High-utility conserved avian microsatellite markers enable parentage and population studies across a wide range of species

Background Microsatellites are widely used for many genetic studies. In contrast to single nucleotide polymorphism (SNP) and genotyping-by-sequencing methods, they are readily typed in samples of low DNA quality/concentration (e.g. museum/non-invasive samples), and enable the quick, cheap identification of species, hybrids, clones and ploidy. Microsatellites also have the highest cross-species utility of all types of markers used for genotyping, but, despite this, when isolated from a single species, only a relatively small proportion will be of utility. Marker development of any type requires skill and time. The availability of sufficient “off-the-shelf” markers that are suitable for genotyping a wide range of species would not only save resources but also uniquely enable new comparisons of diversity among taxa at the same set of loci. No other marker types are capable of enabling this. We therefore developed a set of avian microsatellite markers with enhanced cross-species utility. Results We selected highly-conserved sequences with a high number of repeat units in both of two genetically distant species. Twenty-four primer sets were designed from homologous sequences that possessed at least eight repeat units in both the zebra finch (Taeniopygia guttata) and chicken (Gallus gallus). Each primer sequence was a complete match to zebra finch and, after accounting for degenerate bases, at least 86% similar to chicken. We assessed primer-set utility by genotyping individuals belonging to eight passerine and four non-passerine species. The majority of the new Conserved Avian Microsatellite (CAM) markers amplified in all 12 species tested (on average, 94% in passerines and 95% in non-passerines). This new marker set is of especially high utility in passerines, with a mean 68% of loci polymorphic per species, compared with 42% in non-passerine species. Conclusions When combined with previously described conserved loci, this new set of conserved markers will not only reduce the necessity and expense of microsatellite isolation for a wide range of genetic studies, including avian parentage and population analyses, but will also now enable comparisons of genetic diversity among different species (and populations) at the same set of loci, with no or reduced bias. Finally, the approach used here can be applied to other taxa in which appropriate genome sequences are available.


Background
Microsatellite loci are suitable for a wide range of applications and have remained the most commonly used marker for studies of population structure and paternity since the early 1990s [1][2][3]. The use of microsatellites is likely to continue to be used for many years to come. They are comparatively cheap to genotype and provide more population genetic information per marker than biallelic markers such as single nucleotide polymorphisms (SNPs; [4]). A single set of microsatellite markers can be used to genotype several related species, but SNP markers lack cross-species utility, and are therefore only suitable for population and paternity studies where the project involves just a single species. Microsatellites can be successfully used for genotyping samples of low DNA concentration or low-quality samples (such as museum and non-invasive samples, e.g. feather, hair and faecal samples), in contrast to, for example, SNPs and genotyping-by-sequencing methods. A relatively large amount of DNA (typically 250 ng per individual) is usually required for SNP-typing versus >1 ng for microsatellite-based genotyping. Microsatellites have a wide range of other applications, and for some of these they have been found to be more suitable than SNPs, e.g. in genetic stock identification ( [5], cf. [6]). They are the most convenient marker to establish if an individual (plant, for example) is a clone of its parent. They enable investigation of ploidy in a species, which for many species remains unknown. Plants and insects can be haploid, diploid or tetraploid, etc. and in some cases, one sex may be haploid and the other diploid (e.g. some wasp species). Finally, microsatellites enable the rapid identification of cryptic species (e.g. [7]) and have been used successfully to identify species hybrids (e.g. [8,9]).
Unfortunately, like most markers, the isolation, development and validation of microsatellite markers can take time to complete and therefore prove costly. Due to their low abundance in birds compared to other taxa [10,11], enrichment protocols are routinely employed to isolate avian microsatellite loci. The enrichment and cloning of microsatellite sequences is a skilled task, and is, therefore, often out-sourced, to be performed at specialist research facilities or by commercial laboratories. The use of 454-pyrosequencing can increase the number of loci isolated (e.g. [12]) but this also has to be performed at a specialist facility and can therefore increase costs [13]. Several weeks are then usually required for the in-house stages of primer testing and validating markers.
Moreover, the development and selection of microsatellite markers using a single population from an individual species often results in ascertainment bias [14]. Thus, even when markers amplify in multiple species, they are often most polymorphic in the same population and/or species from which they have been isolated (e.g. [15][16][17][18][19]), preventing meaningful cross-species comparisons. Ideally, any marker type would be applicable to several species to enable cross-species comparisons and allow investigation of karyotype and genome evolution. The cross-species utility of microsatellites is higher than other types of markers. However, when microsatellites are developed in the traditional way, from a cloned single species, their utility is normally limited to closely-related taxa.
Since the early demonstrations of cross-species microsatellite amplification in birds (e.g. [20], attempts have been made to identify a useful number of primer sets of high utility in a wide range of avian species. A small number of such primer sets of high cross-species utility have been identified (e.g. [21]; see also the BIRDMARKER webpage http://www.shef.ac.uk/nbaf-s/databases/birdmarker, [22]). Unfortunately, loci that are polymorphic are often rendered useless for genetic studies due to deviation from Hardy-Weinberg equilibrium and high null allele frequencies [23]. However, Durrant et al. [24], demonstrated, by testing the 34 TG conserved microsatellite markers developed by Dawson et al. [21], that it is possible to identify at least 20 validated polymorphic loci in species of Passeridae or Fringillidae (classification based on Sibley & Monroe [25]), with the term "validated" indicating that each locus, when assessed in a single population of unrelated individuals, adhered to Hardy-Weinberg equilibrium and had an estimated null allele frequency lower than 10%. Between 12-40 of such validated markers are normally sufficient for parentage and population studies (e.g. [26][27][28]), although some analyses, such as heterozygosity-fitness correlations, may require larger numbers of loci [29,30]. A large number of zebra finch (Taeniopygia guttata) expressed sequence tag (EST) microsatellite loci have been identified as useful in the blue tit (Cyanistes caeruleus) and, due to the relatively large genetic distance between zebra finch and blue tit, these are expected to be of utility in multiple species of Paridae [31]. However, although sufficient conserved markers probably exist for paternity and population studies of most species of Paridae, Passeridae and Fringillidae, additional loci are required to combine with existing conserved markers and enable genetic studies and cross-species comparisons in the large majority of bird species (including over 5,000 passerines and 4,000 non-passerines, [25]. To identify highly conserved microsatellite loci in the avian genome, the ideal scenario would be to compare homologous sequences in the two most genetically distant avian species. The two most genetically distant bird groups are the ratites and non-ratites [32]. However, there are relatively few species of ratites (n = 57, [25], none of which have as yet had their genomes sequenced (as of 10th February 2013). In order to attempt to identify such highly-conserved microsatellite loci in the avian genome, Dawson et al. [21] previously compared homologous sequences in two very distantly related species, the zebra finch and chicken (Gallus gallus). The primer sequences of these loci were a complete match to both zebra finch and chicken and the marker names were therefore given the prefix "TG" representing the first letters of the binomial names of these two species Taeniopygia guttata and Gallus gallus. The zebra finch and chicken are both non-ratites but belong to two distantly related groups of birds and have the highest recorded genetic distance for any two bird species based on DNA: DNA melting temperature (Δ T m ) hybridisation distances (28.0, [33]. Both of these species have now had their whole genomes sequenced and assembled (see http://www.ensembl.org).
Dawson et al. [21] identified loci that amplified in all non-ratite bird species, a high proportion of which were polymorphic in most species tested. This earlier study utilised microsatellites mined from zebra finch EST sequences with very strong similarity to their chicken homologue, but where the repeat region in zebra finch was not necessarily present in its chicken homologue. The longest uninterrupted string of dinucleotide repeat units in the sequenced zebra finch and chicken alleles was low for most loci (zebra finch: n = 3-15, mean 8 repeats; chicken: n = 0-13, mean 6 repeats). For the markers developed in this way, the proportion of loci polymorphic in a species was inversely related to the genetic distance from the "source" speciesthe "source species" being regarded as zebra finch, the species that contained the most uninterrupted microsatellite repeat units. Passerine species were regarded as those with a genetic distance of 12.8 or less from zebra finch based on DNA: DNA melting temperature (Δ T m ) hybridisation distances [25]. On average, 47% of those TG loci amplifying were polymorphic in passerines and 22% in non-passerines (zebra finch and chicken data excluded; [21]. The variability of a locus is related to the number of repeats it possesses [34]. The decrease in polymorphism with increasing genetic distance may have been due to a correlated reduction in the number of repeat units in the target species compared to the source species. In this new study, we have attempted to identify markers that are polymorphic in a larger range of species. We followed the approach of Dawson et al. [21] by identifying highly similar homologous sequences in two distantly related species (zebra finch and chicken). However, here we (1) selected homologous sequences in which both species contained repeat motifs, (2) attempted to align sequences that contained more repeat units than in the earlier study (≥ 8, in both species) and (3) we searched the whole genome for conserved microsatellite loci (i.e. not just for microsatellites in EST sequences, as performed by Dawson et al. [21]). Microsatellites with more repeat units generally have higher mutation rates [35,36] and are therefore expected to be more variable. The use of the whole genome was expected to increase the number of useful loci identified due to the huge increase in the number of microsatellite sequences that were now available. It is unclear if the source origin of the sequence (i.e. anonymous genomic sequence versus EST) would be expected to have any influence on locus variability. There is evidence that there is no difference between the variability of microsatellite markers developed from non-EST and EST sequences but other studies suggest non-EST markers may be more variable than those from ESTs (cf. [37][38][39]). We developed a set of conserved markers for 24 loci using the stated criteria and assessed their utility across a wide range of avian species. Additionally, we compared the utility of the new marker set to that of the previously-developed conserved marker set [21].

Identification of microsatellite loci in the zebra finch and chicken genome
In order to identify microsatellite sequences we searched the contigs and supercontigs of the unassembled zebra finch genome (now assembled and published by [40]) and the assembled chicken genome version 2.1 [41], using a version of the SPUTNIK software modified by Cornell University (http://wheat.pw.usda.gov/ITMI/EST-SSR/LaRota/, [42]. We identified sequences containing any dinucleotide repeat regions (CA, GA, AT, GC or their complements) which had more than ten repeats and which were at least 90% pure (i.e. >18 bp long; Table 1). We extracted 200 bp of sequence flanking either side of the repeat region, or all of the available sequence if it was less than 200 bp.

Identification of highly-conserved microsatellite loci
The length of the sequence compared against another affects the strength of the E-value obtained. The zebra finch sequences extracted and used for the BLAST sequence comparison to chicken were 421-487bp long ( Table 2). We attempted to create a zebra finch-chicken consensus primer set for all zebra finch microsatellite   *, the zebra finch sequences were isolated by a search of the unassembled contigs and super contigs of the zebra finch genome and the chicken sequences were isolated by a search of the assembled chicken genome (v2.1). The sequence of each locus is provided in Additional file 2. bp, base pairs; ZF, zebra finch Taeniopygia guttata; CH, chicken Gallus gallus; F, forward primer sequence; R, reverse primer sequence ¥, The forward and reverse primer sequences match 100% to zebra finch and 86-100% to chicken Gallus gallus when the degenerate bases are accounted for. The degenerate bases used in the primer sequences shown in bold and underlined, R = A or G, Y = C or T, M = A or C, S = C or G, W = A or T, K = G or T; Ψ, calculated by dividing the number of bases matching chicken (after accounting for the degenerate bases) by the total length of the primer sequence; Ŧ, assessed for (a) similarity to sequences in the NCBI nucleotide EST and nr/nt databases identified using blastn (distant homologies) settings and (b) for similarity to protein coding regions in the CH & ZF assembled genomes which was identified by the presence of exons within 5 kb of the source sequence (searches performed 30/09/2011). Details of the sequence homologues found are provided in Additional file 6.
sequences that exhibited an NCBI BLAST E-value of E-59 or better (lower) when compared to their chicken microsatellite homologue (Table 2). BLAST E-value scores were obtained using standalone blastN (version 2.2.8 of Blast for 32-bit Windows; [43]).

Creation of a consensus hybrid sequence and primer design
Consensus zebra finch-chicken sequences were created by aligning homologous sequences using MEGA3 software [44] and replacing mismatching bases and gaps with the code "n" to represent an unknown base. We used the zebra finch-chicken consensus microsatellite sequences to design primer sets using PRIMER3 software [45]. The primer sequences were designed from the consensus zebra finchchicken hybrid sequence including "n" at those base pair locations where the zebra finch and chicken bases did not match. When necessary, we altered the "General Primer Picking Conditions" and set the "Max #N's" parameter (maximum number of unknown bases (N) allowable in any primer) to "1" or "2" so that degenerate bases (if needed) could be included in the primer sequence. Primers were selected to have a melting temperature between 57-63°C and the maximum allowable difference in the melting temperature between the forward and reverse primer was set as 1.0°C. However, it should be noted that the melting temperature assigned to an unknown "n" base by PRIMER3 is an average of all four bases and not the melting temperature of any actual base. The real melting temperature of primer sequences including degenerate bases will be different to that requested in the PRIMER3 selection criteria and also stated in the PRIMER3 output. The actual melting temperature will therefore be 0.88/2.18°C higher than that stated if the actual base at the location of the degenerate base was a G/C and 0.55/2.41°C lower if an A/T. We manually selected the primer-binding sites to be positioned in regions where the sequences were highly similar between zebra finch and chicken and attempted to include as few degenerate bases as possible, but most primers (encompassing 18 pairs) required the inclusion of degenerate bases. These degenerate bases were placed at the sites where a base mismatch occurred between the zebra finch and chicken sequence in an attempt to make the primer sequences amplify in multiple species. We used a maximum of two degenerate bases per primer and a maximum of three per primer pair (Table 2). With two degenerate bases per primer the difference in true melting temperatures versus those calculated by PRIMER3 ranges from a maximum of −4.82°C (n × 2 versus T × 2) to +4.36°C (n × 2 versus G × 2). The (multiple) different combinations of alternative primer sequences due to the inclusion of degenerate primer bases were not checked for adherence to PRIMER3 primer design criteria prior to ordering the primer sets due to the complexity of performing this task. The forward primer of each primer set was labelled with either a HEX or 6-FAM fluorescent dye ( Table 2). The loci were named with the prefix CAM representing "Conserved Avian Microsatellite".

Genome locations
All of the sequences were assigned chromosome locations in the zebra finch and chicken genomes by performing a BLAT search against each genome, using the masked genome and the distant homologies settings implemented on the ENSEMBL webpage (http://www. ensembl.org/Multi/blastview; methods as in [46,47]; Table 3, Figure 1). The genome assemblies used were the Taeniopygia_guttata-3.2.4 (v 1.1), released 14 July 2008 [40] and the chicken genome assembly version 2.1 [41]. The locations of the loci were displayed using MAPCHART software [48].

Cross-species amplification and polymorphism
The 24 primer sets developed were used to genotype a minimum of four individuals from each of eight species of Passeriformes and one species each of Ciconiiformes (Charadriiformes), Strigiformes, Coraciiformes and Galliformes (including zebra finch and chicken; classification following Sibley & Monroe [25]). The species tested covered a wide range of genetic distances from the zebra finch (species identities and sample sizes are provided in Table 4). All individuals had been sampled in the wild with the exception of the zebra finch and chicken individuals ( Table 4). The latter were sampled from captive populations maintained at the University of Sheffield and the United States Department of Agriculture (Agriculture Research Service, East Lansing, USA), respectively. For each species, all individuals genotyped were unrelated as known, except for the chicken and European rollers. All four chicken were siblings and three of the European rollers were siblings. The chicken individuals genotyped were four siblings from the East Lansing mapping population, which consists of fifty-two BC1 animals derived from a backcross between a partially inbred jungle fowl line and a highly inbred white leghorn line [49]. These individuals, therefore, will display a maximum of four alleles per locus, but often fewer. Additionally, a higher proportion of the chicken siblings might be expected to be heterozygous than in a wild population because the mother and father of the chicken pedigree originated from different breeds. Polymorphism in chickens at the TG and CAM loci was omitted from analyses for three reasons: (1) the chicken individuals tested belonged to a backcrossed mapping pedigree; (2) all the other species tested were comparable, being all at a genetic distance of 28 from chicken (genetic distance: DNA: DNA melting temperature (Δ T m ) hybridisation distance, [33]) and,   Exp. length in ZF or CH (bp), expected PCR product size based on the pure zebra finch (ZF) or pure chicken sequence (CH); , those expected allele sizes in parentheses assume that a product is amplified in spite of the additional mismatches between the primer bases and the chicken genome. Minimum expected allele size in ZF or CH (bp), is based on the same sequences as above but after the deletion of the repeat region and repeat-like regions; n, number of individuals genotyped (of species stated); #A, number of alleles observed in the individuals genotyped; €, same two alleles amplified in all individuals. Based on difference between the expected and observed allele sizes we suspected a different locus is amplifying in chicken;  Table 2. Those loci underlined are less than 5Mb apart and may display linkage disequilibrium. finally, (3) the primer sets had been engineered more specifically to amplify in chicken than in the other species tested. The European rollers genotyped initially included four nestlings sampled from two nests (including three siblings from one nest). When the loci that failed to amplify were rechecked, unrelated European roller individuals were used. All individuals genotyped were sampled from a single population, except the Leach's storm-petrels, for which the six individuals were sampled from four populations, and Berthelot's pipits, for which each of the four individuals sampled was from a different population.
Approximately 20-50 μl of blood was collected from each individual and stored in 1.5 ml of absolute ethanol in rubber-sealed screw-topped microfuge tubes. Genomic DNA was extracted using an ammonium acetate precipitation method [50] or a salt extraction method [51]. Each DNA extraction was tested for amplification and sex-typed using the Z-002 [52] or (for the Berthelot's pipit and the European roller) P2/P8 [53] sex-typing markers.
Each primer set was tested in isolation (single-plexed) in all species. Primer sets (using the zebra finch version of the primer sequence) were checked for their potential to form hairpins and to identify any PCR incompatibilities due to primer sequence similarity using AUTODIMER software [54], http://www.cstl.nist.gov/strbase/software.htm) using a 'conservative minimum threshold score' of seven.
Single-plex PCR reactions were performed in 2-μl volumes using QIAGEN Multiplex PCR Master Mix (QIAGEN Inc.) for all species except the European roller and its reruns. Each 2-μl PCR contained approximately 10 ng of lyophilised genomic DNA, 0.2 μM of each primer and 1 μl QIAGEN Multiplex PCR Master Mix [55]. For all species, PCR amplification was performed in the same laboratory in Sheffield using a DNA Engine Tetrad 2 thermal cycler (model PTC, MJ Research, Bio-Rad, Hemel Hempstead, Herts, UK). PCR amplification was performed using an annealing temperature of 56°C or a touchdown PCR program (Table 4). Slightly different PCR protocols were used for some species, since they were performed by different researchers at different times and using different DNA Taq polymerases (Table 4). However, these differences are not expected to have any measurable effect. The European roller amplifications were performed in a 10-μl PCR reaction that contained approximately 20 ng of genomic DNA, 0.5 μM of each primer, 0. . Products were diluted 1 in 500 prior to separation on an ABI 3730 48-well capillary DNA Analyser and allele sizes were assigned using GENEMAPPER v3.7 software (Applied Biosystems, California, USA). The same DNA Analyser at Sheffield was used for separating the amplified products for all species. Alleles were scored separately for each species, using species-specific allele bin sets, in different sessions by different researchers but in the same laboratory and using the same methods (details in Table 4).
Previous work has identified that it is worth retesting any markers that fail to amplify at the first PCR attempt [21]. All markers that failed to amplify were therefore rechecked by performing a repeat PCR and the majority amplified at the second PCR attempt. When the 24 markers were initially tested, a maximum of six markers (25%) failed to amplify in a single species; however, the majority amplified at the second PCR attempt (Table 4 and Additional file 1).
For four species, Berthelot's pipit, rifleman, Leach's storm petrel and European roller, a proportion of the CAM and TG loci [21] were assessed in a larger sample of unrelated individuals (n = 17-30) from a single population in order to check for Hardy-Weinberg equilibrium and estimate null allele frequencies (calculated using GENEPOPv4.0.10, [56] and CERVUSv3.0.3, [57]). The characteristics of the CAM and TG marker sets were then compared for these four species, in terms of the number of loci deviating from Hardy-Weinberg equilibrium and the proportion possessing high null allele frequency estimates.
All statistical analyses were carried out in R version 2.14.1 [58]. Differences in the proportions of polymorphic loci across passerines and non-passerines, and between CAM and TG loci, were tested using chi-squared (χ 2 ) tests. Linear regression was used to test for whether the percentage of polymorphic loci per species was related to the genetic distance from zebra finch.

Results and discussion
Identification of microsatellite sequences in the zebra finch and chicken genomes There were similar total numbers of dinucleotide microsatellite sequences of eight or more repeats in the zebra finch and chicken genomes (6,458 versus 6,581, respectively; Table 1). Hits to the "unknown" chromosome were not included, since duplicate sequences have been observed on both the named chromosomes and the 'unknown' chromosome and these occurrences are probably artefacts of the assembly process (DAD pers. obs.). It should also be noted that a male was sequenced to obtain the zebra finch genome, whereas a female was used for the chicken, so that only the chicken genome includes sequence derived from the W chromosome. However, due to the small size of the W chromosome (representing only 0.02% of the assembled chicken genome), its inclusion is not expected to influence significantly the total number of microsatellites detected.
Only one chicken and no zebra finch microsatellites were found that contained a GC/CG motif, suggesting that these motif types are rare and/or shorter than eight units in  Genetic distance to ZF, genetic distance from species tested to zebra finch based on [33] and the classification of [25]; Genetic distance to CH, genetic distance from species tested to chicken [33].
length in the avian genome. Although the total numbers of microsatellite loci were similar between the zebra finch and chicken, the zebra finch possessed a higher proportion of AT/TA repeats, and fewer CA/GT and GA/CT motifs, than chicken (Table 1; heterogeneity test, χ 2 = 381.6, d.f. = 2, p < 0.0001). These differences were unexpected and the reasons for them are currently unknown.

Identification of highly conserved microsatellite loci
Forty-two homologous microsatellite loci were identified in both the zebra finch and chicken, with each pair having a BLAST E-value better than E-59. None of these newly identified conserved sequences matched any of the conserved EST-based microsatellite loci for which primer sets had already been developed by Dawson et al. [21]. The conserved loci possessed the following dinucleotide motifs: CA/GT motif (n = 22), AT/TA (n = 16) and GA/CT (n = 4). The distribution of motif types in the conserved loci did not differ from expectation based on their frequencies in the zebra finch (heterogeneity test, χ 2 = 5.42, d.f. = 2, p = 0.07) or chicken genome (heterogeneity test, χ 2 = 2.95, d.f. = 2, p = 0.23; Table 1). All 42 zebra finch sequences were aligned with their chicken homologues in an attempt to create a consensus hybrid sequence.

Creation of a consensus hybrid sequence and primer design
Consensus primer sets were created for 24 of the 42 unique loci identified (57%) using the primer design criteria outlined above (Tables 1 & 2; full sequences of the loci are provided in Additional file 2). In contrast to Dawson et al.
[21], we were not able to create primer sets that were always 100% homologous to chicken but all matched 100% to zebra finch, and were at least 86% similar to their homologous chicken sequences (by including 1-2 degenerate bases in 25 primers). Only a single degenerate base in just one primer was required in the earlier EST study, which then matched 100% to both species (34 primer sets; [21]). Many more degenerate bases were used in the CAM marker set than in the earlier TG marker set (CAM: 28 degenerate bases spread over 18 of the 24 markers; TG: one degenerate base in one of the 34 markers; this study versus Dawson et al. [21]). Only six CAM consensus sequences contained regions of microsatellite-flanking sequence that were identical in zebra finch and chicken for a sufficient length from which to design primers without using any degenerate bases (CAM-06, CAM-13, CAM-17, CAM-18, CAM-20 and CAM-24; Table 2). The remaining 18 primer sets contained between 1-2 degenerate bases per primer sequence (a maximum of 3 degenerate bases per primer pair) and, of these, only six were 100% matches to both zebra finch and chicken, when accounting for the degenerate bases used. We attempted to design the most consensus primers we could. The primer sequences of the remaining 12 degenerate primer sets were a 100% match to zebra finch and a match to chicken of between 86-96%. As expected, all 24 loci possessed dinucleotide motifs in chicken and zebra finch, with the majority being the CA/GT motif (n = 16), although some had AT/TA (n = 4) and GA/CT (n = 4) motifs. The same motif type was present in both chicken and its zebra finch homologue at all 24 loci (Table 3). Most loci possessed several different dinucleotide repeat regions and some also possessed additional mononucleotide repeat regions in the sequence (Table 3). When the longest string of uninterrupted dinucleotide repeats at each orthologous locus was compared between chicken and zebra finch there was a significant difference in the number of repeat units (paired t-test, t = 2.18, d.f. = 23, P = 0.04; 15 loci had fewer repeats in chicken, six had more and three the same number of repeat units; Table 3). The 24 selected loci possessed a minimum of eight uninterrupted dinucleotide repeat units (in both species) and a maximum of 27 in zebra finch and 20 in chicken (Table 3).
No hairpins were detected in any primer sequences when analysed using only the pure zebra finch version of each primer (assessed using AUTODIMER software). Three pairs of primer sequences displayed some degree of similarity and should be avoided as potential multiplex combinations to prevent the risk of forming primer dimers (CAM-02R-CAM-15R, CAM-03R-CAM-20F and CAM-05R-CAM-06R). However, the check for primer similarity (using AUTODIMER software) is of limited utility when checking primers containing degenerate bases because the degenerate bases are regarded as unknown bases and some unidentified primer pairs may turn out to be incompatible. We therefore recommend typing the loci both singly and in multiplex PCR reactions to confirm that the genotypes match before routinely using any multiplex set, especially when the primer sequences contain degenerate bases. When up to three degenerate bases are used, as in this study, the maximum number of forward and reverse sequence combinations per primer set is eight and the resulting variation in annealing temperatures between the forward and reverse primers might potentially cause PCR amplification problems. We recommend designing primer sets for standard microsatellite loci using PRIMER3 with a maximum difference between the forward and reverse primer melting temperature of 0.5°C. However, a difference of up to 2°C has been found to be acceptable for the amplification of many primer sets (e.g. [59]). Unreliable PCR amplification of these loci is most likely in the nonpasserine species, as they are more genetically distant from zebra finch and are therefore more likely to exhibit base mismatches in the primer binding regions. Incomplete PCR amplification can be identified by testing a range of annealing temperatures, performing repeat PCRs and/or the typing of a pedigree (if available), and, if detected, can be improved by PCR optimisation methods.

Homology to expressed and coding sequence
Highly conserved microsatellites have been successfully isolated from ESTs [21]. The majority of the 24 CAM sequences (17/24) were found to be homologous to avian ESTs, avian (or mammalian) mRNA sequences or known genes (identified by sequence similarity searches of the GenBank nr, EST ("EST_others") nucleotide databases and the zebra finch and chicken genomes; Table 2). Some of the microsatellite sequences were located within exons, which may explain why these sequences are conserved among many species.

Genome locations and linkage
All 24 loci could be assigned a location in both the zebra finch and chicken genome based on sequence similarity. Twenty-three loci were assigned to an autosomal location and one locus (CAM-11) was assigned to the Z chromosome in both species (Figure 1). Two pairs and one triplet of loci were assigned locations less than 5 Mb apart in both the chicken and zebra finch genomes; there is therefore an increased possibility of these loci being in linkage disequilibrium because recombination rates between them will be relatively low:  Figure 1). Loci CAM-02 and CAM-13 were not typed in the house sparrow pedigree so could not be checked for linkage to the other locus located on the same chromosome (CAM-03 and CAM-23 respectively).

Cross-species amplification
All loci amplified in both zebra finch and chicken (Tables 3 & 4, Figure 2). The ranges of allele sizes obtained by genotyping zebra finches and chickens were close to those expected based on the respective genome sequences, with the exception of locus CAM-09 in chicken. The maximum difference between the expected allele size and the allele size range observed for each species was 11 bp (except CAM-09 in chicken; Table 3); since the source genome sequence was isolated from an individual belonging to a different population to the individuals genotyped, small allele size differences (such as 1-20 bp) are expected. Locus CAM-09 was 101 bp smaller in size in chicken than expected, however, this marker remains of potential utility in other species. We suspect that a deletion may have occurred in the chicken (breed/population) genotyped, or that a different locus is being amplified, possibly due to poor similarity of the CAM-09 primer sequences to chicken (three degenerate bases were used (one in the forward primer and two in the reverse) but, despite this, three bases in the forward primer and two in the reverse still did not match chicken 100%; Table 2). It was surprising that, despite up to three chicken-primer base mismatches per primer sequence (in addition to the presence of up to two degenerate bases), and the differences in   Figure 2 Percentage of CAM loci amplified (white squares) and polymorphic (black circles), alongside genetic distance from the zebra finch (grey triangles) for 12 species. % Polymorphic, proportion of loci polymorphic of those amplifying for each set of loci. Four individuals were genotyped for each species at 24 loci. Genetic distance, DNA:DNA Δ T m hybridisation distance [33].
primer annealing temperatures in different species caused by this (Additional file 3), all the primer sets amplified in chicken. Amplification may have been assisted by the use of a touchdown PCR program and the use of the QIAGEN Multiplex PCR Master Mix, which enhances the likelihood of successful PCR amplification from primers with differing annealing temperatures. For the majority of loci (including CAM-09), the sizes of the alleles observed in the ten other species tested were very similar to those expected and observed in zebra finches (and/or chickens, except CAM-09) (Additional file 1). It is expected that for each species a few loci will not possess high sequence similarity and, because the identity of those not possessing sequence similarity is different in each species, this does not present a problem. We compared sequences to the recently released collared flycatcher (Ficedula albicollis) and budgerigar (Melopsittacus undulates) genome sequences (http://www.ensembl.org/index.html; Dawson et al. unpublished data). A homologue was identified in each case and all contained a microsatellite repeat (including CAM-09; CAM-24 cannot be checked because it cannot be identified in the available assemblies). This suggests the correct target locus was being amplified in the majority of species-marker tests. The degree of sequence similarity between distantly related species affects the range of species that will amplify [60]. Those markers designed from sequences with high similarity between distantly related species (i.e. those with an E-value of E-80 or better between zebra finch and chicken) have been found to amplify in virtually all birds [21]. Dawson et al. [21] used a different BLAST program (WU-BLAST) when assessing loci for potential crossspecies utility. However, the BLAST E-values obtained via WU-BLAST and NCBI BLAST (as used for this study) for the same sequence are normally very similar (DAD unpublished data). During this study we utilised sequences with a lower similarity between zebra finch and chicken (those displaying a BLAST E-value better than E-59). This weaker cut-off was necessary to enable the identification of homologous sequences that possessed eight repeats in both zebra finch and chicken but the trade-off was that in most cases the poorer similarity made it impossible to design primers that were a complete match to both zebra finch and chicken. The reduced primer similarity to chicken was expected to lower the utility of these markers in species distant to zebra finch but it was hoped that, for those species close to zebra finch (passerines), a high number of polymorphic loci would be identified. On average, 94% of loci amplified in each of the seven passerine species tested (range 83-96%) and 95% amplified in each of three non-passerine species (range 92-96%; zebra finch and chicken data excluded, Table 4, Figure 2). The number of loci that amplified within each species was not related to their genetic distance from the zebra finch ( Figure 2).
When all 24 CAM markers were considered as a whole, the proportion of loci polymorphic per species was negatively correlated with genetic distance from the zebra finch (Figure 2), as was also previously found for the TG loci [21], despite the fact that the CAM loci displayed a repeat region of at least eight repeat units in chicken (chicken excluded; CAM loci: F = 27.55, d.f. = 1, 9, R 2 = 0.73, P = 0.0005; TG loci: F = 15.30, d.f. = 1, 17, R 2 = 0.44, P = 0.001; Figure 3A). Additionally, the mean number of alleles per polymorphic locus decreased with increasing genetic distance from the zebra finch (chicken excluded; F = 22.99, d.f. = 1, 9, R 2 = 0.68, P < 0.001; Figure 4A). These regressions remained significant after controlling for differences between passerines and non-passerines, and when a phylogenetic correction was used (data not shown), indicating that the effect of genetic distance on polymorphism was a linear, rather than group effect. Approximately 20% more of the loci that amplified were polymorphic per species than was achieved previously by studies attempting to create conserved avian microsatellite loci. Each marker displayed a varying degree of cross-species utility ( Figure 5, Additional file 4), possibly due to the differing degree of primer sequence similarity to chicken (Table 4, Additional file 3). In order to investigate this, we selected two subsets of six CAM markers: (Set 1) those that were a 100% match to chicken (and zebra finch) and possessed no degenerate bases (CAM-06, CAM-13, CAM-17, CAM-18, CAM-20 and CAM-24) and (Set 2) those which displayed poor similarity to chicken (but a 100% match to zebra finch; CAM-03, CAM-04, CAM-10, CAM-15, CAM-21 and CAM-23) and analysed these two groups separately. For Set 1 (the highly conserved markers), there was no relationship between the percentage of species polymorphic and genetic distance from zebra finch (linear regression: R 2 = 0.11, d.f. = 10, P = 0.15, zebra finch and chicken excluded; Figure 3B). This appears to be a result of more markers in this set being polymorphic in those species distant to zebra finch ( Figure 3B). However, in Set 2 (the more weakly conserved markers), the percentage polymorphism declined significantly with genetic distance from zebra finch (linear regression: R 2 = 0.75, d.f. = 10, P = 0.0002, zebra finch and chicken excluded; Figure 3C). Set 2 also displayed a decrease in the mean number of alleles with increasing genetic distance from zebra finch (R 2 = 0.8, d.f. = 10, P = 0.0002; Figure 4C), whereas in Set 1 there was no such fall (R 2 = 0.07, d.f. = 10, P = 0.42; Figure 4B). In order to identify why markers with poor primer sequence similarity to chicken displayed a fall in variability as genetic distance increased, we checked both sets of loci for sequence similarity with the collared flycatcher and budgerigar genome sequences. These species are both useful for this investigation because their genetic distance from chicken is the same as the other species used in this study (genetic distances (Δ T m ): collared flycatcher-chicken = 28 and budgerigar-chicken = 28; collared flycatcher-zebra finch = 11.7 and budgerigar-zebra finch = 23.1; [33]). We checked how many bases in each primer sequence mismatched with their zebra finch and chicken homologue and how the repeat regions varied between the species. This revealed that for both Set 1 and Set 2, only two and one primer sets completely matched flycatcher respectively, but the number of bases mismatching in each primer set was quite low in both groups (a maximum of three mismatches per primer set, except for CAM-06 and CAM-21). In the more distant budgerigar, when the weakly-conserved markers of Set 2 were analysed, there were more mismatches per primer set than observed Allelic richness (mean number of alleles per polymorphic locus) of the CAM markers in relation to genetic distance from zebra finch.* A: All 24 CAM markers included; B: Six CAM markers with 100% primer sequence similarity to chicken (and zebra finch); C: Six CAM markers with poor primer sequence similarity to chicken (but 100% identical to zebra finch). Genetic distance, genetic distance of the genotyped species from zebra finch (Taeniopygia guttata) DNA:DNA ΔT m hybridisation distance [33]. *Four individuals were genotyped at 24 loci for each of 11 species (including zebra finch Taeniopygia guttata but excluding chicken Gallus gallus; see text). Six CAM markers with 100% primer sequence similarity to chicken (and zebra finch); C: Six CAM markers with poor primer sequence similarity to chicken (but 100% identical to zebra finch). Percentage markers polymorphic, proportion of loci polymorphic of those amplifying for each set of loci (CAM and TG sets). Genetic distance, DNA:DNA Δ T m hybridisation distance [33]. Four individuals were genotyped at 24 loci for each of the 11 species (including zebra finch Taeniopygia guttata but excluding chicken Gallus gallus; see text).
in the flycatcher: four markers had over three bases mismatching per primer set, one marker had one mismatch and for only one marker did both the forward and reverse primer sequences completely match budgerigar. Whereas, in the strongly-conserved marker Set 1, for the five homologous loci that could be identified (i.e. except CAM-24) all primer sets were a complete match to budgerigar. It was surprising that the primer sequences of the markers in Set 1 displayed higher similarity to budgerigar than flycatcher. All loci in both sets contained at least five uninterrupted repeats both species, except CAM-03 in budgerigar (CAM-24 could not be checked). There was no relationship between the mean number of repeats possessed and the number of bases mismatching in the primer sequences (mean number of repeats in Set 1 versus Set 2, flycatcher: 11 versus 11, budgerigar: 6 versus 7). This suggests that primer sequence similarity is the main factor affecting the identification of a polymorphic locus in this set of 24 CAM markers. Based on the number of repeats observed in budgerigar, other CAM loci would be expected to be polymorphic in non-passerines but the primers appear to be amplifying only one of the alleles (19 loci had more than 5 repeats in budgerigar and a maximum of 11 repeats observed; CAM-24 could not be checked). Perhaps, in distantly related species, mismatches between the target sequence and primer sequence result in amplification failure of some alleles due to large differences in the melting temperatures between the forward and reverse primer and between these and the PCR annealing temperature used. These base mismatches and mismatched melting and annealing temperatures may lead to only a single allele (with highest similarity to the primers) being amplified during the PCR. It is unclear why the primer set does not simply fail to amplify a product but perhaps the use of QIAGEN Multiplex PCR Master Mix reaction buffer enables amplification even when a primer set has poor similarity to the target. Alternatively, perhaps those displaying poor similarity to chicken are amplifying a different (invariant) locus in many of those species distant to zebra finch although this seems unlikely based on the agreement between the observed and expected allele size for each locus. The six well- We deduce that there are several important factors for ensuring polymorphism across the widest range of species (and for avoiding null alleles) when designing conserved markers: (1) the most distantly related species possible should be selected for designing the primers; (2) the similarity of the homologous regions should be high (displaying a BLAST E-value of E-80 or better); (3) a minimum of 8 uninterrupted repeats should be present in each species' sequence used in the alignment; (4) the primer sequence must match both/all species 100%; (5) the use of degenerate bases should ideally be avoided or else minimized (to no more than one degenerate base per primer set); (6) the forward and reverse primer melting temperatures should ideally be within 0.5°C of each other (maximum 2°C); and (7) when degenerate bases are used it is important to confirm that all the alternative states of the forward and reverse primers are compatible and ensure that the melting temperature of all alternative states are within 0.5°C of each other.
The CAM loci were of utility in non-passerine birds. The nearest avian order, in terms of genetic distance, to Passeriformes is the order Ciconiiformes (also known as Charadriiformes, shorebirds and allies, [33]). We tested one ciconiiform, the Leach's storm-petrel, in which 23 (96%) loci amplified and 13 (56%) of those amplifying were found to be polymorphic (Table 4, Additional file 1, Figure 2). In the two species very distant from both zebra finch and chicken, the barn owl and European roller, most of the markers amplified (92-96%) and 32-39% of those amplifying were polymorphic; Table 4, Additional file 1, Figure 2). When tested in chicken, 38% of the loci (n = 9) were polymorphic (Tables 4, Additional file 1).

Typical proportions of loci polymorphic among those amplifying in other studies
The levels of variability in each species when typed with the CAM loci might be affected by factors other than genetic distance, for example, genetic bottlenecks, founder effects, or long-term inbreeding, though we are unaware that these factors have affected any of the species/populations we typed. Additional polymorphic loci have been genotyped in the same three non-passerine species that we tested and this work did not suggest that any of the three species had exceptionally low variability (barn owl, [61]; Leach's storm-petrel, [62]; European roller, [63]).
We found the proportions of CAM loci polymorphic among those amplifying to vary between 38-92% per species when all 24 loci were considered (Table 4; i.e. including those markers with good zebra finch-chicken primer sequence similarity and those loci in which it was poor). These figures are typical of those found in other studies. The proportion of loci polymorphic of those amplifying appears to vary widely among species (Additional file 5).
It is currently unclear if non-passerines are generally less variable than passerines. Further species need to be tested and more work performed to resolve this. If, however, the majority of non-passerine species do display lower variation than passerines then possible causes could be: (1) smaller effective population sizes in non-passerines, (2) higher microsatellite mutation rates in passerines compared to non-passerines or (3) different life histories between passerine and non-passerines. (1) Using a database for North American birds (Partners in Flight Landbird Population Estimates Database, http://rmbo.org/pif_db/ laped/default.aspx, [64]), we found that passerines generally exhibited much larger population sizes than non-passerines (mean ± s.e. individuals per population = 15,524,224 ± 1,950,522 for passerines and 2,789,765 ± 835,772 for non-passerines; independent samples t-test, t = 5.83, d.f. = 383, P < 0.0001). The higher mean population size of passerines may lead to them retaining more genetic variability than non-passerines. (2) Microsatellite mutation rates vary among species [34]. Microsatellites may mutate more rapidly in passerines than non-passerines and, as a result, passerines are more variable. (3) The typically longer generation time of nonpasserines [65] is expected to result in a lower evolutionary rate [66]. In contrast, non-passerines generally display lower levels of extra-pair paternity (EPP) than passerines [67]. A high rate of EPP will increase the variance in male reproductive success and reduce the effective population size (N e ), and hence the level of genetic variability. However, the difference in male variance and the consequent effect on N e will be relatively small.

Individual marker performance
Nineteen loci were polymorphic in a minimum of 50% of the eight passerine species tested (when all loci were assessed in a minimum of 4 individuals/species; Figure 5, Additional file 1). The best performing loci in passerines were CAM-13 and CAM-19, which were polymorphic in all eight passerine species tested (including zebra finch, Figure 5, Additional file 1). Seven further loci were polymorphic in seven of the eight passerine species tested (CAM-01, CAM-02, CAM-05, CAM-10, CAM-15, CAM-17 and CAM-20, Figure 5, Additional file 1). The poorest performing locus, CAM-22, failed to amplify in five passerine species (however, all non-passerines amplified; Figure 5).

Locus homology to bird EST/genic sequences
Seventeen of the 24 markers developed were homologous to a bird EST sequence and/or gene (all markers except CAM-02, CAM-03, CAM-04, CAM-09, CAM-12, CAM-22 and CAM-24; Table 2, Additional file 6). Homology to bird EST/genic sequences, which are expected to be most conserved, did not reduce the number of species found to be polymorphic. In fact, the opposite was true: markers homologous to EST/genic bird sequences were more polymorphic across bird species (χ 2 = 11.77, d.f. = 1, P = 0.006). This is in accordance with evidence from previous studies, which have failed to show that microsatellite markers developed from non-EST sequences are more variable than those from ESTs [37,38].

Null alleles
For four species: Berthelot's pipit, rifleman, Leach's storm petrel and European roller, some of the polymorphic CAM and TG loci (n = 5-12) were additionally typed in 17-30 individuals from a single population and assessed for deviation from Hardy-Weinberg equilibrium and null allele frequencies estimated (Additional file 7). When the data from these four species was combined, there was no overall difference in the proportion of loci displaying high estimated null allele frequencies between the CAM and TG loci (χ 2 =0. 0.001, d.f. = 1, P = 0.98; Additional file 7).
It is likely that null alleles will be more common in more distant species, especially when using primer sets that are less conserved (between chicken and zebra finch). If this happens, the amplified product could be sequenced and species-specific primer sets designed.

Chromosome locations and sex linkage
All individuals genotyped with the CAM loci were of known sex based on plumage characteristics or PCR sex-typing. The individuals genotyped included both males and females for each species. Males (ZZ) of all species amplified at all loci, indicating that no CAM loci were purely W-linked in any species.
All the predicted genome locations of these loci were autosomal except for locus CAM-11, which was predicted to be Z-linked ( Figure 1). Genotypic evidence supported the suggested Z-linked status of this locus in every species in which it was polymorphic: zebra finch, house sparrow, Berthelot's pipit, chaffinch, Eurasian bullfinch, rifleman, European roller and Leach's storm-petrel (Additional file 1). All females were hemizygous whereas at least some males were heterozygous, 5-28 males and 3-22 females per species (regarding Leach's storm-petrel, see below). In Leach's storm-petrels, CAM-11 amplified both W and Z-linked alleles and could be used to sextype individuals. Females were hemizygous, displaying one allele of size 113 bp (n = 22 females) and males were heterozygous or homozygous with observed allele sizes of 134, 136, 138 and 145 bp (n = 26 males). This suggests that the 113-bp allele is located on the W chromosome and the 134-145-bp alleles are located on the Z chromosome. The absence of an amplified Z-allele in females suggests that the 113-bp W allele is amplified in preference to the Z alleles that must also be present. This is expected to happen, for example, if the primers are a better match to the W locus than the Z locus. Upon re-examination, very weak Z alleles (peak heights of 97-288 relative fluorescence units (RFU)) were seen in some female chromatographs, supporting this hypothesis. These weakly-amplified female Z alleles were only observed when the peak height of the W allele was well over 2000 RFU (most over 6000 RFU) and they often failed to amplify at all when the sample was rerun. Locus CAM-11 may prove suitable for sex-typing other related species of Charadriiformes, such as petrels, albatrosses and shearwaters and this is under investigation.

Future directions for identifying conserved microsatellite markers
Since this study began, four additional avian genomes have been sequenced and assembled: the turkey (Meleagris gallopavo), mallard duck (Anser platyrhynchos), collared flycatcher (Ficedula albicollis) and budgerigar (Melopsittacus undulates; as of 10th February 2013; http://www.ensembl. org/). As the costs of sequencing whole genomes continue to fall, many more bird genomes will be sequenced in the near future, so providing an increasingly rich resource for developing conserved markers. For example, following the release of the turkey and mallard genome sequence, it is now possible to identify microsatellite markers that are conserved between the chicken, turkey and mallard, and design conserved primer sets that should then amplify in a wide range of galliform and anseriform species. There are approximately 250 living species of Galliformes, which are separated from their nearest order, the Craciformes (chachalacas, curassows, guans and megapodes), by a genetic distance (Δ T m ) of 21.6 [33]. Since the genetic distance between chicken and turkey is less than the difference between chicken and zebra finch (11.1 versus 28.0), it should be possible to create a much larger number of conserved markers for the Galliformes. However, because chicken and turkey are separated by a relatively small genetic distance (11.1), these sets would probably not be particularly highly conserved and would, therefore, be useful for only a subset of galliform species and few non-Galliformes. A comparison of zebra finch and turkey would not be expected to yield many additional new conserved microsatellite sequences, since the majority should have been identified in the zebra finch-chicken comparisons already performed (this study and Dawson et al. [21]). The approach used here can also be applied to the mallard genome sequence to identify highly conserved sequences and create markers (i.e. zebra finch-mallard markers) suitable for the majority of Anseriformes and Galliformes (via chicken-mallard, turkey-mallard and chicken-turkeymallard markers).
Birds belong within the reptilian clade. Only two nonavian reptile genomes have been sequenced and assembled: the anole lizard (Anolis carolinensis) and Chinese softshell turtle (Pelodiscus sinensis) (http://www.ensembl. org/; as of 10th February 2013). The anole lizard is more closely related to birds than the turtle (http://www. ensembl.org/info/about/species_tree.pdf). Only one CAM locus had an identifiable lizard homologue, which included a microsatellite containing at least eight repeat units and which matched to both sides flanking the repeat region (CAM-20), but even for this locus it is probably not possible to create a consensus bird-lizard primer set due to low sequence similarity.
This study and that of Dawson et al. [21] indicate that few (if any) conserved microsatellite markers will be usefully polymorphic across all bird species (passerines and non-passerines). There are 23 orders of extant birds that are separated by a genetic distance (DNA: DNA melting temperature (Δ T m ) hybridisation distance) of more than 20 [33], classification based on Sibley & Monroe [25]). This study and that of Dawson et al. [21] indicate that when the required (genome and/or EST) sequence data from each avian order becomes available, a conserved set of over 50 markers can be created that will be of high utility for all the species within that order. It is likely that future avian genome sequencing projects will include species originating from different bird orders and so facilitate the creation of conserved microsatellite marker sets suitable for genotyping and comparing multiple species.

Conclusions
We have successfully developed primer sets for 24 polymorphic microsatellite loci that are of high utility in passerine birds, with some utility in non-passerine species. The microsatellite markers described here are particularly useful for genotyping species closely related to the zebra finch, such as those belonging to the Passeridae and Fringillidae families, which encompass 1,383 species [25]). When these markers are combined with 34 conserved markers developed previously [21], the requirement to isolate microsatellite loci will be alleviated for most genetic studies of passerine birds. These conserved loci are suitable for many applications, including studies of population structure, parentage and relatedness; they can also contribute towards linkage mapping and the identification of gene order rearrangements among many species. The less polymorphic loci will be useful, where required, for distinguishing between species and identifying hybrid birds (such as occur naturally in warblers, flycatchers, petrels, ducks, owls and other raptors). These loci also have potential for studying the population genetics of extinct or highly endangered species in which it is difficult to develop microsatellite libraries due to the lack of sufficient (high-quality) DNA. Conserved markers can potentially be used to genotype samples from museum collections or from other noninvasive sources (such as mouth swabs or feathers). The loci will, in particular, enable the comparison of populations and species at the same loci, and so allow genetic variability to be compared directly, without ascertainment bias.