A SNP discovery strategy based on the development of GSPs and their deployment in the search for SNPs in wheat is reported here. As a first step in SNP discovery, a CP pipeline was built starting with 6,045 ESTs . For these ESTs and an additional 290 for which primers were generated manually, only a small portion, about 17.4%, ultimately resulted in validated GSPs. Even though the wheat genomes show low SNP levels, more than half of the A- and B-genome GSPs yielded an SNP, demonstrating that the development of GSPs is more difficult in polyploid wheat than SNP discovery.
The rationale for GSP development is that they make SNP markers versatile; any SNP detection method can theoretically be used if GSPs are available for the site in polyploid wheat. However, some SNP genotyping methods do not require prior PCR amplification of the SNP-containing targets in polyploid wheat [38, 39] making GSPs superfluous. The cost/benefit ratio of GSP development should therefore be considered in the future development of SNPs for wheat.
Another aspect of the SNP development strategy employed here that needs consideration is the use of a distantly related relative as a source of information about the exon-splicing boundaries in ESTs for the design of CPs . The reliance on wheat-rice comparisons preferentially selected for the conserved gene repertoire, which is concentrated in the proximal, low-recombination regions of wheat chromosomes [5, 30, 40–42]. There is also the potential that a focus on conserved loci could result in a downward bias in diversity estimates.
A focus on single-copy loci may also affect the distribution of loci with SNPs along the chromosomes. In wheat , as in other plants , single-copy genes are preferentially located in the proximal, low-recombination regions whereas distal, high-recombination regions are enriched for multigene families. Focusing on ESTs from single-copy genes may cause preferential development of SNP markers for genes located in the proximal, low-recombination regions of chromosomes.
For these multiple reasons, the SNP markers developed here are more abundant in the proximal, low-recombination regions of wheat chromosomes than in distal, high-recombination regions. This is particularly true for the distal 30 cM of the short arms of chromosomes in homoeologous groups 1, 2, and 3, which are poorly populated with SNP markers.
Comparative mapping based on RFLP markers showed that gene order along the T. aestivum homoeologous chromosomes is highly conserved and that any one chromosome of a trio of homoeologous chromosomes can be used to approximate gene order along the other two  and, as a matter of fact, along homoeologous chromosomes of other species throughout the tribe Triticeae . Gene order is also surprisingly conserved across the entire grass family. Approximately 64, 65, and 66% of the loci on the Ae. tauschii genetic map are colinear with genes along the sorghum, B. distachyon, and rice pseudomolecules, respectively .
The conservation of gene order among wheat homoeologous chromosome and across the grass family was exploited here to summarize diversity in the wheat genomes using a single map. A comparative map of Ae. tauschii  was selected for that purpose. The high degree of gene synteny across grasses was exploited to insert into that map additional genes that in wheat contain SNPs but could not be mapped in Ae. tauschii for lack of polymorphism.
The utility of the Ae. tauschii linkage map as a representation of the linear order of genes in the wheat genomes depends on the extent to which the assumption of colinearity of the Ae. tauschii and wheat chromosomes is true. Known translocations exist among chromosomes 4A, 5A, and 7B, and chromosome 4A also acquired pericentric and paracentric inversions [33, 34]. For chromosomes 4A and the translocated regions of 5A and 7B, the diversity maps reported here are of limited relevance.
Since virtually all of the ESTs employed in SNP discovery here had been previously mapped on the wheat deletion-bin maps, this is the first time it is therefore possible to compare the wheat bin maps with a high density genetic map of a closely related genome. The Ae. tauschii genetic map that formed the backbone of the diversity maps was highly colinear with rice, B. distachyon, and sorghum genomic sequences [30, 31]. There was a remarkably good agreement between the deletion-bin maps and the Ae. tauschii genetic map for most chromosome arms and discrepancies were found for less than 10% of the loci. Some of these discrepancies were biological in nature. The greatest number of discrepancies was in the B-genome deletion-bin map and the smallest in the D-genome deletion-bin map. The numbers of paralogous loci in the B genome outnumber those in the A or D genomes 2 to 1 . The B genome is also more prone to translocation [46–48] and undoubtedly other structural changes. Both paralogous gene duplications and changes in chromosome structure manifest themselves as breaks in synteny between the Ae. tauschii genetic map and wheat deletion-bin maps. The poorest fit between the genetic map and the deletion-bin maps found here for the B genome is therefore consistent with greater divergence of the B genome relative to the A and D genomes.
Although the wheat D-genome map was the most similar to the Ae. tauschii map of the three wheat deletion-bin maps, it too showed discrepancies relative to the Ae. tauschii map in several chromosome arms. The largest number of loci showing a perturbed location on the deletion-bin map was observed in chromosome arm 4DL. Ordering of loci in the 4DL arm bins on the basis of the Ae. tauschii genetic map resulted in interdigitation of loci mapped in the neighboring bins 4DL12 and 4DL13. The Ae. tauschii genetic map shows many rearrangements in that region compared to rice chromosome Os3 . It is therefore possible that chromosome 4 D may contain a paracentric inversion spanning the boundary of bins 4DL12 and 4DL13, which could account for the difficulties encountered during an attempt to recombine wheat homoeologous chromosome arms 4DL and 4BL in the KNA1 region .
A total of 36% of the loci on the diversity maps was mapped on the basis of synteny with rice. Even though mapping of these loci was based on several lines of corroborating information, it is nevertheless an inference and must be treated with caution. The prerequisite corroborating information was not available for the remaining 209 (11.7%) of the A-, B-, and D-genome loci harboring SNPs, and these markers were neither included on the diversity maps nor used in computations of diversity estimates, although they were included in the database http://probes.pw.usda.gov:8080/snpworld/Search. The most frequent reason for the inability to map a locus on the basis of synteny was the failure to identify an orthologous region in rice. Synteny is more rapidly lost in the distal regions of wheat chromosomes due to greater rates of gene deletions and gene duplications in the distal regions than in the proximal regions [5, 40, 41]. This factor contributed to the poor SNP marker coverage in the distal regions of some of the chromosomes. For the same reasons, however, ESTs harboring SNPs that could not be mapped on the basis of synteny are preferentially located in distal chromosome regions. The project SNP database should therefore be interrogated if additional SNPs are needed, particularly those in the distal chromosome regions.
Genetic application of the diversity maps
The diversity maps reported in Additional file 2, Table S1 provide a convenient summary of SNPs http://probes.pw.usda.gov:8080/snpworld/Search in genes that were mapped on the Ae. tauschii map. A θ
value of zero indicates no SNP was present and high values suggest several SNPs at a locus in the respective population of T. aestivum and wild emmer lines. Negative Tajima's D values indicate low frequency SNPs and positive Tajima's D values indicate a predominance of intermediate frequency SNPs at a locus.
Tetraploid wheats were parents of nine synthetic wheats that were screened with data subsequently reported in the SNP database. They included durum lines (Sn24, Sn29, Sn30, and Sn31), the tetraploid component of T. aestivum 'Canthatch' (Sn25 to Sn28), and an emmer line (Sn31). SNPs present in these lines are tabulated in the database. Because they were not used in the computation of diversity measures, θ
may be 0.00 for a gene in Additional file 2, Table S1 but SNPs may exist in the A and B genomes of synthetics wheats in the database. This fact should be kept in mind when a specific locus is interrogated for a SNP on the diversity maps.
Although synthetic wheats RL5402, RL5403, RL5405, and RL5406 share tetraploid Canthatch as the source of their A and B genomes, they are occasionally polymorphic in the database. The tetraploid Canthatch was developed by recurrent backcrossing of the pentaploid hybrid T. durum 'Steward' × T. aestivum 'Canthatch' to T. aestivum Canthatch selecting tetraploid offspring in each generation . SNPs occasionally observed among the four synthetic wheats are presumably residual germplasm of T. durum Steward present in the tetraploid Canthatch, indicating that a complete extraction of hexaploid wheat A and B genomes was not reached in tetraploid Canthatch and that the tetraploid is heterozygous at some loci.
Sampling nucleotide diversity for SNP development
Nucleotide diversity measured as θ
was similar in the T. aestivum A and B genomes and averaged 0.59 × 10-3, which is close to an estimate of 0.8 × 10-3 reported earlier . The agreement between these two independent studies suggests that the sample of the T. aestivum lines used here was representative of T. aestivum and was adequate for SNP discovery in all three wheat genomes. However, nucleotide diversity averaged across genomes of wild emmer (θ
= 0.72 × 10-3) was lower than the estimated θ
= 2.7 × 10-3 for wild emmer as a whole  indicating that the population in the Diyarbakir region has low diversity relative to species-wide samples of wild emmer. This is consistent with earlier RFLP results , which indicated that the greatest diversity in wild emmer exists in northern Israel, southern Lebanon, and southwestern Syria . Because T. aestivum originated in Transcaucasia , the failure to sample wild emmer in those regions may have had a limited effect on the discovery of SNPs relevant for hexaploid wheat. However, it must have had a great effect on the discovery of SNPs relevant for durum wheat, because durum wheat originated in the eastern Mediterranean . Inclusion of only a few durum accessions in the sample screened for SNPs here was inadequate to characterize durum diversity, and an additional SNP search is needed for cultivated tetraploid wheat.
Wheat diversity architecture
In spite of the fact that the three T. aestivum genomes have coexisted within a single nucleus since the origin of T. aestivum, profound differences were found among them. The A and B genomes are more diverse and show more uniform distributions of diversity across the genome than does the D genome. Because of the short time that has elapsed since the origin of T. aestivum, 8,500 years or less , it is unlikely that most SNPs observed in T. aestivum originated there. It is much more likely that SNPs were contributed by gene flow from the ancestral species, tetraploid wheat and diploid Ae. tauschii, or potentially polyploid species of Aegilops having a D genome, such as Ae. cylindrica, that occasionally hybridize with wheat [51, 52].
This intuitive argument is supported by differences in the ratio of replacement to silent polymorphisms in the T. aestivum genomes. Evolution in young polyploids is accompanied by relaxed purifying selection acting on genes, which is shown by an order of magnitude greater rate of fixation of deletions of single-copy genes in tetraploid wheat than in diploid Ae. tauschii and T. urartu . If SNPs observed in T. aestivum were contributed by gene flow, genes in the A and B genomes should show ratios of replacement to silent site variation shifted towards 1.0 (indicating relaxed selection) compared to those in the D genome, which was observed. Additionally, if the haplotypes present in T. aestivum were largely contributed by gene flow, this could increase the effective population size Ne of the A and B genomes relative to the D genome because haplotype recombination in the A and B genomes could have taken place during the evolution of wild emmer. Hence LD in the A and B genomes of T. aestivum is expected to be stronger than in the A and B genomes of wild emmer and LD in the D genome of T. aestivum is expected to be stronger than in the T. aestivum A and B genomes, which is what was observed. We therefore conclude that most of the differences in diversity between the A and B genomes on the one hand and the D genome on the other hand can be attributed to differences in gene flow.
The difference in gene flow among the genomes has a material basis. It is well known that very little reproductive isolation exists between hexaploid and tetraploid wheat because these species readily hybridize and the resulting pentaploid hybrids are usually fertile . In contrast, hybridization between hexaploid wheat and Ae. tauschii is difficult and hybrids are sterile . Landraces of hexaploid and tetraploid wheat have often been grown together, which has facilitated hybridization. In contrast, sympatry between T. aestivum and Ae. tauschii has been limited by the geographic distribution of Ae. tauschii. Greater gene flow from the T. aestivum ancestors into the A and B genomes than into the D genome is therefore expected.
This study substantiated a previous survey of modern wheat varieties with SNPs developed here  and showed that limited gene flow into the T. aestivum D genome has enriched it for rare alleles. The preponderance of rare alleles in the D genome is indicated by the negative average Tajima's D observed in all seven D-genome chromosomes. Site frequency spectra in the T. aestivum genomes show a steeper decline in the D genome than in the A and B genomes, which is consistent with more limited gene flow into the T. aestivum D genome than into the A and B genomes. These observations agree with previous isozyme, RFLP, and SNP studies on the origin of hexaploid wheat, which suggested that wheat originated via a very limited number of hybridization events [23, 24, 26, 56–58]. SNP data generated here showed that 93% of the 138 polymorphic genes in the D genome include only two haplotypes.
Diversity contributed by gene flow into wheat was further shaped by several factors. One was reduced effective recombination accompanying self-pollination, the prevalent mating system in wheat. Self-pollination can reduce the effective population size to half that expected under cross-pollination  and enhance the effects of genetic drift on diversity . Self-pollination, by greatly impacting effective recombination , increases the sizes of chromosomal segments hitchhiking along with positively selected genes [61–64]. Low effective recombination is likely one of the contributing factors of the greatly uneven distribution of diversity in the T. aestivum D genome compared to the A and B genomes; the average θ
per chromosome was found to be six-fold higher in the most-diverse D-genome chromosome compared to the least-diverse D-genome chromosome. Diversity is high along the entirety of chromosomes 1 D and 2 D, the distal portion of the long arm of chromosome 4 D, and both distal regions of chromosome 6 D. In contrast, the entirety of chromosomes 3 D and 5 D, three-quarters of chromosome 4 D, and proximal regions of 6 D have very low levels of diversity. This suggests that under limited gene flow and self-pollination, genetic drift and selection may impact diversity along large chromosomal regions in wheat.
Several A- and B-genome chromosomes show that effects shaping the diversity of entire chromosomes may occasionally take place even under the regime of moderate gene flow in polyploid genomes. Diversity in T. aestivum chromosome 4B mimics in all respects diversity in the D genome. The entire chromosome is diversity impoverished and the chromosome has a highly negative Tajima's D. As in the D-genome chromosomes, most of the 4B genes have either one or two haplotypes. Chromosome 4B is polymorphic for a pericentric inversion in T. aestivum , and homoeologous group 4 has a lower number of genes than the remaining six Triticeae homoeologous groups , presumably due to the translocation of the gene-rich terminal region of the short arm of chromosome 4 to the long arm of chromosome 5 . Recombination takes place primarily in genes. Low number of genes on chromosome 4B would probably result in low crossover frequencies in this chromosome, which was observed . The net effects of limited effective recombination may be that a large portion of this chromosome has hitchhiked during episodes of positive selection during the evolution of T. aestivum or was subjected to a reduction in effective population size during episodes of background selection . A long-range loss of diversity may have also taken place in wild emmer chromosome 5B, which also has a negative average Tajima's D. Another chromosome in which a chromosome-sized loss of diversity has taken place is 4A. In this chromosome, the loss of diversity was undoubtedly caused by the fixation of inversions suppressing recombination in a heterozygous state.
Another factor that must have had a significant impact on the architecture of diversity in wheat is the expression of the Ph1 locus, which is unique to polyploid wheat. Its primary function is to preclude recombination between homoeologous chromosomes [67–69]. Importantly, Ph1 also negatively effects recombination between heterozygous homologues . The activity of Ph1 therefore has similar effects on diversity as self-pollination. For an unknown reason, Ph1 negatively affects recombination in the B genome more than in the A genome . The T. aestivum B genome shows greater variation in diversity among chromosomes than the A genome. The coefficients of variation were 0.18 and 0.21 for θ
, and θ
among the T. aestivum A-genome chromosomes but were respectively 0.30 and 0.38 among the T. aestivum B-genome chromosomes, which is consistent with more reduced recombination in the B genome than in the A genome due to Ph1 effects. Recombination between the Ae. tauschii chromosomes and wheat D-genome chromosomes is even more affected by Ph1 than recombination between wheat heterozygous homologues . In agreement, T. aestivum D-genome chromosomes show the greatest variation in diversity among the three genomes; coefficients of variation were respectively 0.52 and 0.59 for θ
, and θ
among the D-genome chromosomes. We suggest that the synergy of self-pollination and suppression of recombination due to Ph1 results in high levels of random drift, loss of diversity from large chromosome regions, and relatively high variance in diversity among chromosomes.