Nucleotide diversity maps reveal variation in diversity among wheat genomes and chromosomes

Background A genome-wide assessment of nucleotide diversity in a polyploid species must minimize the inclusion of homoeologous sequences into diversity estimates and reliably allocate individual haplotypes into their respective genomes. The same requirements complicate the development and deployment of single nucleotide polymorphism (SNP) markers in polyploid species. We report here a strategy that satisfies these requirements and deploy it in the sequencing of genes in cultivated hexaploid wheat (Triticum aestivum, genomes AABBDD) and wild tetraploid wheat (Triticum turgidum ssp. dicoccoides, genomes AABB) from the putative site of wheat domestication in Turkey. Data are used to assess the distribution of diversity among and within wheat genomes and to develop a panel of SNP markers for polyploid wheat. Results Nucleotide diversity was estimated in 2114 wheat genes and was similar between the A and B genomes and reduced in the D genome. Within a genome, diversity was diminished on some chromosomes. Low diversity was always accompanied by an excess of rare alleles. A total of 5,471 SNPs was discovered in 1791 wheat genes. Totals of 1,271, 1,218, and 2,203 SNPs were discovered in 488, 463, and 641 genes of wheat putative diploid ancestors, T. urartu, Aegilops speltoides, and Ae. tauschii, respectively. A public database containing genome-specific primers, SNPs, and other information was constructed. A total of 987 genes with nucleotide diversity estimated in one or more of the wheat genomes was placed on an Ae. tauschii genetic map, and the map was superimposed on wheat deletion-bin maps. The agreement between the maps was assessed. Conclusions In a young polyploid, exemplified by T. aestivum, ancestral species are the primary source of genetic diversity. Low effective recombination due to self-pollination and a genetic mechanism precluding homoeologous chromosome pairing during polyploid meiosis can lead to the loss of diversity from large chromosomal regions. The net effect of these factors in T. aestivum is large variation in diversity among genomes and chromosomes, which impacts the development of SNP markers and their practical utility. Accumulation of new mutations in older polyploid species, such as wild emmer, results in increased diversity and its more uniform distribution across the genome.


Background
While nucleotide diversity studies and the development and deployment of single nucleotide polymorphism (SNP) markers are straightforward in diploid and paleopolyploid species, such as maize or soybean [1][2][3], they are complicated in recently evolved polyploid species by high levels of orthologous gene similarity. Sequence similarity makes sequencing of single genes and allocation of sequences into respective genomes difficult. Special strategies are therefore required for nucleotide diversity studies and the development of SNP markers for young polyploid species, which include wheat and other economically important plants.
A possible strategy for nucleotide diversity studies and SNP discovery in young polyploid species, such as wheat, is to find diverged regions in orthologous genes and use them for the design of polymerase chain reaction (PCR) primers that anneal to only a single DNA target. These genome-specific primers (GSPs) amplify DNA from only a single genome and facilitate gene sequencing and SNP discovery [13]. An alternative strategy is to shotgun-sequence cDNAs and then allocate each sequence to a genome. Both approaches have been used in polyploid wheat [13][14][15] although those studies were of limited scope and genome coverage [14][15][16][17] and none mapped the markers.
A domestication bottleneck at the tetraploid level and a polyploidy bottleneck during the transition from the tetraploid to hexaploid level are expected to have reduced the diversity of polyploid wheat compared to wild emmer. Nucleotide diversity θ π [18] was reported to be 2.7 × 10 -3 in 24 A-and B-genome wild emmer genes [16]. For comparison, θ π was estimated to be 9.7 × 10 -3 in teosinte genes (Zea mays ssp. parviglumis) [3] and 7.7 to 8.1 × 10 -3 in wild barley genes (Hordeum vulgare ssp. spontaneum) [19,20]. The diversity of emmer was reduced by the domestication bottleneck but, curiously, no further diversity loss took place in the A and B genomes during the polyploidy bottleneck accompanying the evolution of T. aestivum from domesticated tetraploid wheat [16]. Levels of diversity in the T. aestivum D genome are unknown.
Genetic evidence suggests that wild emmer was domesticated in the Diyarbakir region in southeastern Turkey [21,22]. The result was hulled domesticated emmer (T. turgidum ssp. dicoccon), which was then the primary source of free-threshing tetraploid wheat, such as durum (T. turgidum ssp. durum, henceforth T. durum). Transcaucasia and northwestern Caspian Iran appear to be the primary sites of the evolution of T. aestivum [23]. Gene flow from wild to domesticated tetraploid wheat and from tetraploid wheat and Ae. tauschii to T. aestivum has been experimentally documented [23][24][25][26][27] but its impact on the evolution of the T. aestivum A, B, and D genomes is not clear.
We report here the development of GSPs for T. aestivum and their use in sequencing of T. aestivum genes with the goal of characterizing the nucleotide diversity of the wheat genomes and discovering SNPs. To make the GSP development possible, a set of primers anchored in conserved exons flanking one or several introns was developed and is also reported. We refer to these as conserved primers (CPs), as in [13]. Primers of this type have also been known as conserved orthologous sets (COS) [28]. A map of genes bearing SNPs constructed in diploid Ae. tauschii is presented and compared with wheat deletion-bin gene maps [29]. Nucleotide diversity in individual chromosomes in a wild emmer population from the Diyarbakir region in Turkey and in T. aestivum was computed and the distribution of diversity among and within wild emmer and T. aestivum genomes was used to analyze the early stages of polyploid evolution.
EST, a reference sequence for a locus and its source, and graphical and numerical displays of each SNP. Reference sequences were used to specify the positions of SNPs. For the majority of the loci, the cv 'Chinese Spring' (code Ta21, Table 1) sequence was used as a reference sequence because of the central position of Chinese Spring in the unrooted phylogenetic tree of 468 T. aestivum lines (Additional file 1, Figure S1). If the sequence from Chinese Spring was unavailable, the next most complete sequence for the locus was used. SNPs can be viewed in the context of the entire reference sequence in the expanded view window for each EST.
The database also contains data for portions of 1,651 genes amplified and sequenced with CPs in T. urartu, Ae. speltoides, and Ae. tauschii http://probes.pw.usda. gov:8080/snpworld/Search. The accession used as a reference sequence for a locus is indicated for each species. Data in the database include 488 polymorphic loci containing 1,271 SNPs for T. urartu, 463 polymorphic loci containing 1,218 SNPs for Ae. speltoides, and 641 polymorphic loci containing 2,203 SNPs for Ae. tauschii. Additional SNPs for Ae. tauschii can be found in the database for the D genomes of the synthetic wheats.

Diversity maps
A single Ae. tauschii EST linkage map [30] was used as the backbone of the diversity maps. The Ae. tauschii   map backbone contained 870 loci (Table 3). Cosegregating genes were allocated into "recombination blocks" which were sequentially numbered (Additional file 2, Table S1). The order of orthologous genes in rice was used to order genes within a recombination block. Synteny of the Ae. tauschii genetic map with the rice genome sequence [30] was exploited in mapping additional loci for which the parents of the Ae. tauschii mapping population were not polymorphic (Table 3) and which met the conditions detailed in Materials and Methods (Map construction). In a few cases, in which an ambiguity was encountered in the rice genome sequence, sorghum and Brachypodium distachyon genome sequences were employed [30,31]. Consider, for example locus BG313769 located on the short arm of chromosome 1 D (Additional file 2, Table S1). This locus was mapped to bins 1AS1, 1BS9, and 1DS1 [32]http://wheat.pw.usda.gov/cgi-bin/ westsql/map_locus.cgi. The locus with the highest sequence similarity in rice is on pseudomolecule Os5 starting at nucleotide 1,903,106. Os5 is homoeologous with 1DS and mapping of the locus in the 1AS1, 1BS9, and 1DS1 bins is consistent with the position of the locus in Os5 (Additional file 2, Table S1). PCR using genomic DNA of N1A-T1B and N1A-T1 D as templates with the BG313769 A-genome GSPs showed that the locus used for the diversity study was on chromosome 1A http://probes. pw.usda.gov:8080/snpworld/Search. Inserting locus BG313769 into the map on the basis of synteny of 1AS1, 1BS9, and 1DS1 with Os5 placed it between recombination block 36 (locus BE445121, which is at 56.85 cM on the Ae. tauschii map and at nucleotide 1,679,201 in the Os5 pseudomolecule) and recombination block 37 (locus BF291549, which is at 57.06 cM on the Ae. tauschii map and at nucleotide 1,954,380 in the Os5 pseudomolecule). Locus BG313769 and its diversity data were therefore placed between loci BE445121 and BF291549. No cM value was attached to the locus but its coordinates on Os5 were given (Additional file 2, Table S1).
Loci corresponding to 484 ESTs were inserted into the diversity maps on the basis of this process (Additional file 2, Table S1), bringing the total number of loci on the map to 1,354 (Table 3). Diversity was estimated from at least one genome for 987 EST loci on the map. From 348,938 to 351,542 bp were sequenced and mapped on the diversity maps for each genome × taxon combination ( Table 4). The numbers of discovered SNPs ranged from 377 in the T. aestivum D genome to 1,979 in the wild emmer B genome. The highest average number of haplotypes per gene and highest average haplotype diversity was in the D genome of synthetic wheats whereas the lowest number of haplotypes per gene and lowest haplotype diversity was in the D genome of T. aestivum (Table  4). In wild emmer and T. aestivum, the average numbers of haplotypes per gene and haplotype diversity did not significantly differ between the A and B genomes (Table  4). However, both variables were significantly higher in the genomes of wild emmer than in the corresponding genomes of T. aestivum (Table 4).

Superimposition of diversity maps on the deletion-bin maps
Wheat EST deletion-bin maps are an important resource for the use of ESTs in wheat comparative mapping, map-based  Table S1). The Ae. tauschii linkage map [30] and wheat deletionbin maps share large numbers of loci, which facilitated comparison of the two sets of maps. Only loci mapped by linkage were used for these comparisons. Totals of 534, 654, and 646 ESTs on the wheat A-, B-and D-genome deletion-bin maps were compared, respectively. The bin location of a locus was considered incongruent between the genetic and deletion-bin maps if it disagreed with the order of recombination blocks (Additional file 2, Table  S1); the order of loci within recombination blocks was disregarded. The known translocation differences involving chromosome 4A and chromosome arms 5AL and 7BS [33,34] were not considered. Because the genetic maps of Ae. tauschii chromosomes are highly colinear with the rice pseudomolecules (Additional file 2, Table S1) most of the disagreements between the linkage maps and deletion-bin maps would have to be due to structural differences between wheat and Ae. tauschii chromosomes or due to incompleteness or inconsistencies in the deletion-bin maps.
The Ae. tauschii linkage map portion of the diversity maps (Additional file 2, Table S1) is expected to be more consistent with the D-genome deletion-bin map than the A-and B-genome deletion-bin maps because the Ae. tauschii chromosomes are phylogenetically more closely related to those of the wheat D genome than to those of the wheat A and B genomes, and this was indeed observed. While the locations of only 8.8% of the loci on the D-genome deletion-bin maps were incongruent with the linkage map, 10.8 and 12.4% of the A-and B-genome loci were incongruent ( Table 5). The greatest discrepancies relative to gene order in Ae. tauschii and rice were encountered in chromosome arms 1AL, 5AS, 7AL, 1BL, 5BS, 4DL, and 7DS, and none were found in chromosome arms 2BS, 2DS, 2DL, 3DS, and 5DL (Table  5 and Additional file 2, Table S1).

Nucleotide diversity
From 609 to 704 genes with estimated diversity were mapped in a genome × species combination (Table 6). However, some of the loci were excluded from diversity  analyses because of small sample size or because of unreasonably high diversity indicating the possibility of orthologous or paralogous sequences being included in a diversity estimate. The numbers of loci used for analyses of diversity were therefore lower ( Table 6). Of the analyzed loci, 305 (52%) and 296 (51%) were polymorphic in the A and B genomes of T. aestivum, respectively, and 316 (54%) and 338 (59%) were polymorphic in the A and B genomes of wild emmer, respectively (Table 6). Only 138 (20%) loci of the 679 analyzed in the T. aestivum D genome were polymorphic (Table 6). Because the same GSPs resulted in the discovery of 477 (74%) SNP-bearing loci in the D genome of synthetic wheats (Table 6), the low number of polymorphic loci in the wheat D genome must be an attribute of wheat, not of Ae. tauschii, its diploid source. Genome-wide θ w , and θ π were similar between the T. aestivum A and B genomes ( Table 7). Both estimates were higher than those in the T. aestivum D genome ( Table 7). The estimates were also similar between the A and B genomes in wild emmer, which showed higher diversity than the corresponding genomes in T. aestivum (Table 7).
Tajima's D contrasts θ w , and θ π to detect differences in the distribution of diversity relative to neutral expectations. The expectation for a neutral locus in a population is a Tajima's D of zero. Positive values of Tajima's D indicate a paucity of rare alleles and a preponderance of intermediate frequency alleles while negative values indicate a preponderance of rare alleles and a paucity of intermediate frequency alleles. Average Tajima's D was near zero in the A and B genomes of T. aestivum and wild emmer but was negative in the T. aestivum D genome and positive in the Ae. tauschii genome present in synthetic wheats ( Table 7). The positive value of Tajima's D in the D genome of synthetic wheats is very likely due to strong subdivision of Ae. tauschii into two major subpopulations. This subdivision has been acknowledged taxonomically by elevating individuals of the two subpopulations to subspecies, Ae. tauschii ssp. strangulata and Ae. tauschii ssp. tauschii [35]. Estimates of diversity at the replacement to silent codon sites in the D genome were similar to those in Ae. tauschii and differed in both genomes from those in the A and B genomes of T. aestivum and wild emmer ( Table 7).

Diversity among individual chromosomes
In the A genome of wild emmer and T. aestivum, diversity was lower in chromosome 4A than in the remaining chromosomes (Table 8). This was true for diversity in coding sequences and in replacement and silent codon  positions (Additional file 1, Tables S1, S2). Because chromosome 4A differs structurally from the Ae. tauschii homoeologue, the distribution of diversity along the chromosome was not investigated and it is not included in Figure 2 and Additional file 1, Figure S2, which illustrate the distribution of nucleotide diversity and the number of haplotypes per gene among and along the A-genome chromosomes. The distribution of diversity on chromosome 4A relative to its rearrangements will be addressed separately. In wild emmer, chromosome 5A also had lower diversity than the genome-wide average. Chromosome 5A of T. aestivum and chromosomes 2A and 7A of wild emmer had higher diversity than the genome-wide average. With the sole exception of T. aestivum chromosome 2A, diversity was low in genes in proximal chromosomal regions and high in genes in distal chromosomal regions ( Figure 2). In T. aestivum chromosome 3A, most genes had only one or two haplotypes (Additional file 1, Figure S2). Average Tajima's D was close to zero in most A-genome chromosomes in T. aestivum with the exception of chromosome 7A, which had a negative average value, and chromosome 6A, which had a positive average value ( In the B genome of T. aestivum, chromosome 2B had higher diversity and chromosome 4B had lower diversity than the rest of the chromosomes (Table 8). Diversity was low across the entire chromosome 4B (Figure 3), which also had the lowest number of haplotypes per locus and lowest haplotype diversity (Table 9). Except for genes in the distal region of the short arm of 4B, in which three haplotypes were observed in several genes, the proximal region of the short arm and the entire long arm had only one or two haplotypes per gene (Additional file 1, Figure S3). The sole exception to this trend in the entire long arm was locus BF201102, which had three haplotypes. However, the third haplotype caused by a singleton SNP that was not observed in any of the remaining 31 accessions could be a sequencing error. No other B-genome chromosome showed a similar pattern either in T. aestivum or wild emmer ( Figure  3 and Additional file 1, Figure S3). Wild emmer chromosome 5B had reduced diversity, lower number of haplotypes per gene, and lower haplotype diversity compared to the genome mean (Tables 8 and 9). Diversity was reduced only in the proximal regions of both arms Table 6 Numbers of loci on the diversity maps harboring one or more SNPs, the total numbers of loci with estimated diversity (nt), and the total numbers of loci used for analyses (na)    Figure 3), and three or more haplotypes were observed in many genes (Additional file 1, Figure S3). As in the A genome, most chromosomes in both T. aestivum and wild emmer showed low diversity in the proximal regions ( Figure 3). T. aestivum chromosome 4B and wild emmer chromosome 5B had highly negative average Tajima's D. In wild emmer, chromosome 4B also had a negative Tajima's D and a high ratio of silent to replacement sites (Additional file 1, Table S1). The D genome was the most uneven of the three T. aestivum genomes in terms of average nucleotide diversity per chromosome ( Table 8). The coefficient of variation among the D-genome chromosomes was three times greater than in the D genome of synthetic wheats (Table 8). Nucleotide polymorphism θ w and nucleotide diversity θ π , the number of haplotypes per gene H, and haplotype diversity h were high in chromosomes 1 D and 2 D compared to genome averages (Table 9) and high values were distributed across the entire lengths of the chromosomes ( Figure 4). Diversity was low in chromosomes 3 D and 5 D and both chromosomes were diversity impoverished across their entire lengths. Chromosome 4 D had low diversity across its length except for the distal region of the long arm in which diversity was high. A similar pattern was observed in chromosome 6 D, in which genes in both distal regions showed relatively high diversity and those in the proximal regions showed low diversity. In only a few genes were there more than two haplotypes (Additional file 1, Figure S4). Genes with more than two haplotypes were invariably in regions of elevated nucleotide diversity in the D genome. Computation of ratios of silent and replacement sites was greatly affected by the low levels of diversity in the D genome and was of limited value (Additional file 1, Table S2). All D-genome chromosomes had negative values of Tajima's D but all seven D-genome chromosomes in synthetic wheats had positive values of Tajima's D (Table 8), which, like the estimates of diversity, shows that negative Tajima's D is an attribute of the T. aestivum D genome but not of its ancestor.
Wall's B is a measure of intralocus linkage disequilibrium (LD). The higher the value of Wall's B the greater the proportion of neighboring sites in complete disequilibrium. In wild emmer and T. aestivum, the A and B genomes showed similar values of Wall's B, which ranged from 0.40 to 0.49 (Table 10). No significant differences were observed among the chromosomes. Triticum aestivum chromosome 7B and the low diversity T. aestivum chromosome 4B had the highest Wall's B values, 0.70 and 0.67, respectively, indicating that genes on those chromosomes have on average the highest levels of intralocus LD in the A and B genomes. The average Wall's B value of combined T. aestivum A and B genomes (0.49) was significantly higher (P = 0.024, paired t-test) than that of wild emmer (0.41), indicating a higher LD in T. aestivum than in wild emmer. In the D genome, average Wall's B (0.81) was significantly higher than in the A and B genomes indicating stronger linkage disequilibrium in the D-genome genes than in the A-and B-genome genes.
for the T. aestivum D genome declined faster than the spectra for the T. aestivum A and B genomes, which is consistent with higher numbers of rare polymorphisms in the D genome than in the A and B genomes as indicated by negative Tajima's D for the T. aestivum Dgenome chromosomes.

SNP discovery
A SNP discovery strategy based on the development of GSPs and their deployment in the search for SNPs in wheat is reported here. As a first step in SNP discovery, a CP pipeline was built starting with 6,045 ESTs [37].
Another aspect of the SNP development strategy employed here that needs consideration is the use of a distantly related relative as a source of information about the exon-splicing boundaries in ESTs for the design of CPs [37]. The reliance on wheat-rice comparisons preferentially selected for the conserved gene repertoire, which is concentrated in the proximal, low-recombination regions of wheat chromosomes [5,30,[40][41][42]. There is also the potential that a focus on conserved loci could result in a downward bias in diversity estimates.
A focus on single-copy loci may also affect the distribution of loci with SNPs along the chromosomes. In wheat [40], as in other plants [43], single-copy genes are preferentially located in the proximal, low-recombination regions whereas distal, high-recombination regions are enriched for multigene families. Focusing on ESTs from single-copy genes may cause preferential development of SNP markers for genes located in the proximal, low-recombination regions of chromosomes.
For these multiple reasons, the SNP markers developed here are more abundant in the proximal, low-recombination regions of wheat chromosomes than in distal, high-recombination regions. This is particularly true for the distal 30 cM of the short arms of chromosomes in homoeologous groups 1, 2, and 3, which are poorly populated with SNP markers.

Diversity maps
Comparative mapping based on RFLP markers showed that gene order along the T. aestivum homoeologous chromosomes is highly conserved and that any one chromosome of a trio of homoeologous chromosomes can be used to approximate gene order along the other two [44] and, as a matter of fact, along homoeologous chromosomes of other species throughout the tribe Triticeae [45]. Gene order is also surprisingly conserved across the entire grass family. Approximately 64, 65, and 66% of the loci on the Ae. tauschii genetic map are colinear with genes along the sorghum, B. distachyon, and rice pseudomolecules, respectively [30].
The conservation of gene order among wheat homoeologous chromosome and across the grass family was exploited here to summarize diversity in the wheat genomes using a single map. A comparative map of Ae. tauschii [30] was selected for that purpose. The high degree of gene synteny across grasses was exploited to insert into that map additional genes that in wheat contain SNPs but could not be mapped in Ae. tauschii for lack of polymorphism.
The utility of the Ae. tauschii linkage map as a representation of the linear order of genes in the wheat genomes depends on the extent to which the assumption of colinearity of the Ae. tauschii and wheat chromosomes is true. Known translocations exist among chromosomes 4A, 5A, and 7B, and chromosome 4A also acquired pericentric and paracentric inversions [33,34]. For chromosomes 4A and the translocated regions of 5A and 7B, the diversity maps reported here are of limited relevance.
Since virtually all of the ESTs employed in SNP discovery here had been previously mapped on the wheat deletion-bin maps, this is the first time it is therefore possible to compare the wheat bin maps with a high density genetic map of a closely related genome. The Ae. tauschii genetic map that formed the backbone of the diversity maps was highly colinear with rice, B. distachyon, and sorghum genomic sequences [30,31]. There was a remarkably good agreement between the deletion-bin maps and the Ae. tauschii genetic map for most chromosome arms and discrepancies were found for less than 10% of the loci. Some of these discrepancies were biological in nature. The greatest number of discrepancies was in the B-genome deletion-bin map and the smallest in the D-genome deletion-bin map. The numbers of paralogous loci in the B genome outnumber those in the A or D genomes 2 to 1 [41]. The B genome is also more prone to translocation [46][47][48] and undoubtedly other structural changes. Both paralogous gene duplications and changes in chromosome structure manifest themselves as breaks in synteny between the Ae. tauschii genetic map and wheat deletion-bin maps. The poorest fit between the genetic map and the deletion-bin maps found here for the B genome is therefore consistent with greater divergence of the B genome relative to the A and D genomes. Although the wheat D-genome map was the most similar to the Ae. tauschii map of the three wheat deletion-bin maps, it too showed discrepancies relative to the Ae.  1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103    the deletion-bin map was observed in chromosome arm 4DL. Ordering of loci in the 4DL arm bins on the basis of the Ae. tauschii genetic map resulted in interdigitation of loci mapped in the neighboring bins 4DL12 and 4DL13. The Ae. tauschii genetic map shows many rearrangements in that region compared to rice chromosome Os3 [30]. It is therefore possible that chromosome 4 D may contain a paracentric inversion spanning the boundary of bins 4DL12 and 4DL13, which could account for the difficulties encountered during an attempt to recombine wheat homoeologous chromosome arms 4DL and 4BL in the KNA1 region [49]. A total of 36% of the loci on the diversity maps was mapped on the basis of synteny with rice. Even though mapping of these loci was based on several lines of corroborating information, it is nevertheless an inference and must be treated with caution. The prerequisite corroborating information was not available for the remaining 209 (11.7%) of the A-, B-, and D-genome loci harboring SNPs, and these markers were neither included on the diversity maps nor used in computations of diversity estimates, although they were included in the database http://probes.pw.usda.gov:8080/ snpworld/Search. The most frequent reason for the inability to map a locus on the basis of synteny was the failure to identify an orthologous region in rice. Synteny is more rapidly lost in the distal regions of wheat chromosomes due to greater rates of gene deletions and gene duplications in the distal regions than in the proximal regions [5,40,41]. This factor contributed to the poor SNP marker coverage in the distal regions of some of the chromosomes. For the same reasons, however, ESTs harboring SNPs that could not be mapped on the basis of synteny are preferentially located in distal chromosome regions. The project SNP database should therefore be interrogated if additional SNPs are needed, particularly those in the distal chromosome regions.

Genetic application of the diversity maps
The diversity maps reported in Additional file 2, Table S1 provide a convenient summary of SNPs http://probes.pw. usda.gov:8080/snpworld/Search in genes that were mapped on the Ae. tauschii map. A θ w value of zero indicates no SNP was present and high values suggest several SNPs at a locus in the respective population of T. aestivum and wild emmer lines. Negative Tajima Tetraploid wheats were parents of nine synthetic wheats that were screened with data subsequently reported in the SNP database. They included durum lines (Sn24, Sn29, Sn30, and Sn31), the tetraploid component of T. aestivum 'Canthatch' (Sn25 to Sn28), and an emmer line (Sn31). SNPs present in these lines are tabulated in the database. Because they were not used in the computation of diversity measures, θ w may be 0.00 for a gene in Additional file 2, Table S1 but SNPs may exist in the A and B genomes of synthetics wheats in the database. This fact should be kept in mind when a specific locus is interrogated for a SNP on the diversity maps.
Although synthetic wheats RL5402, RL5403, RL5405, and RL5406 share tetraploid Canthatch as the source of their A and B genomes, they are occasionally polymorphic in the database. The tetraploid Canthatch was developed by recurrent backcrossing of the pentaploid hybrid T. durum 'Steward' × T. aestivum 'Canthatch' to T. aestivum Canthatch selecting tetraploid offspring in each generation [50]. SNPs occasionally observed among the four synthetic wheats are presumably residual germplasm of T. durum Steward present in the tetraploid Canthatch, indicating that a complete extraction of hexaploid wheat A and B genomes was not reached in tetraploid Canthatch and that the tetraploid is heterozygous at some loci.

Sampling nucleotide diversity for SNP development
Nucleotide diversity measured as θ π was similar in the T. aestivum A and B genomes and averaged 0.59 × 10 -3 , which is close to an estimate of 0.8 × 10 -3 reported earlier [16]. The agreement between these two independent studies suggests that the sample of the T. aestivum lines used here was representative of T. aestivum and was adequate for SNP discovery in all three wheat genomes. However, nucleotide diversity averaged across genomes of wild emmer (θ π = 0.72 × 10 -3 ) was lower than the estimated θ π = 2.7 × 10 -3 for wild emmer as a whole [16] indicating that the population in the Diyarbakir region has low diversity relative to species-wide samples of wild emmer. This is consistent with earlier RFLP results [22], which indicated that the greatest diversity in wild emmer exists in northern Israel, southern Lebanon, and southwestern Syria [22]. Because T. aestivum originated in Transcaucasia [35], the failure to sample wild emmer in those regions may have had a limited effect on the discovery of SNPs relevant for hexaploid wheat. However, it must have had a great effect on the discovery of SNPs relevant for durum wheat, because durum wheat originated in the eastern Mediterranean [22]. Inclusion of only a few durum accessions in the sample screened for SNPs here was inadequate to characterize durum diversity, and an additional SNP search is needed for cultivated tetraploid wheat.

Wheat diversity architecture
In spite of the fact that the three T. aestivum genomes have coexisted within a single nucleus since the origin of T. aestivum, profound differences were found among them. The A and B genomes are more diverse and show more uniform distributions of diversity across the genome than does the D genome. Because of the short time that has elapsed since the origin of T. aestivum, 8,500 years or less [10], it is unlikely that most SNPs observed in T. aestivum originated there. It is much more likely that SNPs were contributed by gene flow from the ancestral species, tetraploid wheat and diploid Ae. tauschii, or potentially polyploid species of Aegilops having a D genome, such as Ae. cylindrica, that occasionally hybridize with wheat [51,52].
This intuitive argument is supported by differences in the ratio of replacement to silent polymorphisms in the T. aestivum genomes. Evolution in young polyploids is accompanied by relaxed purifying selection acting on genes, which is shown by an order of magnitude greater rate of fixation of deletions of single-copy genes in tetraploid wheat than in diploid Ae. tauschii and T. urartu [5]. If SNPs observed in T. aestivum were contributed by gene flow, genes in the A and B genomes should show ratios of replacement to silent site variation shifted towards 1.0 (indicating relaxed selection) compared to those in the D genome, which was observed. Additionally, if the haplotypes present in T. aestivum were largely contributed by gene flow, this could increase the effective population size Ne of the A and B genomes relative to the D genome because haplotype recombination in the A and B genomes could have taken place during the evolution of wild emmer. Hence LD in the A and B genomes of T. aestivum is expected to be stronger than in the A and B genomes of wild emmer and LD in the D genome of T. aestivum is expected to be stronger than in the T. aestivum A and B genomes, which is what was observed. We therefore conclude that most of the differences in diversity between the A and B genomes on the one hand and the D genome on the other hand can be attributed to differences in gene flow.
The difference in gene flow among the genomes has a material basis. It is well known that very little reproductive isolation exists between hexaploid and tetraploid wheat because these species readily hybridize and the resulting pentaploid hybrids are usually fertile [53]. In contrast, hybridization between hexaploid wheat and Ae. tauschii is difficult and hybrids are sterile [54]. Landraces of hexaploid and tetraploid wheat have often been grown together, which has facilitated hybridization. In contrast, sympatry between T. aestivum and Ae. tauschii has been limited by the geographic distribution of Ae. tauschii. Greater gene flow from the T. aestivum ancestors into the A and B genomes than into the D genome is therefore expected.
This study substantiated a previous survey of modern wheat varieties with SNPs developed here [55] and showed that limited gene flow into the T. aestivum D genome has enriched it for rare alleles. The preponderance of rare alleles in the D genome is indicated by the negative average Tajima's D observed in all seven Dgenome chromosomes. Site frequency spectra in the T. aestivum genomes show a steeper decline in the D genome than in the A and B genomes, which is consistent with more limited gene flow into the T. aestivum D genome than into the A and B genomes. These observations agree with previous isozyme, RFLP, and SNP studies on the origin of hexaploid wheat, which suggested that wheat originated via a very limited number of hybridization events [23,24,26,[56][57][58]. SNP data generated here showed that 93% of the 138 polymorphic genes in the D genome include only two haplotypes.
Diversity contributed by gene flow into wheat was further shaped by several factors. One was reduced effective recombination accompanying self-pollination, the prevalent mating system in wheat. Self-pollination can reduce the effective population size to half that expected under cross-pollination [59] and enhance the effects of genetic drift on diversity [60]. Self-pollination, by greatly impacting effective recombination [59], increases the sizes of chromosomal segments hitchhiking along with positively selected genes [61][62][63][64]. Low effective recombination is likely one of the contributing factors of the greatly uneven distribution of diversity in the T. aestivum D genome compared to the A and B genomes; the average θ π per chromosome was found to be six-fold higher in the most-diverse D-genome chromosome compared to the least-diverse D-genome chromosome. Diversity is high along the entirety of chromosomes 1 D and 2 D, the distal portion of the long arm of chromosome 4 D, and both distal regions of chromosome 6 D. In contrast, the entirety of chromosomes 3 D and 5 D, three-quarters of chromosome 4 D, and proximal regions of 6 D have very low levels of diversity. This suggests that under limited gene flow and self-pollination, genetic drift and selection may impact diversity along large chromosomal regions in wheat.
Several A-and B-genome chromosomes show that effects shaping the diversity of entire chromosomes may occasionally take place even under the regime of moderate gene flow in polyploid genomes. Diversity in T. aestivum chromosome 4B mimics in all respects diversity in the D genome. The entire chromosome is diversity impoverished and the chromosome has a highly negative Tajima's D. As in the D-genome chromosomes, most of the 4B genes have either one or two haplotypes. Chromosome 4B is polymorphic for a pericentric inversion in T. aestivum [65], and homoeologous group 4 has a lower number of genes than the remaining six Triticeae homoeologous groups [29], presumably due to the translocation of the gene-rich terminal region of the short arm of chromosome 4 to the long arm of chromosome 5 [30]. Recombination takes place primarily in genes. Low number of genes on chromosome 4B would probably result in low crossover frequencies in this chromosome, which was observed [66]. The net effects of limited effective recombination may be that a large portion of this chromosome has hitchhiked during episodes of positive selection during the evolution of T. aestivum or was subjected to a reduction in effective population size during episodes of background selection [60]. A long-range loss of diversity may have also taken place in wild emmer chromosome 5B, which also has a negative average Tajima's D. Another chromosome in which a chromosome-sized loss of diversity has taken place is 4A. In this chromosome, the loss of diversity was undoubtedly caused by the fixation of inversions suppressing recombination in a heterozygous state.
Another factor that must have had a significant impact on the architecture of diversity in wheat is the expression of the Ph1 locus, which is unique to polyploid wheat. Its primary function is to preclude recombination between homoeologous chromosomes [67][68][69]. Importantly, Ph1 also negatively effects recombination between heterozygous homologues [66]. The activity of Ph1 therefore has similar effects on diversity as self-pollination. For an unknown reason, Ph1 negatively affects recombination in the B genome more than in the A genome [66]. The T. aestivum B genome shows greater variation in diversity among chromosomes than the A genome. The coefficients of variation were 0.18 and 0.21 for θ w , and θ π among the T. aestivum A-genome chromosomes but were respectively 0.30 and 0.38 among the T. aestivum B-genome chromosomes, which is consistent with more reduced recombination in the B genome than in the A genome due to Ph1 effects. Recombination between the Ae. tauschii chromosomes and wheat D-genome chromosomes is even more affected by Ph1 than recombination between wheat heterozygous homologues [70]. In agreement, T. aestivum D-genome chromosomes show the greatest variation in diversity among the three genomes; coefficients of variation were respectively 0.52 and 0.59 for θ w , and θ π among the D-genome chromosomes. We suggest that the synergy of self-pollination and suppression of recombination due to Ph1 results in high levels of random drift, loss of diversity from large chromosome regions, and relatively high variance in diversity among chromosomes.

Conclusions
Distinctly different diversity patterns were found in two closely related polyploid species of differing age, the recently evolved T. aestivum and the older wild emmer. In wild emmer, diversity is uniform among genomes and chromosomes but in T. aestivum, diversity is heterogeneous both among both genomes and chromosomes. These observations suggest the following scenario of polyploid evolution. In a nascent polyploid, diversity almost entirely depends on gene flow from the ancestral species. During that period, diversity is greatly affected by stochastic and directional processes, particularly under self-pollination that is wide spread in polyploids. Dependence on gene flow and the synergy of self-pollination and action of Ph1-like genes results in low and heterogeneous diversity across genomes. If gene flow cannot keep pace with the population expansion, diversity is dominated by rare alleles. Large chromosomal regions or whole chromosomes are subjected to genetic drift and hitchhiking resulting in their low diversity. As time passes, the accumulation of new mutations results in an increased and more uniformly distributed diversity across the genome, as is seen in wild emmer.

CP design
ESTs showing simple cDNA hybridization profiles with T. aestivum genomic DNA in Southern blots http:// wheat.pw.usda.gov/cgi-bin/westsql/map_locus.cgi were selected from the wEST database [71] for CP design. Only ESTs mapped on the wheat deletion bin maps [32,[72][73][74][75][76][77] were used. The wheat deletion-bin maps were constructed by hybridization of random cDNA clones with DNAs of 101 deletion stocks [78] and a set of wheat telocentric stocks [79] that subdivided the 21 wheat chromosomes into 159 bins [80]. CPs located in exons flanking one or more introns were designed on the basis of comparison of wheat EST sequences with rice genomic sequence. EST contigs or EST singletons were extracted from the wEST database http://wheat.pw. usda.gov/cgi-bin/westsql/map_locus.cgi and compared with rice genomic sequences to identify exon/exon junctions. The Primer3 program [81] was modified for PCR primer design in batch mode [82]. A pipeline for batch homology search between wheat ESTs and rice genomic sequence http://avena.pw.usda.gov/SNP/new/bioinformatics.shtml was built [37,82]. With this pipeline, PCR primers were successfully designed for 2,223 EST unigenes, 1,958 EST contigs and 265 EST singletons. Of these, primer pairs for 1,624 were from 5' ESTs and 599 from 3' ESTs. Since these primers were located in exons and were designed on the basis of homology with the rice exonic sequences, they were highly conserved in grasses; hence their name. An additional 290 primers for loci in the distal bins were designed manually using the Primer3 program or the GeneTools primer design program [83].

GSP design
Genomic DNAs of T. urartu accessions G1812 (PI 428198, Turkey) and ICTW600161 (Syria, supplied by J. Valkoun, ICARDA, Syria), Ae. tauschii ssp. strangulata accession AL8/78 (Armenia, supplied by V. Jaaska, Estonian University, Tartu), and Ae. tauschii ssp. tauschii accession AS75 (Shaanxi, China) (Table 11) were used as PCR templates. Pairs of accessions were selected among 193 and 188 accessions representative of the geographic distribution of T. urartu and Ae. tauschiiŗ espectively, and genotyped with random restriction fragment length polymorphism (RFLP) markers [23]. A pair of genetically distant accessions was selected within each species. Target DNAs were PCR amplified using CPs and amplicons were directly sequenced, using CPs as sequencing primers. Triticum uratu and Ae. tauschii are self-pollinating species, and the four accessions were assumed to be homozygous at the targeted loci but Ae. speltoides is cross-pollinating, and it was expected to be heterozygous at many loci targeted for sequencing. DNA of two randomly selected Ae. speltoides F 4 plants from the cross 2-12-4 × PI 136909-12-II/134-1 [84] were therefore used as PCR templates in the hope that at least one was homozygous at a targeted locus and the amplicon could be sequenced.
B-genome sequences were obtained from 'Langdon' durum wheat by PCR amplification of Langdon genomic DNA using CPs. The amplicons were purified using the Promega PCR amplicon purification kit and cloned to the TA cloning site of the pGEM-T vector (Promega) following manufacturer's recommendations, and after transformation, E. coli cells were plated on LB agar medium. Twelve positive transformants were picked and plasmid inserts were PCR amplified using M13-48 and T7 Universal Primers. The PCR reaction consisted of 1X Taq polymerase buffer, 0.2 mM dNTPs mix, 50 pmols of primers, 1U of Taq polymerase, and sterile distilled water. PCR conditions were 10 min at 94°C, 10 cycles of 94°C for 20 sec, 58°C for 20 sec, and 72°C for 2 min, followed by 35 cycles at 94°C for 20 sec, 55°C for 20 sec, and 72°C for 2 min. PCR was terminated by final extension at 72°C for 5 min. Success of PCR amplification was checked by 1% agarose electrophoresis. For sequencing, 5μl of amplified DNA was treated with exonuclease I (USB) and shrimp alkaline phosphatase (USB) according to manufacturer's recommendations in a 10μl reaction volume. The reaction was diluted to 18μl with water before an aliquot was taken for subsequent sequencing. The clones were sequenced as described below. The sequences of the clones were compared with the T. urartu and Ae. speltoides sequences and each Langdon clone was assigned to either the A or B genome. Because CPs annealed to both A-and B-genome templates during PCR amplification of Langdon DNA, chimeric amplicons could be generated during the amplification process [85]. A-genome/B-genome chimeric clones were occasionally encountered and attention was paid to their presence during the assignment of sequences to genomes.
The T. urartu, Ae. speltoides, Ae. tauschii, and Langdon sequences were assembled into contigs with the Staden program [86]. Clean FASTA sequences were aligned with ClustalW [87] or MUSCLE [88] programs. Sequence alignments were visually checked using Bioedit (Tom Hall, Ibis Therapeutics, Carlsbad, CA). Genome-specific nucleotide substitutions were recorded and GSPs were designed. The primers were limited to the Tm range of 55 to 60°C, so that subsequent PCR amplifications could be performed in batches. Genome-specific nucleotide substitution was used as the 3' end of each GSP [89]. In addition, the third nucleotide from the 3' end was occasionally purposely mismatch with the template to increase the genome specificity of the primer. These modifications are included in the GSPs reported in the SNP database http://probes.pw.usda. gov:8080/snpworld/Search. A GSP was combined with one of the CPs to obtain a primer pair for genomespecific amplification. Most of the amplicons therefore consisted of exonic and intronic sequences.

GSP validation
Because the bin location of each targeted gene was known, only the chromosomes of the homoeologous group in which the bins were located were used for PCR validation of GSPs. DNA was PCR amplified from the relevant N-T in the T. aestivum 'Chinese Spring' genetic background [90]. If a GSP functioned properly, DNA of the N-T line nullisomic for the chromosome in the targeted genome produced no amplicon but DNA of the N-Ts for the remaining two chromosomes of the homoeologous group produced amplicons. Primers that produced amplicons with DNA of all three N-Ts failed the validation step. This could happen if the gene was actually located in a different homoeologous group than assumed. Primers that failed the N-T test were therefore used in PCR with N-Ts for all 21 wheat chromosomes. If amplification occurred in all but one N-T, it was assumed that the targeted gene was located on the chromosome that was absent in the N-T line that failed to produce an amplicon. Such primers were considered validated. If none of the N-Ts consistently showed absence of an amplicon in one of the N-T lines, the putative GSP was discarded.

SNP discovery
To maximize the likelihood of the relevance of discovered SNPs to cultivated wheat while minimizing the number of lines screened for SNPs, a resequencing panel representative of lines of wild emmer from the Diyarbakir region (10 lines) and T. aestivum (13 lines) ( Table 1) was used. Twelve of the 13 T. aestivum lines were selected from representative branches of a neighbor-joining tree constructed for 476 T. aestivum lines genotyped at 153 RFLP loci [91] (Additional file 1, Figure S1). One T. aestivum line ('Opata 85') was added because it was one of the parents of the International Triticeae Mapping Initiative (ITMI) mapping population (Table 1) [92]. The wild emmer lines were selected from wild emmer populations in the Diyarbakir region so that each represented a branch in a neighbor-joining tree (Additional file 1, Figure S5) based on genetic distances using 131 RFLP loci [22]. In addition, 9 synthetic hexaploid wheats produced by crossing tetraploid wheat with Ae. tauschii and doubling the chromosome number [12] were included in the screening population. Synthetic wheat is used in wheat breeding as a source of new Dgenome variation [93][94][95]. Four synthetics (Sn25 through Sn28, Table 2) selected for the project were supplied by E.R. Kerber (Agriculture Canada, Winnipeg). They were included because they were previously used as sources of Ae. tauschii chromosomes in the development of disomic substitution lines of single Ae. tauschii chromosomes in the Chinese Spring genetic background (J. Dvorak, unpublished). The donor of the A and B genomes of these synthetic wheats was a tetraploid extraction of T. aestivum 'Canthatch' [50]. Synthetic Sn24 was the parent of the ITMI mapping population [92]. Synthetic wheats Sn29 through Sn31 were extensively used in the CIMMYT wheat breeding program. Finally, synthetic wheat Sn32 was the parent of a RFLP mapping population [44]. The tetraploid parents of Sn24, Sn29, Sn30, and Sn31 were durum and that of Sn32 was emmer. The Ae. tauschii parents of the synthetics, if known, are indicated in Table 2.
Target DNAs of these 32 lines were amplified with GSPs, the sequences were aligned and edited as described above and SNPs were submitted to the central database http://probes.pw.usda.gov:8080/snpworld/ Search. All wheat sequences were also submitted to NCBI. Their accession numbers are HQ389550 to HQ391340.

DNA sequencing
As explained above, a GSP primer pair consisted of a CP and a GSP primer. PCR amplification was performed in seven different labs but sequencing of the amplicons was performed centrally at the Western Regional Research Center, USDA-ARS, Albany, California. Two replicas containing 96 different processed amplicons were made in semi-skirted 96-well PCR plates. To make the first replica, 3 μl of a processed amplicon and 1μl of the corresponding CP primer (3.2 pmol/ul) were placed into a well of one plate. To make the second replica, 3 μl of a processed amplicon and 1μl of the corresponding GSP primer (3.2 pmol/ul) were placed into the corresponding well of a second plate. The plates were frozen and shipped on dry ice to the sequencing lab along with a directory of the amplicons in each well. The plates were thawed in the sequencing lab and 1.5μl of 5X sequencing buffer, 1μl of 50% DMSO, 1μl of Big Dye v.3.1, and 2.5μl of deionized water were added to each well. The cycling conditions were: 5 min at 98°C followed by 40 cycles at 10 sec at 96°C, 5 sec at 50°C, and 4 min at 60°C. DNA was precipitated with ethanol followed by a 70% ethanol rinse, dried, 12 μl of sequencing grade formamide was added, and the DNA was sequenced on an ABI3730xl. Both strands of each amplicon were sequenced; one was produced in the plate containing CPs as sequencing primers (1 st DNA strand) and the other was produced in the plate containing GSPs as sequencing primers (2 nd DNA strand).
The Phred/Phrap [96] or Staden package http://staden. sourceforge.net/ programs were used for base calling and assembly of sequencing trace files. Assembled contigs were edited with the Staden package. Perl and Java programs were written to manipulate the data. The PolyPhred v. 5.0 program [97] and mutation detection modules from the Staden package were utilized for SNP detection.

Map construction
A genetic map based on segregation of markers in a population of 572 F 2 plants from the cross Ae. tauschii AL8/78 × Ae. tauschii AS75 [30] was used as a backbone for the development of diversity maps. The backbone map contained 878 markers of which 863 were ESTs; 12 of the remaining loci were random RFLP markers and three were microsatellite loci. ESTs were mapped either on the basis of RFLP or SNP. The latter were mapped with the SNaPshot™SNP assay (Applied Biosystems, Foster City, California) or GoldenGate Bea-dArray SNP assay (Illumina Inc., SanDiego, California). A total of 174 F 2 plants was used for RFLP and SNaPshot mapping and 560 F 2 were used for Illumina Gold-enGate assays.
The EST markers were compared with the NCBI rice genomic sequence to assess the wheat-rice macrosynteny (henceforth synteny) [30]. Loci that cosegregated were grouped into "recombination blocks" [30] within which they were arranged to parallel the order of orthologous genes in rice. Genes that could not be mapped because of the lack of polymorphism between the parents of the Ae. tauschii mapping population were inserted into the Ae. tauschii map at a location corresponding to that of a putative rice orthologue, provided that the following conditions were met: (1) the allocation of the gene to wheat chromosome by PCR using N-T lines and GSPs agreed with the previous deletion-bin mapping, (2) the section of the wheat chromosome in which the gene resided was homoeologous with the rice chromosome on which the rice orthologue resided, and (3) the bin location of the gene http://wheat.pw.usda.gov/cgi-bin/westsql/map_locus.cgi agreed with the location of the putative rice orthologue. If any ambiguity was encountered, the location of the gene on the Brachypodium distachyon and sorghum pseudomolecules [30,31] was taken into account. The loci inserted into the diversity maps on the basis of synteny with rice are indicated in Additional file 2, Table S1 by having no cM value assigned.

Comparisons of genetic maps with wheat deletion-bin maps
The positions of genes on the Ae. tauschii genetic map were compared with their locations on the wheat deletion-bin maps compiled in the GrainGenes database http://wheat.pw.usda.gov/cgi-bin/westsql/map_locus.cgi. The deletion bins in Additional file 2, Table S1 were named according to the proximal breakpoint delimiting a bin. The proximal-most bin delimited by the centromeric break on the proximal side received the letter c (for centromeric) following the chromosome arm name. Bins within a chromosome were arbitrarily colored in Additional file 2, Table S1. If a gene was previously mapped into a wheat bin located in a different homoeologous group, the cell of the bin map was not colored. If no bin information was available for a locus, the cells were left blank in Additional file 2, Table S1. The bin location of a locus was considered inconsistent with the location of the locus on the deletion-bin map if it conflicted with the linear order of recombination blocks on the genetic map.

Diversity estimation
Diversity was estimated only for mapped loci. The edited alignments of the 13 T. aestivum lines were compared. Nucleotide polymorphism θ w [98], nucleotide diversity θ π [18], average number of haplotypes per locus (H), haplotype diversity (h) [99], Tajima's D [100], and Wall's B [101] were computed for each gene using tools from the libsequence library [102]. The same descriptive statistics were computed for 10 lines representative of wild emmer in the Diyarbakir region and 9 accessions of synthetic wheat.
An unbiased estimation of diversity of a population requires the alignment of homologous sequences. This prerequisite is complicated in polyploid populations by the potential for inadvertently incorporating homoeologous sequences into the alignments, which would upwardly bias the estimates of average sequence diversity, sometimes dramatically. Coalescent simulations were used to estimate diversity variance expected under neutral coalescent histories. Simulations were performed in ms [103] and results were summarized using the msstats tool from the libsequence library [102]. For the A and B genomes, simulations were based on estimates of diversity in wild emmer, with the generative value of θ based on mean θ for each chromosome and the average length of amplicons for the same chromosome. A total of 10,000 simulations per chromosome was performed. The 99 th percentile of θ π from the 10,000 simulations was taken as the upper bound of θ π expected for each chromosome. Loci in both wild emmer and T. aestivum for which empirical estimates of θ π exceeded the upper bound were excluded from estimation of sequence diversity. Simulations for the D genome were based on the mean chromosome-specific estimates of θ π for the D genome of synthetic wheats. Loci for which empirical estimates of θ π in synthetic wheats exceeded the upper bound for the chromosome were excluded from further analysis in both synthetic wheats and T. aestivum. Loci excluded from computations are reported in Additional file 2, Table S1 but are indicted by a yellow cell color. Data containing less than 75% of the lines were also excluded from the analyses. They are reported in Additional file, Table S1 but are indicted by a red cell color.
The polydNdS program from libsequence [102] was used to estimate polymorphism at replacement and silent codon positions. The outputs estimated diversity for the whole gene, exons only, introns plus flanking sequences (UTRs) only, and replacement and silent polymorphisms. Only codons that differed at one position or codons that differed at two positions where the sites could be unambiguously assigned as synonymous or nonsynonymous were used.
The frequency spectrum of the less frequent (minor) allele was estimated in the sample of 10 homozygous wild emmer lines (10 chromosomes) and 13 homozygous T. aestivum lines (13 chromosomes). The distribution of the minor allele frequency in a sample of size n is described by the folded spectrum [36,104], which estimates the frequency of SNP sites with a minor allele in the i th chromosome of n investigated chromosomes (i ranges from 1 to ≤ n/2). The folded spectra were computed for silent and replacement codon positions for individual genomes of wild emmer and T. aestivum.

Statistical tests
Significance of differences among genomes was tested with the GLM and LSD procedures (SAS), using mean θ, the mean number of haplotypes (H) and mean haplotype diversity (h) per chromosome as variables. Because θ, H, and h across loci are not normally distributed, the GLM procedure could not be used to evaluate the significance of differences in these variables among chromosomes. The significance of differences between chromosome means was tested by estimating a 99% confidence interval (CI) about the genome mean of θ, H, and h from the distribution of 1000 means of random samples drawn with replacement from the population of θ, H, and h of genes within a genome (A, B, D genomes) nested within a species (T. aestivum and wild emmer). Chromosome means outside of the 99% CI were declared significantly different from the genome mean.

Additional material
Additional file 1: Table S1 summarizes estimates of nucleotide polymorphism θ w and nucleotide diversity θ π at the replacement (N) and silent (S) codon positions, and noncoding portions of genes and the ratios of diversity at the replacement and silent codon positions in genes in the individual chromosomes of the A and B genomes of T. dicoccoides population from the Diyarbarkir region in Turkey. Table S2 summarizes estimates of nucleotide polymorphism θ w and nucleotide diversity θ π at the replacement (N) and silent (S) codon positions and in noncoding portions of genes and the ratios of diversity at the replacement and silent codon positions in the A, B, and D genomes of T. aestivum. Figure S1 is a neighbor joining unrooted tree of 476 T. aestivum accessions constructed from Nei's genetic distances computed from RFLP at 131 loci. The tree depicts genetic relationships among 13 T. aestivum lines used for resequencing and SNP discovery. Figures S2 and S3 show the numbers of haplotypes per gene along the A-genome and B-genome chromosomes, respectively, in T. aestivum and wild emmer (T. dicoccoides) in the Diyarbakir region in Turkey. Figure S4 shows the numbers of haplotypes per gene along the D-genome chromosomes in T. aestivum. Figure S5 is a neighbor joining unrooted tree of 55 wild emmer (T. dicoccoides) accessions from the Diyarbakir region in Turkey constructed from Nei's genetic distances computed from RFLP at 153 loci. The tree depicts genetic relationships among 10 wild emmer accessions used for resequencing and SNP discovery in wild emmer.
Additional file 2: Table S1 is an Xcel table summarizing locus diversity measures in the A, B, and D genomes of T. aestivum, the A and B genomes of Diyarbakir population of wild emmer, and the D genome of synthetic wheats. Table S1 further shows synteny of the diversity map with the wheat deletion-bin maps and the rice 12 pseudomolecules.