Five 1536-SNP GoldenGate assays (Figure 1, Table 2)
Three pilot-phase 1536-SNP GoldenGate assays were developed. These "pilot OPAs" are referred to as POPA1, POPA2 and POPA3. Two 1536-SNP production-scale OPAs, referred to as BOPA1 and BOPA2, were developed from SNPs tested on the pilot OPAs. All sequences used as SNP sources were generated using the Sanger dideoxy chain termination method.
POPA1 and POPA2
The contents of POPA1 and POPA2 came from an initial list of SNPs comprised of the union of three intersecting lists from SCRI (1,658 SNPs), IPK (985 SNPs) and UCR (12,615 SNPs). SCRI and IPK SNPs were derived from PCR amplicon sequences, whereas UCR SNPs were derived nearly entirely from EST sequences. In the selection of SNPs for the OPAs, preference was given to SNPs derived from amplicon sequences. Nearly all SNPs on POPA1 and about 60% of the SNPs on POPA2 targeted stress-regulated genes. The composition of POPA1 included 1524 barley SNPs, one per gene, of which 1033 were derived from ESTs and 491 from amplicon sequences. The composition of POPA2 included 1536 barley SNPs, one per gene including 258 genes represented on POPA1, of which 1456 were from ESTs and 80 from amplicon sequences.
BOPA1 represented 705 SNPs from POPA1 and 832 from POPA2, including one SNP in common. All BOPA1 SNPs had a satisfactory technical performance on POPA1 or POPA2 and a minor allele frequency of at least 0.08. To the extent of results presented in this manuscript, BOPA1 included 1414 mapped and 122 unmapped SNPs.
Residual SNPs from the sources of POPA1 and POPA2 were insufficient to complete the design of POPA3 without compromising on the SNP selection criteria. Additional SNPs for POPA3 came from three sources: 1) an extended list of 5,732 SNPs identified in SCRI amplicon sequences, 2) colleagues who contributed SNPs from amplicon sequences of specific genes of biological interest and 3) an expanded barley EST resource. The first two of these additional sources were exhausted for POPA3 design. In the selection of EST-derived SNPs, priority was given to genes previously classified as having interesting expression patterns during malting or upon exposure to pathogens, or relevant to malting, brewing quality, abiotic stress or phenology. The composition of POPA3 included 1536 barley SNPs, in many cases more than one per gene and in some cases including genes represented on POPA1 or POPA2. In total, 967 POPA3 SNPs were derived from ESTs and 569 from amplicon sequences.
BOPA2 represented 406 SNPs from POPA1, 178 from POPA2 and 952 from POPA3. The primary emphases of BOPA2 were representation of mapped SNPs that were not included on BOPA1 and inclusion of multiple SNPs for certain genes to reveal haplotypes at these loci, with some weight given to MAF. BOPA2 contained 921 SNPs with MAF at least 0.08, 256 SNPs with MAF at least 0.04 but less than 0.08, 345 SNPs with MAF least 0.005 but less than 0.04, and 14 SNPs with only one allele (MAF = 0) in the germplasm examined using POPA3. To the extent of results presented in this manuscript, BOPA2 included 1263 mapped and 273 unmapped SNPs. A total of 967 SNPs were from ESTs and 569 from amplicon sequences.
Table S4 (Additional File 14) provides alternative SNP names arising from this work, and several annotation fields for all SNPs represented on POPA1, POPA2, POPA3, BOPA1 and BOPA2. The annotations include BLAST hits to the rice and Arabidopsis genomes and UniProt, the relationship of SNP source sequences to HarvEST:Barley unigenes and probe sets on the Affymetrix Barley1 GeneChip and source consensus sequences. To assign SNP loci on the genetic map to chromosome arms, centromere positions were identified using flow-sorted chromosome arms following the method described in Simkova et al. ; results of this work will be described elsewhere (Bhat et al., in preparation). The annotation information in Table S4 (Additional File 14) is also available from HarvEST:Barley  and . The HarvEST BLAST server  provides the 2943 mapped SNP unigene sequences as a searchable database.
Genomic DNAs of 93 doubled haploid maplines and the parents (Dom, Rec) of the Oregon Wolfe Barley (OWB) population [26, 27] 148 doubled haploids and the parents of the Steptoe × Morex (SxM) population [7, 28], 95 doubled haploid maplines and the parents of the Haruna Nijo × OHU602 (HxO) population and 213 additional germplasm samples were purified using Plant DNeasy (Qiagen, Valencia, CA, USA) starting with 100-300 mg of young seedling leaves. Genomic DNAs of 93 doubled haploid maplines and the Barke parent from the Morex × Barke population (Stein et al. unpublished) were produced using a CTAB method. All DNA samples were checked for concentration using UV spectroscopy and Quant-iT PicoGreen (Invitrogen, Carlsbad, CA, USA) and adjusted to approximately 120 ng/μl in TE buffer.
Data production for map construction and MAF estimation
DNA concentrations were re-checked using Quant-iT PicoGreen (Invitrogen, Carlsbad, CA) and standardized to 80 ng/μl in TE buffer in preparation for the GoldenGate assay and 5 μl (400 ng) were used for each assay. Data were generated from each progeny line in the OWB, SxM and MxB doubled haploid populations using POPA1 and POPA2. Data were also produced using POPA3 from the complete OWB and MxB sets of DNA samples, but from only 92 SxM doubled haploids. Data from 95 HxO doubled haploids using BOPA1 were also generated. For each of these four mapping populations, extensive integration of SNP data with other types of marker data will be described elsewhere (for example OWB marker integration in Szűcs et al. ). Data used for the determination of allele frequency (see below) came from 125 germplasm samples for POPA1, 195 germplasm samples for POPA2, and 189 germplasm samples for POPA3.
Raw data were transformed to genotype calls, initially using Illumina GenCall and subsequently using Illumina BeadStudio version 3 with the genotyping module. For each OPA, the data from all samples were visually inspected to manually set 1536 archetypal clustering patterns. The cluster positioning was guided by knowledge that heterozygotes are nearly non-existent in doubled haploids and rare in highly inbred parental genotypes and germplasm samples. Several "synthetic heterozygote" DNA samples were made by mixing parental DNAs in a 1:1 mass ratio (Figure 2A, green dots), and included to anchor heterozygote cluster positions to enable the identification of true heterozygotes which occur at a significant frequency in germplasm samples that have not been sufficiently inbred to reach a state of genome-wide allele fixation. The spatial positions of heterozygote and homozygote data clusters were confined to areas of high certainty so that data points with less certainty fell outside the boundaries of heterozygotes and homozygotes and were scored as "no-call" (Fig 2A, one germplasm sample as black dot). Polymorphisms with theta compressed clusters were not used if the compression was such that any homozygote call was not plainly distinguishable (Figure 2B, set as Gentrain 0.000, 100% "no call"). Vertically separated data clusters were not accepted as polymorphisms (Figure 2C). Following the production of one master workspace for each Pilot OPA using all DNA samples, customized workspaces were produced for each mapping population to optimize the genotype calls via minor adjustments of cluster positions. Genotype calls were exported as spreadsheets from BeadStudio and then parsed to create input for mapping programs.
Individual and consensus map production
Individual maps were made principally using MSTMap [29, 30] for each data set from the four doubled haploid mapping populations. In brief, MSTMap first identifies linkage groups, then determines marker order by finding the minimum spanning tree of a graph for each linkage group, then calculates distances between marker using recombination frequencies. JoinMap 4  was used to confirm linkage groups and marker order determined by MSTMap. Raw data for problematic markers were reviewed using BeadStudio and then either the marker was discarded entirely if any ambiguity in data calling could not be resolved or individual genotype calls were modified if it was plainly evident that such adjustments were warranted. Each such review of primary data was followed by the production of new maps; this iterative process generally involved 10-20 cycles for each individual map. At several points, a consensus map was produced using MergeMap , which also flags problematic markers for review. MergeMap takes into account marker order from individual maps and calculates a consensus marker order. Briefly, the input to MergeMap is a set of directed acyclic graphs (DAGs)  from each individual map, and the output is a set of consensus DAGs (Figure 3, Figures S3-S9, Additional Files 6, 7, 8, 9, 10, 11, 12), where each is consistent with all (or nearly all) of the markers in the individual input maps. MergeMap then linearizes each consensus DAG using a mean distance approximation. The consensus map coordinates from MergeMap were normalized to the arithmetic mean cM distance for each linkage group from the four individual maps (Figure S2, see Additional file 4 and Table S4, see Additional file 14).
Implementation of BOPA1 and BOPA2 in US barley breeding germplasm
As part of Barley CAP , the two BOPAs have been part of an effort to genotype a total of 3840 US barley breeding lines contributed from ten US barley breeding programs for association mapping analyses. As of January 2009, data from both BOPAs had been generated for 1920 breeding lines, with 960 submitted from the selections of each of two years, 2006 and 2007. Table S5 (Additional File 15) provides MAF for observed in these samples for each SNP in BOPA1 and BOPA2.