Plant material and DNA samples
The GPMx population was developed from a cross between a two-rowed barley (Golden Promise, ari-e.GP/Vrs1) and six-rowed barley (Morex, Ari-e/vrs1) at the James Hutton Institute (JHI). DNAs were extracted from one week old seedling tissue using the DNeasy Plant Mini kit (Qiagen). Three 48-plex GBS libraries were constructed from a set of 138 progenies from the F11 single-seed descent generation, along with replicated samples of each parent, respectively.
Plant growth and phenotyping
Ten seeds harvested from a single F9 generation plant of the GPMx RIL population were planted in soil in a polytunnel in spring 2009. Planting was randomized and plants grown using automatic watering. Plant height measurements were performed on mature plants prior to harvest. Plant height for each line was determined by selecting the 3–5 longest tillers and measuring the distance from the ground to the top spikelets (excluding awns). Bulked seeds harvested from 3–5 plants of each line of the F10 generation of the GPMx RIL population were planted in the field in spring 2010. Before planting, TGW (Thousand Grain Weight) of each sample was determined and used to calculate the weight of the seeds to be planted in 1x 2.5 m plots (so that each plot has about the same number of seeds). In total, 17 randomly selected lines and parents were planted as randomized replicates (2-3X). Plant height measurements were performed 3–4 weeks after anthesis following the same procedure as above. Lodged plants were lifted before measuring their height.
Constructing GBS libraries
GBS libraries were constructed in a similar manner to Poland et al. . Briefly:
A set of 48 barcoded adapters (Additional file 1: Table S1) were generated from complementary oligonucleotides (Sigma) with a PstI overhang sequence and unique barcodes of length 4 nt to 8 nt. In addition, a common Y-adapter was generated corresponding to the 5’ TA overhang generated by MseI. Top and bottom strand complementary oligonucleotides for each adapter (50 μM) were annealed using the following program: 95°C for 2 min, decrease to 25°C by 0.1°C/s, hold at 25°C for 30 min. Annealed adapters were diluted 1:10 and their concentration measured using PicoGreen. Barcoded adapters were normalised to 2 ng/μl and the common Y-adapter to 40 ng/μl.
DNAs were digested in 30 μl reactions containing 200 ng of genomic DNA, 1 × NEB buffer 4, 8 u PstI-HF, 8 u MseI, incubated at 37°C for 3 h, then 80°C for 20 min to inactivate the enzymes. For ligation, 4 ng annealed barcoded adapter and 200 ng annealed common Y-adapter were added along with 1 × T4 DNA ligase buffer and 200 u T4 ligase in a total volume of 50 μl. All 48 ligation reactions were incubated at 22°C for 2 h, then 65°C for 20 min.
An aliquot (5 μl) was removed from each ligation reaction, pooled, purified using QIAquick PCR Purification Kit (Qiagen) and eluted in 30 μl of dH2O. PCR amplification was conducted in 50 μl reactions containing 4 μl of pooled and purified library DNA, 1 × high fidelity Phusion polymerase buffer, 0.2 μM dNTP, 0.2 μM primer 1 (complementary to barcode adapter), 0.2 μM primer 2 (complementary to common Y-adapter), 1 u Phusion polymerase Taq. PCR was conducted as follows: 98°C for 30 s for one cycle; 20 cycles of 98°C for 10 s, 65°C for 20 s, 68°C for 20 s; one cycle of 75°C for 5 min, cool to 4°C. The PCR enriched library was gel-purified, selecting the 200–500 bp size fraction, using the MinElute Gel Extraction Kit (Qiagen), eluted in 12 μl dH2O, and quality and quantity of the library measured using a Nanodrop and Agilent Bioanalyzer.
Sequencing and processing raw GBS data
Single-end sequencing from the PstI sites was carried out using Illumina GA II and/or HiSeq2000 sequencer: of the three GBS libraries (GPMx_1, GpMx_2 & GPMx_3), initially GPMx_1 was sequenced on two lanes of Illumina GAII and subsequently all three GBS libraries were sequenced on one lane each of Illumina HiSeq2000. All GBS sequences were submitted to Sequence Read Archive section of the European Nucleotide Archive (ENA) (submission: ERP002594 Genotyping by sequencing of a barley mapping population).
Generation of reference sequences
Reference sequences for the mapping of GBS tags were generated from existing genomic assemblies of the barley cultivars Morex, Bowman and Barke based on Illumina whole genome shotgun sequencing. As a first step in the workflow (see Additional file 3: Figure S1 for a diagram of the full workflow), the EMBOSS program restrict (http://emboss.sourceforge.net/) was used to discover PstI restriction sites in the assemblies. Custom written Java code was then used to extract from the Morex genomic assembly two separate flanking 64 bp sequences extending the restriction site in forward and in reverse direction. This process was repeated for the other two cultivar assemblies and the extracted 64 bp sequences were then compared with the sequences generated from cultivar Morex assembly using the standalone BLASTN program  from NCBI (version 2.2.26+). A single hit was obtained per query, and from this we extracted those hits with alignments along the full length of the query sequence, an identity value of less than 100%, and a mismatch number of at least 2. These hits were added to the full set of Morex flanking sequences, thereby providing a global set of reference sequences from the three barley genome assemblies. To further refine the reference sequences, we screened them for chloroplast DNA, which can be a common feature in whole genome shotgun sequencing. This was done by BLASTN, with the combined set of sequences as query against the full barley chloroplast genome sequence (http://www.ncbi.nlm.nih.gov/nuccore/118430366?report=fasta). Hits were filtered to require sequence identity > = 90%, and an alignment length > = 64. We detected 568 chloroplast DNA sequences that were subsequently removed from the reference set.
Prior to mapping, the raw Illumina reads were assigned to their respective samples (‘deconvoluted’) based on the sample-specific barcodes included in the sequence. Barcode lengths varied between 4 and 8 bases therefore custom written Java code was used for deconvolution, and this also removed the barcodes after assigning the read to a sample, which is a requirement for the successful mapping of the read to a reference sequence. Reads that started with the PstI overhang sequence (TGCAG) after barcode removal were accepted, quality trimmed to remove bases of quality Phred < 20 from the 3’-end (distal to the PstI site), and then shortened from the 3’-end to a standard length of 64 bases. Reads that were shorter than this after quality trimming were discarded.
Reads were then mapped to the 64 bp reference sequences using the Bowtie mapping tool (version 0.12.7, ). To avoid cross-mapping of reads between similar sequences, the “ --best --strata” switch was used, which ensures that multi-mapped reads are only mapped to the location with the fewest mismatches. In order to reduce the number of false positive SNPs during downstream analysis, only a single mismatch per read was allowed (“-v 1”), and only uniquely mapped reads were retained (“-m 1”).
SNP discovery and genotype calling
We used the FreeBayes software  to discover single nucleotide polymorphisms (SNPs), as well as custom Java code for converting the resulting VCF file into a human-readable text file. Within FreeBayes, the SNPs were filtered to retain those where the minimum number of reads with the alternative allele was greater than 3, which provided a total of 57,328 SNPs. We then applied the following filters: the minimum fraction of reads with the alternative allele for a SNP should be greater than or equal to 0.1; the percentage difference between the base qualities for the reference and alternative alleles should be less than or equal to 5; the SNP quality score cut-off should be greater than or equal to 20. This procedure yielded 18,251 SNPs. Then, within Excel, further filters were applied: we required a total read coverage of greater than or equal to 700 (ie. a mean of at least 5 reads for each sample in the population), which left 3,246 SNPs; the percentage of heterozygous samples was less than or equal to 2%, which left 1,985 SNPs; the ratio of alternative allele/reference allele was greater than or equal to 0.5, which left 1,968 SNPs.
Genotypes were then called based on the proportion of the reference allele. We identified this as homozygous for the reference allele if the proportion was greater than 0.8, as homozygous for the alternative allele if the proportion was less than 0.2 and as heterozygous if the proportion is between 0.2 and 0.8. Samples with fewer than three reads if designated homozygous, or with fewer than six reads if designated heterozygous, were recoded as missing. Nineteen SNPs had a missing genotype for one of the parents, and these were also excluded to leave 1,949 SNPs for linkage mapping. Visual inspection of both mappings and SNPs was carried out using the Tablet software .
The SNP data were sorted by decreasing quality score before analysis with JoinMap . This ensured that when co-segregating SNPs were excluded, the lower quality SNPs were preferentially dropped. SNPs with greater than 20% missing values were also excluded from the JoinMap analysis. SNPs were grouped using the independence LOD score, and then ordered within each linkage group using the maximum likelihood algorithm. The GBS tags were mapped to reference sequences generated from Morex, Bowman and Barke WGS shotgun assemblies. Those from Morex contain previously published anchored genetic/physical markers, which we assumed to be correct. We define these as anchoring markers on the genetic linkage groups. Additional file 4: Table S3 provides a list of 1,332 unique co-dominant GBS markers used for map construction and ordered according to their map location on the GPMx population. It highlights 403 genetically redundant markers, the correspondence of all GBS tags to expressed genes (MLOC’s) and their genetic position on the IBSC consensus map (IBSC, 2012).
QTL interval mapping was used to locate QTLs for the 2009 and 2010 height data separately, using MapQTL . A permutation test with 1,000 permutations was used to establish the LOD threshold. Restricted multiple QTL mapping (rMQM mapping) was used to search for further QTLs taking into account the most significant ones. A regression analysis, using Genstat 15 for Windows , was used to test for significant interactions among the selected QTLs. Genstat was also used to test which of the mapped SNPs showed the greatest association with the two-rowed/six-rowed type, using chi-square tests of independence.
Cross-referencing barley genome data sets
In total 4,607 individual sequences from the manifest files accompanying barley OPA SNP mapping platforms  were used to identify corresponding sequences in the barley genome represented by ~2.6 million sequence contigs using the blastN algorithm . The resulting table cross-referenced SNP markers used to define introgressions in the Bowman backcross derived lines  and sequence contigs in the barley genome assembly . Tables containing the contig anchoring results from POPSEQ  are available for download from ftp://ftp.ipk-gatersleben.de/barley-popseq/.