SNP and haplotype diversity in target genes
Molecular analyses of 12 genes from seven loci known to be involved in the genetic determination of growth habit and inflorescence type - the two major traits causing strong population structuring - from a unique set of 102 barley accessions revealed high sequence diversity and complex haplotypes. For VRN-H1, VRN-H3, and VRS1 - the three genes where we had sequence information from the full set of 102 genotypes - we found 10, 39, and nine SNP sites, which resulted in eight, seven, and seven haplotypes, respectively. Using a subset of the core set - 26 spring, two facultative, and two winter growth habit accessions - also revealed high nucleotide diversity at six loci: VRN-H2 (10 SNPs and four haplotypes of HvSNF2), PPD-H1 (20 SNPs and six haplotypes of HvPRR7), PPD-H2 (7 SNPs and 3 haplotypes of HvFT3), and FR-H2 ( 19 SNP and five haplotypes for HvCBF3, 17 SNP and six haplotypes for HvCBF6, eight SNP and six haplotypes for HvCBF9). Identification of these haplotypes in marker-assisted breeding programs will require the use of multiple SNPs per gene, or alternative assays.
Multi-locus haplotypes define growth habit classes
Even more genetic diversity was found when we considered multi-locus haplotypes of loci involved in the regulation of growth habit. Functional polymorphisms in, and epistatic interactions between, the three VRN-H and two PPD-H loci have been reported [12, 14, 23, 27]. Based on the proposed functional polymorphisms, we determined multi-locus haplotypes of the three VRN-H and two PPD-H loci and found 16 of the possible 32-allele combinations (Additional File 1). Thirty-two out of the 85 spring growth habit cultivars had "spring" alleles at all five loci, while 41 spring growth habit genotypes had recessive (winter) vrn-h3 (HvFT1) and "spring" alleles at the other four loci. The abundance of the winter vrn-h3 allele in cultivated spring growth habit germplasm was unexpected since allelic variants at the VRN-H3 locus were previously reported only in exotic barley genotypes . Seven out of 11 winter growth habit genotypes had "winter" alleles at the five loci. In total, vernalization-insensitive (spring and facultative growth habit) genotypes had 13 multi-locus haplotypes and vernalization-sensitive (winter growth habit) genotypes had three. Our results show that a large number of single-locus and multi-locus haplotypes are involved in determining flowering time and thus growth habit.
Effect of population structure and linkage disequilibrium on association analysis
The CAP Core is a limited sample of heterogeneous germplasm. The array is comprised of accessions representing a range of growth habit, row type, usage and origin. However, this set is representative of the array of lines usually used in barley breeding programs of North America. The distribution and the number of lines in each group defined by the PCA analysis follows the same pattern in the CAP I, II and III sets and in the CAP Core (Figure 2). We used row type and growth habit (as measured by flowering time of non vernalized plants under long photoperiod) as models for genome-wide association analysis for several reasons. These phenotypes exemplify two different genetic scenarios. In the case of row type, there are multiple SNPs in the VRS1 gene that lead to the loss of function phenotype. The genes determining vernalization sensitivity interact epistatically and the functional basis is known. The two phenotypes can be scored unequivocally. Finally, growth habit and row type are the two traits considered world-wide to define germplasm structure in barley. Even after accounting for structure in the CAP Core, we were able to identify markers associated with the two traits because there was some degree of admixture.
The limited admixture between the three subgroups representing combinations of growth habit and row type (spring two-rowed, spring six-rowed, and winter six-rowed) is likely due to the tendency of geographically dispersed breeding programs to limit germplasm exchange. Accessions from the same structure group will have a more similar history of selection for genomic regions that control row type and growth habit than accessions from other groups. Within each structure group, accessions from any given breeding program will also tend to be more similar. As a consequence, the extent of LD is longest in germplasm of the same growth habit/row type group. However, the extent of LD within each group was reduced by the fact that accessions within each group came from different programs. The reduced estimated extension of LD in the winter six-rowed group may also be due to the small sample size (33 individuals). Barley, a self-pollinated species, is reported to have a sufficiently high level of LD that, with sufficient marker coverage, association mapping should be effective [2, 11, 40]. In the case of the CAP Core, we found an average extension of LD of 0.7 cM. Since the average marker density is 0.6 cM and we have markers (SNPs and/or resequencing-based assays) in the candidate genes responsible of the target traits, we would expect genome-wide association mapping to be effective.
However, the estimated extension of LD in any germplasm array must be taken cautiously, since the estimated values are averaged within each chromosome. There are individual cases of high r
2 values between distant loci (up to 100 cM in the same chromosome and even in different chromosomes) and also other cases where LD decreases abruptly within a few hundreds of bp (Figure 1).
Genome-wide single marker association analysis
In the CAP Core we found significant associations for SNPs located in the regions of the genome where the target genes are located (Figure 3A and 3C). However, we also found other significant associations at alpha = 0.05, after multi-test adjustment, for 140 row type and seven vernalization sensitivity markers in other regions. Therefore it would be difficult to identify VRS1 and the VRN-H genes based on these results. Without a priori information regarding the locations of the target genes, it would be very difficult to differentiate the true associations from the spurious ones in the CAP Core.
Including "synthetic" allele type markers for VRS1 and VRN-H2 we obtained more highly significant associations than with any SNP marker. This is especially relevant in the case of VRS1, where some SNPs were located within the candidate genes (Figure 3A and 3C). In the case of VRN-H1, although some SNP markers from the determinant genes were also included, two other BOPA markers showed p -values comparable to the allele type marker (Figure 3C). However, these marker loci mapped 10 and 20 cM from VRN-H1. Even though these BOPA makers are in LD with the functional polymorphism they would not identify VRN-H1 without prior knowledge of this gene's location and function.
Analysis of interactions
In addition to the significant interaction between the two markers for the functional polymorphisms in VRN-H1 and VRN-H2, we found a highly significant interaction for VRN-H2 with two markers located on the short arm of chromosome 2H (Table 2). No genes or QTLs for winter vs. spring growth habit are reported in this region, but one allele is predominant at both loci in the vernalization-sensitive accessions in the CAP Core (all 11 accessions), and OSU CAP I, II and III (96 out of 116 lines). The fact that the same allele was also present in a considerable number of vernalization insensitive (facultative) lines of the OSU CAP I, II and III (93 out of 129) suggests that this region could play a role in any of the multiple aspects of winter growth habit (e.g. low temperature tolerance or photoperiod sensitivity) since these are target traits in the OSU breeding program. A member of the CBF gene family (HvCBF8) cosegregates with this locus in the consensus map. Other members of the CBF gene family are implicated in low temperature tolerance [28, 41, 42], but no direct effect of HvCBF8 on cold tolerance has been reported.
Regarding row type, we found a perfect association for the interaction between allele type marker VRS1_AT with four other markers (one each on chromosomes 1H, 2H, 4H and 5H) (Table 3). All four markers are located in regions where other loci related to row type have been described . These include VRS3  on the long arm of chromosome 1H, VRS2  and int-b  on the short arm of 5H. The marker in 2H co-locates with VRS1 in the consensus map . The marker in 4H maps to the reported position of INT-C [39, 47]. Alleles at these four marker loci follow a consistent pattern in the parents of the OWB population (OWB_Dominant (six-rowed) and OWB-Recessive (two-rowed)). However, there is no correlation between alleles at these four regions and the row type in the DH progeny after one cycle of recombination (even marker 1_0213 on 2H is monomorphic in this population segregating for row type). Therefore, genes in these regions are not causal for row type, but have reached a high degree of fixation in two-rowed vs. six-rowed germplasm. INT-C is an excellent example, where a specific dominant allele is necessary for the commercial six-rowed phenotype, but dominant or recessive alleles are found in cultivated two-rowed germplasm.
Results of AM from larger populations
Using 2,575 lines we found significant associations in the same regions on chromosomes 2H and 1H that we detected using the CAP Core. Regarding 2H, these results confirm that the extension of LD in barley, even in highly structured germplasm, is sufficient to detect associations with markers in LD with a gene, such as VRS1, where different loss-of-function mutations lead to the same phenotype. The most significant markers are located in a 5 cM interval around VRS1 and one of them (3_0900) is located within the coding region of the gene, although it is not the marker with the highest significance (Figure 3B). On chromosome 1H we found significant markers in the same region we identified using interaction analysis and the CAP Core data. These associations may correspond to VRS3  or to a gene, or genes, that interact with the transcription factor encoded by VRS1. VRS3 has not been characterized phenotypically or genetically in the CAP germplasm and none of the annotations for significant EST-based SNPs correspond to obvious candidate genes (HarvEST: http://www.harvest-web.org/). The CAP Core interaction analysis, and the analysis of the full set identified associations on chromosome 5H in the vicinity of INT-B. On chromosomes 3H, 6H and 7H significant associations were found in both analysis. There are no obvious candidate genes for these associations. Therefore, with the exception of INT-C (chromosome 4H), the two approaches identified the same regions.
In the case of vernalization sensitivity, using 247 accessions we found significant associations on chromosomes 4H and 5H at the same regions identified using interaction analysis based on the 102 accessions in the CAP Core (Figure 3B and 3D). These associations correspond to the locations of VRN-H2 and VRN-H1, respectively. As described by , the most significant associations may not always correspond to candidate genes. There is no obvious candidate for the 2H associations, which differed in position between the two analyses.