- Research article
- Open Access
Genome-wide association study for flowering time, maturity dates and plant height in early maturing soybean (Glycine max) germplasm
BMC Genomics volume 16, Article number: 217 (2015)
Soybean (Glycine max) is a photoperiod-sensitive and self-pollinated species. Days to flowering (DTF) and maturity (DTM), duration of flowering-to-maturity (DFTM) and plant height (PH) are crucial for soybean adaptability and yield. To dissect the genetic architecture of these agronomically important traits, a population consisting of 309 early maturity soybean germplasm accessions was genotyped with the Illumina Infinium SoySNP50K BeadChip and phenotyped in multiple environments. A genome-wide association study (GWAS) was conducted using a mixed linear model that involves both relative kinship and population structure.
The linkage disequilibrium (LD) decayed slowly in soybean, and a substantial difference in LD pattern was observed between euchromatic and heterochromatic regions. A total of 27, 6, 18 and 27 loci for DTF, DTM, DFTM and PH were detected via GWAS, respectively. The Dt1 gene was identified in the locus strongly associated with both DTM and PH. Ten candidate genes homologous to Arabidopsis flowering genes were identified near the peak single nucleotide polymorphisms (SNPs) associated with DTF. Four of them encode MADS-domain containing proteins. Additionally, a pectin lyase-like gene was also identified in a major-effect locus for PH where LD decayed rapidly.
This study identified multiple new loci and refined chromosomal regions of known loci associated with DTF, DTM, DFTM and/or PH in soybean. It demonstrates that GWAS is powerful in dissecting complex traits and identifying candidate genes although LD decayed slowly in soybean. The loci and trait-associated SNPs identified in this study can be used for soybean genetic improvement, especially the major-effect loci associated with PH could be used to improve soybean yield potential. The candidate genes may serve as promising targets for studies of molecular mechanisms underlying the related traits in soybean.
Flowering, maturity and plant height in plants are complex traits controlled by internal and external factors. They considerably impact the adaptability, biomass and economic yield in agricultural crops. The genetic signaling pathways of flowering have been well characterized in Arabidopsis, a model organism in plant biology. The floral integrator genes FLOWERING LOCUS T (FT) and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 (SOC1) play a central role in flowering regulation. They promote the expression of a group of floral meristem identity genes such as APETALA 1 (AP1), LEAFY (LFY) and CAULIFLOWER (CAL) to initiate the floral transition of the plant . In the floral pathway, upstream of FT and SOC1 are FLOWERING LOCUS C (FLC) and CONSTANS (CO), which are two key regulators of floral integrators but in different manners. In Arabidopsis, a high level of FLC represses expression of both FT and SOC1, and it can be released by vernalization treatment or autonomous development process . CO is circadian-regulated and involved in the photoperiod pathway through promoting expression of FT and SOC1 [2,3]. Additionally, more genes involved in the inductive photoperiod pathway, the vernalization pathway, the autonomous pathway and the gibberellins pathway have been characterized in Arabidopsis [1,4,5].
Soybean (Glycine max) is a major crop of agronomic importance grown across a wide range of latitudes from 50°N to 35°S . However, every soybean cultivar adapts to a limited latitudinal region because of photoperiod sensitivity. Days to flowering (DTF) and maturity (DTM) and plant height (PH) are important traits related to soybean adaptability and productivity . DTM represents the entire duration of growth and development, consisting of two periods: DTF and duration of flowering-to-maturity (DFTM). All the four traits are quantitatively inherited in soybean. Previous studies identified nine major-effect loci affecting flowering and maturity in soybean, which have been designated as E1 to E8, and the J locus for “long juvenile period” . Of these genes, E1, E2, E3 and E4 have been map-based cloned and functionally characterized. E1 encodes a nuclear-localized B3 domain-containing protein, which is induced by long day conditions and is inversely related to both GmFT2a and GmFT5a expression , two FT orthologs promoting early flowering in soybean . E2 encodes a homolog of GIGANTEA, which regulates expression of CO and FT in Arabidopsis, and controls soybean flowering through regulating GmFT2a but not GmFT5a . E3 and E4 encode phytochrome A (PHYA) proteins GmPHYA3 and GmPHYA2, respectively [11,12]. Loss-of-function alleles of E1, E3 or E4 lead to photoperiod insensitivity and promote early flowering under long day conditions [8,11,12], which are important for soybean plants, a typical short-day crop, to adapt to high-latitude environments. In addition to these major loci, many minor-effect quantitative traits loci (QTLs) related to soybean flowering and maturity have also been identified (SoyBase, www.soybase.org). Recently, comparative genomic analyses revealed that there are a large number of soybean orthologs of Arabidopsis flowering genes [6,13], suggesting a complex genetic basis of flowering and maturity in soybean.
Overlaps between PH loci and maturity loci have been observed in soybean [14,15], indicating that PH and maturity might share a similar genetic basis to some extent. Previous research showed that stem termination affected both stem elongation and maturity in soybean, and the gene determinate stem 1 (Dt1) plays a primary role in soybean determination . Determinate soybean plants cease stem elongation through initiating floral transition of the shoot apical meristem (SAM) soon after photoperiod-induced floral transition. In Arabidopsis, stem elongation is regulated through cell wall component modification, phytohormone- and light-regulated development [17-19]. To date, at least 180 QTLs associated with PH have been reported across all the 20 chromosomes in soybean (SoyBase, www.soybase.org). However, limited knowledge of genes conditioning soybean stem elongation is available.
Genome-wide association study (GWAS) using high-density markers and a population of non-cross-derived lines provides higher mapping resolution than conventional QTL mapping based on cross-derived segregating populations, and enables one to predict or identify causal genes. GWAS has been widely used to dissect complex traits in some major crops, e.g., maize and rice [20-23]. However, there are very few reports of GWAS with high-density single nucleotide polymorphism (SNP) in soybean [24-26]. Thus the application of GWAS in soybean, a highly self-pollinated crop with complex genome structure, remains to be explored, especially for agronomic traits.
To better understand the genetic architecture of DTF, DTM, DFTM and PH in soybean, we conducted a GWAS for these traits in a population consisting of 309 plant introductions (PIs) with 31,045 SNPs. Many new loci and previously reported loci were identified for each trait. Candidate genes with known function or Arabidopsis orthologs were also proposed. This study enriches our knowledge of the genetic basis underlying DTF, DTM, DFTM and PH in soybean and provides valuable markers for molecular breeding of soybean.
Plant materials and field trials
Three hundred and nine accessions, obtained from the USDA Soybean Germplasm Collection, were planted in a randomized complete block design with three replications on the Agricultural Research Farms of South Dakota State University at three locations: Aurora (2011AU), Brookings (2012BK) and Watertown (2012WT), SD. According to the Germplasm Resources Information Network (GRIN, http://www.ars-grin.gov/), most of the PIs originate from China, and 90% are maturity group (MG) 0 and the rest are MG 00 (Additional file 1). They mainly adapt to the upper Midwest in the United States and the southern region in Canada.
Phenotypic evaluation and statistical analysis
DTF and DTM were recorded in the field as the number of days from planting to the date when 50% of the plants in a plot had showed the first flower and when 95% of the pods had ripened as indicated by mature pod color, respectively. DFTM was calculated as the difference between DTM and DTF (or days from flowering to maturity). PH was the average of four measurements per plot, and each measurement was recorded as the length of main stem from the ground to the top extremity of the plant at maturity. The model for the phenotypic trait was y ijk = μ + g i + l j + (gl) ij + b k(j) + e ijk , where μ is the total mean, g i is the genetic effect of the i th genotype, l j is the effect of the j th environment, (gl) ij is the interaction effect between the i th genotype and the j th environment, b k(j) is the block effect within the j th environment, and e ijk is a random error following N(0, σ 2 e ). Broad heritability on an entry-mean basis was calculated as H 2 = σ 2 g /[σ 2 g + σ2 gl /k + σ2 e /(rk)], where σ 2 g is the genotypic variance, σ 2 gl is the genotype by environment interaction variance, k is the number of environments, r is the number of replications. Estimation of variance components was performed by the varcomp procedure in SAS version 9.3 (SAS Institute, Inc., Cary, NC) with all effects considered to be random. To estimate the proportion of phenotypic variation explained by the mixed linear model (MLM) containing all identified loci, the likelihood-ratio-based R2 was calculated for each trait .
Genotyping and quality control
The Illumina Infinium SoySNP50K BeadChip was used to genotype the population as described in a previous study , and 42,509 SNPs were identified with a call success rate of 85% or greater. Of them, 61 SNPs that were presented in unanchored sequence scaffolds were excluded from further analyses. The dataset had a missing rate of 0.6%. Markers with missing rate larger than 10% were ruled out and the remaining missing data were imputed using BEAGLE version 3.3.1 with default parameter settings [29,30]. SNPs with a minor allele frequency (MAF) < 5% after imputation were excluded from further analyses as well. Finally, a total of 31,045 SNPs were used for GWAS.
Linkage disequilibrium estimation
Pairwise LD between markers was calculated as squared correlation coefficient (r 2) of alleles using R package synbreed . In light of substantial difference in recombination rate between euchromatic and heterochromatic regions, r 2 was calculated separately for the two chromosomal regions. The physical length of euchromatin and heterochromatin on each chromosome were defined as in SoyBase (www.soybase.org). Only r 2 for SNPs with pairwise distance less than 10 Mb in either euchromatic or heterochromatic region of each chromosome were used to draw the average LD decay figure by R script using the equation described in a previous study . The LD decay rate of the population was measured as the chromosomal distance where the average r 2 dropped to half its maximum value .
Genome-wide association analysis
To minimize the effects of environmental variation, best linear unbiased predictors (BLUPs) of individual lines were calculated for each trait using the R package lme4 , and were then used to fit the one-way ANOVA model for naive test (without correction of population structure and familial relatedness) implemented in R 2.15.3 (www.R-project.org) and MLM implemented in the GAPIT R package [34,35]. The latter takes both familial relatedness and population structure into account.
For the naive test, the equation was
For the MLM analysis, the equation was
where y is the phenotype BLUPs of each line, μ is the total mean, X is the incidence matrix relating the individuals to the fixed marker effects α, P is the incidence matrix relating the individuals to the fixed principal component (PC) effects β, and Z is the incidence matrix relating the individuals to the random group effects u obtained from the compression algorithm. The random group effects u follows a multivariate normal distribution with mean 0 and variance-covariance matrix 2KV g , where K is the kinship matrix, and V g is the polygenic variance. The random error term e follows a multivariate normal distribution with mean 0 and variance-covariance matrix IV e, where I is the identity matrix and V e is the error variance component. The optimal number of PCs to be involved in the MLM was determined by Bayesian information criterion of the model (Additional file 2).
For association analysis of DTF and DTM, regular MLM (K model) was suggested by GAPIT. For PH, the results between the regular MLM and compressed MLM (cMLM) were very similar, thus a regular MLM (K model) was adopted too. While for DFTM, a K model was not applicable (Additional file 2) and regular MLM (K + P model) could not detect significant association, thus cMLM (K + P model) with a compression level of 1.9 was used as suggested by GAPIT (Additional file 3). The threshold of significance for SNP-trait associations was determined by the false discovery rate (q) < 0.05 or the empirical significance level at P < 0.001, whichever was more stringent. To assess the empirical significance of SNPs, we performed 1,000 permutations of genome-wide association analyses. Since we could not find appropriate permutation for K + P model, permutation was conducted for K model only, which was used for association analysis of DTF, DTM and PH as described above. For each iteration, the phenotype values and kinship matrix (K) in the MLM remained unchanged, while genotypes of each SNP were permuted. Briefly, we shuffled the rows randomly but kept the order of row names unchanged in a genotypic data file where each column represented one SNP and each row represented one germplasm accession. GAPIT was run with the same parameter setting as the original test for each trait. This method preserves the association between the phenotypes and K but eliminates the association between the SNP and P (Peter J. Bradbury, personal communication). As a result, we applied it to the K model only but not to the K + P model.
Prediction of candidate genes
Genes annotated in Glyma1.1, Glyma1.0 and NCBI RefSeq gene models in SoyBase (www.soybase.org) were used as the source of candidate genes. The prediction of candidate genes was referred to the following preferences: i) genes of known function in soybean related to the trait under study, ii) genes with function-known orthologs in Arabidopsis related to the trait under study, and iii) genes pinpointed by the peak SNPs.
Statistics of phenotypes
Analysis of variance indicated that the effects of genotypes, environments, and their interactions were significant for all the four traits except the G × E interaction for PH (Table 1). Averaged over three environments, all traits showed a large variation, especially for PH where a four-fold difference was observed. The frequency distribution skewed towards early flowering for DTF but approximated a normal distribution for the other three traits (Additional file 4). There were high correlations between different environments, ranging from 0.77 (P < 10−4) for DFTM between 2011 AU and 2012WT to 0.91 (P < 10−4) for DTF between 2011 AU and 2012BK. Accordingly, a high heritability (>80%) was estimated for each trait, indicating that genetic effects play a predominant role in the performance of the traits (Table 1). However, the correlations between traits were low to moderate (Additional file 5). Interestingly, DTF was positively correlated to DTM (r = 0.34, P < 10−4) while negatively correlated to DFTM (r = −0.47, P < 10−4).
Distribution of markers and linkage disequilibrium
A total of 31,045 SNPs with MAF ≥ 0.05 were used for further analyses after quality control, with an average marker density of 1 SNP every 29 kb genome-wide, varying across chromosomes from 44.4 kb/SNP on chromosome 1 (Gm01) to 22.3 kb/SNP on Gm13 (Additional file 6). Most (74.6%) of the SNPs were harbored within euchromatic regions where 78% of putative genes are located , resulting in an average marker density of 1 SNP per 20 kb in euchromatin and 1 SNP per 62 kb in heterochromatin. Using the whole set of SNPs, the LD decay rate of the population was estimated at 326 kb in euchromatin, where r 2 = 0.23 (half of its maximum value) (Figure 1). In heterochromatin, however, r 2 did not drop to half of its maximum value until 4,285 kb.
GWAS of traits
GWAS was conducted by using the BLUPs of individual performance over three environments in a MLM, which accounts for both population structure and familial relatedness [34,37]. As shown in quantile-quantile plots (Additional file 3), the genomic inflation was considerably controlled in MLMs versus the naive model (i.e., one-way ANOVA model without correcting for relatedness and population structure).
In total 135, 11, 103 and 115 SNPs significantly associated with DTF, DTM, DFTM and PH were identified, respectively (Figure 2a, b, c and d). Twenty-eight (on Gm18) and 53 (on Gm20) of the 135 SNPs associated with DTF and 66 (on Gm04) of 103 SNPs associated with DFTM were located in the extensive LD blocks in the heterochromatic regions with physical length of 2.9 Mb, 6.3 Mb and 20.0 Mb, respectively (Additional file 7). To determine the trait-associated loci, all significant SNPs located in close physical proximity were clumped at r 2 > 0.70 and only the strongest trait-associated SNP (or peak SNP) within each LD block was kept, except in the case of the extensive LD blocks where multiple peak SNPs were found. Finally, 27, 6, 18 and 27 loci associated with DTF, DTM, DFTM and PH were identified across the 20 chromosomes (Additional file 8). The final model containing these loci explained 77.2%, 52.5%, 69.8% and 76.3% of phenotypic variation for DTF, DTM, DFTM and PH, respectively.
Some loci were found to be associated with multiple traits. The locus at the 45.0 Mb position on Gm19, representing the strongest association for PH, was also associated with DTM. It explained 15% and 4.5% of total phenotypic variation for PH and DTM, respectively. On average, the lines carrying the major frequency allele of the peak SNP (Gm19_45000827, MAF = 0.05) at this locus were 31.2 cm taller and matured 4.4 days later than those with the alternative allele (Figure 3). The locus at the 6.98 Mb position on Gm09 was associated with both DTF and DFTM, but their effects were in opposite directions. Two loci at 2.4 Mb and 3.0 Mb positions on Gm16 were detected for both DTM and DFTM with similar effects. No overlap was found between loci of DTF and DTM.
Prediction of candidate genes
Based on the results of GWAS and genes annotated in SoyBase (www.soybase.org), we further predicted candidate causal genes for loci significantly associated with each trait. A total of 18 candidate genes were predicted for 15 of the 27 loci associated with DTF (Additional file 8). Ten of them have orthologs of Arabidopsis flowering genes. Glyma07g08831 and Glyma07g08890 were located at 27.8 kb upstream and 8.9 kb downstream of the peak SNP of DTF7 locus, respectively (Figure 4a). Glyma07g08831 is homologous to AtSOC1, and encodes a protein sharing 77% amino acid sequence identity with the product of soybean flowering gene GmSOC1 [38,39]. Glyma07g08890 and the candidate gene for DTF26 locus Glyma20g21082 are both homologous to the Arabidopsis flowering gene AGAMOUS-LIKE 6 (AGL6)  (Additional file 8). The locus at about the 7.0 Mb position on Gm09 associated with DTF was targeted by two SNPs in high LD (r 2 = 0.90). Glyma09g07980, identified at 7 kb away from the peak SNP of DTF16 (Figure 4b), is homologous to Arabidopsis EARLY FLOWERING 8 (ELF8) .
We identified the Dt1 gene at 18.6 kb upstream of the peak SNP (Gm19_45000827, MAF = 0.05) on Gm19, which was associated with both DTM and PH (Figure 4c). We also predicted candidate genes for other SNPs associated with PH. A putative gene Glyma18g49080, encoding a membrane transport protein, was found at 6.3 kb away from the peak SNP (Gm18_58462762, MAF = 0.22) of locus PH21 on Gm18 (Figure 4d). It is homologous to Arabidopsis auxin efflux carrier protein PIN-FORMED 1 (AtPIN1), which is involved in auxin-induced shoot and root development . Glyma19g37180, encoding a putative pectinesterase, was identified in a small chromosomal region associated with locus PH24 on Gm19 where LD decayed rapidly (Figure 5). This locus alone could explain 10% of PH phenotypic variation. For DFTM, seven candidate genes were proposed for seven of 18 significantly associated loci. The detailed information of the peak SNPs for all trait-associated loci and candidate genes is presented in the Additional file 8.
Recombination rate is one of the major factors affecting LD extension. In soybean, the recombination rate in euchromatic regions is about five times that in heterochromatic regions . In this study, a large difference in LD decay rate was observed between these two chromosomal regions (326 kb in euchromatin versus 4,285 kb in heterochromatin) (Figure 1). The low LD decay rate in euchromatic regions was also reported in a recent GWAS on soybean , in which the population had a higher diversity of origin (China, Korea, Japan) and more maturity groups (II, III and IV) than the population used in this study. It is indicated that the relatively low LD decay rate might be a common phenomenon in Asian soybean landraces, the major source of the USDA Soybean Germplasm Collection. Therefore, LD decay rate is the primary factor limiting the mapping resolution in GWAS for soybean, and a lower density of SNPs should be suitable for GWAS in soybean as compared with other crops like maize and rice.
Kinship and population structure are known as the major confounding factors leading to spurious results in association analysis, and corrected MLM containing both kinship matrix (K) and population structure (P) is more effective than the MLM containing either K or P alone [43,44]. In this study, however, no PC was involved in the MLM for the association analysis of DTF, DTM and PH, but two PCs were involved for DFTM (Additional file 2), indicating that the improvement of model fitness by involving P in the K model could vary with traits. The possible explanations might include: i) the phenotypic variation attributed to population structure varies with traits; and ii) the degree of overlap between K and P in controlling genetic relationships is different for individual traits . Therefore, the inclusion of population structure in MLM depends on the genetic relationships of the association panel and the divergence of the trait of interest.
Previous reports suggested that DTM and DTF would be highly correlated . However, a low correlation between DTF and DTM (r = 0.34, P < 10−4) was observed in the present study. The relatively higher correlation between DTM and DFTM (r = 0.63, P < 10−4) indicated that DFTM had greater impact on DTM than DTF. In addition, the negative correlation between DTF and DFTM (r = −0.47, P < 10−4) indicated that for a certain early maturity group in soybean, shorter vegetative growth might imply longer reproductive growth to some extent, which may help accumulate more dry matter in seeds. The GWAS results showed that the flowering locus DTF16 was also associated with DFTM but not with DTM, while maturity loci DTM2 and DTM4 were associated with DFTM but not DTF. Therefore, we suppose that some flowering loci may also affect growth stage after flowering as reported in a previous study , and some loci condition maturity through affecting reproduction period only. The loci at the 7.0 Mb position on Gm09 for DTF (DFT16) and DFTM (DFTM11) were located within the same region but exhibited opposite effects, which may provide an underlying genetic explanation of the negative correlation between DTF and DFTM.
Previous research showed that major maturity loci (E1 through E8, and J) also affected flowering in soybean , and E3 and E4 affected post-flowering photoperiod responses as well . However, we did not detect these loci for any of the three traits (DTF, DTM and DFTM) in the present study. One possible explanation was lack of functional polymorphism at these loci in the association panel. GWAS using even all SNPs (MAF > 0), including four SNPs in E2 and one in E3, did not detect any of the loci at q < 0.05 either. Another explanation was that genetic variants might exist at these major loci, but could not be captured due to the lack of SNP coverage. Notably, some genetic variants were actually undetectable by SNP genotyping. For example, the e1-nl allele, one of the natural variants of E1, was a deletion of the entire gene , which is hard to detect through SNP genotyping. The different loci identified between DTF and DTM as well as DFTM suggested that soybean flowering and maturity could be controlled by common major-effect loci, but also modified by numerous trait-specific minor-effect loci.
Natural and artificial selections during domestication can decrease the genetic diversity and increase LD in modern soybean cultivars , which difficults prediction of causal genes in soybean through association analysis. In this study, the estimate of genome-wide LD decay rate was much lower than that in rice . However, rapid LD decay was found at some loci, allowing the prediction of candidate genes in these regions (Figures 4 and 5). For DTF, ten of the proposed gene candidates have orthologs of Arabidopsis flowering genes, and seven of them were previously identified by comparative genomic analysis [6,13]. Of them, Glyma07g08831 was located in close proximity to the DTF7 locus. It encodes a protein sharing 77% of amino acid sequence identity with the GmSOC1, which promotes flowering in soybean . The high protein sequence identity between Glyma07g08831 and GmSOC1 might imply their functional abundance in soybean flowering.
The SNP at the 45.0 Mb position on Gm19, a region similar to previously reported QTLs Pod mat 13–6 and Plant height 4–2 and 13–8 (Additional file 8), was strongly associated with both DTM and PH. The Dt1 gene was found in this region. Dt1 is homologous to Arabidopsis terminal flower 1, and plays a predominant role in determining stem growth habit in soybean . Based upon the stem growth habit, soybean cultivars can be classified into two major categories, determinate and indeterminate. For the determinate soybean cultivars (dt1/dt1), SAM switches from vegetative growth to reproductive growth soon after photoperiod-induced floral transition, and stem growth stops . In contrast, the transition of SAM to floral meristem is suppressed in indeterminate cultivars (Dt1/Dt1) and vegetative growth of SAM continues until a cessation caused by the demand of developing seeds . Therefore, stem growth habit has broad effects on plant height and maturity of soybean. This is highly consistent with the result of the present study that the locus harboring Dt1 was strongly associated with both DTM and PH, indicated that Dt1 is very likely the causal gene for DTM5 and PH25. Because plant height is one of the major factors determining yield potential in soybean, PH25 with large effect on plant height may also affect soybean yield substantially. However, application of the preferred allele of PH25 locus needs to be careful, as it may also affect maturity dates.
The locus PH24, explained 10% of phenotypic variation, was mapped to a small region on Gm19 where LD decayed rapidly. Three QTLs associated with PH were previously reported in the similar region (Additional file 8). The candidate gene Glyma19g37180 was identified near the peak SNP (Figure 5). It was proposed to encode a pectinesterase (SoyBase, www.soybase.org). Pectin is a structurally complex polysaccharide contained in primary cell walls of plant and has functions in plant growth, morphology and plant defense . Pectinesterase catalyzes the de-esterification of pectin into pectate and methanol, and plays important roles in some physiological processes such as stem elongation that requires rearrangement of cell wall architecture. Transient stem elongation was observed in potato plants overexpressing a Petunia inflate pectinesterase . In Arabidopsis, a reduction in cross-linking of cell wall pectic polysaccharide resulted in dwarf phenotype . More recent research showed that pectinesterase regulates cell growth and hypocotyl elongation in Arabidopsis by affecting the degree of pectin methyl-esterification . Therefore, Glyma19g37180 encoding a putative pectinesterase was the most likely causal gene for PH24, a major-effect locus associated with plant height in soybean. Notably, unlike PH25, PH24 had no association with maturity, and selection of the desired allele of this locus might improve the yield potential of soybean without affecting the maturity dates.
In this study, 27, 6, 18 and 27 loci associated with DTF, DTM, DFTM and PH were identified via GWAS, respectively. Thirty-five candidate genes were proposed, including a function-known gene (Dt1) and 16 genes orthologous to Arabidopsis genes functioning in similar traits. It evidently demonstrates the high efficiency of GWAS in dissecting complex traits in soybean. A medium number of SNPs generated from the SoySNP50K analysis is capable of capturing genome-wide allelic variation, and candidate genes are regionally accessible for crops like soybean with a low LD decay rate. The genetic variants and trait-associated SNPs identified in this study will be useful for soybean cultivar improvement, especially for major-effect loci associated with PH that may have great potential for soybean yield improvement. Additionally, biological validation of the candidate genes will be also of great interest.
Best linear unbiased predictor
Duration of flowering-to-maturity
Days to flowering
Days to maturity
Genome-wide association study
Minor allele frequency
Mixed linear model
Quantitative traits locus
Shoot apical meristem
Single nucleotide polymorphism
Ratcliffe OJ, Riechmann JL. Arabidopsis transcription factors and the regulation of flowering time: a genomic perspective. Curr Issues Mol Biol. 2002;4(3):77–91.
Suarez-Lopez P, Wheatley K, Robson F, Onouchi H, Valverde F, Coupland G. CONSTANS mediates between the circadian clock and the control of flowering in Arabidopsis. Nature. 2001;410(6832):1116–20.
Onouchi H, Igeño MI, Périlleux C, Graves K, Coupland G. Mutagenesis of plants overexpressing CONSTANS demonstrates novel interactions among Arabidopsis flowering-time genes. The Plant Cell Online. 2000;12(6):885–900.
Komeda Y. Genetic regulation of time to flower in Arabidopsis thaliana. Annu Rev Plant Biol. 2004;55:521–35.
Henderson IR, Dean C. Control of Arabidopsis flowering: the chill before the bloom. Development. 2004;131(16):3829–38.
Watanabe S, Harada K, Abe J. Genetic and molecular bases of photoperiod responses of flowering in soybean. Breed Sci. 2012;61(5):531–43.
Cober ER, Morrison MJ. Regulation of seed yield and agronomic characters by photoperiod sensitivity and growth habit genes in soybean. Theor Appl Genet. 2010;120(5):1005–12.
Xia Z, Watanabe S, Yamada T, Tsubokura Y, Nakashima H, Zhai H, et al. Positional cloning and characterization reveal the molecular basis for soybean maturity locus E1 that regulates photoperiodic flowering. Proc Natl Acad Sci U S A. 2012;109(32):E2155–64.
Kong F, Liu B, Xia Z, Sato S, Kim BM, Watanabe S, et al. Two coordinately regulated homologs of FLOWERING LOCUS T are involved in the control of photoperiodic flowering in soybean. Plant Physiol. 2010;154(3):1220–31.
Watanabe S, Xia Z, Hideshima R, Tsubokura Y, Sato S, Yamanaka N, et al. A map-based cloning strategy employing a residual heterozygous line reveals that the GIGANTEA gene is involved in soybean maturity and flowering. Genetics. 2011;188(2):395–407.
Watanabe S, Hideshima R, Xia Z, Tsubokura Y, Sato S, Nakamoto Y, et al. Map-based cloning of the gene associated with the soybean maturity locus E3. Genetics. 2009;182(4):1251–62.
Liu B, Kanazawa A, Matsumura H, Takahashi R, Harada K, Abe J. Genetic redundancy in soybean photoresponses associated with duplication of the phytochrome A gene. Genetics. 2008;180(2):995–1007.
Jung CH, Wong CE, Singh MB, Bhalla PL. Comparative genomic analysis of soybean flowering genes. Plos One. 2012;7(6):e38250.
Zhang WK, Wang YJ, Luo GZ, Zhang JS, He CY, Wu XL, et al. QTL mapping of ten agronomic traits on the soybean (Glycine max L. Merr.) genetic map and their association with EST markers. Theor Appl Genet. 2004;108(6):1131–9.
Lee SH, Bailey MA, Mian MAR, Carter TE, Ashley DA, Hussey RS, et al. Molecular markers associated with soybean plant height, lodging, and maturity across locations. Crop Sci. 1996;36(3):728–35.
Bernard R. Two genes affecting stem termination in soybeans. Crop Sci. 1972;12(2):235–9.
O’Neill MA, Eberhard S, Albersheim P, Darvill AG. Requirement of borate cross-linking of cell wall rhamnogalacturonan II for Arabidopsis growth. Science. 2001;294(5543):846–9.
Parks BM, Folta KM, Spalding EP. Photocontrol of stem growth. Curr Opin Plant Biol. 2001;4(5):436–40.
Xu YL, Gage DA, Zeevaart JAD. Gibberellins and stem growth in Arabidopsis thaliana - effects of photoperiod on expression of the GA4 and GA5 loci. Plant Physiol. 1997;114(4):1471–6.
Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S, et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat Genet. 2011;43(2):159–62.
Poland JA, Bradbury PJ, Buckler ES, Nelson RJ. Genome-wide nested association mapping of quantitative resistance to northern leaf blight in maize. Proc Natl Acad Sci U S A. 2011;108(17):6893–8.
Li H, Peng ZY, Yang XH, Wang WD, Fu JJ, Wang JH, et al. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet. 2013;45(1):43–U72.
Huang XH, Wei XH, Sang T, Zhao QA, Feng Q, Zhao Y, et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010;42(11):961–U976.
Hwang EY, Song Q, Jia G, Specht JE, Hyten DL, Costa J, et al. A genome-wide association study of seed protein and oil content in soybean. BMC Genomics. 2014;15:1.
Wen Z, Tan R, Yuan J, Bales C, Du W, Zhang S, et al. Genome-wide association mapping of quantitative resistance to sudden death syndrome in soybean. BMC Genomics. 2014;15:809.
Mamidi S, Lee RK, Goos JR, McClean PE. Genome-wide association studies identifies seven major regions responsible for iron deficiency chlorosis in soybean (Glycine max). Plos One. 2014;9(9):e107469.
Sun G, Zhu C, Kramer MH, Yang SS, Song W, Piepho HP, et al. Variation explained in mixed-model association mapping. Heredity (Edinb). 2010;105(4):333–40.
Song QJ, Hyten DL, Jia GF, Quigley CV, Fickus EW, Nelson RL et al. Development and Evaluation of SoySNP50K, a High-Density Genotyping Array for Soybean. Plos One 2013, 8(1).
Browning BL, Browning SR. Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet Epidemiol. 2007;31(5):365–75.
Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84(2):210–23.
Wimmer V, Albrecht T, Auinger HJ, Schon CC. synbreed: a framework for the analysis of genomic prediction data using R. Bioinformatics. 2012;28(15):2086–7.
Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, et al. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A. 2001;98(20):11479–84.
Bates D, Maechler M, Bolker B. lme4: Linear mixed-effects models using S4 classes. R package version 0.999999-0. http://CRAN.R-project.org/package=lme4. 2012.
Zhang ZW, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42(4):355–U118.
Lipka AE, Tian F, Wang QS, Peiffer J, Li M, Bradbury PJ, et al. GAPIT: genome association and prediction integrated tool. Bioinformatics. 2012;28(18):2397–9.
Schmutz J, Cannon SB, Schlueter J, Ma JX, Mitros T, Nelson W, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463(7278):178–83.
Yu JM, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38(2):203–8.
Zhong XF, Dai X, Xv JH, Wu HY, Liu B, Li HY. Cloning and expression analysis of GmGAL1, SOC1 homolog gene in soybean. Mol Biol Rep. 2012;39(6):6967–74.
Na XF, Jian B, Yao WW, Wu CX, Hou WS, Jiang BJ, et al. Cloning and functional analysis of the flowering gene GmSOC1-like, a putative SUPPRESSOR OF OVEREXPRESSION CO1/AGAMOUS-LIKE 20 (SOC1/AGL20) ortholog in soybean. Plant Cell Rep. 2013;32(8):1219–29.
Yoo SK, Wu X, Lee JS, Ahn JH. AGAMOUS‐LIKE 6 is a floral promoter that negatively regulates the FLC/MAF clade genes and positively regulates FT in Arabidopsis. Plant J. 2011;65(1):62–76.
Oh S, Zhang H, Ludwig P, van Nocker S. A mechanism related to the yeast transcriptional regulator Paf1c is required for expression of the Arabidopsis FLC/MAF MADS box gene family. Plant Cell. 2004;16(11):2940–53.
Sassi M, Lu Y, Zhang Y, Wang J, Dhonukshe P, Blilou I, et al. COP1 mediates the coordination of root and shoot growth by light through modulation of PIN1-and PIN2-dependent auxin transport in Arabidopsis. Development. 2012;139(18):3402–12.
Stich B, Mohring J, Piepho HP, Heckenberger M, Buckler ES, Melchinger AE. Comparison of mixed-model approaches for association mapping. Genetics. 2008;178(3):1745–54.
Zhao KY, Aranzana MJ, Kim S, Lister C, Shindo C, Tang CL et al. An Arabidopsis example of association mapping in structured samples. Plos Genet 2007, 3(1).
Orf JH, Chase K, Jarvik T, Mansur LM, Cregan PB, Adler FR, et al. Genetics of soybean agronomic traits: I. Comparison of three related recombinant inbred populations. Crop Sci. 1999;39(6):1642–51.
Xu M, Xu Z, Liu B, Kong F, Tsubokura Y, Watanabe S, et al. Genetic variation in four maturity genes affects photoperiod insensitivity and PHYA-regulated post-flowering responses of soybean. BMC Plant Biol. 2013;13:91.
Hyten DL, Song QJ, Zhu YL, Choi IY, Nelson RL, Costa JM, et al. Impacts of genetic bottlenecks on soybean genome diversity. Proc Natl Acad Sci U S A. 2006;103(45):16666–71.
Liu B, Watanabe S, Uchiyama T, Kong F, Kanazawa A, Xia Z, et al. The soybean stem growth habit gene Dt1 is an ortholog of Arabidopsis TERMINAL FLOWER1. Plant Physiol. 2010;153(1):198–210.
Tian Z, Wang X, Lee R, Li Y, Specht JE, Nelson RL, et al. Artificial selection for determinate growth habit in soybean. Proc Natl Acad Sci U S A. 2010;107(19):8563–8.
Caffall KH, Mohnen D. The structure, function, and biosynthesis of plant cell wall pectic polysaccharides. Carbohyd Res. 2009;344(14):1879–900.
Pilling J, Willmitzer L, Fisahn J. Expression of a Petunia inflata pectin methyl esterase in Solanum tuberosum L enhances stem elongation and modifies cation distribution. Planta. 2000;210(3):391–9.
Derbyshire P, McCann MC, Roberts K. Restricted cell elongation in Arabidopsis hypocotyls is associated with a reduced average pectin esterification level. BMC Plant Biol. 2007;7(1):31.
The authors are grateful to Siddhi Bhusal, Nicholas Hall, Marci Green, Sivananda Tirumalaraju and Mukhtar Agoub for assistance in implementing the field experiments, and Alexander E. Lipka, Institute for Genomic Diversity, Cornell University, for assistance in using GAPIT R package. This research was supported in part by the South Dakota Agricultural Experiment Station and USDA-NIFA/CSREES Hatch project, and the grants of the United Soybean Board and the South Dakota Soybean Research and Promotion Council awarded to G.-L. Jiang. Virginia State University Agricultural Research Station provided support in completion of the manuscript.
The authors declare that they have no competing interests.
JZ and GLJ designed the research experiments. QS and PBC designed SNPs and performed genotyping. RLN prepared and provided germplasm accessions. JZ and XW performed phenotyping. JZ and JW analyzed data. GLJ designed the overall project. JZ and GLJ wrote the manuscript. All authors read and approved the final manuscript.
List of 309 soybean germplasm accessions analyzed in this study. Information given in this file for each accession includes accession name, origin and maturity group according to GRIN (http://www.ars-grin.gov/) and the present study.
Bayesian Information Criterion (BIC) values of mixed linear model with different numbers of principal components (PCs) used for associate analyses of each trait.
Quantile-quantile plots of genome-wide association analysis for each trait with different models. For days to flowering (a) and maturity (b), the naive model and regular mixed model (MLM) are presented. For duration of flowering-to-maturity (c) and plant height (d), the naive model, MLM and compressed MLM (cMLM) are presented. The expected distribution of negative log10-transformed P values is indicated in red, and those of the naive model, MLM and/or cMLM are indicated in black, green and/or blue, respectively.
Frequency distribution of observations of four agronomic traits in soybean. (a) Days to flowering (DTF), (b) Days to maturity (DTM), (c) Duration of flowering-to-maturity (DFTM) and (d) Plant height (PH). Shown is the average of each trait in a population of 309 germplasm accessions over 3 environments each with 3 replicates.
Correlation analyses of traits. Information given in this file includes the correlation coefficients of each pair of traits calculated by using the average of each trait over three environments.
Distribution and density of single nucleotide polymorphisms (SNPs) across the soybean genome. Each chromosome is labeled on the horizontal axis and the physical length of each chromosome is labeled on vertical axis. The vertical bar on each chromosome represents the heterochromatic region. The number of SNPs per 100 kb in the consensus data set is shown in a grey scale on right.
Extensive linkage disequilibrium (LD) blocks on heterochromatic regions of the soybean genome associated with days to flowering (DTF) and duration of flowering-to-maturity (DFTM). (a) and (b) LD blocks associated with DTF on Gm18 and Gm20, respectively. (c) LD block associated with DFTM on Gm04. At the top of each panel, the negative log10-transformed P values from the regular mixed linear model (MLM) or compressed MLM are plotted against the physical distance on the horizontal axis. The physical length of each region is labeled. In the bottom of each panel, pairwise LD r 2 values are indicated in a low diagonal matrix heat map. The r 2 values are shown using a color intensity index as indicated on right bottom of each panel.
List of loci significantly associated with days to flowering (DTF) and maturity (DTM), duration of flowering-to-maturity (DFTM) and plant height (PH) in soybean. Information given in this file includes the name, alleles, minor allele frequency (MAF), allelic effect and P value of genome-wide association analyses of the peak SNP for each locus. The candidate gene(s) and previously reported QTLs at similar chromosome region with some loci identified in this study are also presented.
About this article
Cite this article
Zhang, J., Song, Q., Cregan, P.B. et al. Genome-wide association study for flowering time, maturity dates and plant height in early maturing soybean (Glycine max) germplasm. BMC Genomics 16, 217 (2015). https://doi.org/10.1186/s12864-015-1441-4
- Genetic architecture
- Genetic improvement
- Quantitative trait locus
- Single nucleotide polymorphism
- Soybean (Glycine max)