A genome-wide association study of seed protein and oil content in soybean
© Hwang et al.; licensee BioMed Central Ltd. 2014
Received: 29 August 2013
Accepted: 21 December 2013
Published: 2 January 2014
Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content.
A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r 2 ) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil.
This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).
KeywordsGWAS Glycine max Seed protein and oil content Single nucleotide polymorphism Linkage disequilibrium
Association studies provide an alternative to conventional family-based methods for detecting the genomic location of genes or quantitative trait loci (QTL). While both rely on the correlation between DNA marker alleles and the phenotypic expression of a trait of interest, association studies can provide relatively higher resolution in terms of defining the genomic position of a gene or QTL, since it can be applied to naturally occurring populations such as human populations or germplasm collections. The important difference between the degree of QTL detection possible with these natural populations and that possible with progeny derived from crosses between two individuals is the level of linkage disequilibrium (LD).
LD is the non-random association of alleles at different loci in a population. The detection of genes or QTL depends on the level of LD between a causal mutation and physically linked markers. The higher the degree of association between marker alleles and the phenotypic variants, the greater the likelihood that the phenotypic causal mutation is physically linked to the marker. A confounding factor that influences the success of association study is the presence of population structure. Population structure is the result of allele frequency differences between different populations that arise as a result of population history. Factors such as selection, migration, local adaptation, geographical isolation, or genetic drift can result in population structure. It is well known that population structure can cause spurious associations between markers and the trait under study , because of allele frequency differences between subpopulations in a population, rather than genuine genetic associations with the trait of interest.
Owing to the lack of sufficient numbers of DNA markers, early association studies in plant species used the candidate gene approach to identify specific single nucleotide polymorphisms (SNPs) or genes controlling a phenotypic trait of interest such as, flowering time , endosperm color  and kernel starch biosynthesis  in maize (Zea mays ssp. mays), enzymatic discoloration in potato (Solanum tuberosum L.) , flowering time in sorghum (Sorghum bicolor L. Moench) , and aluminum tolerance in triticale (X Triticosecale Wittmack) . However, the availability of high throughput DNA sequencing and genotyping technologies has provided a platform for conducting genome-wide association studies (GWAS). Such studies have verified the location of loci associated with frost tolerance in barley (Hordeum vulgare L.) ; leaf architecture ; southern leaf blight , and waterlogging tolerance  in maize (Zea mays L.); agronomic traits in rice (Oryza sativa L.) ; flowering time  and defense metabolites  in Arabidopsis (Arabidopsis thaliana L. Heynh) and plant architecture and flowering time in sunflower . In soybean (Glycine max L. Merr.), a GWAS was performed to detect genes/markers associated with iron deficiency chlorosis  and chlorophyll and chlorophyll fluorescence parameters .
Seed protein and oil content in soybean are quantitatively inherited traits determined by the interaction of a number of genes subject to genotype × environment interactions. Many seed protein and oil QTL have been reported in a number of studies over the past two decades (SoyBase, the USDA, ARS Soybean Genetics and Genomics Database). These QTL, identified using linkage analysis of populations derived from crosses of two parents with contrasting seed protein and oil concentration, have been detected in many different genomic regions throughout all 20 chromosomes. Several of these QTL have been identified three or more times at identical or very similar chromosomal positions in different populations, which suggests that these QTL are not likely false positives. Because these regions likely contain a gene or genes with relatively large genetic effects on seed protein and oil content, their re-identification in a GWAS targeted at seed protein and oil would provide a measure of the success of GWAS in an independent test of its ability to detect the presence of seed protein and oil QTL.
In the research reported here, the degree of LD was assessed in 298 soybean germplasm accessions obtained from the USDA Soybean Germplasm Collection. Seed protein and oil concentration were measured in seeds harvested from plants grown in replicated field trials at two locations. The resulting seed composition data were used to evaluate the GWAS approach, and ultimately to provide an assessment of the likely success of GWAS when it is used to detect the QTL controlling the two well-known soybean quantitative traits–seed protein and oil content.
Linkage disequilibrium in euchromatic and heterochromatic genome regions
A total of 42,368 SNPs positioned on the 20 soybean chromosomes was used to estimate the most likely number (K) of subgroups in the 298 germplasm accessions. The most likely K value was K = 17. The accessions within seven subgroups originated from a single Asian country and the other nine subgroups were comprised of a mix of accessions from different countries. In terms of maturity group, eight subgroups consisted of accessions with the same maturity group, whereas the other eight subgroups were a mix of accessions of different maturity groups.
Seed protein and oil phenotypes
Analysis of variance of seed protein content
Pr > F
Accession × location
Analysis of variance of seed oil content
Pr > F
Accession × location
Genome-wide association study for the genes controlling seed protein and oil concentration
SNP loci associated with both seed protein and oil content
Mean protein content (%) associated with SNP allele
Mean oil content (%) associated with SNP allele
Protein vs. oil relationship a
Soybean germplasm accessions used in this study that include landraces collected in China, Korea, and Japan over a 60 year period represent an excellent reservoir of genetic variation for the application of a GWAS. It is assumed that these accessions possess a diversity of alleles. Such diversity can be examined (and possibly exploited) for soybean genetic improvement by identifying alleles that would be useful in achieving that genetic improvement. GWAS offers one approach for accomplishing that objective.
An important consideration in the application of genome-wide association for gene discovery is the extent of LD. Because of obvious differences in the extent of LD between different chromosomal regions, we elected to define LD separately for the euchromatic and heterochromatic regions of the 20 soybean chromosomes. The reliability of the approach used in the present study, i.e., to define the heterochromatic region of each chromosome as the region between the two inflection points of the cumulative genetic distance plotted against the cumulative physical distance, was previously examined in rice. In that species, the heterochromatic regions defined by suppressed recombination rate was in agreement with that identified by the conventional DAPI staining method . The plot of genetic distance on physical distance clearly indicated a higher recombination per physical distance in euchromatic regions vs. heterochromatic regions. Approximately five times greater recombination per unit of physical distance occurs in soybean euchromatic versus heterochromatic regions. In the grass species, sorghum (Sorghum bicolor (L.) Moench), the recombination rate in euchromatic regions was approximately 34 times higher than that in heterochromatic regions . In fact, 97-98% of the recombination occurred in the euchromatin of this species . The difference in recombination rate between the heterochromatic regions and the euchromatic regions explains the difference in the extent of LD observed in the two chromosomal regions. The different levels of LD as indicated by r 2 between different species and different chromosome regions within species can be explained by a combination of population histories including mating system, mutation rate, founding effects, the magnitude of selection, admixture, and genetic drift. The large difference in the LD level in euchromatic and heterochromatic regions, as shown in the present study, supports our contention that the density of markers required for GWAS should be adjusted to take into account the changing ratio, along the length of a given chromosome, of recombination to genetic distance. In euchromatic regions with high recombination to physical distance, a high marker density will be required, whereas in heterochromatic regions, a lower marker density is acceptable. Of course, until a whole genome sequence and a densely populated genetic map become available, it is not possible to precisely define the ratio of genetic to physical distance across the genome of a species to which a GWAS is being conducted.
Association analysis for seed protein and oil content
The challenge for association studies in crop plants is the identification of genes associated with quantitatively inherited agriculturally important traits. Currently, many seed protein and oil QTL have been reported at many positions across the 20 soybean chromosomes/linkage groups in numerous studies (SoyBase, http://www.soybase.org). The genome positions where QTL were reported a number of times in studies using different sources of high seed protein and oil germplasm should be good candidates for the validation of associations detected via GWAS. All the previously reported seed protein and oil QTL listed in SoyBase were identified using linkage analysis and thus, the causal gene(s) could be located a substantial distance (in cM) from the markers which were reported to be genetically linked to the QTL. Therefore, the exact position of the QTL determined by linkage analysis could not be precisely determined. However, with the release of Version 4.0 of the soybean genetic map , which was produced from a JoinMap 3.0 analysis of five soybean mapping populations that collectively segregated for over 5,000 markers, it is possible to more narrowly define the regions to which previously reported protein and oil QTL had been mapped. Using this information, it is possible with some confidence to relate the physical position of the reported QTL from different studies with the SNP positions identified by GWAS in the present study. In that regard, we were able to align the 17 genome regions associated with seed protein identified in this study with previously reported seed protein QTL locations, and based on that alignment, determined that QTL had previously been reported in 12 of the 17 regions (Figure 3A and Additional file 2). Notably, the seed protein QTL on Gm20 has been reported seven times, the QTL at the 14.7 Mbp position on Gm06 six times, the regions located on Gm05, Gm07, and Gm15 four times, the region on Gm09 three times, and the regions at 43.9 Mbp on Gm08, 1.4 Kbp on Gm10, and on Gm12 have been reported in two previous studies. Of the 13 regions associated with seed oil content, the regions on Gm05, Gm06, and Gm20 had been previously reported more than three times and the regions on Gm02 and Gm04, the region at 9.9 Mbp on Gm09, and the region on Gm15 have been reported twice. In addition, there were QTL that have been reported in previous studies which could not be found in this study. One reason could be that since the accessions used in this study were from the maturity groups II, III and IV, QTL identified in accessions from other maturity groups are not present in the germplasm accessions used in this study. For example, we did not find QTL reported in studies using very early or late maturity groups such as three QTL identified in populations derived from maturity group 00 × 00 , three QTL from maturity group 0 × 0 [23, 24], nine QTL from maturity group VI × VI , and three QTL from maturity group 0 × I [26, 27]. A second reason may be that many protein and oil QTL have been identified in a number of different populations, however, if a QTL is controlled by a rare allele present only in a specific accession used in creating a QTL mapping population, it could not be detected in a GWAS such as is reported here. The inability of GWAS to detect rare alleles occurring in one or a few members of a population under study is well documented [28, 29].
The main differences between the findings of an association study versus linkage analysis are firstly, the ability to detect a range of genes controlling the phenotype under study rather than just those segregating in a given mating of two parents and secondly, the increased resolution resulting from historical recombination rather than the limited recombination in the progeny of a biparental population. The QTL identified on Gm20 provides an example of the increased resolution of GWAS. This seed protein QTL has been reported seven times in different G. soja × G. max and G. max × G. max populations with R 2 values ranging from 0.15 to 0.65 [22, 30–35]. This QTL on Gm20 has been recognized as a candidate region for possibly cloning the causal gene(s) controlling seed protein content. Bolon et al.  analyzed gene expression of developing seeds in a pair of near isogenic lines (NILs) contrasting in seed protein concentration and differing for the region of Gm20 containing this major seed protein QTL. Differential transcript accumulation in the developing seeds of the NILs was detected for 12 candidate genes. The region defined by Bolon et al.  in which these genes reside spans approximately 8.4 Mbp from 24.5 to 32.9 Mbp on Gm20. Based on the results of the LD estimation in the current study, we verified that the region defined by Bolon et al.  possessed two large LD blocks, as determined by r 2 , each spanning regions of approximately 3 Mbp, one from 24.5 to 27.6 Mbp and the other from 27.6 to 30.0 Mbp (Figure 4). All the SNPs showing significant association with seed protein concentration were located in the 27.6 to 30.0 Mbp region. Thus, our GWAS results support a narrowing of the candidate gene region of this major seed protein QTL on Gm20 to about 2.4 Mbp versus the previously defined region of 8.4 Mbp . Only six of the 12 genes identified by Bolon et al.  were located within the GWAS-defined 2.4 Mbp region (Figure 4). The gene candidates that remain include Hsp22.5 (Glyma20g19680), a putative ammonium transporter AMT1 (Glyma20g21030), an ATP synthase D chain (Glyma20g21080) (Figure 4) and three genes with limited similarity to known genes. We can assume that one or more of these genes may likely be the causative gene(s) associated with soybean seed protein content. This high resolution mapping demonstrates the capacity of GWAS to utilize historical recombination to increase resolution .
The GWAS for seed oil concentration also successfully defined 13 chromosome regions with high resolution, of which eight regions corresponded with previously reported oil QTL. Thus, in the case of both seed protein and oil content, five chromosome regions were identified at which no previously reported QTL had been detected. These regions may be of particular interest to soybean breeders and geneticists as sources of genetic variation to alter soybean seed protein and oils levels.
It has been recognized that there is a negative correlation between seed size and seed protein concentration [32, 34], but more importantly, an even stronger negative correlation between seed oil and seed protein content [22, 38–41]. G. soja has an extremely small seed size and much higher seed protein concentration (>50%) [42, 43] than G. max. The average seed protein content in most G. max germplasm accessions is approximately 42.1% . The reasonable explanation could be that selection had been imposed for larger seed types during domestication by ancient soybean farmers which had a concurrent genetic consequence of lower seed protein concentration. The lower seed protein concentration resulting from selection would also simultaneously lead to the increase in seed oil concentration. The possible cause for this would be either very tight genetic linkage of the genes controlling seed protein and oil concentration or that both traits are controlled by the same gene(s) . The latter is more likely, given that high seed protein and the low seed oil almost always co-segregate and given the fact that many attempts to separate low seed oil from high seed protein (or the inverse) have been unsuccessful [30, 32]. The results of GWAS from the current study were not in complete agreement with this assertion given results for the SNP at the 4.92 Mbp position on Gm09 at which one allele was associated with both higher protein and oil content and the alternative allele with lower protein and oil content. This is a QTL that should be of interest to those breeding for seed constituents.
One suggested virtue of association studies is the ability to take advantage of existing phenotypic data. However, there are also obstacles to performing association studies using existing phenotypic data, especially in the case of quantitative traits such as soybean seed protein and oil content. The seed protein and oil data obtained from GRIN were derived from field evaluations of the USDA Soybean Germplasm Collection conducted over the past 50 years at various locations and the evaluations included accessions from one or more maturity groups. The seed protein and oil data for the accessions used in the current study were not from a single field trail or set of trials in which all of the 298 accessions were included. Rather, the data were from a number of different trials grown in different environments and years. A quantitative trait defined in different studies and determined in a range of different environments and years is likely to produce different accession values that are confounded by these non-genetic effects, and it is, of course, the accession means that are used in GWAS. Such may be the case of the seed protein and oil concentration from the GRIN database used in this study (Figure 2A and B). There was only a correlation of r = 0.61 for seed protein concentration and 0.78 for seed oil concentration between the replicated field data from the current study vs. the data reported in the GRIN. We have a high level of confidence in the data obtained in the current study given the low coefficient of variation in the two replicated field experiments at two locations and the high correlation of the seed protein and oil values between the two locations. Thus, it is at least a possibility that the advantage often suggested for association study, i.e., the availability of large amounts of existing phenotypic data may, in some cases, not be the advantage that is often assumed.
In this study, we performed a GWAS to detect genome regions controlling the quantitative traits, seed protein and oil content, using 42,368 SNP markers in a genetically diverse set of the 298 soybean germplasm accessions. Despite the relatively low level of LD and complex population structure, we were able to successfully identify many of the previously reported QTL associated with soybean seed protein and oil, and we also were able to further narrow the size of the genome region in which those QTL were likely to be located. The chromosome regions defined in this study can be used for further analysis to identify the causal gene(s) as well as to identify DNA markers that can be used in selection to alter soybean seed protein and oil in a predictable manner. The SNP data used in this study will soon be available through SoyBase (http://soybase.org/) for more than 19,000 G. max and G. soja germplasm accessions in the USDA Soybean Germplasm Collection based upon the analysis of the entire USDA Soybean Germplasm Collection with the Illumina Infinium SoySNP50K BeadChip described by Song et al. . These genotypic data should be a useful resource for the detection of agriculturally important genes/QTL using association analysis.
Soybean germplasm accessions were selected based upon seed protein content (% on a dry weight basis) as reported in the GRIN (Germplasm Resources Information Network, U.S. Department of Agriculture, Agricultural Research Service, http://www.ars-grin.gov/npgs/index.html) database (Additional file 4). A total of 298 accessions were identified so as to create two groups–a case group including 151 accessions with high seed protein values ranging from 46 to 51%, and a control group including 147 accessions with more typical seed protein values ranging from 40 to 43%. The selection of accessions for the two groups was constrained to ensure balance between the groups in terms of geographical origin in the Far East (China, Korea, Japan), maturity group (II, III & IV), and other known phenotypic and morphological traits. Such balancing was done in an attempt to mitigate differences in population structure between the two groups. Genomic DNA of all accessions was extracted from bulked young leaf tissue grown in the greenhouse using the CTAB method .
Field trials and measurement of protein and oil in the harvested seed
Field tests were conducted using a randomized complete block design with two replicates of hillplots grown at Beltsville, MD and two replicates grown at Lincoln, NE in 2003. For an analysis of total seed nitrogen, the seeds were dried for one day at 60°C and ground with a coffee grinder. The powdered grain was stored in an air-tight polyethylene bag and weighed in small tin capsules (LECO, St. Joseph, MI) to a targeted powder weight of approximately 0.2 g. The percentage of total nitrogen in the grain powder was determined using a LECO CHN 2000 analyzer (LECO, St. Joseph, MI) . The seed protein percentage was calculated by multiplying the total nitrogen percentage by 6.25. The seed oil percentage was determined with approximately 10 g of seed using a Maran pulsed NMR (Resonance Instruments, Witney, Oxfordshire, UK), followed by the field induction decay-spin echo procedure . Protein and oil concentrations were expressed on a% dry weight basis.
A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped in the 298 accessions, with 52,041 analyzed using an Illumina Infinium assay  and 3,072 SNPs analyzed using the Illumina GoldenGate assay following the protocol described by Fan et al.  and Hyten et al. . In addition, there were 46 SNP markers genotyped by direct sequencing and a single-base-extension method using the Luminex flow cytometer as described by Choi et al. . A total of 1,363 SNPs were analyzed using both Illumina GoldenGate and Illumina Infinium assays. After eliminating redundant SNPs, non-polymorphic SNPs and SNPs with >25% missing data, a total of 42,368 SNPs (Additional file 5) remained. The physical positions of these markers in the soybean genome were determined using the whole genome assembly of Williams 82 soybean (Glyma 1.01) at the U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA (http://www.phytozome.net/soybean).
Linkage disequilibrium estimation
For the estimation of the level of LD, a total of 31,954 loci with minor allele frequency >0.10 and the number of missing data points less than 25% was used. Heterozygous alleles were treated as missing data. Only physically linked markers located within 20 Mbp distances were used for LD estimation. Haploview 4.2  was used to make all pair-wise comparisons of the alleles to calculate r 2 (the squared allele frequency correlation between two loci), and to compute D’ (standardized disequilibrium coefficient) . In addition, LD (D’ and r 2 ) was estimated separately for euchromatic and heterochromatic regions. The euchromatin and the heterochromatin regions of each of the 20 chromosomes were determined as follows: The physical positions of 3,321 SNP and 862 simple sequence repeat (SSR) markers mapped in the soybean genome [21, 51, 54–56] were determined by BLAST analysis of the SNP and SSR-containing source sequences to the soybean whole genome sequence using the standalone Megablast software as previously described . The cumulative genetic distances (cM)  were plotted against their cumulative physical distance (Mbp) to determine the base pair/centiMorgan relationship via the common SSR and SNP loci positions on the genetic linkage map and their physical position in the genome sequence along each chromosome. The region between the two inflection points of the cumulative genetic distance against cumulative physical distance on the plot was defined as the heterochromatin and the regions on each chromosome flanking the inflection points were defined as the euchromatin . In order to provide an assessment of the difference in the extent of LD between euchromatic and heterochromatic regions, LD was calculated by a pairwise comparison of physically linked heterochromatic SNPs, and then two separate LD calculations were made for all pairs of markers from the two flanking euchromatic regions on each chromosome. The mean value of LD was estimated by calculating the mean LD of SNP pairs at distances of 0-200 Kbp, 200-400 Kbp, etc. to 19,800-20,000 Kbp in euchromatic and heterochromatic regions.
An analysis of variance was conducted to obtain the variance components which were used to calculate the heritability of seed protein and oil content. The variances of location, replications within locations, accessions, and the accession × location interaction were determined using the PROC GLM procedure of the Statistical Analysis System (SAS institute, Inc., Cary, NC). Genetic and environmental variances were extracted from the variance component estimates based on the expected mean squares. For the estimation of the heritability of seed protein and seed oil concentration, replications and locations were considered to be random effects.
where is the genetic variance among accessions, and is the environmental variance which results from error and the accession × location interaction.
To obtain the matrix of population structure, a total of 42,368 SNPs were analyzed in the 298 germplasm accessions using the Admixture program v. 1.22 . The 10-fold cross-validation procedure was performed with 25 random seeding replications for K values from 2 to 30. The minimum mean standard error was when K = 17. The kinship coefficient matrix that explained the most probable identity by state of each allele between individuals was estimated with the TASSEL program . For a genome-wide association study, we compared the false positive rate using the general linear model (GLM), the mixed linear model (MLM), and the compressed MLM of the TASSEL program . The MLM was as good as the compressed MLM and greatly reduced the false positive rate versus the GLM. For this study, the value of 0.001 was used as a Type I error significance threshold P value. As a verification of the genome regions identified in this study we compared the genomic locations of previously reported seed protein and oil QTL with the physical positions of the markers showing significant associations in this study.
Genome-wide association study
Single nucleotide polymorphisms
Quantitative trait locus
Germplasm resources information network
Simple sequence repeat
Coefficient of variation
Restriction fragment length polymorphism
Near isogenic lines
General linear model
And checks and the three experiments in mixed linear model.
We thank the National Science Foundation (IOS 0822258) and the United Soybean Board Project #8265 for funding to support this work.
- Knowler WC, Williams RC, Pettitt DJ, Steinberg AG: Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet. 1988, 43 (4): 520-526.PubMed CentralPubMedGoogle Scholar
- Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ES: Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet. 2001, 28 (3): 286-289. 10.1038/90135.PubMedView ArticleGoogle Scholar
- Palaisa KA, Morgante M, Williams M, Rafalski A: Contrasting effects of selection on sequence diversity and linkage disequilibrium at two phytoene synthase loci. Plant Cell. 2003, 15 (8): 1795-1806. 10.1105/tpc.012526.PubMed CentralPubMedView ArticleGoogle Scholar
- Wilson LM, Whitt SR, Ibanez AM, Rocheford TR, Goodman MM, Buckler ES: Dissection of maize kernel composition and starch production by candidate gene association. Plant Cell. 2004, 16 (10): 2719-2733. 10.1105/tpc.104.025700.PubMed CentralPubMedView ArticleGoogle Scholar
- Urbany C, Stich B, Schmidt L, Simon L, Berding H, Junghans H, Niehoff KH, Braun A, Tacke E, Hofferbert HR, et al: Association genetics in Solanum tuberosum provides new insights into potato tuber bruising and enzymatic tissue discoloration. BMC Genomics. 2011, 12: 7-20. 10.1186/1471-2164-12-7.PubMed CentralPubMedView ArticleGoogle Scholar
- Bhosale SU, Stich B, Rattunde HF, Weltzien E, Haussmann BI, Hash CT, Ramu P, Cuevas HE, Paterson AH, Melchinger AE, et al: Association analysis of photoperiodic flowering time genes in west and central African sorghum [Sorghum bicolor (L.) Moench]. BMC Plant Biol. 2012, 12: 32-41. 10.1186/1471-2229-12-32.PubMed CentralPubMedView ArticleGoogle Scholar
- Niedziela A, Bednarek PT, Cichy H, Budzianowski G, Kilian A, Aniol A: Aluminum tolerance association mapping in triticale. BMC Genomics. 2012, 13: 67-82. 10.1186/1471-2164-13-67.PubMed CentralPubMedView ArticleGoogle Scholar
- Visioni A, Tondelli A, Francia E, Pswarayi A, Malosetti M, Russell J, Thomas W, Waugh R, Pecchioni N, Romagosa I, et al: Genome-wide association mapping of frost tolerance in barley (Hordeum vulgare L.). BMC Genomics. 2013, 14: 424-10.1186/1471-2164-14-424.PubMed CentralPubMedView ArticleGoogle Scholar
- Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S, Rocheford TR, McMullen MD, Holland JB, Buckler ES: Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat Genet. 2011, 43 (2): 159-162. 10.1038/ng.746.PubMedView ArticleGoogle Scholar
- Kump KL, Bradbury PJ, Wisser RJ, Buckler ES, Belcher AR, Oropeza-Rosas MA, Zwonitzer JC, Kresovich S, McMullen MD, Ware D, et al: Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population. Nat Genet. 2011, 43 (2): 163-168. 10.1038/ng.747.PubMedView ArticleGoogle Scholar
- Zhang X, Tang B, Yu F, Li L, Wang M, Xue Y, Zhang Z, Yan J, Yue B, Zheng Y, et al: Identification of major QTL for waterlogging tolerance using genome-wide association and linkage mapping of maize seedlings. Plant Mol Biol Rep. 2013, 31: 594-606. 10.1007/s11105-012-0526-3.View ArticleGoogle Scholar
- Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, et al: Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010, 42 (11): 961-967. 10.1038/ng.695.PubMedView ArticleGoogle Scholar
- Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M, Bergelson J, Cuguen J, Roux F: Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet. 2010, 6 (5): e1000940-10.1371/journal.pgen.1000940.PubMed CentralPubMedView ArticleGoogle Scholar
- Chan EK, Rowe HC, Kliebenstein DJ: Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics. 2010, 185 (3): 991-1007. 10.1534/genetics.109.108522.PubMed CentralPubMedView ArticleGoogle Scholar
- Mandel JR, Nambeesan S, Bowers JE, Marek LF, Ebert D, Rieseberg LH, Knapp SJ, Burke JM: Association mapping and the genomic consequences of selection in sunflower. PLoS Genet. 2013, 9 (3): e1003378-10.1371/journal.pgen.1003378.PubMed CentralPubMedView ArticleGoogle Scholar
- Mamidi S, Chikara S, Goos RJ, Hyten DL, Moghaddam SM, Cregan PB, McClean PE: Genome-wide association analysis identifies candidate genes associated with iron deficiency chlorosis in soybean. Plant Genome. 2011, 4: 154-164. 10.3835/plantgenome2011.04.0011.View ArticleGoogle Scholar
- Hao D, Chao M, Yin Z, Yu D: Genome-wide association analysis detecting significant single nucleotide polymorphisms for chlorophyll and chlorophyll fluorescence parameters in soybean (Glycine max) landraces. Euphytica. 2012, 186: 919-931. 10.1007/s10681-012-0697-x.View ArticleGoogle Scholar
- Zhao Q, Zhang Y, Cheng Z, Chen M, Wang S, Feng Q, Huang Y, Li Y, Tang Y, Zhou B, et al: A fine physical map of the rice chromosome 4. Genome Res. 2002, 12 (5): 817-823. 10.1101/gr.48902.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim JS, Islam-Faridi MN, Klein PE, Stelly DM, Price HJ, Klein RR, Mullet JE: Comprehensive molecular cytogenetic analysis of sorghum genome architecture: distribution of euchromatin, heterochromatin, genes and recombination in comparison to rice. Genetics. 2005, 171 (4): 1963-1976. 10.1534/genetics.105.048215.PubMed CentralPubMedView ArticleGoogle Scholar
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al: The Sorghum bicolor genome and the diversification of grasses. Nature. 2009, 457 (7229): 551-556. 10.1038/nature07723.PubMedView ArticleGoogle Scholar
- Hyten DL, Choi I, Song Q, Specht JE, Carter TE, Shoemaker RC, Hwang EY, Matukumalli LK, Cregan PB: A high density integrated genetic linkage map of soybean and the development of a 1536 universal soy linkage panel for quantitative trait locus mapping. Crop Sci. 2010, 50 (3): 960-968. 10.2135/cropsci2009.06.0360.View ArticleGoogle Scholar
- Brummer EC, Graef GL, Orf J, Wilcox JR, Shoemaker RC: Mapping QTL for seed protein and oil content in eight soybean populations. Crop Sci. 1997, 37 (2): 370-378. 10.2135/cropsci1997.0011183X003700020011x.View ArticleGoogle Scholar
- Specht JE, Chase K, Macrander M, Graef GL, Chung J, Markwell JP, Germann M, Orf JH, Lark KG: Soybean response to water: a QTL analysis of drought tolerance. Crop Sci. 2001, 41 (2): 493-509. 10.2135/cropsci2001.412493x.View ArticleGoogle Scholar
- Mansur LM, Orf JH, Chase K, Jarvik T, Cregan PB, Lark KG: Genetic mapping of agronomic traits using recombinant inbred lines of soybean. Crop Sci. 1996, 36 (5): 1327-1336. 10.2135/cropsci1996.0011183X003600050042x.View ArticleGoogle Scholar
- Lee SH, Bailey MA, Mian MAR, Carter TE, Shipe ER, Ashley DA, Parrott WA, Hussey RS, Boerma HR: RFLP loci associated with soybean seed protein and oil content across populations and locations. Theor Appl Genet. 1996, 93 (5/6): 649-657.PubMedView ArticleGoogle Scholar
- Orf JH, Chase K, Jarvik T, Mansur LM, Cregan PB, Adler FR, Lark KG: Genetics of soybean agronomic traits: I: comparison of three related recombinant inbred populations. Crop Sci. 1999, 39 (6): 1642-1651. 10.2135/cropsci1999.3961642x.View ArticleGoogle Scholar
- Stombaugh SK, Orf JH, Jung HG, Chase K, Lark KG, Somers DA: Quantitative trait loci associated with cell wall polysaccharides in soybean seed. Crop Sci. 2004, 44: 2101-2106. 10.2135/cropsci2004.2101.View ArticleGoogle Scholar
- Raychaudhuri S: Mapping rare and common causal alleles for complex human diseases. Cell. 2011, 147 (1): 57-69. 10.1016/j.cell.2011.09.011.PubMed CentralPubMedView ArticleGoogle Scholar
- Ladouceur M, Dastani Z, Aulchenko YS, Greenwood CM, Richards JB: The empirical power of rare variant association methods: results from sanger sequencing in 1,998 individuals. PLoS Genet. 2012, 8 (2): e1002496-10.1371/journal.pgen.1002496.PubMed CentralPubMedView ArticleGoogle Scholar
- Chung J, Babka HL, Graef GL, Staswick PE, Lee DJ, Cregan PB, Shoemaker RC, Specht JE: The seed protein, oil, and yield QTL on soybean linkage group I. Crop Sci. 2003, 43 (3): 1053-1067. 10.2135/cropsci2003.1053.View ArticleGoogle Scholar
- Diers BW, Keim P, Fehr WR, Shoemaker RC: RFLP analysis of soybean seed protein and oil content. Theor Appl Genet. 1992, 83 (5): 608-612.PubMedView ArticleGoogle Scholar
- Nichols DM, Glover KD, Carlson SR, Specht JE, Diers BW: Fine mapping of a seed protein QTL on soybean linkage group I and its correlated effects on agronomic traits. Crop Sci. 2006, 46: 834-839. 10.2135/cropsci2005.05-0168.View ArticleGoogle Scholar
- Reinprecht Y, Poysa VW, Yu K, Rajcan I, Ablett GR, Pauls KP: Seed and agronomic QTL in low linolenic acid, lipoxygenase-free soybean (Glycine max (L.) Merrill) germplasm. Genome. 2006, 49 (12): 1510-1527. 10.1139/g06-112.PubMedView ArticleGoogle Scholar
- Sebolt AM, Shoemaker RC, Diers BW: Analysis of a quantitative trait locus allele from wild soybean that increases seed protein concentration in soybean. Crop Sci. 2000, 40 (5): 1438-1444. 10.2135/cropsci2000.4051438x.View ArticleGoogle Scholar
- Tajuddin T, Watanabe S, Yamanaka N, Harada K: Analysis of quantitative trait loci for protein and lipid contents in soybean seeds using recombinant inbred lines. Breeding Science. 2003, 53: 133-140. 10.1270/jsbbs.53.133.View ArticleGoogle Scholar
- Bolon YT, Joseph B, Cannon SB, Graham MA, Diers BW, Farmer AD, May GD, Muehlbauer GJ, Specht JE, Tu ZJ, et al: Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean. BMC Plant Biol. 2010, 10: 41-10.1186/1471-2229-10-41.PubMed CentralPubMedView ArticleGoogle Scholar
- Cardon LR, Bell JI: Association study designs for complex diseases. Nat Rev Genet. 2001, 2 (2): 91-99. 10.1038/35052543.PubMedView ArticleGoogle Scholar
- Brim CA, Burton JW: Recurrent selection in soybeans: II: selection for increased percent protein in seeds. Crop Sci. 1979, 19: 494-498. 10.2135/cropsci1979.0011183X001900040016x.View ArticleGoogle Scholar
- Cober ER, Voldeng HD: Developing high-protein, high-yield soybean populations and lines. Crop Sci. 2000, 40: 39-42. 10.2135/cropsci2000.40139x.View ArticleGoogle Scholar
- Wilcox JR: Increasing seed protein in soybean with eight cycles of recurrent selection. Crop Sci. 1998, 38 (6): 1536-1540. 10.2135/cropsci1998.0011183X003800060021x.View ArticleGoogle Scholar
- Burton JW: Quantitative genetics: results relevant to soybean breeding. Soybeans: Improvement, production, and Uses. Edited by: Wilcox JR. 1987, 211-247. 2Google Scholar
- Bao XU: A decade of wild soybean (Glycine soja) research in China. Jilin Agric Sci. 1989, 39 (1): 5-13.Google Scholar
- Xu B: The protein resources of Glycine max in China. Soybean Sci. 1985, 3 (4): 327-331.Google Scholar
- Wilson RF: Seed composition: p. 621-677. HR Boerma and JE Specht (ed) Soybean: Improvement, production, and Uses 3rd ed. 2004Google Scholar
- Song Q, Hyten DL, Jia G, Quigley CV, Fickus EW, Nelson RL, Cregan PB: Development and evaluation of SoySNP50K, a high-density genotyping array for soybean. PLoS One. 2013, 8 (1): e54985-10.1371/journal.pone.0054985.PubMed CentralPubMedView ArticleGoogle Scholar
- Keim P, Olson TC, Shoemaker RC: A rapid protocol for isolating soybean DNA. Soybean Genet Newsl. 1988, 15: 150-152.Google Scholar
- Campbell CR: Total carbon, hydrogen and nitrogen–combustion method. Res Ser Bull. 1992, 368: 21-23.Google Scholar
- Rubel G: Simultaneous determination of oil and water contents in different oilseeds by pulsed nuclear magnetic resonance. J Am Oil Chem Soc. 1994, 71 (10): 1057-1062. 10.1007/BF02675896.View ArticleGoogle Scholar
- Fan JB, Gunderson KL, Bibikova M, Yeakley JM, Chen J, Wickham Garcia E, Lebruska LL, Laurent M, Shen R, Barker D: Illumina universal bead arrays. Methods Enzymol. 2006, 410: 57-73.PubMedView ArticleGoogle Scholar
- Hyten DL, Song Q, Choi I, Yoon M, Specht JE, Matukumalli LK, Nelson RL, Shoemaker RC, Young ND, Cregan PB: High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. Theor Appl Genet. 2008, 116 (7): 945-952. 10.1007/s00122-008-0726-2.PubMedView ArticleGoogle Scholar
- Choi IY, Hyten DL, Matukumalli LK, Song Q, Chaky JM, Quigley CV, Chase K, Lark KG, Reiter RS, Yoon MS, et al: A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics. 2007, 176: 685-696. 10.1534/genetics.107.070821.PubMed CentralPubMedView ArticleGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21 (2): 263-265. 10.1093/bioinformatics/bth457.PubMedView ArticleGoogle Scholar
- Gaut BS, Long AD: The lowdown on linkage disequilibrium. Plant Cell. 2003, 15 (7): 1502-1506. 10.1105/tpc.150730.PubMed CentralPubMedView ArticleGoogle Scholar
- Cregan PB, Jarvik T, Bush AL, Shoemaker RC, Lark KG, Kahler AL, Kaya N, VanToai TT, Lohnes DG, Chung J: An integrated genetic linkage map of the soybean genome. Crop Sci. 1999, 39 (5): 1464-1490. 10.2135/cropsci1999.3951464x.View ArticleGoogle Scholar
- Song QJ, Marek LF, Shoemaker RC, Lark KG, Concibido VC, Delannay X, Specht JE, Cregan PB: A new integrated genetic linkage map of the soybean. Theor Appl Genet. 2004, 109 (1): 122-128. 10.1007/s00122-004-1602-3.PubMedView ArticleGoogle Scholar
- Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK, Grimm DR, Hyatt SM, Fickus EW, Young ND, Cregan PB: Single-nucleotide polymorphisms in soybean. Genetics. 2003, 163 (3): 1123-1134.PubMed CentralPubMedGoogle Scholar
- Song Q, Jia G, Zhu Y, Grant D, Nelson RT, Hwang EY, Hyten DL, Cregan P: Abundance of SSR motifs and development of candidate polymorphic SSR markers (BARCSOYSSR_1.0) in soybean. Crop Sci. 2010, 50 (5): 1950-1960. 10.2135/cropsci2009.10.0607.View ArticleGoogle Scholar
- Alexander DH, Novembre J, Lange K: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009, 19 (9): 1655-1664. 10.1101/gr.094052.109.PubMed CentralPubMedView ArticleGoogle Scholar
- Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, et al: A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006, 38 (2): 203-208. 10.1038/ng1702.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.