Skip to main content

Dissection of complicate genetic architecture and breeding perspective of cottonseed traits by genome-wide association study



Cottonseed is one of the most important raw materials for plant protein, oil and alternative biofuel for diesel engines. Understanding the complex genetic basis of cottonseed traits is requisite for achieving efficient genetic improvement of the traits. However, it is not yet clear about their genetic architecture in genomic level. GWAS has been an effective way to explore genetic basis of quantitative traits in human and many crops. This study aims to dissect genetic mechanism seven cottonseed traits by a GWAS for genetic improvement.


A genome-wide association study (GWAS) based on a full gene model with gene effects as fixed and gene-environment interaction as random, was conducted for protein, oil and 5 fatty acids using 316 accessions and ~ 390 K SNPs. Totally, 124 significant quantitative trait SNPs (QTSs), consisting of 16, 21, 87 for protein, oil and fatty acids (palmitic, linoleic, oleic, myristic, stearic), respectively, were identified and the broad-sense heritability was estimated from 71.62 to 93.43%; no QTS-environment interaction was detected for the protein, the palmitic and the oleic contents; the protein content was predominantly controlled by epistatic effects accounting for 65.18% of the total variation, but the oil content and the fatty acids except the palmitic were mainly determined by gene main effects and no epistasis was detected for the myristic and the stearic. Prediction of superior pure line and hybrid revealed the potential of the QTSs in the improvement of cottonseed traits, and the hybrid could achieve higher or lower genetic values compared with pure lines.


This study revealed complex genetic architecture of seven cottonseed traits at whole genome-wide by mixed linear model approach; the identified genetic variants and estimated genetic component effects of gene, gene-gene and gene-environment interaction provide cotton geneticist or breeders new knowledge on the genetic mechanism of the traits and the potential molecular breeding design strategy.


Cotton, one of the most important crops, has been used extensively in many fields, such as textiles, food consumption and medical use. Cottonseed accounts for approximately two-thirds of the total cotton harvested while the remaining one-third is explained by fiber [1]. Cottonseed meal is a very good source of protein, and generally less expensive per unit of protein than soybean meal [2]. Benefiting from effective decrease of free gossypol in cottonseed, cottonseed protein has been regarded as a good food source with well-balanced and high nutritional value [3]. Cottonseed oil is typically composed of saturated fatty acids (about 1% myristic acid (C14:0), 22% palmitic acid (C16:0), and 3% stearic acid (C18:0) which confer a relatively stable vegetable oil without partial hydrogenation, and enough unsaturated fatty acids (22% oleic acid (C18:1) and 52% linoleic acid (C18:2) which are requisite ingredients for a heart healthy oil [4]. In addition, cottonseed oil could also be purified to be a kind of alternative fuel for diesel engines [5].

Using linkage mapping, several studies have been carried out for detecting QTLs associated with cottonseed traits in specifically designed mapping populations [6,7,8,9,10]. However, limited recombination events and low genetic diversity in the designed population are major obstacles to distinguishing more QTLs at fine level by conventional linkage mapping [11]. The genome-wide association study (GWAS), as an alternative strategy, has been proved to be an effective way to identify genetic variants underlying traits at a relatively finer resolution in maize, rice, soybean, sesame, and other crops [11,12,13]. Badigannavar et al. used the GWAS successfully to detect genetic diversity, population structure and variants associated with cottonseed quality traits [14], but this study based on AFLP markers, and might lose precision and power due to relatively small number of markers. Currently, the availability of draft genome sequence of the G. raimondii, G. arboreum and the G. hirsutum provides a crucial groundwork for identification, isolation and manipulation of important cotton functional genes controlling agronomic, yield, and quality traits [15,16,17,18,19,20]. Due to the narrow genetic basis and the characteristic of allotetraploid genome, it is difficult to discover or design polymorphic molecular marker in cotton, however, as the development of high-throughput DNA sequencing technology, it has been possible to discover a large number of SNPs [21, 22] saturated in the entire cotton genome; as a result, GWAS could be conducted for exploring intricate genetic architecture of most cotton traits of interested, which is mostly referred to the underlying genetic basis and variation properties of a trait, and usually depicted by the associated QTLs and their genetic effects including additive, dominance, epistasis and their interactions with environments [23].

Generally, most GWAS approaches are based on simple additive or additive and dominance effects models [24, 25], ignoring the fact that both the polygenic interaction (epistasis) and the gene × environment interaction are involved in the genetic variation of a complex trait substantially; as a result, these methods are not effective for detecting many minor-effect loci and paired epistatic loci and will result in the problem of missing heritability [26,27,28]. Zeng et al. (2016) investigated the association between SNPs of GhSus family genes and fiber- and seed-related traits in 277 upland cotton accessions based on an epistatic association mapping (EAM) model, which included main-effects, epistatic effects and one-dimensional gene × environment interaction effect. However, their study only focused on some specific genes, it’s still very valuable to explore the whole genetic architecture of the cottonseed traits across entire AD genomes [29]. In this study, we employed a newly proposed method, which could analyze single-locus effects, epistatic effects, and their interaction effects with environments simultaneously, to conduct the GWAS for seven cottonseed traits (protein content, oil content, myristic acid, palmitic acid, stearic acid, oleic acid, and linoleic acid), based on the phenotype of 316 accessions in 3 different locations and the genotype of ~ 390 k SNPs. This study aims to uncover the complicate genetic architecture of cottonseed traits and predict the breeding potential of the detected genetic variants in genetic improvement of target traits.


Population structure and LD analysis

According to the results of the population structure analysis, the total cotton accessions could be classified into two subgroups (Additional file 1: Figure S2). The half LD decay distance measured by the average correlation coefficient (r2) of pairwise SNPs decreases to the half of its maximum value (Additional file 2: Figure S3) was ~ 160 kb, in the range of maize (~ 500 kb) and rice (~ 123 kb in Indica rice landrace and ~ 167 kb in Japonica rice landrace) [30, 31], which is in concert with the result expected for cotton as a species of often cross-pollinated plant. Abdurakhmonov et al. reported that the genome-wide average of LD block size of cotton, based on SSR marker at r2 ≥ 0.1, is less than 4 Mb in the landrace germplasm and larger than 12 Mb in the improved variety germplasm [32]; whereas the corresponding value in our study was around 163 Kb, much higher resolution than those in the previous studies.

Total heritability analysis for seven cottonseed traits

According to the result of SNP screening by GMDR method for each trait, total 4864 SNPs (996 SNPs for protein, 717 SNPs for oil, 784 SNPs for oleic acid, 889 SNPs for linoleic acid, 1026 SNPs for palmitic acid, 881 SNPs for myristic acid, and 1139 SNPs for stearic acid) were selected and used to perform the subsequent GWAS analysis. A total of 124 significant associated QTSs were identified for the protein content, the oil content, and the fatty acids. In general, all seven cottonseed traits exhibited significant additive and dominance effects; however, the epistatic effects and gene-environment effects, were largely diverse across traits (Table 1); the broad-sense heritabilities of these traits varied from 71.62 to 93.43%.

Table 1 Estimated heritability for seven cottonseed traits

The protein content was not only regulated by additive and dominance effects, but also by four kinds of epistatic effects (aa, ad, da, dd) with total epistatic heritability of 65.18%, in which the heritability of dominance × dominance epistatic effect (\( {h}_{DD}^2=31.43\% \)) was larger than that of other epistatic effects. Similarly, the palmitic acid was affected by additive and dominance effects, as well as epistatic effects (aa, da, dd), and the dominance × dominance effect played a key role in genetic variation (\( {h}_{DD}^2=36.79\% \)). Like the protein content, the oil content was also affected by all genetic main effects, but the dominance effect (\( {h}_D^2=33.44\% \)) contributed the largest to the total heritability, and dominance × environment and additive-dominance epistasis × environment effects (de and ade) were also involved in the genetic architecture of the oil content. The oleic acid and the linoleic acid are two important and genetically related traits, and both were regulated mainly by dominance effect (\( {h}_D^2=54.99\% \) for the oleic acid, and \( {h}_D^2=45.09\% \) for the linoleic acid); Furthermore, epistatic effects and gene × environment interaction effects were also involved in the genetic architecture of these two traits and account for more proportion of the genetic heritability for the linoleic acid than that for the oleic acid. The additive and dominance effects explained 70.85% of the total variation and 95.54% of total heritability for the myristic acid. On the stearic acid, the gene × environment interaction effects were very prominent, compared with the main effect (the additive effect); notably, the additive × environment interactions (\( {h}_{AE}^2=25.05\% \)) exhibited remarkable effects on the stearic acid than the dominance × environment interactions (\( {h}_{DE}^2=3.41\% \)).

Genetic architecture of the protein content and the oil content

15 QTSs, consisting of 5 QTSs with significant main-effects of additive or dominance, named individual QTS hereafter, and 6 pairs of QTSs involved in epistatic interaction, named epistatic QTSs hereafter, were detected to be significantly associated with the protein content (Table 2, Fig. 1 and Additional file 3: Table S3). QTS A3_115443958 was identified with the largest additive heritability (−log10p = 69.59, \( {h}_a^2=5.14\% \)). Of epistatic QTSs, three pairs of QTSs A3_115443958 & D2_383581, A6_29542325 & D8_18888997, and A7_1504479 & D2_383581, accounted for major epistatic heritability of 61.03%. Generally, QTSs involved in epistasis with heritability from 0.12 to 20.16%, also had both additive and dominance effects. In addition to the dominance effect, D2_383581 contributed four kinds of epistatic effects (aa, ad, da, dd) through interaction with A3_115443958, wherein the additive-dominance epistasis explained 20.16% heritability, and other 3 kinds of epistasis (aa, da, dd) taked effects on the protein together with A7_1504479.

Table 2 Genome-wide significant QTSs associated with the protein content and the oil content at -log10p > 7 (at least one kind of effect of QTS)
Fig. 1
figure 1

Network plot of highly significant QTSs associated with seven cottonseed traits. Red dot (square) indicates the QTS expresses additive (dominance) effects, green dot (square) indicates the QTS expresses additive (dominance) by environment interaction effect, blue dot (square) indicates the QTS expresses both additive (dominance) effect and additive (dominance) by environment effect, black dot (square) indicate the QTS doesn’t express additive (dominance) effect but interacts with other QTS, red line indicates there is only interaction (epistasis) between genetic components of two QTSs at the ends of the line, green line indicates there is only interaction between epistasis and environment for the ends of the line, blue line indicates there is both epistasis effect and interaction between epistasis and environment for the end of the line

A total of 21 significant QTSs were detected for the oil content, of which were 3 additive QTSs, 1 dominance QTS, 9 QTSs with both additive and dominance effects, and 4 paired epistatic QTSs, respectively (Table 2, Fig. 1 and Additional file 3: Table S3). Except QTS A9_85081637, whose additive heritability was almost twice as much as the dominance heritability, all the other QTSs activating in both additive and dominance exhibited lower additive heritability than their corresponding dominance heritability. Being the largest contributor to the total heritability and expressing the largest dominance heritability, QTS A13_83121382 not only exhibited additive, dominance and dominance-environment interaction effects, but also larger epistatic effects through interaction with A7_110987422 (aa, ad, da, dd, ade2). The total heritability of both A13_83121382 and A7_110987422, reached to 47.35%, accounting for over half of the total heritability of cottonseed oil content, indicating the importance of the two epistatic QTSs in regulating oil content. Both the phenotypic and the genotypic correlation analysis revealed significant negative correlation between the protein content and the oil content (r p  =  − 0.63, r g  =  − 0.92) (Additional file 4: Table S2). Association analysis detected a pleiotropic genetic variant D3_35705563 which contributed a positive additive effect to the protein content and positive additive and negative dominance effects to the oil content.

Genetic architecture of the fatty acids

Based on the above analysis on heritability, it could be concluded that the dominance effect played a core role in regulating the oleic acid. There were total 21 QTSs associated with the oleic acid (Table 3, Fig. 1 and Additional file 3: Table S3), consisting of 17 QTSs with additive or dominance effects, 2 QTSs with both additive effect and additive × environmental effect, 1 pair of QTSs with epistatic effect. Of 10 QTSs with dominance effects, 8 QTSs also contributed additive effects, and each dominance heritability was larger than the corresponding additive heritability except QTS A7_2266630. D3_1889546, contributing to the largest heritability, significantly affected the oleic acid by a positive dominance effect (\( {h}_d^2=20.28\%,d=1.019 \)). QTSs A1_44951529 and A1_85724143 was a unique pair of epistatic QTSs detected with a negative additive × additive effect (aa  − 0.186) for the oleic acid, accounting for 2.69% epistatic heritability and 3.33% additive heritability. In the case of the linoleic acid, 21 QTSs were identified in total, including 19 main-effect QTSs and 1 pair of epistatic QTSs. For the oleic acid, the dominance effects predominated the variation, and were mainly attributed to the QTSs exhibiting multiple different effect types, the number of additive QTSs with a p value below 1 × 10− 7 was more than that of dominance effect QTSs. D4_21291786 and D5_47644432 not only contributed single locus effects, but also strong digenic additive × dominance and dominance × dominance effects which explained heritability of 24.68% for the linoleic acid. The oleic acid was negatively correlated with the linoleic acid in both phenotypic and genotypic effects (r p  =  − 0.54, r g  =  − 0.75) (Additional file 5: Table S2) and we detected 3 pleiotropic QTSs (D3_1889546, D3_29047260, D4_21291786) associated simultaneously with these two traits (Fig. 2, Additional file 6: Table S3). In addition to the negative specific additive effect in environment 1 (ae 1 ) on the linoleic acid, D3_1889546 also exhibited a strong positive dominance effect on the oleic acid and a negative dominance effect on the linoleic acid with the largest dominance heritability 20.28 and 10.76%, respectively. D3_29047260 affected the oleic acid by the additive and environment-specific additive effects, but the linoleic acid by the dominance and environment-specific dominance effects. D4_21291786 was associated with the oleic acid through the significant additive and dominance effects, as well as with the linoleic acid by the significant additive effect and epistatic effects with the D5_47644432.

Table 3 Genome-wide significant QTSs associated with five fatty acids at -log10p > 7 (at least one kind of effect of QTS)
Fig. 2
figure 2

The genetic relationship among six cottonseed traits. The black arrow denotes individual genetic effect; blue brace denotes gene-gene epistatic interaction effect

A total of 8 QTSs with a significant additive effect (−log10p > 8.30) were detected for the palmitic acid (Table 3, Fig. 1 and Additional file 3: Table S3). Three QTSs (A1_61493378, A13_119809048 and D4_10313468) showed the concordant effect direction, either positive dominance and additive effects or negative dominance and additive effects. The QTSs A7_642514 and D4_10313468 is a pair of key genetic variants, their total heritability reached 61.51% and explained 71.6% of the total heritability of the palmitic acid. They not only exhibited larger individual additive or dominance effects, as well as very strong epistatic effects, especially the dominance × dominance epistasis explaining as much as 36.79% of heritability.

On the genetic architecture of the myristic acid, 25 QTSs were identified, consisting of 12 QTSs with additive effect only, 10 QTSs with both additive and dominance effects, 1 QTS with additive and additive × environment effect and 1 pair of epistatic QTSs (Table 3, Fig. 1 and Additional file 3: Table S3). A1_35871478, expressing both additive and dominance effects, was the variant which accounted for the largest heritability by dominance effect, followed by A7_642514, whose dominance heritability was 5.00%. QTS pair A3_58421047 and A12_117532407 expressed a significant positive epistatic effect in environment 2 (aae2); A3_58421047 had no individual effect, whereas A12_117532407 contributed a positive additive effect, accounting for heritability of 1.47% (Additional file 3: Table S3). In addition to positive additive effect, A6_26471461 also had positive additive × environmental interaction effect in environment 2, and the interaction heritability (\( {h}_{ae}^2=1.07\% \)) was over twice as that of its main effect (\( {h}_a^2=0.40\% \)).

Different from the genetic basis of the fatty acid component traits discussed above, more genes of the stearic acid were found to be involved in environment interaction (Table 3, Fig. 1 and Additional file 3: Table S3) and no epistatic QTS was detected. Of 22 QTSs detected, 11 QTSs contributed additive or dominance × environment interaction effects. Apart from significant positive additive effect, A4_17627308 was also found to contribute the strongest additive × environment interaction effects in all 3 environments compared with other QTSs, its interaction heritability in environment 3 was larger than those in both environment 1 and environment 2, and the average gene-environment interaction heritability reached to 8%; furthermore, the effect direction of A4_17627308 in environment 3 was opposite to those in environment 1 and environment 2, implying that environment factor should be considered in the utilization of this genetic variant in breeding. QTS D10_30598593 was detected with not only significant additive × environment interaction effect in three environments but also dominance × environment interaction effect in environment 3.

According to the Fig. 2, we could find that fatty acids, except the stearic acid, were genetically linked mainly through pleiotropic QTSs and epistatic QTSs. There were in total 10 QTSs related to the genetic correlations of fatty acid composition, wherein, 8 QTSs were involved in the correlation of the linoleic acid with other 3 fatty acids (oleic, palmitic and myristic), indicating the key role of the linoleic acid in the genetic architecture of fatty acids. The A4_16656838 was a pleiotropic variant, which affected the myristic acid by significant additive effect, and the linoleic acid by significant additive and dominance effects. Similarly, the A6_26471461 and the D9_7944 were other two pleiotropic QTSs. The A6_26471461 affected the linoleic acid by significant negative additive effect and the additive × environment interaction effect in both environment 1 and environment 2, and the myristic acid by positive additive and positive additive-environment interaction effects in environment 2; while, the D9_7944 affected the linoleic acid and the myristic acid by opposite additive effects (a = 0.225 for the linoleic acid and a =  − 0.010 for the myristic acid). In addition, there seemed to exist an indirect genetic pathway connecting the linoleic acid and the myristic acid, in which A12_11753239 regulated the linoleic acid and the palmitic acid by opposite additive effects; while A7_642514 regulated the palmitic acid through individual negative additive effect and epistatic effect with D4_10313468, and also affected the myristic acid through negative additive and dominance effects.

Candidate gene annotation

16 QTSs were identified in the regions of the annotated genes captured by 124 QTSs of cottonseed traits by using InterProScan (Table 4). A7_1504479, located in CDS of gene Gh_A07G0108, is associated with five kinds of protein domains relevant to crop resistance. Within these protein domains, protein kinase, catalytic domain (IPR000719), was found to participate in regulation on the lint yield of cotton in previous study [33]; Protein kinase-like domain (IPR011009) is related to the salt tolerance mechanism in Arabidopsis thaliana or rice [34, 35]; Kinase-like domain (IPR011009), which was found in cotton boll weevil transcriptome and associated with putative genes underling qNLB8.06DK888 of maize, is also relevant to insect control or disease resistance [36, 37]; Tyrosine-protein kinase, catalytic domain (IPR020635), relevant to A5_22579901, is associated with disease-related gene in rice [38]. In addition, A5_22579901, located in the CDS region of gene Gh_A05G1876, is associated with peptidase C48, SUMO/Sentrin/Ubl1 protein domain; D1_616439, located in 3’-UTR of gene Gh_D01G0083, encodes pseudouridine synthase family protein. A3_100487624, located in the CDS region of gene Gh_A10G1894, encoding SLC26A/SulP transporter (IPR001902), STAS domain (IPR002645), SLC26A/Sulphate transporter domain (IPR011547), SLC26A/SulP transporter (IPR001902), is associated with the oil content with concordant negative additive and dominance effects. However, A2_58832915 relevant to UDP-glusuronosyl/UDP-glucosyltransferase (IPR002213), and D1_62433793 relevant to light harvesting complex photosystem II, exhibited positive additive and dominance effects on the oil content, respectively.

Table 4 Candidate genes significantly associated with seven cottonseed traits

QTS A7_2266630, which was associated with the oleic acid and had larger additive heritability than the dominance heritability, is located in the intron of Gh_D07G0133, encoding Armadillo-like helical (IPR011989), Uncharacterised protein domain (IPR012978) and Armadillo-type fold (IPR016024). Another QTS, D1_1087912 that exhibited positive dominance effect (d = 0.478), is located in the CDS of Gh_D01G0153, which encodes Pentatricopeptide repeat (IPR002885), Tetratricopeptide-like helical domain superfamily (IPR011990) and DYW domain (IPR032867). D5_66975738 that was significantly associated with the palmitic acid by negative additive effect (a =  − 0.256, −log10p = 10.59)is located in the CDS of Gh_A04G0196. Pleiotropic A7_642514, affecting both the palmitic acid and the myristic acid, is located in the CDS of Gh_A07G0057, which encodes IPR001806, IPR002041, IPR003577, IPR003578, IPR003579, IPR005225, IPR020849 and IPR027417. For the stearic acid, A4_17627308 is located in the intron of Gh_A03G0465 and its annotated functions are transferase (IPR003480) and Chloramphenicol acetyltransferase-like domain superfamily (IPR023213); A2_25310067 is located in the CDS of Gh_A02G0110, whose function is Calreticulin/calnexin (IPR001580), Calreticulin/calnexin, P domain superfamily (IPR009033), Calreticulin (IPR009169), Concanavalin A-like lectin/glucanase domain superfamily (IPR013320), Calreticulin/calnexin, conserved site (IPR018124); in addition, there are other three candidate genes (Gh_D13G1997, Gh_Sca004725G01 and Gh_A06G1283) which are involved in the regulation on the stearic acid.

Prediction of superior genotype for seven cottonseed traits

In order to assess the potential of these identified QTSs in genetic and molecular improvement of cottonseed traits, we conducted molecular design and genotypic value evaluation on the general superior homozygous line (GSL), the general superior hybrid line (GSH), the environment-specific superior homozygous line (SL) and the environment-specific superior hybrid line (SH), based on genetic effects of QTSs [39].

On the genetic architecture of seven cottonseed traits, genotypic values were consistently observed across three environments for the protein and the palmitic acid, because of no significant gene × environment interaction, whereas the genotypic values of other 5 traits varied (Additional file 5: Table S4). Obviously, both the pure line (homozygous genotype: QQ, qq, and SL) and the hybrid line (SH) possessed potential to improve 7 cottonseed traits, and hybrid line exhibited more advantages than the pure line in terms of the range of genetic values achieved in designed lines. The oleic acid and the myristic acid had the highest SH and the highest SL value in environment 1 or in environment 2 respectively. Analogously, the linoleic acid and the stearic acid, influenced by both the additive × environment and the dominance × environment interaction effects, could achieve the highest SH and SL values in environment 2 or in environment 3. Different from other traits influenced by gene × environment interaction, the genotypic values of both SL and SH (SL(+) and SH(+)) remained constant across 3 environments for the oil content.

Because there was no significant gene × environment interaction for the protein content and the palmitic acid, the designed superior genotypes (GSL(+), GSH(+)) based on genetic main effects of QTSs would keep unchanged in every environment (Table 5 and Additional file 6: Table S5). Notably, although the oil content, the oleic acid, and the myristic acid were influenced by gene × environment interaction, the same superior homogenous genotypes for all three environments could be designed, as well as superior hybrid genotypes, for each trait, respectively. To design QTS genotype for synchronous improvement of multiple cottonseed traits, it is evaluable to classify all other detected QTSs into two groups, one consists of non-pleiotropic QTS and the other consists of pleiotropic QTS. The designed genotypes for each QTS residing in the regions of annotated genes were presented in the upper part above the middle split line of the Table 5 and the lower part for the pleiotropic QTSs. The Table 5 showed that selecting AA at the A7_1504479, TT at the D1_616439, and AA at the A5_22579901 could achieve the maximum genetic value of the protein content by the superior homozygous lines; similarly, GG at the A2_58832915, and AA at the A3_100487624 would be preferred for homozygous lines for the oil content, GG at the D5_66975738 for the palmitic acid, AA at the A13_128192242 for the myristic acid; however, GG and AG at the D1_62433793 were optimal for the superior homozygous lines and the superior hybrids for the oil content, respectively, and similar designs could be found at A7_2266630, A12_120581335, and D1_1087912 for the oleic acid.

Table 5 The genotypes achieving the maximum genetic value in designed lines at the QTSs in annotated genes and the QTSs with pleiotropic effects

On the linoleic acid and the stearic acid, the designed genotypes at most QTSs exhibited greater consistency across different environments except QTS D3_1889546 for the linoleic acid and 7 QTSs (A2_25310067, A4_135747886, A4_17627308, A12_78651650, D5_44746794, D10_5643096, D10_30598593) (Table 4, Additional file 6: Table S5) for the stearic acid. On the stearic acid, the design of superior genotype was more complex due to gene-environment interaction. For example, CC at A2_25310067 was preferred for achieving the maximum genetic value of the stearic acid for the superior homozygous lines (GSL, SL) and the superior hybrids (GSH, SH) in environment 1 and 3, whereas CT was better in environment 2 for the superior hybrids.

Pleiotropic genes make it more difficult to conduct synchronous improvement of multiple crop traits. Out of 11 pleiotropic QTSs (Table 5), the genotypes of 5 QTSs (D3_35705563, A4_16656838, A6_26471461, A12_117532394 and D9_7944), exhibited identical across superior lines (GSL, SL, GSH, SH) for same trait, but completely different between correlated traits. For example, CC at D3_35705563 was preferred for achieving the maximum genetic value of the protein content, but completely different genotype TT for the oil content, which was in concert with the negative genetic correlation between two traits. Selecting TT at D3_29047260 was a better choice to improve the linoleic acid, and the oleic acid simultaneously. On selection of epistatic effect between A7_642514 and D4_10313468, TT at A7_642514 together with CC at D4_10313468 was preferred for both the palmitic acid and the myristic acid; likewise, selecting GG at D5_47644432 together with GG or AA at D4_21291786 for the oleic acid and for the linoleic acid, respectively, was preferred. TT in superior homogenous design (GSL, SL) and TC in superior hybrid design (GSH, SH) were preferred in all environments for the oleic acid.

Myristic acid and palmitic acid could increase the level of low-density lipoprotein (LDL) cholesterol in serum, it is desired to reduce the components of myristic acid and palmitic acid in cottonseed for human health; thus, the optimal genotypes are expected to lower myristic acid and palmitic acid of cottonseed, which are provided in the Additional file 7: Table S6. Such information on designed genotypes will bring new insight into the molecular improvement of cottonseed traits.


Mixed linear model approach has been successfully applied in linkage mapping of complex traits, including seed traits in crops [32, 40, 41]. Because of the advantages of the mixed linear model approach in aggregating total genetic effects of numerous genetic variants scattered in the whole genome and dealing with covariates, such as age, sex and population stratification, many tools have been developed based on mixed linear model for estimating total genetic heritability and performing genome wide association mapping [42,43,44,45]. However, genetic effects due to epistasis or gene × environment interaction (interactions of one gene or two paired genes with environment) are ignored in most of these methods, such as the method proposed by Lü et al. [45], whose model ignored the interaction of epistasis with environment [42,43,44,45]. More recently, the method proposed by Zhang et al. [46] included effects of additive, dominance, epistasis and their interaction effects with environment in a mixed linear model to conduct genome wide association mapping and the corresponding software QTXNetowork has been available ( Thus, our study employed this new method and software to explore intricate genetic architecture of cottonseed traits. Although relatively small compared with the main effects of an individual locus (e.g., additive and dominance) for most of the cottonseed traits studied, significant epistatic effects and gene-environment interactions were detected, justifying the effectiveness of the method we employed.

It should be noted that we first performed filtering on total ~ 390 K SNPs using the generalized multifactor dimensionality reduction (GMDR) [47, 48], and then the filtered SNPs were used in association mapping by the QTXNetwork. The effectiveness of this strategy has been simulated and confirmed recently [49]. Considering the influence of potential population stratification on association mapping, we conducted population stratification analysis based on 203,021 unlinked SNPs and two subpopulations were found in the 316 accessions (Additional file 1: Figure S2). Under the assumption of two subpopulations existent in our cotton population, chi-square test was applied to investigate significance of difference in genotypic frequency between subpopulations at the QTSs detected by our strategy. It was observed that the p-values of 120 of 124 QTSs were not significant after Bonferroni adjustment (p-value < 2.46 × 10− 7); However, 4 QTSs (A11_110829220, A13_21415280, D10_30598593 and D10_5643096) had significant difference in frequency between subpopulations.

It has been well documented that epistasis and gene by environment interaction are mostly involved in the genetic architecture of most agronomic, yield and quality traits in crops. Liu et al. reported that 12 QTLs were detected for the protein content using IF2 population, of which 6 QTLs interacted with environment [7]. Zeng et al. (2016) investigated the association of SNPs in GhSus family genes with fiber- and seed-related traits in 277 upland cotton accessions by epistatic association mapping (EAM) model. For cottonseed-related traits, one main-effect quantitative trait nucleotides (QTNs) and one epistatic QTN were found to be associated with seed oil content and protein content respectively, but no significant one-dimensional gene by environment interaction was found [29]. In our study, in addition to additive or dominance effect, we also detected significant epistatic effects on the oil content and the protein content; however, significant epistasis × environment interaction effect was detected only on the oil content and not on the protein content. These results may be due to the difference in environmental factors and genetic materials employed in these studies.

Bioinformatics analysis showed that the QTS A7_1504479, is located in the CDS of gene Gh_A07G0108, and associated with protein domains relevant to resistance mechanism in rice, maize, etc.; thus, further study on Gh_A07G0108 may reveal more information about resistance mechanism of cotton. Many studies have detected negative correlation between the protein content and the oil content in cottonseed and other oil crops, such as in soybean and sesame [8, 12, 14, 50]. Our study detected the key QTS D3_35705563 controlling these two traits by additive and dominance effects simultaneously. Both the oleic acid and the linoleic acid have medical efficacy in lowering LDL-cholesterol, thus they could reduce one’s risk of cardiovascular disease. Oleic acid could be converted into linoleic acid by the delta-12 fatty acid desaturase gene [51, 52], this genetic relationship might be related to 3 significant pleiotropic QTSs (D3_1889546, D3_29047260, D4_21291786) which affect both traits by additive effect, dominance effect, epistatic effect, as well as gene × environmental effect.

Currently, some studies have reported that variation of complex traits result from non-coding regulatory variants more frequently than from coding sequence polymorphisms [53]. Similarly, our study also found that most QTSs accounting for significant heritability are located in non-coding region and play a key role in dissection of the genetic regulation mechanism and the relationship among traits. Further studies, including the functional verification of annotated genes through molecular experiments and the QTL-GWAS joint analysis will be necessary for us to identify the real causal variants.


In summary, this study has detected significant epistasis and gene by environment interactions in the genetic architecture of seven cottonseed traits using association analysis based on a QTS full model and high density SNPs in cotton. The results on the genetic information of the distinguished variants and the superior genotype design provide new vision on the molecular mechanism and insight into marker-assisted selection (MAS) strategy on efficient molecular improvement of the traits in cotton.


Plant materials and phenotyping

316 tetraploid cotton accessions were selected according to the diversity in geographical origins, phenotypic variation, which were provided by the National Cotton Mid-term GeneBank (Anyang, Henan, China). In 2009, all accessions were planted with three replicates in 3 regions of three China provinces (E1 = Anyang of Henan (114.35E,36.10N), E2 = Kuche of Xinjiang (82.97E,41.68N), E3 = Nanjing of Jiangsu (118.46E,32.03N). The seeds of each replicate were collected and the protein, the oil (total of all fatty acid) and its components including oleic acid, linoleic acid, palmitic acid, myristic acid and stearic acid were tested three times each replicate. The names of all 316 accessions and the summary statistics of seven seed traits were presented in the supplement file (Additional file 8: Fig. S1, Additional file 9: Table S1).

The seed kernel proteins of 316 accessions were determined by Kjeldahl method referred to National food safety standard determination of protein in foods (GB 5009.5–2010). The seed kernel oil was measured by reference to animal and vegetable fats and oils--determination of moisture and volatile matter content (GB/T 9696–2008/ISO662:1998). The contents of seed kernel fatty acids including almitic, linoleic, oleic, myristic, stearic etc. in each accessions were assayed by gas chromatography (GC) using a 7890A chromatograph (Agilent tech., Wilmington, USA) according to the national standard for animal and vegetable fats and oils analysis by gas chromatography of methyl esters of fatty acids (GN/T17377–2008/ISO5508:1990, IDT) and national standard for fatty acid test in the food (GB 5009.168–2016). This measurement was carried out by the Laboratory of Quality & Safty Risk Assessment for Oilseed Products (Wuhan), MOA, PR China. Since the absolute concentration is not the crucial factor for the fatty acids in 316 accession population, the internal normalization method was employed as a quantitative tool for relative amounts estimation of fatty acids analysis in this research.

The peaks of different fatty acids were estimated by the reference for retention time and relative retention time of certain fatty acids. The reference for retention time (min) of the myristic acid, the palmitic acid, the stearic acid, the oleic acid and the linoleic acid, are 32.45, 38.81, 42.27, 44.38, and 47.73, respectively. The calculation for fatty acid components as follows, and the absolute difference of two experiment replications is less than 10% of arithmetic mean value,

$$ {X}_i=\raisebox{1ex}{${A}_i\times 100$}\!\left/ \!\raisebox{-1ex}{$\sum A$}\right. $$

where X i is the content of a certain ingredient (%), A i is the peak area of a certain ingredient, ∑A is the total area of all ingredients.

Genotyping by sequencing

Total genomic DNA was extracted from the leaf tissues of 316 accessions using CTAB method with some modifications; the sequencing on all the accessions was conducted in 2013. The paired-end reads were mapped to the reference genome of G. arboreum (v1.0, and G. raimondii (v2.0, by the software BWAv0.5.927. SNP calling procedure was carried out with realSFS (v0.983, based on the Bayesian estimation. SNPs matching the following criteria were removed for quality control: (1) sequencing depth is greater than 6500× or less than 60×; (2) the distance between two adjacent SNP loci is shorter than 5 bp; (3) the call rate is less than 70% in the whole population; (4) the proportion of its heterozygous genotypes is more than 30%; and (5) minor allele frequency (MAF) is less than 0.01. Eventually, a total of 390,612 SNPs were obtained for association analysis.

Population structure analysis

The software PLINK [25] was applied in calculation of LD measured by the average pairwise correlation coefficient (r2), and selection of unlinked SNPs. The possible population structure was analyzed by the fastStructure software [54]. After excluding SNPs that have a r2 value larger than 0.1 with any other SNP within a 50-SNP sliding window, a total of 203,021 SNPs were retained. The hypothetical number of subpopulations (K) from 1 to 10 was performed at five independent run iterations using 203,021 unlinked SNPs by the fastStructure. The output of the fastStructure was used to estimate K according to the method described by Raj et al. [54].

Association mapping

Mixed linear approach was used in the GWAS on seven cottonseed traits of 316 accessions. The statistical model consists of the effects of environment, additive, dominance, additive × additive, additive × dominance, dominance × additive, and dominance × dominance epistatic effects, as well as their interactions with environments. The genetic effects were treated as fixed effects, and the environment effects and gene-environment interactions as random effects. Suppose one surveyed trait is controlled by s loci (QTSs), then, the phenotype y kh of the k-th genotype in the h-th environment can be expressed by following mixed linear model:

$$ {\displaystyle \begin{array}{c}{y}_{kh}=\mu +\sum \limits_i{a}_i{x}_{A_{ik}}+\sum \limits_i{d}_i{x}_{D_{ik}}+\sum \limits_{i<j}{aa}_{ij}{x}_{A{A}_{ij k}}+\sum \limits_{i<j}{ad}_{ij}{x}_{A{D}_{ij k}}+\sum \limits_{i<j}{da}_{ij}{x}_{D{A}_{ij k}}+\sum \limits_{i<j}{dd}_{ij}{x}_{D{D}_{ij k}}\\ {}+{e}_h+\sum \limits_i{ae}_{ih}{u}_{A_{ik}{E}_h}+\sum \limits_i{de}_{ih}{u}_{D_{ik} Eh}+\sum \limits_{i<j}{aa e}_{ij h}{u}_{A{A}_{ij k}{E}_h}+\sum \limits_{i<j}{ad e}_{ij h}{u}_{A{D}_{ij k}{E}_h}\\ {}+\sum \limits_{i<j}{da e}_{ij h}{u}_{D{A}_{ij k}{E}_h}+\sum \limits_{i<j}{dd e}_{ij h}{u}_{D{D}_{ij k}{E}_h}+{\varepsilon}_{kh}\end{array}} $$

Where μ is the population mean; a i and d i are the additive and the dominance effects of the i-th QTS, with coefficients \( {x}_{A_{ik}}\left(=1,0,-1\right) \) and \( {x}_{D_{ik}}\left(=0,1,0\right) \) for the QTS genotype QQ, Qq and qq, respectively; aa ij , ad ij , da ij and dd ij are the additive-additive, the additive-dominance, the dominance-additive, the dominance-dominance epistatic effects between the i-th QTS and the j-th QTS, with coefficients \( {x}_{AA_{ijk}}={x}_{A_{ik}}\cdot {x}_{A_{jk}} \), \( {x}_{AD_{ijk}}={x}_{A_{ik}}\cdot {x}_{D_{jk}} \), and \( {x}_{DD_{ijk}}={x}_{D_{ik}}\cdot {x}_{D_{jk}} \), respectively; \( {e}_h\sim \left(0,{\sigma}_E^2\right) \) is the random effect of the h-th environment; \( {ae}_{ih}\sim \left(0,{\sigma}_{A_iE}^2\right) \) and \( {de}_{ih}\sim \left(0,{\sigma}_{D_iE}^2\right) \) are the interaction effects of the additive, the dominance of the i-th QTS and the h-th environment, random, with coefficients \( {u}_{A_{ik}{E}_h}\left(={x}_{A_{ik}}\right) \) and \( {u}_{D_{ik}{E}_h}\left(={x}_{D_{ik}}\right) \), respectively; \( {aae}_{ij h}\sim \left(0,{\sigma}_{AA_{ij}E}^2\right) \), \( {ade}_{ij h}\sim \left(0,{\sigma}_{AD_{ij}E}^2\right) \), \( {dae}_{ij h}\sim \left(0,{\sigma}_{DA_{ij}E}^2\right) \) and \( {dde}_{ij h}\sim \left(0,{\sigma}_{DD_{ij}E}^2\right) \) are the interaction effects of four epistasis components of the i-th QTS and j-th QTS with h-th environment, random, with coefficients \( {u}_{AA_{ijk}{E}_h}\left(={x}_{AA_{ijk}}\right) \), \( {u}_{AD_{ijk}{E}_h}\left(={x}_{AD_{ijk}}\right) \), \( {U}_{DA_{ijk}{E}_h}\left(={x}_{DA_{ijk}}\right) \) and \( {u}_{DD_{ijk}{E}_h}\left(={x}_{DD_{ijk}}\right) \), respectively; \( {\varepsilon}_{kh}\sim \left(0,{\sigma}_{\varepsilon}^2\right) \) is the residual effect, random.

The software QTXNetwork, developed for the mixed linear model-based association analysis method by Zhang et al. [46], was employed to analyze the above model. Since the number and the positions of all significant SNPs involved in variation of the surveyed trait are unknown, the QTXNetwork first use 1-dimensional and 2-dimensional genome scanning to distinguish all candidate SNPs with significant single-locus effects and all paired SNPs with significant epistatic effects; finally, a full QTS model was established to estimate all quantitative trait SNP (QTS) parameters. The detail of this mapping strategy can refer to the paper by Zhang et al. [46].

In order to alleviate enormous computational burden and exponentially increased multiple testing for detecting epistasis, we used generalized multifactor dimensionality reduction method (GMDR) [47] and the corresponding software GMDR-GPU to screen potential trait-associated SNPs [48]; In order to screen SNPs potentially associated with the traits as many as possible, two different scanning strategies for GMDR analysis were separately conducted, at the level of three dimensional interaction, for each traits; one is ignoring environmental factor and the other is including environmental factor as a covariate in the GMDR model. Then, all screened SNPs by two strategies were used in association analysis by the QTXNetwork software [46]. In order to control the experiment-wise type I error rate, 1000 permutation testing for each SNP and each pair of SNPs were employed to determine the empirical threshold value of the F-statistic at the experiment significance level of 0.05. Finally, a full QTS model was established by all significant QTSs and the genetic effects of QTSs were predicted using a Monte Carlo Markov Chain method with 20,000 Gibbs sampler iterations.



Epistatic association mapping


Generalized multifactor dimensionality reduction


Genome-wide association study


Marker-assisted selection.


Quantitative trait loci


Quantitative trait nucleotides


Quantitative trait SNP


  1. Davis LC. Modification of oil content in cottonseed using chemical mutagenesis: Texas Tech University; 2015.

  2. Li MH, Robinson EH. Use of cottonseed meal in aquatic animal diets: a review. N Am J Aquac. 2006;68(1):14–22.

    Article  Google Scholar 

  3. Wang X, Zhang H, Wu S, Yue H, Wang J, Li J, Qi G. Dietary protein sources affect internal quality of raw and cooked Shell eggs under refrigerated conditions. Asian Australas J Anim Sci. 2015;28(11):1641.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. O’Brien R, Wakelyn P. Cottonseed oil: an oil for trans-free options. Inform. 2005;16(11).

  5. Rashid U, Anwar F, Knothe G. Evaluation of biodiesel obtained from cottonseed oil. Fuel Process Technol. 2009;90(9):1157–63.

    Article  CAS  Google Scholar 

  6. Alfred Q, Liu HY, Xu HM, Li JR, Wu JG, Zhu SJ, Shi CH. Mapping of quantitative trait loci for oil content in cottonseed kernel. J Genet. 2012;91(3):289–95.

    Article  PubMed  CAS  Google Scholar 

  7. Liu H, Quampah A, Chen J, Li J, Huang Z, He Q, Shi C, Zhu S. QTL analysis for gossypol and protein contents in upland cottonseeds with two different genetic systems across environments. Euphytica. 2012;188(3):453–63.

    Article  CAS  Google Scholar 

  8. Yu J, Yu S, Fan S, Song M, Zhai H, Li X, Zhang J. Mapping quantitative trait loci for cottonseed oil, protein and gossypol content in a Gossypium hirsutum× Gossypium barbadense backcross inbred line population. Euphytica. 2012;187(2):191–201.

    Article  Google Scholar 

  9. Shang L, Abduweli A, Wang Y, Hua J. Genetic analysis and QTL mapping of oil content and seed index using two recombinant inbred lines and two backcross populations in upland cotton. Plant Breed. 2016;135(2):224–31.

    Article  CAS  Google Scholar 

  10. Liu H, Quampah A, Chen J, Li J, Huang Z, He Q, Shi C, Zhu S. QTL mapping with different genetic systems for nine non-essential amino acids of cottonseeds. Mol Gen Genomics. 2017;292(3):671–84.

    Article  CAS  Google Scholar 

  11. Huang X, Han B. Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol. 2014;65:531–51.

    Article  PubMed  CAS  Google Scholar 

  12. Hwang E-Y, Song Q, Jia G, Specht JE, Hyten DL, Costa J, Cregan PB. A genome-wide association study of seed protein and oil content in soybean. BMC Genomics. 2014;15(1):1.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Wei X, Liu K, Zhang Y, Feng Q, Wang L, Zhao Y, Li D, Zhao Q, Zhu X, Zhu X. Genetic discovery for oil production and quality in sesame. Nat Commun. 2015;6:8609.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Badigannavar A, Myers GO. Genetic diversity, population structure and marker trait associations for seed quality traits in cotton (Gossypium hirsutum). J Genet. 2015;94(1):87–94.

    Article  PubMed  CAS  Google Scholar 

  15. Lee S-B, Kaittanis C, Jansen RK, Hostetler JB, Tallon LJ, Town CD, Daniell H. The complete chloroplast genome sequence of Gossypium hirsutum: organization and phylogenetic relationships to other angiosperms. BMC Genomics. 2006;7(1):61.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Wang K, Wang Z, Li F, Ye W, Wang J, Song G, Yue Z, Cong L, Shang H, Zhu S. The draft genome of a diploid cotton Gossypium raimondii. Nat Genet. 2012;44(10):1098–103.

    Article  PubMed  CAS  Google Scholar 

  17. Zhu YX, Li FG. The Gossypium raimondii genome, a huge leap forward in cotton genomics. J Integr Plant Biol. 2013;55(7):570–1.

    Article  PubMed  Google Scholar 

  18. Li F, Fan G, Wang K, Sun F, Yuan Y, Song G, Li Q, Ma Z, Lu C, Zou C. Genome sequence of the cultivated cotton Gossypium arboreum. Nat Genet. 2014;46(6):567–72.

    Article  PubMed  CAS  Google Scholar 

  19. Li F, Fan G, Lu C, Xiao G, Zou C, Kohel RJ, Ma Z, Shang H, Ma X, Wu J. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol. 2015;33(5):524–30.

    Article  PubMed  CAS  Google Scholar 

  20. Zhang T, Hu Y, Jiang W, Fang L, Guan X, Chen J, Zhang J, Saski CA, Scheffler BE, Stelly DM. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol. 2015;33(5):531–7.

    Article  PubMed  CAS  Google Scholar 

  21. Byers RL, Harker DB, Yourstone SM, Maughan PJ, Udall JA. Development and mapping of SNP assays in allotetraploid cotton. Theor Appl Genet. 2012;124(7):1201–14.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Hulse-Kemp AM, Lemm J, Plieske J, Ashrafi H, Buyyarapu R, Fang DD, Frelichowski J, Giband M, Hague S, Hinze LL. Development of a 63K SNP Array for Cotton and High-Density Mapping of Intra-and Inter-Specific Populations of Gossypium spp. G3: Genes| Genomes| Genetics. 2015;g3:115.018416.

    Google Scholar 

  23. Yang J, Hu C, Hu H, Yu R, Xia Z, Ye X, Zhu J. QTLNetwork: mapping and visualizing genetic architecture of complex traits in experimental populations. Bioinformatics. 2008;24(5):721–3.

    Article  PubMed  CAS  Google Scholar 

  24. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5.

    Article  PubMed  CAS  Google Scholar 

  25. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Beaty TH, Ruczinski I, Murray JC, Marazita ML, Munger RG, Hetmanski JB, Murray T, Redett RJ, Fallin MD, Liang KY. Evidence for gene-environment interaction in a genome wide study of nonsyndromic cleft palate. Genet Epidemiol. 2011;35(6):469–78.

    PubMed  PubMed Central  Google Scholar 

  27. Liu Y, Maxwell S, Feng T, Zhu X, Elston RC, Koyutürk M, Chance MR. Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data. BMC Syst Biol. 2012;6(Suppl 3):S15.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Xu S. Mapping quantitative trait loci by controlling polygenic background effects. Genetics. 2013;195(4):1209–22.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Zeng Y-D, Sun J-L, Bu S-H, Deng K-S, Tao T, Zhang Y-M, Zhang T-Z, Du X-M, Zhou B-L. EcoTILLING revealed SNPs in GhSus genes that are associated with fiber-and seed-related traits in upland cotton. Sci Rep. 2016;6:29250.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010;42(11):961–7.

    Article  PubMed  CAS  Google Scholar 

  31. Riedelsheimer C, Czedik-Eysenberg A, Grieder C, Lisec J, Technow F, Sulpice R, Altmann T, Stitt M, Willmitzer L, Melchinger AE. Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat Genet. 2012;44(2):217–20.

    Article  PubMed  CAS  Google Scholar 

  32. Abdurakhmonov I, Kohel R, Yu J, Pepper A, Abdullaev A, Kushanov F, Salakhutdinov I, Buriev Z, Saha S, Scheffler B. Molecular diversity and association mapping of fiber quality traits in exotic G. hirsutum L. germplasm. Genomics. 2008;92(6):478–87.

    Article  PubMed  CAS  Google Scholar 

  33. Jia Y, Sun X, Sun J, Pan Z, Wang X, He S, Xiao S, Shi W, Zhou Z, Pang B. Association mapping for epistasis and environmental interaction of yield traits in 323 cotton cultivars under 9 different environments. PLoS One. 2014;9(5):e95882.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Guo M, Yang A, Zhou C, Liu X. The new understanding of Arabidopsis thaliana proteins associated with salinity. J Plant Interact. 2012;7(4):348–55.

    Article  CAS  Google Scholar 

  35. Wang J, Chen L, Wang Y, Zhang J, Liang Y, Xu D. A computational systems biology study for understanding salt tolerance mechanism in rice. PLoS One. 2013;8(6):e64929.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Chung C-L, Jamann T, Longfellow J, Nelson R. Characterization and fine-mapping of a resistance locus for northern leaf blight in maize bin 8.06. Theor Appl Genet. 2010;121(2):205–27.

    Article  PubMed  CAS  Google Scholar 

  37. Firmino AAP, de Assis Fonseca FC, de Macedo LLP, Coelho RR, de Souza JDA Jr, Togawa RC, Silva-Junior OB, Pappas-Jr GJ, da Silva MCM, Engler G. Transcriptome analysis in cotton boll weevil (Anthonomus grandis) and RNA interference in insect pests. PLoS One. 2013;8(12):e85079.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Zhang Q-J, Zhu T, Xia E-H, Shi C, Liu Y-L, Zhang Y, Liu Y, Jiang W-K, Zhao Y-J, Mao S-Y. Rapid diversification of five Oryza AA genomes associated with rice adaptation. Proc Natl Acad Sci. 2014;111(46):E4954–62.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Yang J, Zhu J. Methods for predicting superior genotypes under multiple environments based on QTL effects. Theor Appl Genet. 2005;110(7):1268–74.

    Article  PubMed  Google Scholar 

  40. Qi T, Cao YJ, Cao LY, Gao YM, Zhu SJ, Lou XY, Xu HM. Dissecting genetic architecture underlying seed traits in multiple environments. Genetics. 2015;199(1):61–71.

    Article  PubMed  Google Scholar 

  41. Qi T, Jiang B, Zhu Z, Wei C, Gao Y, Zhu S, Xu H, Lou X. Mixed linear model approach for mapping quantitative trait loci underlying crop seed traits. Heredity. 2014;113(3):224–32.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Yang JA, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–U131.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Yang JA, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Zhang ZW, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu JM, Arnett DK, Ordovas JM, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42(4):355–U118.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Lü H-Y, Liu X-F, Wei S-P, Zhang Y-M. Epistatic association mapping in homozygous crop cultivars. PLoS One. 2011;6(3):e17773.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Zhang F-T, Zhu Z-H, Tong X-R, Zhu Z-X, Qi T, Zhu J. Mixed linear model approaches of association mapping for complex traits based on omics variants. Sci Rep. 2015;5:10298.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Lou XY, Chen GB, Yan L, Ma JZ, Zhu J, Elston RC, Li MD. A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am J Hum Genet. 2007;80(6):1125–37.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Zhu Z, Tong X, Zhu Z, Liang M, Cui W, Su K, Li MD, Zhu J. Development of GMDR-GPU for gene–gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes. PLoS One. 2013;8(4):e61943.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Xu HM, Jiang BB, Cao YJ, Zhang YX, Zhan XD, Shen XH, Cheng SH, Lou XY, Cao LY. Detection of epistatic and gene-environment interactions underlying three quality traits in Rice using high-throughput genome-wide data. Biomed Res Int. 2015;2015(2):135782.

    PubMed  PubMed Central  Google Scholar 

  50. Li C, Miao H, Wei L, Zhang T, Han X, Zhang H: Association mapping of seed oil and protein content in Sesamum indicum L. using SSR markers. 2014.

    Google Scholar 

  51. Liu Q, Singh S, Green A. High-oleic and high-stearic cottonseed oils: nutritionally improved cooking oils developed using gene silencing. J Am Coll Nutr. 2002;21(sup3):205S–11S.

    Article  PubMed  CAS  Google Scholar 

  52. Chapman KD, Neogi PB, Hake KD, Stawska AA, Speed TR, Cotter MQ, Garrett DC, Kerby T, Richardson CD, Ayre BG. Reduced oil accumulation in cottonseeds transformed with a nonfunctional allele of a Delta-12 fatty acid desaturase (). Crop Sci. 2008;48(4):1470–81.

    Article  CAS  Google Scholar 

  53. Wang YH, Zhang HY, Zhang CL, Chen H, Fang XT, Zhang YS, Hou SS. The fear gene Stathmin alleles generated Heterosis on feed efficiency parameters in Peking ducks. Anim Biotechnol. 2012;23(4):233–40.

    Article  PubMed  CAS  Google Scholar 

  54. Raj A, Stephens M, Pritchard JK. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics. 2014;197(2):573–89.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


The design of the study, cotton material collection, field experiment and sequencing were supported by grants from the National Key Research and Development Program of China (2016YFD0100203); data analysis, interpretation and manuscript writing were supported by grants from the National Natural Science Foundation grants 31671570, 31470083, and the National Science Foundation grant DMS1462990.

Availability of data and materials

All supporting data can be found within the manuscript and its additional files.

Author information

Authors and Affiliations



XD and HX conceived the ideas; JS, YJ, ZP, SH, BP, LW, WG, SX, JL, WS, JM, and GS conducted the field trials and collected the data; GZ, HX, QX, ZQ, SL, XL and JZ analyzed the data. The manuscript was written by SL and improved by XD, JJ, XL and HX. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xiongming Du or Haiming Xu.

Ethics declarations

Ethics approval and consent to participate

This study does not contain any research requiring ethical consent or approval.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Figure S2. Population structure of 316 cotton accessions based on pruned unlinked SNPs. (DOC 69 kb)

Additional file 2:

Figure S3. Genome-wide average LD decay estimated based on the population of 316 cotton accessions. (DOC 58 kb)

Additional file 3:

Table S3. The other genome-wide significant QTSs associated with five fatty acids. (DOC 185 kb)

Additional file 4:

Table S2. Phenotypic and genotypic correlation between seven cottonseed traits. (DOC 58 kb)

Additional file 5:

Table S4. Predicted genetic values (G) for QQ, qq, Qq, superior homozygous line (SL), and superior hybrids (SH) of seven cottonseed traits. (DOC 114 kb)

Additional file 6:

Table S5. The SNP genotypes achieving the maximum genetic value of seed traits in designed superior homozygous lines and hybrids at the QTSs absent in the Table 5. (DOC 168 kb)

Additional file 7:

Table S6. The SNP genotypes achieving the minimum genetic value of seed traits in designed lines for QTSs (DOC 210 kb)

Additional file 8:

Figure S1. The observed phenotype distribution of seven traits. (DOC 83 kb)

Additional file 9:

Table S1. The information of 316 accessions and the summary statistics of seven seed traits (DOC 321 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, X., Liu, S., Sun, J. et al. Dissection of complicate genetic architecture and breeding perspective of cottonseed traits by genome-wide association study. BMC Genomics 19, 451 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: