- Research article
- Open Access
Genome-wide association mapping of resistance to a Brazilian isolate of Sclerotinia sclerotiorum in soybean genotypes mostly from Brazil
BMC Genomicsvolume 18, Article number: 849 (2017)
Sclerotinia Stem Rot (SSR), caused by the fungal pathogen Sclerotinia sclerotiorum, is ubiquitous in cooler climates where soybean crops are grown. Breeding for resistance to SSR remains challenging in crops like soybean, where no single gene provides strong resistance, but instead, multiple genes work together to provide partial resistance. In this study, a genome-wide association study (GWAS) was performed to dissect the complex genetic architecture of soybean quantitative resistance to SSR and to provide effective molecular markers that could be used in breeding programs. A collection of 420 soybean genotypes were selected based on either reports of resistance, or from one of three different breeding programs in Brazil, two commercial, one public. Plant genotype sensitivity to SSR was evaluated by the cut stem inoculation method, and lesion lengths were measured at 4 days post inoculation.
Genotyping-by-sequencing was conducted to genotype the 420 soybean lines. The TASSEL 5 GBSv2 pipeline was used to call SNPs under optimized parameters, and with the extra step of trimming adapter sequences. After filtering missing data, heterozygosity, and minor allele frequency, a total of 11,811 SNPs and 275 soybean genotypes were obtained for association analyses. Using a threshold of FDR-adjusted p-values <0.1, the Compressed Mixed Linear Model (CMLM) with Genome Association and Prediction Integrated Tool (GAPIT), and the Fixed and Random Model Circulating Probability Unification (FarmCPU) methods, both approaches identified SNPs with significant association to disease response on chromosomes 1, 11, and 18. The CMLM also found significance on chromosome 19, whereas FarmCPU also identified significance on chromosomes 4, 9, and 16.
These similar and yet different results show that the computational methods used can impact SNP associations in soybean, a plant with a high degree of linkage disequilibrium, and in SSR resistance, a trait that has a complex genetic basis. A total of 125 genes were located within linkage disequilibrium of the three loci shared between the two models. Their annotations and gene expressions in previous studies of soybean infected with S. sclerotiorum were examined to narrow down the candidates.
Many advances are being made in soybean (Glycine max (L.) Merrill) breeding that have been reflected by continual increased production worldwide . For example, in the US there has been a steady approximate 1.9% average increase in yield (kg/ha) per year from 1960 to 2015 (http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1290/). However, although yields are increasing, yields are still vulnerable to a variety of environmental restrictions, such as the constraint caused by the constant attack by pests and pathogens. The pathogen Sclerotinia sclerotiorum (Lib.) de Bary is one such yield-limiting pathogen, causing the disease Sclerotinia Stem Rot (SSR), and causing considerable damage in regions that have had several weeks of cool wet weather during the flowering period .
Resistance to S. sclerotiorum is not complete in most dicotyledon crops such as soybean. Instead, resistance in soybean to SSR is only partial, with multiple alleles providing a small amount of enhanced defense. Due to this lack of adequate resistance in commercial varieties, and the heavy influence of weather conditions, losses due to SSR can be zero one year, but then account for nearly half of all disease-associated losses the next, as seen in Iowa 2003 and 2004 . Even though complete resistance to SSR is currently not available for soybean, there are clear differences in the susceptibility of tested genotypes [4,5,6,7,8,9,10,11]. And although not ideal, partial resistance to SSR does increase the yield potential, as slowing of the disease progression can be successful enough to allow plants to recover with minimal damage, especially if the weather changes to be more favorable to the plant, and less favorable to the pathogen.
Molecular studies looking into genes that are differentially expressed during SSR development in soybean identified thousands of genes responding within the first 24 h [12, 13]. These results, in addition to those of physiological and biochemical studies of S. sclerotiorum infection [14,15,16], show that S. sclerotiorum – plant molecular interactions are very complex, involving numerous pathogen-released proteins and oxalic acid for the plant to cope with . The complexity of SSR partial resistance has led many researchers to use QTL analyses to identify markers associated with SSR resistance. Some early QTL mapping studies used biparental populations with limited genetic variation, or with populations of limited size [9, 18,19,20,21]. But due to limits of mapping studies involving only a few genotypes, and due to the subtle effect of these QTLs, it has been a challenge to find markers consistent enough to be satisfactorily applied to marker-assisted selection on a commercial scale.
In recent years the use of next generation sequencing technologies has led to a drastic reduction in the cost of genotyping, facilitating the identification of vast numbers of SNPs from large numbers of genotypes that can then be used for more accurate association mapping [22, 23]. The technique of genotype-by-sequencing (GBS) is one such approach to rapidly and cheaply produce a large number of single base polymorphisms (SNPs) based on the comparison of restriction fragment sequences that are held in common within the population [24, 25]. In this sense, studies of genomic association, proposed by Meuwissen et al. , are analyzed based on the evaluation of a large number of markers widely distributed throughout the genome. This type of association mapping involves the search for genotype-phenotype correlations in unrelated individuals and is often faster and more profitable than traditional biparental mapping . Genotypic and phenotypic data are collected from a population in which kinship is not strongly controlled by the researcher, and correlations between genetic markers and phenotypes are sought within this population. In this context, the Genome Wide Association Studies (GWAS) may be a promising strategy for the identification of QTLs for genes of interest. Several GWAS reports have been published recently on soybean response to S. sclerotiorum, highlighting the common view that GWAS is a valuable approach to identify key genes and regions providing some enhanced resistance to this importance disease. Using GWAS to look at soybean response to SSR, Iquira et al.  found SNPs of high significance on chromosomes 1, 3, 8, and 20 with the strongest being on 1; Bastein et al.  found significance on chromosomes 1, 15, 19, and 20 with the strongest being on 15. By measuring chemical changes to pigments that might be associated with SSR resistance , Zhao et al.  found significance on chromosomes 6, 10, and 13 with the strongest being on 13. Of these two pathogen-based studies conducted by the same lab [28, 29], only two of the six QTL were identified in both studies, highlighting that conducting only one GWAS is not 100% effective in identifying all the variable nature of the SSR disease interaction, and that there is a need for additional analyses to identify more SNPs that are significantly associated with enhanced SSR resistance.
In this manuscript we present the results of another GWAS of soybean response to SSR, with a focus on genotypes from three different SSR resistance breeding programs in Brazil, and with the use of a Brazilian isolate for inoculations. Previously published SSR GWAS studies were conducted on genotypes largely used in breeding programs in Canada, US, and China, and with pathogen isolates collected in the Northern hemisphere [28, 29, 31]. Additionally, we compared significance using the GWAS analysis methods CMLM  and FarmCPU , as these programs handle testing markers, population structure, and kinship differently. The results provide soybean breeders with additional SNPs that they can have high confidence are associated with enhanced resistance to SSR and that could be used in marker-assisted selection.
Plant material, inoculation, and phenotypic scoring
Soybean [Glycine max (L.) Merrill] plants were grown in 500 ml plastic cups filled with Bioflora® (Bioflora LTDA-ME, Prata, MG, Brazil) organic plant growth substrate based on pine, other natural fibers, minerals and nutrient enriched, in a greenhouse at approximately 25–30 °C under natural light cycle in Uberlandia Brazil from December 2015 through February 2016 (approximately 12 h days). At least nine seeds were planted for each genotype. Soybean genotypes originated from the breeding programs of FT Sementes (Ponta Grossa, PR, Brazil), Tropical Melhuramente & Genetica (TMG, Cambé, PR, Brazil), or the Federal Universidade de Uberlandia (MG, Brazil), as well as various plant introductions (PIs) of the Embrapa Soja collection chosen from the literature. Immature shoot tips were removed at the V2 stage, and within 2–4 days, the plants were taken to the lab and cut just below the second trifoliate node. The freshly cut stems were inoculated with 2–3 day old S. sclerotinorum cultures of isolate Jataí  grown on potato dextrose agar plates for cut stem inoculation, a method shown to have low variability , with mycelial plugs picked up with a 200 μl pipette tip, similar to the straw test . Plants were placed in an 18–20 °C, low light, growth chamber immediately after inoculation. After 4 days incubation, plants were phenotyped by measuring the length of necrotic lesion in centimeters.
DNA extraction, GBS library construction and sequencing
Total DNA was extracted using a CTAB based method as previously described . DNA integrity was verified on an agarose gel. Samples that were determined to be intact and of high quality, were pipetted into 96 well plates. Aliquots were removed to new plates to quantify with picogreen (Molecular Probes, Eugene Oregon, USA) on a Synergy HT (BioTek, Winooski, Vermont, USA) microplate reader. Based on both DNA amounts and plant samples, 352 samples were chosen, together with 32 controls (10 water blanks and 22 random repeats) for GBS library construction. GBS library construction was based on the method described in Poland et al. (2012) . In summary, after adjusting DNA concentrations to be approximately 50 ng/μl, five microliters (250 ng) were pipetted into a new set of 96-well plates containing 2.5 μl 0.1 μM specific DNA barcoded HindIII adaptors. Pretesting of restriction enzyme pairs PstI-HinP1I, PstI-MspI, PstI-MseI, HindIII-HinP1I, HindIII-MspI, and HindIII-MseI indicated that the HindIII-MseI pair gave the best digestion results of having an even smear, with a broad peak near 300 nt (Additional file 1: Figure S1). Samples were therefore restriction digested using 1 U HindIII and 1 U MseI in a buffer mix at 37 °C two hours, followed by 80 °C for 20 min (all enzymes and buffers used for the GBS library construction were purchased from New England Biolabs, Ipswich, Massachusetts, USA). Following digestion, a common MseI adaptor was added, in addition to 40 U T4 Ligase and 1 mM ATP and 2× Cutsmart buffer were added, and the samples incubated at 25 °C for 2 h, followed by a 65 °C incubation for 20 m. After ligation, 8 μl of each well from each row were collected in a strip of 8 PCR tubes. Then 25 μl from each of the tubes of a strip were transferred to a 1.5 ml microfuge tube, totaling 200 μl that originated from one 96-well plate. Then 50 μl of samples were cleaned using Agencourt AMPure XP beads (Beckman Coulter Life Sciences, Indianapolis Indiana, USA), dried, and suspended in 15 μl a resuspension buffer. PCR enrichment was performed using master mix containing Illumina primers and NEB Phusion Master Mix. The following PCR settings were used for the enrichment reaction: 98 °C 30s, 15 cycles (98 °C 10s, 68 °C 30s, 72 °C 30s), 72 °C 5 m, 4 °C forever. After PCR, samples were purified using the Agencourt AMPure XP beads, and then pooled such that all the wells of a single 96-plate were now represented in a single microcentrifuge tube, and run on a Bioanalyzer 2100 (Agilent, Santa Clara, CA) using a DNA7500 chip to verify correct size, general success of amplification, and estimation of DNA amounts. Concentrations were adjusted to have approximately 10 nM DNA in 10 mM TRIS-HCl, 0.05% Tween-20 and run on a single lane of an Illumina HiSeq4000 using a HiSeq SBS sequencing kit version 4 at the Roy J. Carver Biotechnology Center on the University of Illinois (Urbana, IL) campus. The Fastq sequence files were demultiplexed with bcl2fastq v220.127.116.11 conversion software (Illumina).
Processing the data from the Illumina sequence readings and selection of SNPs
The TASSEL 5 GBS v2 SNP-calling pipeline , together with the soybean reference genome assembly, Gmax_Wm82.v2 , were used for SNP identification. Prior to running TASSEL, the adapter sequences were removed from reads using the software Cutadapt . To run the TASSEL pipeline, the default settings were used, except for the following parameters: minimum base quality score (mnQS 20), minimum mapping quality score (minMAPQ 20), and kmer length (kmerLength 80). The alignment step was performed using BWA-MEM  with default settings. SNPs with more than 50% missing data were removed, as were genotypes with more than 75% missing SNPs, prior to the imputation step, which was accomplished using Beagle 4.1  with the parameter window = 1000, overlap = 200, and ne = 1000. After imputation, SNPs or genotypes with higher than 10% heterozygosity were removed from the dataset.
Genome-wide association analysis
For the association analysis of SNPs to phenotypes, the compressed mixed linear model (CMLM) was selected and performed within the GAPIT package of R, which assigned similar individuals into 259 groups to estimate the kinship matrix [30, 39]. The reduced kinship matrix and three principal components (PCs) generated from principal component analysis (PCA) (Additional file 2: Figure S2) were included in the model to control population structure and individual relatedness. A batch effect was also included in the model as a covariate to control for possible variation between different phenotyping dates. The SNPs with a minor allele frequency (MAF) higher than 0.01 were used to estimate the population structure and the kinship. Only SNPs with a MAF higher than 0.1 were used for association tests. The cutoff of significant association was a False Discovery Rate (FDR) adjusted p-value less than 0.1 using the Benjamini and Hochberg procedure to control for multiple testing .
Another computational method named FarmCPU  was also implemented to do an association analysis, which separated the mixed linear model into a fixed effect model and a random effect model to reduce false negatives that might result from confounding population structure, kinship, and SNPs. The same batch effect and three PCs generated from GAPIT were included as covariates. Likewise, only SNPs with a minor allele frequency higher than 0.1 were used for association tests, and the cutoff for significant association was a FDR-adjusted p-value less than 0.1.
In addition to the two main models described above, three other models were also performed to test marker-trait associations. They were a naïve model without any control for population structure, a general linear model (GLM) with three PCs, and a mixed linear model (MLM) with three PCs and a kinship matrix without any compression.
Linkage disequilibrium (LD) analysis
A function built into the TASSEL 5.0 program was used to determine squared allele-frequency correlations (r2) between pairs of SNPs with MAF greater than 0.1. The local LD was viewed, and LD plots were built, using Haploview 4.2 . The LD blocks were defined with the Confidence Interval methods and the default parameters . The LD decay plot was built based on the r2 values and distances between each pair of SNPs. To calculate the physical distance of LD decay (r2 < 0.2), a nonlinear model was used to estimate the expected values of E(r2) [44, 45].
Plants that germinated and grew healthily were used for DNA extraction and phenotyping. Most (108 of the 352 total utilized) genotypes were represented by nine plants, 86 samples had eight plants, 52 had seven, 48 had six, 26 had five, 20 had four, and only eight genotypes relied on just three plants. Following stem inoculation with S. sclerotiorum, the size distribution of necrotic lesions after 4 days incubation ranged from 0.67 to 8.43 cm, and all genotypes developed a lesion, even the most resistant genotypes, indicating that an infection had successfully occurred in all the plants assayed. Samples were removed from analysis if: the standard deviation of their phenotypic values was greater than 2.0 cm (19 genotypes), the range of their major and minor phenotypic values was greater than 5.0 cm (five genotypes), viral infected or miss-labeled (four genotypes). The remaining 324 genotypes had an approximate normal distribution of phenotypes (Additional file 3: Figure S3) and an average lesion length of 4.65 cm (Additional file 4: Table S1), and were used in for SNP analysis. The genotypes that were the most resistant, including genotypes like EMGOPA 316 , PI194639 [8, 9], and NK S19–90 which have been reported to have partial resistance, are shown in Additional file 5: Table S2. The most susceptible, including genotypes like BRSGO Ipameri, BRS RAISSA RES 1/3 NCS, and MSOY 7908 RR which have previously been shown to be very susceptible , are listed in Additional file 6: Table S3.
The HindIII and MseI restriction enzyme double digestions produced a wide peak of fragments in the range of 300–500 bp with little strong banding (Additional file 1: Figure S1F), and therefore this enzyme pair was used for construction of the GBS libraries. The HindIII adaptors were barcoded, allowing all the libraries to be pooled and sequenced on a single lane of an Illumina flow cell, yielding 373 million high-quality, single-stranded readings of 100 nucleotides in length.
With the TASSEL 5 GBS v2 SNP-calling pipeline, 68,242 SNPs were identified. After filtering for less than 50% missing datas 14,637 SNPs remained. After imputation and removing high heterozygosity, 12,411 SNPs remained. Among them, 11,811 SNPs met the MAF threshold at 0.01, and 6478 SNPs at MAF >0.1. The 11,811 SNPs spread across all chromosomes (Fig. 1), with chromosome 18 containing the most SNPs (1179) and chromosome 5 the fewest (276) SNPs. Filtering of genotypes that had missing data at 75% or more of the SNP sites, or that showed more than 10% heterozygous SNPs, reduced the samples to 275 genotypes for use in association of SNPs to phenotypes. The phenotypes of these 275 genotypes were fairly normally distributed (Fig. 2) and the most resistant (Table 1) and most susceptible (Table 2) are listed.
Association analysis using SNP data from the TASSEL 5 GBS v2 SNP-calling pipeline
A principal component (PC) analysis was performed with the 11,811 SNPs with a MAF >0.01 and the 275 soybean genotypes to estimate the population structure. Based on the resulting scree plot (Additional file 2: Figure S2), the first three PCs were included in the association analysis. PC1 explained 10.3% of the genetic variation, PC2 explained 7.8%, and PC3 explained 5.5%. The PC analysis scatter plot (Fig. 3) showed that the first and second PCs were composed largely of three subpopulations of genotypes originiaing from different sources. The subpopulation “LAGER-UFU” was more obviously separated while the subpopulations ‘FT-Sementes’ and ‘TMG’ were more close to each other. Soybean genotypes in the ‘miscellaneous’ group were collected from different sources, including many that were selected based on the literature.
Four different models were implemented and compared in the marker-trait association tests. Q-Q plots were generated to visualize the effectiveness of controlling population structure and familial relatedness (Fig. 4). The expected distribution of p-values is a uniform [0,1] distribution under the assumption that no associations exist between SNP markers and the trait. The naïve model and the GLM model resulted in 38.7% and 23.1% of SNPs with p-values <0.05, respectively and it can be seen from the Q-Q plot that both models produced a large proportion of p-values that deviated from the expected distribution, which indicated an excess of false positives. Compared to the naïve model and the GLM, the CMLM and the FarmCPU resulted in 3.8% and 4.9% of SNPs with p-values <0.05, indicating the inflation of p-values was reduced and supported the appropriateness of including population structure and kinship matrix in the model. In addition, FarmCPU yielded most of the SNPs fitting the expected distribution of p-values with a few SNPs exhibiting high significance (higher significance than CMLM).
Because the CMLM and FarmCPU both seemed appropriate for analyzing the association between the soybean SNPs and SSR resistance, the association results from CMLM (conducted in the R package GAPIT) and FarmCPU (conducted in the R package FarmCPU) were compared. Setting the FDR-corrected p-value cutoff at 0.05, six SNPs (on chromosomes 1, 4, 9, 11, 16, and 18) were detected as significant by FarmCPU, with each significant SNP explaining approximately 2% to 3.2% of the total phenotypic variation (Table 3). The CMLM analysis did not identify any SNPs as significantly associated at the 0.05 FDR-corrected p-value cutoff. But loosening the strict threshold to an FDR-corrected p-value <0.1, a total of 19 SNPs (on chromosomes 1, 11, 18 and 19) showed significant associations with CMLM, where each SNP explained 3.2% to 4.4% of the total phenotypic variation (Table 3). No additional SNPs were significant with FarmCPU at the <0.1 cutoff. The three most significant SNPs detected by FarmCPU (S1_36,045,483, S11_6,493,121 and S18_14,327,556) were also deemed significant by CMLM, as shown by the Manhattan plot (Fig. 5 and Table 3). However, the SNPs on chromosomes 4, 9 and 16 that were identified with FarmCPU as significant, were not significant by CMLM, although weak signals could be seen at those sites on the CMLM-produced Manhattan plot (Fig. 5). A single SNP on chromosome 19 determined by CMLM did not pass the significance threshold with the FarmCPU, but there was a weak peak on the FarmCPU-produced Manhattan plot (Fig. 5). The Manhattan plots generated by the two models look different for the significant loci, as the LD associated with the SNPs affected whether there would be a string of hits (on the CMLM plot) or a single spot (on the FarmCPU plot) as FarmCPU removed all the SNPs in LD, and these SNPs in LD were kept by the CMLM analysis (Fig. 5).
LD and candidate genes analysis of three major loci
Because both CMLM and FarmCPU analyses identified significant SNPs on chromosome 1, 11 and 18, these three loci were considered to be the most significant of the study, and further analyses were therefore performed on them. LD plots were generated for these three loci to examine the local LD blocks. Seven significant SNPs on chromosome 1 were located within a LD block of 1738 kb from 35 to 36.8 Mb, and all SNPs in this LD block showed moderate to high r2 values from 0.4 to 1 (Fig. 6), indicating that these seven SNPs were highly associated, and therefore might have the same causal site(s) that contributed to enhanced resistance. The significant SNP on chromosome 11 lied within an LD block of 822 kb (Fig. 7). However, since the LD block was determined by D’ values, the SNP at 5711798 bp showed fairly low LD with the other three SNPs in the block considering r2 values (r2 < 0.1). Three SNPs from 6.2 Mb to 6.5 Mb showed high r2 values with each other, suggesting the r2-based LD block here was about 0.3 Mb (Fig. 7). Another large (1194 kb) LD block was located around the most significant SNP on chromosome 18, from 13.3 Mb to 14.5 Mb. By the CMLM analysis, this block contained ten significant SNPs that exhibited very high pairwise r2 values >0.9 (Fig. 8).
The soybean reference genome  was queried to identify the genes that predicted to be within the LD blocks. The 1.7 Mb LD block around the SNP peak on chromosome 1 contained 36 genes, the 0.3 Mb LD block on chromosome 11 contained 35 genes, and the 1.1 Mb LD block on chromosome 18 contained 54 genes (Additional file 7: Table S4).
The number of genes within the LD blocks on chromosomes 1, 11, and 18 totalled 125. The translations of the putative coding sequences of these genes were aligned by blastx to the nr database of NCBI  to identify possible gene function (Additional file 7: Table S4). The genes with the LD blocks were also checked for their expression pattern in a microarray experiment involving soybean transcriptome response to S. sclerotiorum infection . Although most of the genes showed moderate to no expression change in response to infection, 34 of the 125 genes were identified as being differentially expressed (inoculated vs mock) during the first 36 h post inoculation (hpi) (Fig. 9). There were sixteen genes with significant differential expression between infected samples and mock treated samples in at least one time point, making them more promising candidates for genes potentionally involved in defense efforts. Several of the genes in the microarray study (Fig. 9) showed a much stronger expression during S. sclerotiorum infection than the others. The gene with the strongest differential expression was Glyma.01G106000, that encodes a tau class glutathione S-transferase; it had the strongest expression differences in both genotypes used in that sudy, at 12, 24, and 36 hpi, and is about 189.0 kb away from the peak SNP on chromosome 1, S1_36,045,483. Three genes on chromosome 11, Glyma.11G084000, Glyma.11G084200, and Glyma.11G086600, that are in LD with the peak SNP S11_6,493,121, also showed fairly strong induction of expression after infection compared to other genes. Other genes that were part of the microarray study did not change any more than 2 fold in response to inoculation (Fig. 9).
The study was successful in identifying SNPs significantly associated with soybean defense against SSR, a disease that is increasing in importance worldwide, and whose resistance QTL are very challenging to map due to the high phenotypic variability of the disease, and the minor contribution from each QTL. The use of GWAS with thousands of SNP markers, improves the ability to identify QTL with higher statistical significance, and several other research groups have already applied GWAS to identify loci associated with SSR responses [28, 29, 31].
GWAS succeeded in identifying new QTL associated with SSR
In this presented GWAS of soybean resistance to SSR, six QTL were identified using the FarmCPU computation, and four with the CMLM computation, with an overlap of three QTL identified by SNPs: S1_36,045,483, S11_6,493,121, and S18_14,327,556. The study did not have good overlap with the QTL identified in the other recent GWAS studies, except the possible overlap of our QTL on chromosome 1 near 35.5 Mb, to the QTL near 27.7 Mb (position updated to the same version of the soybean reference genome sequence) of Bastein et al. (2014). The QTL on chromosome 19 was only identified as significant by CMLM, but it might be the same identified by Vuong et al. (2008) where a QTL for SSR resistance in PI194639 was bordered by Satt495 (Chr19:650,674) and Satt388 (Chr19:4,212,645) [9, 48]. This lack of overlapping QTL identification between different experiments, might reflect the subtle effects of each QTL for SSR resistance or the high variability in phenotyping this disease. The lack of QTL overlap could also be due to the use of different soybean genotypes, the use of different pathogen isolates that were collected on different continents, or the use of different inoculation methods or treatments. Among the other GWAS, two of the studies inoculated flowers and measured disease progression down the stem seven days later [28, 29], and the other study treated soybeans with oxalic acid released by S. sclerotinia, at 40 mM, and measured pigment induction in response to this treatment 48 h later . In the present study, young seedlings were inoculated on freshly cut stems. Therefore, it is not too surprising that the QTL identified in each study could differ.
How CMLM and FarmCPU compare
The model chosen to conduct GWAS should be carefully decided based on species and traits being studied. For this study of soybean, it involved a plant that is self-pollinated and with high LD across the genome, and resistance to SSR is underlied by multiple genes with small effects. The analysis tested two different methods, CMLM and FarmCPU, for marker-trait association. The Q-Q plot showed both CMLM and FarmCPU controlled inflated p-values due to population structure better than the other more naïve models. The FarmCPU method identified more significant loci and with higher levels of significance for the most significant SNPs.
The mixed linear model has been widely used in GWAS studies on soybean and in a variety of other crops as it was shown to largely reduce the spurious associations resulting from both population structure and unequal individual relatedness [49,50,51,52,53]. Based on the MLM, the CMLM implemented in the GAPIT package to deal with large and computational challenging datasets by clustering individuals into groups in the kinship matrix [32, 54]. Although popular for GWAS, in some cases the CMLM can be insufficient due to the confounding between the population structure, kinship, and different testing SNPs, which could potentially lead to compromise of true postives. It was reported that the recently developed method FarmCPU mitigated this problem and led to both increased statistical power and reduced false positives . FarmCPU implements a fixed effect model that contains the testing markers and multiple associated markers as covariates, and a random model that contains the kinship matrix. These steps are performed separately but optimize each other iteratively. So compared to CMLM in which the kinship matrix remains constant for all the markers, FarmCPU adjusts its kinship based on the testing markers and covariates in the fixed effect model. Moreover, since the CMLM only tests one marker at a time, other associated loci nearby or elsewhere in the genome will sometimes disrupt with the test and resulted in false positives or false negatives, especially when the effects of the other loci are large . Therefore, FarmCPU puts selected associated markers as covariates and tests multiple markers simultaneously, thus improving control of both false positives and false negatives . This could be illustrated by comparing the Manhattan plots (Fig. 5) where “spikes” consisting of multiple SNPs in LD are present with CMLM, but only single SNPs, representing the most significant association, appear with FarmCPU. However, FarmCPU has a weakness in that removing the significant SNPs in LD with the peak SNPs, reduces information, and these SNPs in LD help to confirm the existence of an truly associated locus. Considering upsides and downsides, it is beneficial to implement different models to do association studies to increase confidence in the overlapping loci, and to also increase the possibility of identifying unique novel loci.
Examining LD and genes around the significant SNPs
Soybean has a relatively high LD genome-wide, and LD increases extensively in the pericentromere regions [28, 56, 57]. In our study, seven and ten significant SNPs on chromosome 1 and 18 were found to be in megabase-level LD blocks in pericentromeric regions, where genes are sparsely distributed according to the soybean reference genome. The size of the LD blocks could also be affected by the density of the SNP markers. For example, the closest marker next to the LD block from 35,045,463 bp to 36,783,951 bp on chromosome 1 was 1 Mb away, at 34050470 bp, so there were no SNPs indicating where the LD block terminated in this 1 Mb region. A higher density SNP map would help refine the large LD, and perhaps narrow the region of interest.
Looking at the genes that are located in LD with the SNPs, and combining soybean gene expression data after S. sclerotiorum infection, can increase the confidence in identifying candidate SSR defense-associated genes. Two of the more promising genes in LD with the QTL identified on chromosome 1 based on gene expression were Glyma.01G104100 and Glyma.01G106000. Likewise, the most promising genes on chromosome 11 were three that showed the strongest induction in response to S. sclerotiorum infection: Glyma.11G084000, Glyma.11G084200, and Glyma.11G086600. The genes in LD with the significant SNPs on chromosome 18, were weakly differentially expressed, with the strongest ones, like Glyma.18G113400 and Glyma.18G117400, showing about a 2-fold induction.
Of the genes that are in LD with the significant SNPs on chromosome 1, and whose expression were induced by S. sclerotiorum infection, Glyma.01G106000 was one of the most interesting as it was strongly induced over all time points and samples, and also significantly induced after Pseudomonas syringae infection on soybean . Glyma.01G106000 encodes a tau class glutathione S- transferase (GST) protein, which may be involved in the process of xenobiotic detoxification, reduction of organic hydroperoxides, or oxidative protection [59, 60]—all common needs during plant-pathogen interactions. Overexpression of a rice tau-class GST increased the tolerance to salinity and oxidative stress in Arabidopsis thaliana and tobacco, which may be due to the lower accumulation of reactive oxygen species [61, 62]. Another interesting defense-related gene near the QTL on chromosome 1 is Glyma.01G104100 which encodes an isochorismate synthase, the key enzyme in the synthesis of salicylic acid. This gene was not strongly induced by infection, and was actually reducing over time (Fig. 9). However, salicylic acid is a very important regulatory signal in plant microbe interactions [63, 64], especially for the host-related cell-death pathways [65, 58]. Inducing cell death is believed to be a beneficial strategy that necrotrophic pathogens utilize, and S. sclerotiorum has been shown to manipulate host cell death to its advantage . That salicylic acid is known to induce cell death, and that S. sclerotiorum benefits from enhanced cell death, expression of this isochorismate synthase might enhance susceptibility. Interestingly, the allele present in both of the genotypes of the microarray study is predicted to have a negative effect on defense according to the GWAS results.
Looking at the genes within LD of the significant SNPs on chromosome 11, most of the genes seem to be related to primary metabolism. Although not directly associated with defense, primary metabolism may also affect the outcome of plant pathogen interactions, so the involvement of one of these genes on defense cannot be dismissed. Gene Glyma.11G084200 was strongly induced and putatively encodes a GRIP-like protein. GRIP proteins was shown to target the golgi , but their potential function in plant-pathogen interactions is unknown.
The genes within LD of the significant SNPs on chromosome 18 were more weakly differentially expressed than some of those genes discussed on chromosome 1 and 11. Two genes are associated with cell wall modification, Glyma.18G116400 (a probable polygalacturonase) and Glyma.18G117100 (a cellulose synthase). Although their expression is not significantly affected by S. sclerotiorum infection, plant cell wall degrading enzymes play an important role in SSR disease . Another gene of interest within this region on chromosome 18 is Glyma.18G113400, which encodes a putative Myb transcription factor, increased in expression after S. sclerotiorum infection along the time course, from 12 to 36 h (Fig. 9). It was shown that Myb transcription factors can be involved in various biological processes in plants, including response to biotic stresses [67, 59].
Using different genetic materials and inoculation methods than the other recent SSR GWAS reports, we succeeded to identify new QTL associated with soybean resistance to SSR disease caused by S. sclerotiorum. The study identified four or six SSR resistance QTL, depending on GWAS computational analysis performed. Three of the QTL (on chromosomes 1, 11, and 18) were detected by both methods, giving more confidence that they are signicantly associated with soybean defense to SSR disease. Looking at the genes within the LD blocks of the significant QTL allowed for prediction of candidate defense-associated genes responsible for the enhanced resistance. One of the most interesting genes in these LD blocks was Glyma.01G104100, which encodes an isochorismate synthase, and could play a role in regulating host cell death pathways. Further experimentation will be needed to verify the relevance of the identified SNPs and putative gene functions in the soybean- S. sclerotiorum interaction.
Fixed and random model Circulating Probability Unification
Genome Association & Prediction Integrated Tool
Genotype by Sequencing
Genome Wide Association Study
Quantitative Trait Locus/Loci
Rincker K, Nelson R, Specht J, Sleper D, Cary T, Cianzio SR, Casteel S, Conley S, Chen P, Davis V, et al. Genetic improvement of U.S. soybean in maturity groups II, III, and IV. Crop Sci. 2014;0(0):0.
Sinclair JB, Hartman GL. Sclerotinia stem rot. In: Hartman GL, Rupe JC, Sikora EJ, Domier LL, Davis JA, Steffey KL, editors. Compendium of soybean diseases and pests, 5th edn. St. Paul: American Phytopathological Society; 2015. p. 59–62.
Wrather JA, Koenning SR. Estimates of disease effects on soybean yields in the United States 2003 to 2005. J Nematol. 2006;38:173–80.
Grau CR, Radre VL, Gillespie FL. Resistance of soybean cultivars to Sclerotinia sclerotiorum. Plant Dis. 1982;66:506–8.
Boland GJ, Hall R. Evaluating soybean cultivars for resistance to Sclerotinia sclerotiorum under field conditions. Plant Dis. 1987;71:934–6.
Kim HS, Sneller CH, Diers BW. Evaluation of soybean cultivars for resistance to Sclerotinia stem rot in field environments. Crop Sci. 1999;39:64–8.
Kim HS, Hartman GL, Manandhar JB, Graef GL, Steadman JR, Diers BW. Reaction of soybean cultivars to Sclerotinia stem rot in field, greenhouse, and laboratory evaluations. Crop Sci. 2000;40:665–9.
Vuong TD, Hartman GL. Evaluation of soybean resistance to Sclerotinia stem rot using reciprocal grafting. Plant Dis. 2003;87:154–8.
Vuong TD, Diers BW, Hartman GL. Identification of QTL for resistance to Sclerotinia stem rot in soybean plant introduction 194639. Crop Sci. 2008;48(6):2209.
Garcia RA, Juliatti FC. Avaliação da resistência da soja a Sclerotinia sclerotiorum em diferentes estádios fenológicos e períodos de exposição ao inóculo. Tropical Plant Pathology. 2012;37(3):196–203.
Castro LHS, Figueiro AA, Nogueira APO, Clough SJ, Fernando CJ. Resistance of soybean genotypes to Sclerotinia sclerotiorum isolates in different incubation environments. Genet Mol Res. 2016;15(4):gmr15049061.
Calla B, Blahut-Beatty L, Koziol L, Zhang Y, Neece DJ, Carbajulca D, Garcia A, Simmonds DH, Clough SJ. Genomic evaluation of oxalate-degrading transgenic soybean in response to Sclerotinia sclerotiorum infection. Mol Plant Pathol. 2014;15(6):563–75.
Calla B, Vuong T, Radwan O, Hartman GL, Clough SJ. Gene expression profiling soybean stem tissue early response to Sclerotinia sclerotiorum and in silico mapping in relation to resistance markers. Plant Genome. 2009;2(2):149–66.
Kabbage M, Yarden O, Dickman MB. Pathogenic attributes of Sclerotinia sclerotiorum: switching from a biotrophic to necrotrophic lifestyle. Plant Sci. 2015;233:53–60.
Jamir Y, Guo M, H-S O, Petnicki-Ocwieja T, Chen S, Tang X, Dickman MB, Collmer A, Alfano JR. Identification of Pseudomonas syringae type III effectors that can suppress programmed cell death in plants and yeast. Plant J. 2004;37:554–65.
Cessna SG, Sears VE, Dickman MB, Low PS. Oxalic acid, a pathogenicity factor for Sclerotinia sclerotiorum, suppresses the oxidative burst of the host plant. Plant Cell. 2000;12:2191–9.
Wei W, Clough SJ. Sclerotinia sclerotiorum molecular aspects in plantpathogenic interactions. In: Dalio RJD, editor. RAPP Revisão Anual de Patologia de Plantas. vol. 24. Brasilia: Sociedade Brasileira de Fitopatologia; 2016. p. 174–89.
Kim HS, Diers BW. Inheritance of partial resistance to Sclerotinia stem rot in soybean. Crop Sci. 2000;40:55–61.
Hoffman DD, Diers BW, Hartman GL, Nickell CD, Nelson RL, Pedersen WL, Cober ER, Graef GL, Steadman JR, Grau CR, et al. Selected soybean plant introductions with partial resistance to Sclerotinia sclerotiorum. Plant Dis. 2002;86:971–80.
Arahana VS, Graef GL, Specht JE, Steadman JR, Eskridge KM. Identification of QTLs for resistance to Sclerotinia sclerotiorum in soybean. Crop Sci. 2001;41:180–8.
Guo X, Wang D, Gordon SG, Helliwell E, Smith T, Berry SA, St. Martin SK. Dorrance AE: genetic mapping of QTLs underlying partial resistance to in soybean PI 391589A and PI 391589B. Crop Sci. 2008;48(3):1129.
Bernardo R, Yu J. Prospects for Genomewide selection for quantitative traits in maize. Crop Sci. 2007;47(3):1082.
Faleiro FG. Aplicaçòes de marcadoresmoleculares como ferramenta auxiliar em programas de conservaçào, caracterizaçào, e uso de germplasma e melhoramento genético vegetal. In: Faleiro FG, Andrade SRM, Planaltina SFBR, editors. Biotecnologia - estado da arte e aplicaçòes na agricultura. Empraba Cerrados: DF, Brasil; 2011. p. 730.
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011;6(5):e19379.
Poland JA, Brown PJ, Sorrells ME, Jannink JL. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS One. 2012;7(2):e32253.
Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.
Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang Z, Costich DE, Buckler ES. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell. 2009;21(8):2194–202.
Iquira E, Sonah H, Belzile F. Association mapping of QTLs for sclerotinia stem rot resistance in a collection of soybean plant introductions using a genotyping by sequencing (GBS) approach. BMC Plant Biol. 2015;15:5.
Bastien M, Sonah H, Belzile F. Genome wide association mapping of resistance in soybean with a genotyping-by-sequencing approach. The Plant Genome. 2014;7(1):0.
Wegulo SN, Yang XB, Martinson CA. Soybean cultivar responses to Sclerotinia sclerotiorum in field and controlled environment studies. Plant Dis. 1998;82:1264–70.
Zhao X, Han Y, Li Y, Liu D, Sun M, Zhao Y, Lv C, Li D, Yang Z, Huang L, et al. Loci and candidate gene identification for resistance to Sclerotinia sclerotiorum in soybean (Glycine max L. Merr.) via association and linkage maps. Plant J. 2015;82(2):245–55.
Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, Gore MA, Buckler ES, Zhang Z. GAPIT: genome association and prediction integrated tool. Bioinformatics. 2012;28(18):2397–9.
Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 2016;12(2):e1005767.
Kull LS, Vuong TD, Powers KS, Eskridge KM, Steadman JR, Hartman GL. Evaluation of resistance screening methods for Sclerotinia stem rot of soybean and dry bean. Plant Dis. 2003;87:1471–6.
Petzoldt R, Dickman M. Straw test for resistance to white mold in beans. Annual report of the Bean Improvement Cooperative. 1996;39:142–3.
Dellaporta SL, Wood J, Hicks JB. A plant DNA minipreparation: version II. Plant Mol Biol Report. 1983;1:19–21.
Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q, Buckler ES. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One. 2014;9(2):e90346.
Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463(7278):178–83.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. In: arXiv:13033997v2; 2013.
Browning BL, Browning SR. Genotype imputation with millions of reference samples. Am J Hum Genet. 2016;98(1):116–26.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–5.
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggert M, et al. The structure of haplotype blocks in the human genome. Science. 2002;296:2225–9.
Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, Kresovich S, Goodman MM, Buckler ES. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A. 2001;98:11479–84.
Weir BS, Hill WG. Am J Hum Genet. 1986;38:776–81.
Phytozome [https://phytozome.jgi.doe.gov/pz/portal.html]. Accessed Apr 2017.
Blastx at NCBI [https://blast.ncbi.nlm.nih.gov/Blast.cgi]. Accessed Apr 2017.
SoyBase and the Soybean Breeder's Toolbox [https://www.soybase.org/search/index.php]. Accessed Apr 2017.
Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES. Association mapping across numerous traits reveals patterns of functional variation in maize. PLoS Genet. 2014;10(12):e1004845.
Lipka AE, Gore MA, Magallanes-Lundback M, Mesberg A, Lin H, Tiede T, Chen C, Buell CR, Buckler ES, Rocheford T, et al. Genome-wide association study and pathway-level analysis of tocochromanol levels in maize grain. G3 (Bethesda). 2013;3(8):1287–99.
Wei L, Jian H, Lu K, Filardo F, Yin N, Liu L, Qu C, Li W, Du H, Li J. Genome-wide association analysis and differential expression analysis of resistance to Sclerotinia stem rot in Brassica Napus. Plant Biotechnol J. 2016;14(6):1368–80.
Chang HX, Brown PJ, Lipka AE, Domier LL, Hartman GL. Genome-wide association and genomic prediction identifies associated loci and predicts the sensitivity of tobacco ringspot virus in soybean plant introductions. BMC Genomics. 2016;17:153.
Crowell S, Korniliev P, Falcao A, Ismail A, Gregorio G, Mezey J, McCouch S. Genome-wide association and high-resolution phenotyping link Oryza Sativa panicle traits to numerous trait-specific QTL clusters. Nat Commun. 2016;7:10527.
Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42(4):355–60.
Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, Nordborg M. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012;44(7):825–30.
Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J, Li W, Yu Y, Shu L, Zhao Y, Ma Y, et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol. 2015;33(4):408–14.
Hyten DL, Choi IY, Song Q, Shoemaker RC, Nelson RL, Costa JM, Specht JE, Cregan PB. Highly variable patterns of linkage disequilibrium in multiple soybean populations. Genetics. 2007;175(4):1937–44.
Spoel SH, Dong X. How do plants achieve immunity? Defence without specialized immune cells. Nat Rev Immunol. 2012;12(2):89–100.
Ambawat S, Sharma P, Yadav NR, Yadav RC. MYB transcription factor genes as regulators for plant responses: an overview. Physiol Mol Biol Plants. 2013;19(3):307–21.
Kilili KG, Atanassova N, Vardanyan A, Clatot N, Al-Sabarna K, Kanellopoulos PN, Makris AM, Kampranis SC. Differential roles of tau class glutathione S-transferases in oxidative stress. J Biol Chem. 2004;279(23):24540–51.
Jha B, Sharma A, Mishra A. Expression of SbGSTU (tau class glutathione S-transferase) gene isolated from Salicornia brachiata in tobacco for salt tolerance. Mol Biol Rep. 2011;38:4823. doi:10.1007/s11033-11010-10625-x.
Sharma R, Sahoo A, Devendran R, Jain M. Over-expression of a rice tau class glutathione s-transferase gene improves tolerance to salinity and oxidative stresses in Arabidopsis. Plos One. 2014;9(3):e92900.
Wildermuth MC, Dewdney J, Wu C, Ausubel FM. Isochorismate synthase is required to synthesize salicylic acid for plant defence. Nature. 2001;414:562–5.
Klessig DF, Durner J, Noad R, Navarre DA, Wendehenne D, Kumar D, Zhou JM, Shah J, Zhang S, Kachroo P, et al. Nitric oxide and salicylic acid signaling in plant defense. Proc Natl Acad Sci U S A. 2000;97(16):8849–55.
Mazel A, Levine A. Induction of cell death in arabidopsis by superoxide in combination with salicylic acid or with protein synthesis inhibitors. Free Radic Biol Med. 2001;30:98–106.
Stefano G, Renna L, Hanton SL, Chatre L, Haas TA, Brandizzi F. ARL1 plays a role in the binding of the GRIP domain of a peripheral matrix protein to the Golgi apparatus in plant cells. Plant Mol Biol. 2006;61(3):431–49.
Raffaele S, Rivas S. Regulate and be regulated: integration of defense and other signals by the AtMYB30 transcription factor. Front Plant Sci. 2013;4:98.
Authors would like to thank Larissa Miguel for assisting with some of the plant care, inoculations, and DNA extractions; and to thank University of Illinos professors Patrick Brown and Alexander Lipka, and Ph.D. graduate Marcio Pais de Arruda, for valuable discussions related to GBS and GWAS analyses. Figure 1b kindly provided by Sufei Wang, PhD candidate, University of Illinois. We would like to thank TMG (Tropical Melhoramento & Genética, Cambé, PR, Brazil) and FT-Sementes (Ponta Grossa, PR, Brazil) for providing soybean seeds from their breeding programs, as well as Marcelo Oliveira, curator of the Embrapa Soja (Londrina, PR, Brazil) soybean germplasm collection, for providing the additional genotypes that were requested.
The following sources helped to finance this research: US Department of Agriculture (USDA) National Sclerotinia Initiative, and USDA CRIS to SJC, as well as Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) to FCJ. Funds from all agencies were utilized to enable the research to be conducted, for the data to be obtained and processed, as well as for data interpretation and manuscript writing.
Availability of data and materials
Data generated or analysed during this study are included in this published article and its supplementary information files, with details of all genotypes, including GBS sequences, deposited at NCBI as BioProject PRJNA396424.
Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture, the University of Illinois, nor the Universidade Federal de Uberlândia.
Ethics approval and consent to participate
No exotic or endangered plant materials were used, only the common soybean, which originated in Asia, and now widely cultivated across the globe. Soybean genotypes used in the study were a mixture of publically available ones from the USDA Soybean Germplasm Collection, Urbana, IL, USA, the Soybean Germplasm Collection of Embrapa Soja, Londrina, Brazil, or from the Universidade Federal de Uberlandia, Uberlandia, Brazil; additionally, some seed were freely obtained for research purposes from Tropical Melhoramento & Genética, Cambé, Brazil, or from FT Sementes, Ponta Grossa, Brazil.
Consent for publication
"The authors declare that they have no competing interests"
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Testing Restriction_Enzymes. (DOCX 508 kb)
Scree plot of eigen values to determine number of principle components for the GWAS models. (PDF 8 kb)
Distribution of 324 phenotypes. (PDF 99 kb)
Phenotypes of all 324 genotypes used. (XLSX 31 kb)
Top resistant genotypes prior to filtering based on SNPs. (DOCX 87 kb)
Top susceptible genotypes prior to filtering based on SNPs. (DOCX 86 kb)
All genes in the LD blocks containing the significant SNPs on chromosome 1, 11 and 18. (XLSX 45 kb)