Identification of two major QTLs for pod shell thickness in peanut (Arachis hypogaea L.) using BSA-seq analysis

Background Pod shell thickness (PST) is an important agronomic trait of peanut because it affects the ability of shells to resist pest infestations and pathogen attacks, while also influencing the peanut shelling process. However, very few studies have explored the genetic basis of PST. Results An F2 segregating population derived from a cross between the thick-shelled cultivar Yueyou 18 (YY18) and the thin-shelled cultivar Weihua 8 (WH8) was used to identify the quantitative trait loci (QTLs) for PST. On the basis of a bulked segregant analysis sequencing (BSA-seq), four QTLs were preliminarily mapped to chromosomes 3, 8, 13, and 18. Using the genome resequencing data of YY18 and WH8, 22 kompetitive allele-specific PCR (KASP) markers were designed for the genotyping of the F2 population. Two major QTLs (qPSTA08 and qPSTA18) were identified and finely mapped, with qPSTA08 detected on chromosome 8 (0.69-Mb physical genomic region) and qPSTA18 detected on chromosome 18 (0.15-Mb physical genomic region). Moreover, qPSTA08 and qPSTA18 explained 31.1–32.3% and 16.7–16.8% of the phenotypic variation, respectively. Fifteen genes were detected in the two candidate regions, including three genes with nonsynonymous mutations in the exon region. Two molecular markers (Tif2_A08_31713024 and Tif2_A18_7198124) that were developed for the two major QTL regions effectively distinguished between thick-shelled and thin-shelled materials. Subsequently, the two markers were validated in four F2:3 lines selected. Conclusions The QTLs identified and molecular markers developed in this study may lay the foundation for breeding cultivars with a shell thickness suitable for mechanized peanut shelling. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10005-x.

Pod shell thickness (PST), which is calculated as the distance between the exocarp and the mesocarp of peanut, is an important peanut shell trait that affects peanut processing and resistance to pest infestations [3].Thickshelled peanuts are suitable for storage, but breaking and removing their shells can be difficult (i.e., decreased shelling efficiency), whereas thin-shelled peanuts are appropriate for shelling, but they are susceptible to pest infestations and their seeds may be damaged during shelling [4,5].Therefore, optimizing PST is conducive to maximizing the mechanized shelling efficiency and peanut quality, while also improving peanut storage and processing, potentially leading to increases in the economic value of peanuts.
A previous study revealed that PST is a complex quantitative trait with a normal distribution [3].Pod shell thickness is typically determined by measuring the thickness of the pod waist [6], the ridge of the pod posterior chamber [7,8], the pod stalk [9], and the most convex part of the posterior chamber [3] using a digital vernier caliper.Guo et al. were the first to use vernier calipers to measure the mature pod waist and calculate the shell thickness.Moreover, they used 204 chromosome segment substitution lines obtained from a cross between the recurrent parent Qinhuangdaoguangyang and the donor parent Shiyaodou to identify QTLs for PST, among which two QTLs were detected on chromosomes 2 and 15, with phenotypic variance explained (PVE) values of 7.65% and 9.00%, respectively [6].Li et al. used a recombinant inbred line (RIL) population comprising 151 lines derived from 79,266 and D893 to identify QTLs for PST; 14 QTLs were detected in seven environments (PVE value of 6.90-23.16%),while qPST12 on chromosome 12 was detected in three environments (PVE value greater than 20%) [7].Liu et al. analyzed a RIL population consisting of 441 lines derived from Shanhua15 and Zhon-ghua12 and identified four QTLs on chromosomes 7, 8, 13, and 17 (PVE value of 1.75-3.22%)[8].Yang et al. examined the 267 lines of a RIL population derived from Yueyou92 and Xinhuixiaoli and detected three QTLs on chromosomes 4, 7, and 13 (PVE of 7.36-11.61%)[9].In a recent study, one stable major QTL for PST (qAHPS07) was finely mapped to a 36.46-kbphysical interval on chromosome 7 [10].However, PST measurements are generally imprecise and the complex genetic mechanisms underlying this trait remain relatively uncharacterized.
A bulked segregant analysis (BSA), which involves the pooling of samples with extreme phenotypes, may be useful for cost-effective and highly efficient marker-based screening or QTL mapping [11].During the last decade, advances in next-generation sequencing technologies and decreases in sequencing costs have enabled researchers to gradually develop and improve BSA sequencing (BSAseq) methods and technical systems for analyses of multiple crops, including maize [12], soybean [13], rice [14], and wheat [15].They have also been used in studies on various peanut traits, including testa color [16], branch number [17], sucrose content [18], and pod size [10].
Four kinds of statistical algorithms can be used to identify the single nucleotide polymorphisms (SNPs) or genomic regions associated with target traits by BSA-seq analysis.The SNP-index is a classical algorithm.Specifically, Δ(SNP-index) is used to calculate the difference in the genotype frequency between mixed pools.A strong association between the genomic region and the target trait is reflected by a Δ(SNP-index) value close to 1 [19].The Euclidean distance (ED) algorithm is used to identify the SNP sites that differ significantly between two mixed pools as well as to evaluate the regions associated with traits.Theoretically, in the two mixed pools constructed for a BSA, all loci except for those related to the target trait tend to be the same.Thus, the ED of non-target loci should be close to 0. The magnitude of ED represents the degree of the difference in the markers between the two mixed pools [20].The G value is a modified statistic obtained after smoothing the G statistic, which is useful for mapping a relatively narrow region.The G value of each SNP is calculated on the basis of the allele sequencing depth, and is weighted according to the physical distance of the adjacent SNP [21].Fisher's exact tests were used to calculate P values at each variant position to generate a P value-based plot corresponding to the genomic position [22].
Kompetitive allele-specific PCR (KASP) is a fluorescence-based genotyping technology that enables the accurate detection of bi-allelic SNP and insertiondeletion (InDel) sequences at specific loci in complex genomic DNA samples [23].It is potentially very useful for improving crop traits because of its high flux and strong operability [24].In addition, BSA-seq may be combined with KASP marker-based fine mapping for the rapid and efficient identification of QTLs linked to target traits.This combined approach has been widely applied to identify QTLs for agronomic traits in diverse crops, including plant height in rice [25], powdery mildew resistance in melon [26], kernel length in wheat [27], and sucrose content in peanut [18].
In this study, a new method for calculating PST was proposed and then BSA-seq and fine mapping were used to identify QTLs for PST in an F 2 population obtained from a cross between Yueyou 18 (YY18) and Weihua 8 (WH8).The results of this study may be used to further clarify the genetic mechanism underlying PST, while also providing the theoretical basis for developing relevant molecular markers and cloning important genes.

Phenotypic identification of the two parents and the F 2 population
The phenotypic analysis revealed significant differences in PST between YY18 and WH8 (Fig. 1B).The continuous variation in the PST of 410 selected individuals was consistent with an approximately normal distribution (Fig. 1C, Table S1), with maximum and minimum values of 1.96 and 0.36 mm, respectively, and a coefficient of variation of 34.7% (Table S2).The correlation analysis of 350 materials indicated PST was significantly positively associated with the pod area (PA), pod perimeter (PP), pod length (PL), and pod width (PW) (Fig. 1D).The broad-sense heritability of the pod shell thickness, pod area, pod perimeter, pod width and pod height was 0.9801, 0.9213, 0.9571,0.8746and 0.9136, respectively.One-way analysis of variance showed that PST was reproducible in one material and varied significantly in different materials (Table S3).

BSA-seq analysis of PST
A total of 399.87 Gb raw data were generated by the sequencing platform.After filtering for quality, 398.06 Gb clean data were retained, with an average of 99.52 Gb per sample.The reads with Q20 value ≥ 95.40%, Q30 value ≥ 88.15%, and error rate ≤ 0.04 were used for the subsequent variant detection and analysis.The mean genome coverage depth was 43.66×, 42.52×, 85.32×, and 83.03× for YY18, WH8, thick-shelled pool, and thinshelled pool, respectively.Additionally, 99.80%, 98.25%, 98.91%, and 98.82% of the clean reads for YY18, WH8, thick-shelled pool, and thin-shelled pool, respectively, were mapped to the reference genome (Table 1).A total of 351,510 high-quality homozygous SNPs/InDels between the thick-shelled pool and the thin-shelled pool were obtained after removing the low-quality SNPs and InDels as well as the heterozygous sites in the two parents.These SNPs/InDels were used to conduct the association analysis for PST.

KASP marker development and fine mapping for PST
To narrow down the candidate regions, 10, 5, and 7 KASP markers were designed for the candidate regions on chromosomes 8, 13, and 18, respectively (Table S4).The 410 individuals randomly selected from the F 2 population were genotyped using the 22 KASP markers and then the genetic linkage analysis was performed.The results revealed that the genomic region on chromosome 3 detected by BSA-seq was not identified during the linkage analysis, indicative of a false positive locus.The BSA-seq and linkage analysis suggested that three QTLs (qPSTA08, qPSTA13, and qPSTA18) were associated with PST.The major QTL qPSTA08, which was detected on chromosome 8, had PVE and LOD values of 31.3-32.3%and 27.5-28.7,respectively, with a genetic distance of 1.6 cM and a physical distance of 0.69 Mb (31,024,672-31,713,024 bp) (Fig. 3A).In contrast, qPSTA18, which was detected on chromosome 18 in a region highly homologous to qPSTA08, had PVE and LOD values of 16.7-16.8%and 13.5-13.6,respectively, with a genetic distance of 2.83 cM and a physical distance of 0.15 Mb (7,198,347,232 bp) (Fig. 3C).Moreover, the qPSTA18 genome sequence was highly homologous to part of the qPSTA08 genome sequence.The minor QTL (qPSTA13), which was mapped to a 0.96-Mb physical interval (126,828,166-127,788,471 bp) on chromosome 13, had a PVE value of 6.4-6.6% (Fig. 3B; Table 3).
The 410 individuals of the F 2 population and the two parents were grouped according to the genotyping results of two markers (Tif2_A08_31713024 and Tif2_ A18_7198124).For the Tif2_A08_31713024 locus, the YY18 and WH8 genotypes were denoted as AA and aa, respectively.For the Tif2_A18_7198124 locus, the YY18 and WH8 genotypes were denoted as bb and BB, respectively.Therefore, there were four possible genotypes (AABB, AAbb, aaBB, and aabb) at the two loci.The lines with the AABB genotype had a thick shell, whereas the lines with the aabb genotype had a thin shell.The multiple comparisons test results indicated that the difference in PST between the two genotypes was significant, implying that the genotypes at these two loci were highly correlated with PST (Fig. 4).

KASP marker validation
After genotyping all the F 2 individuals using the two markers Tif2_A08_31713024 and Tif2_A18_7198124, four individuals were selected from F 2 to self-cross in May 2023, of which two lines genotyped aaBb, were homozygous at the A08_31713024, while heterozygous at the A18_7198124, two lines genotyped AaBB, were homozygous at the A08_31713024, while heterozygous at the A18_7198124.Linkage analysis of the pod shell thickness involving two loci in four F 2:3 lines showed that the two markers were associated with PST significantly (Fig. 5).

Candidate gene annotation
A total of 25 genes were predicted in the candidate regions.According to the resequencing data for the two parents, these genes contained SNPs/InDels.Among these genes, 10, 5, and 10 were detected in the qPSTA08, qPSTA18, and qPSTA13 regions, respectively (Table S5).Eight genes had SNPs in the exon regions, resulting in one codon mutation, one frameshift mutation, two synonymous mutations, one missense mutation and three in the untranslated region.Three genes with a nonsynonymous mutation or in the untranslated regions in the two    major QTLs (qPSTA08 and qPSTA18), namely Arahy.R07MUD, Arahy.X1RCBJ, and Arahy.QT7BNH, were designated as candidate genes (Table 4).

Discussion
Accurate method for measuring mature peanut pod shell thickness Previous studies on peanut shells mainly focused on chemical compositions, with relatively little research on the genetic basis of shell traits, especially thickness.Pod shell thickness is usually determined by measuring a fixed position of the shell using a digital vernier caliper.However, because the pod shell has an irregular shape and an uneven thickness, measuring PST at a specific position is inappropriate.Moreover, the shell may deform to some extent if it is examined using vernier calipers and the shell positions used for measuring PST cannot be located precisely.Accordingly, traditional PST measurements may be inaccurate.Therefore, an algorithm was developed to calculate the thickness of the whole pod shell.Specifically, one point on the exocarp was selected and then the closest point on the mesocarp was detected using the algorithm.The distance between the two points was the thickness at that point.After measuring all sampling points on the outer shell, the average value was calculated to determine PST [28].The number of sampling points (approximately 2,000) was positively correlated with PP.The code of MATLAB algorithm was in supplementary text.Finally, this method resulted in a PST with good repeatability, which contributed to the identification of two major QTLs for PST.Hence, the method developed in this study can accurately measure PST in peanut, with potential implications for measuring the shell thickness of other crops.

QTLs for traits that are positively or negatively correlated tended to co-localize in the same or adjacent genomic regions
Earlier research indicated PST is strongly positively correlated with PA, PL, and PW [10].In the current study, the correlation analysis of 350 individuals revealed the highly positive correlations among PST, PA, PL, and PW, but these traits were only weakly positively correlated with PP.Notably, for the two mixed pools, the pods with thick shells were significantly bigger than the pods with thin shells (Figure S2).Therefore, using 22 KASP markers, eight QTLs related to PA, PP, PW, and PL were detected in the same or adjacent regions as the QTLs for PST (Table S6).The QTL (qSPA08) for shelling percentage was detected in a region adjacent to qPSTA08 on chromosome 8 [29].Besides, Khedikar et al. also identified the QTL for shelling percentage at 43.51 cM within TC6H03-GM1760 on chromosome A08, which may overlap with qPSTA08 [30].Furthermore, the PA, PP, PW, and PL values were significantly higher for the samples with the AABB genotype than for the samples with the aabb genotype (Figure S3), suggesting the two major QTLs may be associated with an increase in peanut yield.
In previous study, a QTL associated with pod weight and size on chromosome A08 was identified in three consecutive years with 3.33-5.58%PVE [31], which had an overlap with qPSTA08.Miao et al. identified QTLs related to pod and seed traits on chromosomes A04, A08, B04, B05, B06 and B08, of which B08 was a major QTL, and two QTL clusters were found on A08 [32,33].

Genetic mechanism controlling shell thickness in other crops
The peanut shell develops from the ovary wall and PST may be mainly related to the secondary cell wall (SCW), which is between the plasma membrane and the primary cell wall and is mainly composed of cellulose, hemicellulose, and lignin.Some genes controlling shell thickness in other crops have been identified.In walnut, JrPXC1 encodes a leucine-rich repeat protein kinase that affects SCW synthesis and functions in xylem fibers [34,35].The ortholog of PXCL in soybean (GmLPK1) is also involved in the development of the cell wall architecture.In an earlier study involving sequencing-based homozygosity mapping, Singh et al. identified the gene controlling oil palm seed shell thickness (Shell) and determined it is a homolog of the MADS-box gene STK that controls ovary formation and seed development in Arabidopsis [36].A homologous gene in peach contributes to lignified splitpit formation [37].In pumpkin, CpKST1 with a natural mutation controls the hull-less trait by arresting SCW biosynthesis [38].Other studies indicated NST1 influences anther dehiscence and pod shattering by regulating SCW synthesis in Arabidopsis [39,40].

Analysis of candidate genes controlling pod shell thickness in peanut
In the current study, three genes with nonsynonymous mutations in the two major QTLs qPSTA08 and qPSTA18, namely Arahy.R07MUD, Arahy.X1RCBJ, and Arahy.QT7BNH, were predicted as the candidate genes.Of these genes, Arahy.R07MUD encodes a serine/threonine protein phosphatase, which is an enzyme that catalyzes the dephosphorylation of the phosphoserine/ phosphothreonine side chains of proteins.Serine/threonine protein phosphatases have crucial roles in signal transduction pathways (e.g., MAPK cascades), while also regulating cell cycle-related cell growth and metabolism as well as stress responses and defense activities [41,42].In contrast, Arahy.X1RCBJ encodes a ubiquitin carboxylterminal hydrolase (UCH), which can specifically cleave ubiquitin from proteins during ubiquitin-associated proteasome degradation to facilitate ubiquitin reuse and recycling.The UCHs are a subclass of the deubiquitylating enzymes, which are essential for normal cell metabolism, growth, and development and are involved in important biological processes, including DNA repair and cell apoptosis [43,44].The Arabidopsis uch1 and uch2 mutants reportedly have altered shoot architecture because the corresponding UCHs directly affect auxin signaling [45].Finally, Arahy.QT7BNH encodes fasciclinlike arabinogalactan protein 16 (FLA16), which helps mediate plant growth and development and cell wall synthesis.Many studies have demonstrated that FLAs are related to cell wall biosynthesis [46][47][48][49][50]. Previous research indicated FLA16 is mainly expressed in inflorescence tissue cells with secondary cell walls, with the encoded protein playing an important role in plant SCW synthesis and function [51].Thus, the candidate gene regions will need to be more finely mapped and the candidate genes should be functionally characterized.

Conclusion
Two major QTLs (qPSTA08 and qPSTA18) for PST were identified on the basis of a BSA-seq analysis and fine mapping.Two molecular markers (Tif2_A08_31713024 and Tif2_A18_7198124) were developed and revealed to be closely linked to PST.The three candidate genes (Arahy.R07MUD, Arahy.X1RCBJ, and Arahy.QT7BNH) were detected in QTL regions.

Plant materials
An F 2 segregating population consisting of 1,153 individuals was derived from a cross between YY18 (female parent) and WH8 (male parent).Specifically, YY18 is a Spanish-type cultivar that was released in 2015 by the Crops Research Institute, Guangdong Academy of Agricultural Sciences, whereas WH8 is a Virginia-type cultivar that was released in 2003 by the Weifang Academy of Agricultural Sciences.The PST of YY18 (1.17 mm) is greater than that of WH8 (0.60 mm) (Fig. 1A).

Field trials and phenotyping
The F 2 population and the two parents were grown in rows at the experimental farm of Henan Academy of Agricultural Sciences, Xinxiang (Henan province) in May 2022, and the F2:3 lines in May 2023, with 0.2 m between plants and 0.4 m between rows.Routine field management practices for peanut production were employed according to local conditions.Six full and uniform pods were collected from each plant and peeled along the middlemost ridge.The peanut shells were scanned using the Microtek ScanMaker i600 system.Next, the peanut shell exocarp and mesocarp were cut out using Lasso Tool (Photoshop 2020) [52] and a MATLAB algorithm was developed to calculate the average thickness of a whole pod shell (Figure S1).Additionally, PA, PP, PL, and PW were measured using the Image J Fiji software [53].

Statistical analysis of the phenotypic data
The descriptive statistics analysis, correlation analysis, one-way analysis of viarance, Student's t-test, and multiple comparisons test were performed using the SPSS (version 26.0) software.The frequency distributions of the pod-related traits were visualized using Origin 2022.Histograms and boxplots were created using GraphPad Prism 9.5.The broad-sense heritability was estimated with QTL Icimapping [54].

Bulk construction, whole-genome resequencing, and SNP calling
Following a visual evaluation, 100 thick-shelled individuals and 100 thin-shelled individuals were selected from 1,153 lines and then analyzed using the MATLAB algorithm.A total of 60 extremely thick-shelled and 60 extremely thin-shelled individuals were screened for the BSA-seq analysis (Fig. 1E).Young leaves were collected from the 120 samples with extreme phenotypes and the two parents and then genomic DNA was extracted using the Plant Genomic DNA Kit (DP305-03; Tiangen).Equal amounts of the genomic DNA extracted from extremephenotype materials were mixed to generate the thickshelled and thin-shelled genomic DNA pools.The two mixed pools and the parental genomic DNA were used for the whole-genome resequencing, which was completed using the Illumina HiSeq™ 2000/MiSeq™ platform to generate 150-bp paired-end reads.The sequencing depths of the mixed pools and the parental genomic DNA were 60× and 30×, respectively.
A CASAVA Base Calling analysis was performed to transform the raw image data obtained from the highthroughput sequencing into raw sequencing reads.During the quality control step, the reads with more than 10% unknown bases or more than 50% low-quality bases (Q ≤ 5) were eliminated [55,56].The clean reads were aligned to the Arachis hypogaea cv.Tifrunner (version 2) reference genome (https://www.peanutbase.org/).The GATK software [57] was used to detect variants (SNPs and InDels), whereas the SnpEff software [58] was used to annotate variants and predict variant effects.

BSA-seq analysis
In this study, Δ(SNP-index), ED, G statistics, and P values were used to identify the SNPs associated with PST, with a 2-Mb sliding window and a step size of 10 kb (https:// github.com/xiekunwhy/bsa).A 99% confidence level was selected as the threshold.The overlapping intervals among the four methods were considered as the candidate QTL regions associated with PST.The genes in the candidate intervals were annotated using an online resource (https://www.peanutbase.org/).

KASP marker development, QTL validation, and fine mapping
To narrow the candidate intervals, 410 materials comprising 120 extreme-phenotype individuals and 290 randomly selected individuals from the F 2 population were used for data validation and fine mapping.As previously described, the resequencing data of the two parents were used to convert the polymorphic SNPs in the candidate intervals to KASP markers [59].For each KASP marker, two allele-specific forward primers and one common reverse primer were designed according to the 200 bp upstream and downstream genomic sequences flanking the genic SNPs [60].
Joinmap (version 5.0) [61] was used to construct the genetic linkage map on the basis of independent LOD scores ranging from 2 to 28.Loci orders were determined using the maximum likelihood algorithm and Kosambi's mapping function.In addition, multiple QTL mapping (MQM) of MapQTL 6 [62] was used for analyzing QTLs.The LOD threshold was set as 2.5, with a mapping step size of 0.1 cM.The QTL regions were drawn using Map-Chart 2.3 [63].

Fig. 1
Fig. 1 Phenotypic analysis of the two parents and the F 2 population.(A) Phenotypes of the mature pod shells from YY18 and WH8.Bar = 1 cm.(B) Phenotypic difference between YY18 and WH8 in pod thickness.(C) Frequency distribution of PST in the 410 individuals of the F 2 population.(D) Correlation analysis of the pod size-related traits in the 350 individuals of the F 2 population.(E) Data for the pod shell thickness of each F 2 individual selected to construct the mixed bulks.**, P < 0.01; ***, P < 0.001

Fig. 2
Fig. 2 QTL analysis of the pod shell thickness based on BSA-seq.Manhattan plots were generated to present the squared ED (A) and the distribution of Δ(SNP-index) (B), the distribution of G values (C), and the distribution of -log 10 (P values) according to Fisher's exact test (D) on chromosomes.The green/ blue and red lines represent 95% and 99% confidence intervals, respectively.The black lines indicate the average values for the four algorithms based on the sliding windows analysis.The significant genomic regions related to PST are indicated by red rectangles

Fig. 3
Fig. 3 Fine mapping of QTLs related to pod shell thickness.Fine mapping of QTLs related to pod shell thickness on chromosomes 8 (A), 13 (B), and 18 (C).The identified QTLs are indicated by gray boxes

Fig. 5 Fig. 4
Fig.5 Linkage analysis of the pod shell thickness involving Tif2_A08_31713024 and Tif2_A18_7198124 in four F 2:3 lines

Table 1
The summary of the sequencing quality and alignment results for BSA-seq analysis Sample ID, name of sample; Raw Base (bp), number of original bases; Clean Base (bp), number of filtered bases; Effective Rate (%), proportion of the raw bases that were clean bases; Error Rate (%), proportion of all bases that were incorrect; Q20 (%), proportion of all bases with a Phred value greater than 20; Q30 (%), proportion of all bases with a Phred value greater than 20; GC Rate (%), proportion of all bases that were C or G; Mapped (%), proportion of the clean bases that were mapped to the reference genome; Coverage Rate (%), proportion of the genome covered by the bases; Mean Depth, average sequencing depth

Table 2
The candidate regions associated with pod shell thickness based on BSA-seq analysis

Table 3
The QTLs associated with pod shell thickness identified by fine mapping

Table 4
The candidate genes with SNPs in the exon regions covered by the three candidate regions