High-density linkage map construction and QTL analyses for fiber quality, yield and morphological traits using CottonSNP63K array in upland cotton (Gossypium hirsutum L.)

Background Improving fiber quality and yield are the primary research objectives in cotton breeding for enhancing the economic viability and sustainability of Upland cotton production. Identifying the quantitative trait loci (QTL) for fiber quality and yield traits using the high-density SNP-based genetic maps allows for bridging genomics with cotton breeding through marker assisted and genomic selection. In this study, a recombinant inbred line (RIL) population, derived from cross between two parental accessions, which represent broad allele diversity in Upland cotton, was used to construct high-density SNP-based linkage maps and to map the QTLs controlling important cotton traits. Results Molecular genetic mapping using RIL population produced a genetic map of 3129 SNPs, mapped at a density of 1.41 cM. Genetic maps of the individual chromosomes showed good collinearity with the sequence based physical map. A total of 106 QTLs were identified which included 59 QTLs for six fiber quality traits, 38 QTLs for four yield traits and 9 QTLs for two morphological traits. Sub-genome wide, 57 QTLs were mapped in A sub-genome and 49 were mapped in D sub-genome. More than 75% of the QTLs with favorable alleles were contributed by the parental accession NC05AZ06. Forty-six mapped QTLs each explained more than 10% of the phenotypic variation. Further, we identified 21 QTL clusters where 12 QTL clusters were mapped in the A sub-genome and 9 were mapped in the D sub-genome. Candidate gene analyses of the 11 stable QTL harboring genomic regions identified 19 putative genes which had functional role in cotton fiber development. Conclusion We constructed a high-density genetic map of SNPs in Upland cotton. Collinearity between genetic and physical maps indicated no major structural changes in the genetic mapping populations. Most traits showed high broad-sense heritability. One hundred and six QTLs were identified for the fiber quality, yield and morphological traits. Majority of the QTLs with favorable alleles were contributed by improved parental accession. More than 70% of the mapped QTLs shared the similar map position with previously reported QTLs which suggest the genetic relatedness of Upland cotton germplasm. Identification of QTL clusters could explain the correlation among some fiber quality traits in cotton. Stable and major QTLs and QTL clusters of traits identified in the current study could be the targets for map-based cloning and marker assisted selection (MAS) in cotton breeding. The genomic region on D12 containing the major stable QTLs for micronaire, fiber strength and lint percentage could be potential targets for MAS and gene cloning of fiber quality traits in cotton.


(Continued from previous page)
Conclusion : We constructed a high-density genetic map of SNPs in Upland cotton. Collinearity between genetic and physical maps indicated no major structural changes in the genetic mapping populations. Most traits showed high broad-sense heritability. One hundred and six QTLs were identified for the fiber quality, yield and morphological traits. Majority of the QTLs with favorable alleles were contributed by improved parental accession. More than 70% of the mapped QTLs shared the similar map position with previously reported QTLs which suggest the genetic relatedness of Upland cotton germplasm. Identification of QTL clusters could explain the correlation among some fiber quality traits in cotton. Stable and major QTLs and QTL clusters of traits identified in the current study could be the targets for map-based cloning and marker assisted selection (MAS) in cotton breeding. The genomic region on D12 containing the major stable QTLs for micronaire, fiber strength and lint percentage could be potential targets for MAS and gene cloning of fiber quality traits in cotton.
As the largest natural fiber source, cotton is one of the most important economic crops worldwide. In 2018/19 season, cotton was primarily grown in around 30 countries, with more than 116 million bales of fiber produced [4]. In the United States, which is the third largest cotton fiber producing country as well as the largest cotton fiber exporting country in the world, 18.59 million bales of cotton fiber was produced with 15 million bales exported in 2018/19 season [4]. The production, distribution and processing of cotton in the United States provide about $27 billion direct business revenue while supporting more than 200 thousand jobs [5]. However, the world cotton fiber market is recently under a lot of pressure because of the development of synthetic fibers [6]. In addition, the US cotton has to compete with handpicked cotton from Asia. Currently, the US cotton could compete in the international markets because of its higher fiber quality. Therefore, improving the fiber quality has been an important objective of cotton breeders in the US. Farm productivity and economic viability of cotton production directly related to the lint yields [5]. As such, continued improvements in the fiber quality and yield are critical for the US cotton production.
Plant height, a typical quantitatively inherited trait [7][8][9], can indirectly influence the yield of cotton fiber because optimal plant height can contribute to machine harvesting and help achieve higher harvesting index [7]. Fuzziness seed trait, an important seed trait related to the cotton yield and fiber quality [10], was usually considered as a binomial trait (fuzzy seed or fuzzless seed) while some reports indicated this trait was polygenically controlled [10][11][12][13].
In general, fiber quality and yield traits in cotton are known to inherit polygenically and influenced by environment [14][15][16]. Further, fiber quality traits often have negative association with some yield traits [17]. Although, traditional breeding methods played an important role in the development of cotton cultivars [18,19], further improvements in the trait values especially for the quantitative traits using these breeding approaches have been limited [20,21]. With the advancement of molecular marker technology, maker-assisted selection (MAS) has been increasingly applied in the cotton breeding programs [22]. Restriction fragment length polymorphism (RFLP) markers were the first type of the markers used in the cotton improvement [23] and the first linkage maps in cotton were constructed using RFLP markers in 1994 [24]. From then on, various types of the molecular markers were used in the cotton genetics and breeding [25][26][27][28][29][30][31][32]. High-density genetic maps with broadly adaptable markers are required for improving the efficiency in detection and MAS-based transfer of quantitative trait loci (QTLs) [33][34][35][36][37][38][39]. The abundance, extensive polymorphism and compatibility to high-throughput genotyping platforms have made the single nucleotide polymorphism (SNP) markers the most popular markers used in plant translational genomics [40][41][42]. With the development of next-generation sequencing (NGS) technologies, several methods to discover large numbers of SNP-based markers are now developed for cotton [36][37][38][39][40]. This enabled the development of high-density linkage maps in cotton [36][37][38][39][40]. In the present study, we used 63K SNP array [40] for genotyping a recombinant inbred line (RIL) population, derived from landrace by elite germplasm line cross, to construct a high-density linkage map and to map the QTLs for cotton fiber quality, yield and morphological traits in Upland cotton.

Analyses of the phenotypic traits
A summary of the statistical analyses for the phenotypic performance of the twelve traits is presented in Table 1. Among the six fiber quality traits measured, micronaire (MIC), upper half mean length (UHM), uniformity index (UI) and fiber strength (STR) of the parental accession NC05AZ06 were significantly (P < 0.05) higher (13.0-16.9%, 34.1-36.6%, 4.4-7.6%, 7.4-8.1%, respectively) than those of the parental accession NC11-2091 while the short fiber content (SFC) of NC11-2091 was significantly (P < 0.05) greater (26.3-55.3%) than that of NC05AZ06. No significant difference was found between the two parents for the fiber elongation (ELO). All the four yield traits, boll weight (BW), lint percentage (LP), seed index (SI) and lint index (LI) were significantly (P < 0.01) higher (209.4-222.8%, 137.2-160.0%, 12.5-24.6%, 311.8-317.9%, respectively) in NC05AZ06 than in NC11-2091. For morphological traits, the plant height (PH) of NC05AZ06 was significantly (P < 0.01) lower (− 32.5%) than NC11-2091. The seed fuzziness grade (FG) of NC05AZ06 was 100% (fuzz-rich) and the FG of NC11-2091 was 0 (fuzz-free). The broad-sense heritability of the traits calculated by the ratio of total genetic variance to total phenotypic variance for all the traits is listed in Table 2. Most traits, except for PH, had high broad-sense heritability across 2 years with values ranging from 82 to 96%. The broad-sense heritability of PH was only 56%. Since we only had 1 year's data for PH, we can just state that the trait performance of PH might be sensitive to the environment.
The results of correlation analyses for the twelve traits was described in Table 3. Among the fiber quality traits, UHM was significantly (P < 0.01) positively correlated with UI, BW, LP, LI, FG, and significantly (P < 0.01) negatively correlated with MIC, ELO and SFC. The STR was significantly positively correlated with BW (P < 0.05), SI (P < 0.01) and PH (P < 0.05), and was significantly negatively correlated with ELO (P < 0.05) and LP (P < 0.01). The SFC was significantly (P < 0.01) positively correlated to MIC, ELO and it was significantly (P < 0.01) negatively correlated to UI. The ELO was significantly (P < 0.01) positively correlated with MIC and significantly negatively related to UI (P < 0.01) and BW (P < 0.05) ( Table 3). Almost all the four yield traits BW, LP, SI, and LI showed a highly positive correlation with each other, except for LP and SI, which the correlation was not significant ( Table 3). The morphological trait PH had a negative correlation with yield traits BW, LP and LI, and a positive correlation with SI and STR, respectively. Another morphological trait fuzziness grade was highly positively correlated with all the four yield traits ( Table 3).

Construction of linkage maps
Out of 63,058 SNPs used in the genotyping, 11,255 (17.8%) SNPs were polymorphic between the two parents.
A total of 3129 SNPs were selected for linkage map construction after removing the poor quality or duplicate SNPs.   Of the 3129 mapped SNPs, 175 (5.6%) SNP markers showed segregation distortion which spanned on 22 chromosomes, with the most distorted markers (34) and highest distortion rate (25.37%) on Chr.02 (A13) ( Table  4). Seventeen segregation distortions region (SDR) were identified on 13 chromosomes, with 9 of the SDRs in A sub-genome and 8 SDRs in the D sub-genome (Table 4).
Hence, the sub-genomes did not show any bias for the SDRs.
Comparison of the genetically mapped SNPs with the sequence based physical map of the TM-1 (G. hirsutum) reference genome sequence [43] for syntenic relationships showed that the strong collinearity between the genetic map and physical map (Fig. 8). The SNP based genetic map of 4422.44 cM corresponded to 1911.76 Mb of the sequence based physical map which represented 98.8% of the total length of the sequence based physical map (Additional file 2: Table S2 and Additional file 4: Table S4). All linkage groups showed good collinearity with the physical map. Coverage of the individual chromosomes ranged from 96.4 to 99.5% of the sequence based physical map. Figure 8 shows the circos plots that describe strong collinearity between the genetic map and physical map. Finally, collinearity between genetic and physical maps suggest that the genetic mapping population used in the current study did not contain any chromosomal rearrangements.
QTL analysis for cotton fiber quality, yield and morphological traits QTL analysis using composite interval mapping (CIM) identified a total of 106 QTLs, with 59 of QTLs for fiber quality traits, 38 for yield traits and 9 for morphological traits (Additional file 1: Table S1).
Overall the phenotypic variation explained by the QTLs ranged from 3.6-48.0% (Additional file 1:      with favorable alleles from NC05AZ06 and 9 from NC11-2091).

Upper half mean length (UHM)
UHM is a measure of fiber length. Ten QTLs explaining 5.5 to 12.1% of PV were identified (Table 5 and Additional file 1: Table S1).  the PVE ranging from 10.1 to 12.1% were detected. Majority of the QTLs with favorable alleles were derived from the parent NC05AZ06. The qUHM-16-CH5-D11-1 was the only QTL with favorable alleles derived from NC11-2091.
In each of these 5 clusters, the favorable alleles of SFC and UI QTLs were contributed by same parents with different signs ("+" or "-") of additive effects. For yield traits, four QTL clusters (Y-1 to Y-4) were identified (Table 8). The favorable alleles for most of the yield QTLs in these clusters were derived from NC05AZ06. Ten QTL clusters contained multiple QTLs from different trait categories (Table 9). The QYA-1 and QYA-2 were two clusters carrying multiple QTLs from all 3 trait categories. The QYA-1 with a region in Chr. 8 Table S3).

Construction of high-density linkage maps with SNP arrays
The limited quantity of the polymorphic markers available were often limitations for the construction of high-density linkage maps in cotton [60,61]. Due to the lack of the marker polymorphism in cotton, the linkage maps built by second-generation molecular markers such as SSRs and AFLPs, usually carried some disadvantages viz low marker coverage of the cotton genome, poor marker density and large gaps [61][62][63][64]. SNPs provide abundant genetic variation and their loci distribute evenly along the whole genome. Hence, they have been the most reliable markers for building high-density linkage maps and have been widely used in the QTL studies [40,65,66]. Recently, two sets of cotton SNP arrays CottonSNP63K and CottonSNP80K were developed and were used in the QTL mapping [40,65]. Several high-density cotton genetic maps constructed successfully using these SNP arrays [35,[66][67][68][69][70]. In the current study, a linkage map was constructed using SNPs from CottonSNP63K array. The genetic map spanned a total length of 4422.44 cM, which was in correspondence with the estimated size of tetraploid cotton genome (4500 cM) [71]. The average  (Fig. 8) suggesting there were no chromosome rearrangements among the parents and mapping population used for mapping. Finally, the circos plot suggested the accuracy of the linkage maps in comparison to the physical maps. Circos plots further confirmed that the polymorphic SNPs detected in each of the chromosomes distributed unevenly, which support an observation that the SNPs showed uneven distribution for polymorphism-rich and polymorphism-poor regions along each chromosome [73].

QTL mapping population
The quality of a QTL map depends on the number of polymorphic markers and the genetic mapping populations used. Tetraploid cotton, in general, show a low level of marker polymorphism [77,78]. According to the previous cotton QTL mapping research, it was observed that the marker polymorphism rates in the interspecific mapping populations [24,31,32,72,79,80] were higher than intraspecific mapping populations [35][36][37][38][39] on the whole. In order to potentially detect a broader array of polymorphic markers and QTL alleles, interspecific mapping populations derived from G. hirsutum and G. barbadense have been extensively used for QTL mapping in cotton [24,31,32,72,79,80]. However, the QTL type and their mapping information from an interspecific (G. hirsutum × G. barbadense) population were inconsistent in comparison to the QTLs studied based on an intraspecific G. hirsutum population [15]. Further, QTLs identified using the interspecific RILs could not be transferred precisely into Upland cotton due to the genetic bottlenecks associated with interspecific hybridizations during the breeding process. Hence, QTLs of the interspecific mapping studies were not utilized in Upland cotton improvement. In this study, an intraspecific G. hirsutum RIL population, developed from a cross between an improved germplasm line NC05AZ06 and a landrace accession NC11-2091 was used for QTL mapping. The CottonSNP63K array based genotyping provided good number of the candidate markers. This allowed us to obtain enough polymorphic markers to develop high density genetic maps in Upland cotton which in general suffers from low density of markers and low marker polymorphism [24,31].

QTLs with favorable alleles identification
The identification of favorable QTLs alleles can help improving the fiber quality and yield in Upland cotton by genomic and marker assisted selection [81]. As expected, the performance of parent NC05AZ06 was superior to those of the landrace parental accession NC11-2091 for MIC, UHM, UI, STR and 4 yield traits. Among the 106 QTLs, the favorable alleles of 80 QTLs originated from NC05AZ06 while other 26 from NC11-2091. Only a few of the QTLs with favorable alleles of a given trait were derived from the parent NC11-2091 (Table 7 and Additional file 1: Table S1). Fifteen QTLs with favorable alleles contributed by NC11-2091. Of these, 8 were major QTLs.
The phenotypic trait correlation analysis showed high values of positive or negative correlations between different traits, which can be partially explained by the QTL clusters we identified. For example, Q-1, Q-3, Q-4, Q-5, Q-6 contained multiple QTLs from SFC and UI (Table  8). However, the signs of additive effects of SFC and UI QTLs in each of these clusters were opposite with favorable alleles from same parent. In this case, when we choose the favorable alleles of NC05AZ06 for this QTL, SFC will decrease and UI will increase. If we choose the other alleles for the QTL, the UI will decrease and SFC will increase. This explained why UI and SFC would always show negative correlation values (− 0.93). This strong negative relationship between SFC and UI was also reported in the previous study by Ramey et al. [84]. On the contrary, Y-1, Y-3, Y-4 clusters provided the evidence of why all the yield traits were highly positively correlated since all the favorable alleles of QTLs in these clusters were derived from same parent. These positive correlations among yield traits were also widely observed in many previous studies [31-33, 35, 37, 66, 67]. Similarly, Q-2, Q-7 explained a negative correlation (− 0.62) between UHM and ELO. This is consistent with previous observation by Wang et al. [33] who reported a negative correlation (− 0.59) between fiber length and ELO. Q-7 also explained a positive correlation (0.46) between ELO and SFC as well as a negative correlation (− 0.79) between UHM and SFC. Interestingly, previous reports suggested both positive [85] as well as negative correlation (− 0.349) [61] between ELO and SFC. In the current study, some of the clusters contained both fiber quality traits and yield traits, which provided us an efficient way to improve the quality traits and yield traits at the same time. For example, QY-3 were shared by 3 QTLs from LI, SI and MIC, of which the favorable alleles were all contributed by NC05AZ06. In this case, the QTL markers in QY-3 can help improving the MIC, LI and SI concurrently. Similarly, the QTLs in QY-4 had the potential to improve the UHM, MIC, LI and LP simultaneously. Similar type of positive correlation between LI, LP and MIC was reported by Wang et al. [33].

Candidate gene analysis of the QTLs
The identification of the candidate genes with known functions in cotton fiber development, located in the mapped QTL regions, could add additional validity to the fiber quality QTLs. Out of the 11 stable and major QTLs analyzed, 8 regions with QTLs qELO-CH8-A9-1, qELO-CH4-A11-1, qSTR-CH2-A13-1, qELO-CH19-D1-1, qSTR-CH19-D1-1, qSTR-CH25-D12-1, qMIC-CH25-D12-1, qLP-CH25-D12-1 showed two or more cotton fiber related candidate genes (Additional file 5: Table S5). Moreover, a fiber related gene-rich QTL cluster QYA2 was identified on chromosome D12. The presence of 6 reported fiber related candidate genes localized in this QTL region may partially explain and confirm the QTL cluster containing multiple different QTLs. The importance and validation of this QTL on chromosome D12 could also be confirmed from the previous mapping efforts. Huang et al. 2017 [82] reported a

Development of the RIL population
The G. hirsutum accessions NC05AZ06 and NC11-2091 were used as parents to develop the RIL population. NC05AZ06 is a sub-okra germplasm line with improved fiber quality and yield traits released by our program [86]. The landrace accession NC11-2091(TEX 2313; PI 607640), collected from Thailand, was obtained from the U.S. National Cotton Germplasm Collection (NCGC), USDA-ARS, College Station, Texas. As landraces tend to be heterogeneous, we inbred the land race accession NC11-2091 for three generations using manual selfing and single seed descent method of line advancement. In the summer of 2010, the inbred parental accessions were planted at the Central Crops Research Station at Clayton, North Carolina and crossed to develop F 1 seeds.
The F 1 plants were planted in the winter nursery, Tecoman, Mexico and manually selfed using glassine bags to obtain F 2 seed. The F 2 plants were grown and individual plants were manually selfed to obtain F 3 seed in the summer nursery of 2012. From 2013 to 2015, 107 F 2:3 lines were advanced to F 5 generation by single seed decent method in the greenhouses. The 107 F 5:6 lines were grown in the summer nursery at Clayton, NC in 2015 and seed increased by manual self-pollination. Seed cotton samples were ginned using 10-saw gin. Seed were acid delinted and treated with fungicide and insecticide before using in the current study.

Field experiments and phenotyping
The F 5:6 RIL population containing 107 RILs along with parents and four checks (UA-48, UA-222, DP-393, SG-747) were planted using an augmented randomized complete block design with seven blocks in Clayton, NC in summer 2016. Each line was planted (2.5-3 seeds per ft) in the single row 10-ft plots with 38-in row spacing and 10 ft. alleys. Standard morphological practices were followed. Fifty fully opened bolls from each plot were hand-harvested in November of each year of the trials. Four yield traits, including boll weight (BW), lint percentage (LP), seed index (SI), lint index (LT) were evaluated. Approximately 15 g (g) of fiber sample ginned from each boll sample was tested for the fiber quality parameters using high-volume instrument (HVI) system at the Cotton Incorporated, Cary, North Carolina. The fiber quality traits evaluated were fiber elongation (ELO), micronaire (MIC), short fiber content (SFC), fiber strength (STR), upper half mean length (UHM) and uniformity index (UI). MIC is an airflow measurement of fibers and indicates fiber fineness and maturity. UHM is the mean length of the longer half of the fibers in the sample, measured in hundredths of an inch. STR is the force in grams required to break a bundle of fibers one tex unit in size. ELO is the amount in percentage a fiber sample can stretch prior to breakage. UI is a ratio between the mean length and the upper half mean length of the fibers, expressed as a percentage. It indicates the uniformity of fiber length in a sample. SFC is the percentage by weight of fibers 0.5 in. (12.7 mm) long or less. BW is the average weight in grams of seedcotton in a boll. LP is a ratio between the total fiber weight and the total seedcotton weight. SI is the weight of 100 seeds in grams. LI is the weight of lint in grams obtained from 100 seeds. The morphological trait, fuzziness grade of seed (FG) was determined by rating based on four levels of the seed fuzziness (0, 33.3, 66.6 and 100%). Progressive numbers 0 to 100% indicate fuzz-free to fuzz-rich cotton seed.
In the summer of 2017, the RIL population along with parents and the same four checks were planted using a completely randomized block design (RCBD) with two replications in Clayton, NC. Each line was planted in the single row 20-ft plot with a plant density of 2.5 seeds per foot. Forty fully opened bolls from each plot were handharvested in December 2017. Same phenotyping methods were used for evaluating the 11 cotton traits as in year 2016. Plant height (PH) trait values was evaluated by taking the average of the manually-measured height of five randomly selected plants from each plot.

Marker genotyping and linkage map construction
Genomic DNA was extracted from 3 to 4 weeks old plant leaf tissue of the RIL population and their parents using DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). One hundred and four of the 107 phenotyped RILs and the parents were genotyped with 63 K cotton SNP array [41] at Texas A&M Institute for Genome Sciences and Society. Candidate SNPs were filtered from the array with 63,058 SNPs based on the rules as follows: (1) SNPs with monomorphic genotypes were removed, (2) poorquality SNPs and SNPs with missing values more than 30% were removed and (3) duplicate SNPs were removed [67].
The resultant candidate SNPs were used to construct the linkage map by JoinMap 4.1 [87] using Kosambi's mapping function [88] with logarithm of the odds (LOD) threshold of 7.0. The SNPs were then aligned to the TM-1 (G. hirsutum) Genome NAU-NBI Assembly v1.1 and Annotation v1.1 database [43] by BLAST (https://www.cottongen.org/ blast). Correspondence of the linkage map groups with the physical map groups was performed with the circos plots by Circa software (http://omgenomics.com/circa/).

Data analysis
All the trait phenotypic values of the RILs and parents were estimated using the linear mixed models in SAS version 9.4 (SAS Institute, Cary, NC). The SAS software was also used for calculating the statistics, including the T-test of the difference between the value means of two parents, the broad-sense heritabilities of the traits, the genetic correlations between the traits and other basic statistical parameters.
Segregation of the markers from the Mendelian ratio 1:1 was tested using chi-square analysis (P < 0.05) and a segregation distortion region (SDR) was identified when at least three adjacent markers showing significant (P < 0.05) segregation distortion [89] using JoinMap 4.1.
All 12 traits related QTLs were detected using composite interval mapping (CIM) method [90] using WinQTLCart2.5 [91]. The genotype of alleles from parental accession NC05AZ06 (P1) was coded as "AA" and the genotype of alleles from accession NC11-2091(P2) was coded as "aa". Based on the results of a 1000-time permutation procedure, logarithm of the odds (LOD) ≥ 2.5 with at least 1 year's phenotypic variation explained (PVE) ≥ 5.0 was used as the threshold for a QTL identified in both years with overlap region and LOD ≥ 3.0 with PVE ≥ 5.0 was the threshold to determine a QTL detected only in 1 year. The resulting linkage map with identified QTLs were drawn using MapChart version 2.32 [92]. Further, the identified QTLs were used to detect the QTL clusters and meta QTL analysis was performed by comparing them with the QTLs reported in previous studies. Information of all the previously reported QTLs was downloaded from the CottonQTLdb database (http://www.cottonqtldb.org; Release 2.3) developed by Said et al. [45]. The marker defined QTL regions with DNA sequence information were BLAST searched on Cotton Functional Genomics Database (https://cottonfgd.org/) [59] for identifying the possible candidates genes for the each of the major stable QTL.

QTL nomenclatures
All the QTLs were labeled based on their population, trait type, and chromosome information. The names of the QTL clusters are given based on the trait categories of QTLs they contained. For example, Q-* meant a cluster contained only fiber quality traits QTLs; QY-* meant a cluster contained both fiber quality and yield traits QTLs; QYA-* meant a cluster contained fiber quality, yield and morphological traits QTLs and so on.