A high density SLAF-SNP genetic map and QTL detection for fibre quality traits in Gossypium hirsutum

Background Upland Cotton (Gossypium hirsutum) is a very important cash crop known for its high quality natural fiber. Recent advances in sequencing technologies provide powerful tools with which to explore the cotton genome for single nucleotide polymorphism marker identification and high density genetic map construction toward more reliable quantitative trait locus mapping. Results In the present study, a RIL population was developed by crossing a Chinese high fiber quality cultivar (Yumian 1) and an American high fiber quality line (CA3084), with distinct genetic backgrounds. Specific locus amplified fragment sequencing (SLAF-seq) technology was used to discover SNPs, and a genetic map containing 6254 SNPs was constructed, covering 3141.72 cM with an average distance of 0.5 cM between markers. A total of 95 QTL were detected for fiber quality traits in three environments, explaining 5.5-24.6% of the phenotypic variance. Fifty-five QTL found in multiple environments were considered stable QTL. Nine of the stable QTL were found in all three environments. We identified 14 QTL clusters on 13 chromosomes, each containing one or more stable QTL. Conclusion A high-density genetic map of Gossypium hirsutum developed by using specific locus amplified fragment sequencing technology provides detailed mapping of fiber quality QTL, and identification of ‘stable QTL’ found in multiple environments. A marker-rich genetic map provides a foundation for fine mapping, candidate gene identification and marker-assisted selection of favorable alleles at stable QTL in breeding programs. Electronic supplementary material The online version of this article (10.1186/s12864-018-5294-5) contains supplementary material, which is available to authorized users.


Background
Cotton, the main source of natural textile fiber, is one of the most important cash crops. Cultivated species of cotton include two diploids (Gossypium herbaceum and G. arboreum) and two tetraploids (G. hirsutum and G. barbadense). Due to its higher yield and wide adaptability, G. hirsutum L. accounts for more than 95% of worldwide cotton production [1]. Cotton fiber yield is of primary importance for cotton growers, whereas the textile industry demands high fiber quality. In order to meet the diverse demands of cotton growers and the textile industry, it is necessary for cotton breeders to develop cultivars with both high yield and superior fiber quality. Cotton fiber quality components, important for spinning, primarily include length, strength and micronaire (a measure of fineness). These traits have genetically complex quantitative inheritance and are substantially influenced by environmental factors. Traditional breeding approaches are often not precise enough to uncover the genetic architecture of these traits. Marker assisted selection (MAS) has proven to be a robust and cost effective breeding tool to manipulate genes controlling quantitative traits [2,3].
In the past few decades, the efficiency of QTL mapping has been improved with the advent of molecular markers. More than 1500 QTL for fiber quality traits have been reported [4]. Simple sequence repeats (SSR), a type of DNA marker, have been used for genetic mapping and QTL detection for about two decades. However, low polymorphism rate limits the use of SSRs for creating high density cotton genetic maps. Recent advances in 'next generation sequencing (NGS)' enabled researchers to identify DNA sequence polymorphisms i.e. SNPs, scanning large numbers of nucleotides to find sufficiently large subsets for genome mapping even between closely-related parents. Therefore, SNP markers are effective for creating high density genetic maps, mapping QTL and use in MAS [5][6][7][8], understanding population structure and studying genetic diversity [9].
The release of genome sequences of cotton species including G. raimondii [10,11], G. arboreum [12], G. hirsutum [13,14], and G. barbadense [15] made it easy to utilize NGS for construction of high density genetic maps and QTL identification. A variety of technologies were developed based on NGS, such as genotyping by sequencing (GBS) [16], restriction-site associated DNA sequencing (RAD-seq) [17] and specific locus amplified fragment sequencing (SLAF-seq) [18]. SLAF-seq is one of the most efficient and cost effective tools for manipulating SNPs and genotyping [18]. SLAF-seq has many merits over the other advanced technologies: 1) it does not necessarily require a reference genome sequence and polymorphism information; 2) repetitive sequences can be avoided; and 3) a balance between marker density and population size can be maintained by varying the fragment size. SLAF-seq has previously been applied in many plant species for developing high density genetic maps and mapping QTL [19][20][21][22][23]. Shen et al. [23] identified 132,880 SNPs and 6,296 InDels between a reference genome (TM-1) and five tetraploid cotton species by using SLAF-seq. Zhang et al. [24] exploited SLAF-seq to construct a genetic map of 5521 SNPs covering 3259.4 cM and reveal 18 QTL for cotton boll weight, explaining 4.15-16.70% phenotypic variance, with 344 putative candidate genes identified.
In this study, a RIL population of 180 lines from a cross between a Chinese high fiber quality cultivar (Yumian 1) and an American high fiber quality line (CA3084) was investigated, with these parents selected based on their distinct genetic background. SLAF-seq was used for the discovery of SNPs, which were then used to develop a high density genetic map and identify QTL for fiber quality traits.

Phenotypic analysis of fiber quality traits
Descriptive statistics for fiber quality traits i.e. fiber length (FL), fiber uniformity (FU), fiber micronaire (FM), fiber elongation (FE) and fiber strength (FS) across three environments were summarized in Table 1. In all environments, the fiber length and strength of CA3084 were higher than Yumian 1. All five traits changed according to circumstances for both parents and population. Besides, the phenotype of RIL population showed continuous variation and transgressive segregation. Skewness and kurtosis tests showed that all these traits were normally distributed (Table 1).
One-way ANOVA showed that all traits had significant genetic and environmental effects (p < 0.01) ( Table 2). Correlations among all five fiber quality traits were significant except FU-FM and FM-FE ( Table 3).

Analysis of SLAF-seq data and SLAF markers
After SLAF library construction and high-throughput sequencing, a total of 83.65 GB data comprising 418.26 M paired-end reads was generated. Among them, 93.80% of bases met or exceeded a quality score of 30 (i.e., Q30, indicating a 0.1 % chance of an error, and thus 99.9% confidence) and guanine-cytosine (GC) content was 38.54%. The 418.26 M paired-end reads consisted of 158985 SNP markers. A total of 33288 (20.93% of ) markers were polymorphic in the RIL population. The polymorphic SNP markers were classified into four genotypes as follows: aa × bb meant that both parents were homozygous in this SNP position, the genotype of one parent was aa and the other was bb; hk × hk meant that both parents were heterozygous; and the lm × ll and nn × np meant that one parent was heterozygous and the other was homozygous. Only genotype aa × bb, consisting of 21781 SNPs, was used for further analysis in our study. Markers with low sequencing depth or more than 30% missing data were filtered out. A total of 6254 markers meeting these quality standards were used for genetic map construction.

Construction of the genetic map
From the 6254 high quality SNP markers, a genetic map covering a total distance of 3141.7 cM with an average marker interval of 0.5 cM was constructed (

Collinearity between the genetic and the physical map
Collinearity with the physical map was studied to assess the quality of the genetic map. The physical maps of 26 chromosomes were constructed based on the positions of 6254 mapped loci on the reference genome sequence of G. hirsutum [13], and covered 96.53% of the genome. All chromosomes showed more than 90% coverage except Chr08 and Chr22 which showed 88.08% and 89.97% coverage, respectively (Additional file 1: Table  S1). Most genetically mapped loci were collinear with their physical map locations (Fig. 8).

QTL mapping
A total of 95 QTL for fiber quality traits were identified in this study (Additional file 2: Table S2; Figs. 1, 2, 3, 4, 5, 6 and 7). Phenotypic variance explained (PVE) by these QTL ranged from 5.5-24.6% and LOD values ranged from 2.0-10.6. The A t genome contained 50 QTL, while the D t genome contained 45 QTL. Every chromosome had at least one QTL except Ch04, Ch15, and Ch19. Among the 95 detected QTL, Yumian 1 contributed 46 favorable alleles while CA3084 contributed 49. A total of 55 QTL were found in more than one environment with 9 found in all three environments (Table 5).

Cluster analysis
QTL clusters were defined as regions which contained multiple QTL associated with different traits within approximately 20 cM [25]. In this study 14 QTL clusters were found on 13 chromosomes (

Prominent features of the method SLAF-seq
Specific locus amplified fragment sequencing (SLAF-seq) is an efficient and cost effective tool that uses high-throughput sequencing for genotyping [18], and has been applied in many plant species for developing high density genetic maps and mapping QTL [19][20][21][22][23]. As compared to other high throughput genotyping technologies, such as GBS and RAD-seq, SLAF-seq provides a reliable and cost effective approach for QTL mapping [24]. High quality DNA is required for RAD-seq, with modified protocols necessary for low quality DNA [26,27]. Shared restriction sites among genetically divergent clades may be lost progressively due to sequence polymorphism [26]. GBS is less complicated and more cost effective than RAD-seq [28]. Although GBS and SLAF-seq share some basic principles, some features distinguish SLAF-seq from GBS, i.e. 1) pre-experiment endonuclease digestion scheme development through in silico analysis of available sequencing database; 2) ability to create markers evenly distributed over a genome; 3) high quality markers due to stable sequence depth of each sample; 4) high density genetic map development by exploiting abundant SNPs.
In previous studies, usually one or a few common restriction endonucleases were used for genome digestion. The genome specificity was often ignored, which might lead to uneven distribution of the selected fragments in the whole genome. The SLAF-seq strategy remedies this defect [18,23,24]. Furthermore, compared with the highest-density SSR genetic map of an Upland cotton intraspecific population [29], the present map also showed some advantages due to prominent features of SLAF-seq. For example, the total number of markers in the present map is far more than the highest-density SSR map, and marker distribution in the present map is more uniform.

Genetic map
In previous studies, most cotton genetic maps were based on SSR, AFLP or RFLP. However, it is very difficult to construct a high density genetic map covering the genome sufficiently due to the low polymorphism rate of SSR and other markers which exploit length polymorphism [30][31][32]. Therefore, SNP markers are a better choice to construct high resolution maps due to their high abundance and even distribution. In the present study, we constructed a high density genetic map which harbored 6254 SNP markers with only 8 gaps larger than 10cM. The distribution of mapped SNP markers was not random throughout the genome. This uneven distribution of markers may be due to the multi-step screening of SLAF markers or selective sweeps during domestication. Good collinearity between physical and genetic maps and higher density suggests that this map is high-quality for QTL mapping.

Segregation Distortion
Segregation distortion is widely observed in plant species [33]. Many studies have reported some factors influencing this phenomenon, such as species, population type and marker type [34][35][36]. Some studies also suggest the genetic effect as the main factor controlling segregation distortion. Segregation distortion in the present study largely favored Yumian 1 alleles, in accordance with our previous studies [29,31,35,37,38] which showed that Yumian 1 plays a significant role in segregation distortion due to its complex genetic background. Previous studies suggest that segregation distortion will not affect QTL detection if the distorted marker is not significantly linked to the QTL. Large populations can help to decrease the effect of segregation distortion. Fang et al. [39] fine-mapped a QTL (qFS07.1) located in an SDR , illustrating that segregation distortion does not significantly impact QTL position. In some cases segregation distortion may contribute to higher genetic variation, which in turn benefits QTL detection [40].

QTL identification
Significant differences among the RIL population showed that the parents had diverse alleles for many fiber quality traits. Among the total of 95 detected QTL, Yumian 1 contributed 46 favorable alleles while CA3084 contributed 49. Each of the parents contributed equally for many traits. QTL diversity among phenotypically similar parent is consistent with previous reports [29,[41][42][43]. Many QTL for different traits showed overlapping regions, which suggests that these QTL may represent genes with pleiotropic effects contributing to the fiber development network and may contribute to the significant correlations among different fiber quality traits [25,35,41,[44][45][46]. A total of 53 detected QTL occurred in 14 clusters on 13 chromosomes. Yumian 1 contributed to all QTLs in Chr.12-cluster-1 and Chr.16-cluster-1. While CA3084 contributed to the all QTL in Chr.25-cluster-1. In rest of the QTL clusters Yumian 1 was the main contributor. Fiber micronaire alleles in all QTL clusters were contributed by CA3084 except for Chr.11-cluster-1 and Chr.12-cluster-1. This was the proof of expected segregation of the trait because of significant difference of this trait of this trait between the parents [32,46,47]. Some clusters had notable numbers of stable QTL. For example, Chr21-cluster-1 had 4 QTL mapped at approximately the same location. All QTL in this cluster were stable, explaining 6.9-18.1% of total phenotypic variance. These QTL may be high priorities for fine mapping studies and candidate gene identification. Concentration of the most of QTL in 14 clusters confirmed the significant positive correlations among fiber quality traits except for fiber micronaire. Similar results were reported in previous studies [29,35,[47][48][49][50]. This can be the result of presences of multiple genes in certain genomic region or pleiotropic effects of certain genes [25].
The fiber micronaire QTL in the clusters had no consistent correlation in additive effects, hence, showing insignificant phenotypic correlation between micronaire and other traits in the current study. Similar results have been reported in previous study using Yumian 1 as a parent [47].

Stable and common QTL
Environment plays an important role in phenotypic variation of fiber quality traits. Dissection of variability according to environmental and genetic factors enables us to identify 'stable' QTL, i.e., which tend to maintain their effects across multiple environments and are high-priority candidates for further studies. In the present study, 55 of 95 detected QTL were found in two or more environments.
We compared the detected QTL with CottonQTLdb (release 2.2, February 01, 2017) [4,25] based on the physical position of the nearest DNA markers. In total, 59 QTL (Additional file 4: Table S4) corresponded closely with the physical positions on the TM-1 reference genome of previously reported QTL [13], while the rest were newly found in this study. We also compared our QTL results with recently published GWAS related to cotton fiber traits [51][52][53], comparing the physical position of our SNP markers linked with QTL to those of GWAS-identified markers (Additional file 4: Table  S4). Only 5 QTL corresponded closely with the physical positions of GWAS markers. These common and stable QTL could be priorities for fine mapping, candidate gene identification and MAS to improve cotton fiber quality. Unstable QTL can be the result of imprecise coarse mapping, or unusual environmental conditions.

Conclusion
In this study, we constructed a high-density genetic map by using SLAF-seq, which covers 3141.7 cM of recombinational length. The average distance between markers was 0.5 cM. A total of 13 QTL clusters containing 29 stable QTL were identified. Some of the stable QTL (qFL06.1, qFM07.1 , qFL16.1, qFL21.1) could be valuable for fine mapping and candidate gene identification and can be used in further breeding programs.

Mapping Population
G. hirsutum cultivars Yumian 1 and CA3084 were crossed to produce a segregating population at Southwest University, Chongqing, China, in the summer of 2010 [38]. Yumian 1 is a Chinese cultivar with high fiber strength, bred by our lab [35] while CA3084 is an American breeding line with high fiber quality (provided by Dr. John R. Gannaway at Texas A & M University Research Experimental Center, Lubbock, TX). Single-seed descent was used from the F 2:3 to F 2:7 generations to produce a RIL population with 180 lines. The RIL a b Fig. 8 Colinearity between the Yumian 1 × CA3084 genetic map and physical map. a Colinearity for the A t 581 subgenome. b Colinearity for the D t subgenome

DNA extractions, SLAF library construction and highthroughput sequencing
In the summer of 2015, the leaves of parents and 87 lines were sampled for DNA extraction. Total genomic DNA was extracted according to a modified CTAB method [54]. To assess the number of markers generated by the combination of different endonucleases, an [18,55] in silico analysis was carried out using a G.  hirsutum reference genome [13]. Based on this analysis, two endonucleases HaeIII and SspI (New England Biolabs, USA) were selected to digest the genome of mapping population. The SLAF-seq strategy with some modifications was followed for library construction.

Sequencing data grouping and genotyping
We identified and genotyped SLAF markers according to the procedures described by Sun et al [18] and Jiang et al [55]. Briefly, low-quality reads (< 20e) were filtered out and the remaining high quality reads were sorted to each progeny according to duplex barcode sequences. Then the terminal 5-bp of each high-quality read was trimmed off. Finally, 80 bp paired-end clean reads obtained from the sample were mapped to the G. hirsutum genome sequence [13] using BWA software (V 1.5) with default parameters [56]. Sequences mapped on the same position with over 95 % identity were defined as one SLAF locus [55]. SNP loci in each SLAF locus were then detected between parents using the software GATK (V 3.1.1) with default parameters [57]. All polymorphic SLAF loci with a sequence depth of more than 10 fold in parents were genotyped and the individuals were genotyped based on similarity to the parents. Genotype scoring was conducted to approve the genotyping quality by using a Bayesian approach as described [18]. Markers with more than 15% missing data were removed.

Linkage map construction
SLAF markers were arranged in specific order and genotyping errors were corrected by using the HighMap strategy and SMOOTH algorithm [58,59]. The k-nearest neighbor algorithm was used to deal with missing genotypes [60]. Then, the genetic linkage map was constructed by Joinmap v4.0 [61]. The genetic map was constructed according to the maximum likelihood method [62]. The Kosambi mapping function was applied to estimate the genetic map distances in centimorgan (cM) [63]. Markers showing significant deviation from Mendelian expectations for segregation (p <0.05) were considered distorted. Regions containing more than three adjacent loci which showed significant segregation distortion were denoted 'segregation distortion region(s)' (SDR) [64].

QTL analysis
MapQTL 6.0 [65] was used to identify QTL. A threshold value of log of odds ratio (LOD) ≥2.0 was used to declare suggestive QTL [66]. Positive additive effect means that favorable alleles were contributed by Yumian 1 while negative additive effect means that favorable alleles were contributed by CA3084. QTL identified in two or more environments were considered to be potential stable QTL. MapChart 2.2 [67] was used to graphically represent the genetic map and QTL bars. QTL were named starting with 'q' , followed by a trait abbreviation and the chromosome number, then by the number of QTL affecting the trait on the same chromosome (e.g., qFE06.1 for the first fiber elongation QTL on chromosome 6). QTL for the same trait across different environments were declared in the same QTL region when their confidence intervals (CI) overlapped.

Additional files
Additional file 1: Table S1. Genome coverage of genetic map (XLSX 11 kb) Additional file 2: Table S2. QTL for fiber quality in RIL population in three environments (XLSX 20 kb) Additional file 3: Table S3. Details of QTL Clusters for fiber quality traits (XLSX 14 kb) Additional file 4: Table S4.