Genome survey and high-resolution backcross genetic linkage map construction of the ridgetail white prawn Exopalaemon carinicauda applications to QTL mapping of growth traits

Background High-resolution genetic linkage map is critical for QTL mapping, genome sequence assembly and marker-assisted selection in aquaculture species. The ridgetail white prawn Exopalaemon carinicauda is one of the most economic shrimp species naturally distributed in the coasts of eastern China and western Korea. However, quite limited genomics and genetics information have been exploited for genetic improvement of economic traits in this species. Results In the present study, we conducted genome survey and constructed high-resolution genetic linkage maps of the ridgetail white prawn with reciprocal-cross mapping family genotyped using next-generation sequencing approaches. The estimated genome size was 9.33 Gb with a heterozygosity of 0.26% and a repeat sequence ratio of 76.62%. 65,772 protein-coding genes were identified by genome annotation. A total of 10,384 SNPs were used to high-throughput genotyping and assigned to 45 linkage groups (LGs) from reciprocal backcross families of E. carinicauda, and the average marker distances were 0.73 cM and 0.55 cM, respectively. Based on the high-resolution linkage map, twenty-three QTLs related to five growth traits were detected. All QTLs could explain 8.8–15.7% of the total growth-traits variation. Conclusions The genome size of E. carinicauda was estimated more accurately by genome survey analysis, which revealed basic genomic architecture. The first high-resolution backcross genetic linkage map and QTLs related to growth traits will provide important information for QTL fine mapping, genome assembly and genetic improvement of E. carinicauda and other palaemon shrimps. Electronic supplementary material The online version of this article (10.1186/s12864-019-5981-x) contains supplementary material, which is available to authorized users.


Background
The ridgetail white prawn Exopalaemon carinicauda is an important economic shrimp species naturally distributed in the coasts of eastern China and western Korea [1]. Due to multiple advantages of fast growth, good reproductive performance and strong adaptability to the environment, the aquaculture scale of the ridgetail white prawn expanded rapidly and contributed to one third of the total output of the polyculture ponds in eastern China [2]. The annual aquaculture area and yield of E. carinicauda in China are about 20, 000 ha and 45, 000 tons, respectively. But the aquaculture of E. carinicauda mainly relyed on natural seeds and wild spawning broodstock, which resulted in unclear genetic background and germplasm degradation, seriously affecting industrial development. Conventional breeding methods based on selection of individuals on phenotypic values. The approaches of using molecular markers for selection, or by both markers and other phenotypic data, named marker-assisted selection (MAS) [3] was first applied when the restriction fragment length polymorphisms (RFLP) were detected in economic species [4]. MAS is especially helpful for traits that are difficult to measure, exhibit low heritability and/or are expressed late in development process. Implementation of MAS requires DNA markers that are tightly linked to quantitative trait loci (QTL) for economic traits based on genetic linkage maps [5]. Therefore, MAS approaches is urgently required for sustainable development in culture of E. carinicauda.
High-resolution genetic linkage map is necessary for genome assembly, as well as for mapping QTL of economic traits [6]. In the past decade, genetic linkage map had been constructed using various molecular markers in multiple aquaculture species [5,[7][8][9][10], including Tilapia, Rainbow trout, Atlantic salmon, Asian seabass, Half tongue sole, Channel catfish, Common carp, Japanese flounder and many others. Linkage map provide essential tools for QTL localization and promote MAS process in multiple aquaculture species. For example, growth traits have been mapped in Rainbow trout [11], Asian seabass [12], Salmons [13]; Vibrio anguillarum and lymphocystis disease-resistant traits in Japanese flounder have been successfully mapped and applied to marker-assistant breeding [9,14]; sex-determination traits have been localized by QTL mapping approaches in Half tongue sole [7], Tilapia [15] and Atlantic halibut [16]. Recently, High-resolution genetic linkage maps have been constructed in crustaceans, such as Kuruma prawn [17], Chinese shrimp [18], Black tiger shrimp [19], Pacific white shrimp [20], Swimming crab [21] and Chinese mitten crab [22]. In contrast, a high-resolution linkage map is urgently needed for genomic and genetic researches in the ridgetail white prawn.
Next-Generation Sequencing (NGS) technology and associated genotyping advancements have become widely used to implement de novo genome sequencing and highresolution linkage map construction in non-model species. In recent years, Specific-Length Amplified Fragment sequencing (SLAF-seq) has been successfully applied in large-scale de novo SNP discovery and genotyping in various species [23]. As a reliable, high throughput and double-barcode genotyping platform, SLAF-seq is fast, accurate and cost-effective for high-resolution linkage maps construction, such as sesame [24], soybean [25,26] and cucumber [27]. For aquaculture species, a high-resolution linkage map, including 5,885 markers, was constructed in common carp using SLAF-seq with marker intervals of 0.68 cM on average [23]. High-resolution genetic map of the Pacific white shrimp was also constructed using this method with an average marker distance of 0.7 cM [20].
The main purposes of constructing high-resolution linkage map is to conduct the mapping of QTLs related to economic traits. Based on the high-resolution linkage map, several QTLs related to body length and body weight of shrimps were detected [20,22].
Constructing a genetic linkage map requires reference populations/families where molecular markers segregate [28]. Because common crustaceans reproduced one generation per year, the most of reference populations of shrimps and crabs were F 1 and F 2 families [17][18][19][20][21][22]. Backcross populations, derived by crossing the F 1 hybrid to one of their parents, are usually used to construct genetic linkage map and QTL mapping due to direct reflection of separation of F 1 gametes and higher mapping efficiency than F 2 population [11,29]. In the present study, we constructed reciprocal-cross backcross population of E. carinicauda for the first time based on the characteristics of multiple reproduction annually. We then conducted genome survey analysis and construction of a high-resolution linkage map to investigate the genomic and genetic architecture of E. carinicauda. Based on the high-resolution linkage map, QTL mapping was also conducted to detect markers related to growth traits.

Genome survey of E. carinicauda
We constructed six 270 bp DNA libraries with short paired-end and sequenced them by Illumina Hiseq 4000 platform. 294.46 Gb high-quality reads with 40.01% GC content were obtained in total after sequencing, which covered approximately 31.57-fold the genome size of E. carinicauda. Sequencing raw data have been submitted to the Nucleotide database of NCBI with the accession number QUOF000000000.1. K-mer curve was obtained based on the frequencies of 23-mers (nucleotide strings with a length of 23 bp) within sequencing data ( Fig. 1). K-mer analysis revealed that there was a peak at the Kmer depth of 24. The genome size of E. carinicauda was estimated as 9.33 Gb with remarkably low heterozygosity (0.26%), which were possessed by approximately 76.62% repeat sequences ( Table 1). The total length of assembled scaffolds was 9.18 Gb, which covered approximately 98.40% of E. carinicauda genome (Table 2). Finally, we predicted 65,772 genes (Additional file 2: Table S2) based on sequencing contigs/scaffold.

SLAF-seq library construction and sequencing
A total of 204 SLAF-seq libraries from 4 parents and 200 backcross offsprings were constructed and sequenced on 4 lanes of Illumina HiSeq 2500 platform to generate 4.98 billion raw reads. After data trimming, 100.60 Gb of sequencing data, were individually divided into SLAF tags according to their MIDs. Finally, female and male parental data sets in cross BC 1

SNP marker genotyping and genetic linkage map
After SLAF sequencing, 497888010 paired-end reads were generated for the mapping family (four parents and 200 progenies). A total of 301,314 polymorphic SLAF markers were identified from 1433342 SLAF markers, of which 10384 SNP markers could be successfully genotyped in both parents and offspring (Table 5). SNP   markers with the seven segregation patterns (aa × bb, ab × cc, cc × ab, hk × hk, ef × eg, lm × ll, nn × np) could be used in linkage map construction for the BC 1 family, of which aa × bb was one of the major pattern and most efficient markers for map construction (Fig. 2). The average read depth in backcross family of genotyped markers were 6.52, 15.36 and 22.75 in the offspring, male and female parents, respectively (Additional file 3: Table S3).
And the average read depth in reverse backcross family of genotyped markers were 7.09, 16.90 and 28.51 in the offspring, male and female parents, respectively (Additional file 4: Table S4). High-resolution SLAF-based SNP genetic maps of E. carinicauda based on backcross populations were first constructed using the pseudo-testcross strategy. The BC 1 family genetic linkage map contained 45 linkage groups with 4050 markers and the reverse cross BC 1 family genetic linkage map contained 45 linkage groups with 6,334 markers ( Table 6, Fig. 3, Fig. 4, Additional file 5: Table S5 and Additional file 6: Table S6). The total map distance of two maps were 2939.27 cM with an average inter-locus distance of 0.73 cM and 3460.28 cM with an average interlocus distance of 0.55 cM respectively, which covered 97.65 and 98.44% respectively of the genome based on the total length of the genome.

Genetic linkage map integration
Among the 4050 markers in the backcross map, 2733 markers could align to 1468 genomic scaffolds/contigs with high confidence (Additional file 7: Table S7). And 91 markers were anchored to unigenes of transcriptome (Additional file 8: Table S8). In the reverse backcross map, 4,297 markers could align to 2403 genomic scaffolds/contigs with high confidence (Additional file 9: Table S9). And 156 markers were anchored to unigenes of transcriptome (Additional file 10: Table S10). Based on blast information, there were some markers of genetic linkage maps, genomic scaffolds/contigs and unigenes of transcriptome that can be integrated together (Fig. 5, Fig. 6). 7,030 markers could be aligned to the genomic scaffolds/contigs or transcriptome unigenes, of which 846 markers could be blasted via with the public databases.

QTL mapping of growth-related traits
The high-resolution genetic linkage map was used in the present study for QTL mapping of growth-related traits using MapQTL 4.0 software. In the map of Backcross family, 7 QTLs were detected for all the five growth-related traits, which were distributed on LG1, LG2 and LG20 (Table 7 and Fig. 7). These QTLs with LOD values of 2.66-3.62 contributed to PVE of 11.5-15.7%. Among them, qBW-1 located at 49.652-79.283 cM of LG2 with the highest LOD score of 3.62, and correspondingly had the higher PVE value of 15.4%. Some QTL intervals were clustered together on the same respective linkage groups (LGs). One major cluster containing three QTLs (qBL-2, qCH-1 and qBW-1) was detected between the positions of 47.024-79.283 cM on LG2. On LG20, another cluster located within the region (0-1.826 cM) also consisted of two QTLs (qCW-1 and qBW-2). A total of 16 markers were located in the QTL intervals, among which, 8 markers (50%) distributed on LG2. In the map of reverse Backcross family, a total of 16 QTLs were detected for all the five growth-related traits, which were distributed on LG1, LG10, LG13, LG17, LG19, LG24 and LG38 (Table 7 and Fig. 8). The QTLs with LOD values of 2.0-2.74 contributed to PVE of 8.8-11.9%. Of which, one major cluster containing four QTLs (qBL-2, qCW-3,    [20] and the N. denticulate genome [32]. A total of 31371476 contigs with N50 length of 422 bp was obtained, which was higher than that of previous study [30]. And a total of 65772 unigenes with mean length of 1386 bp were obtained, which was relatively smaller than those of decapod shrimps [30]. The reciprocal-cross backcross population of E. carinicauda were firstly obtained for construction of genetic linkage map and the growth traits were analyzed. Significant correlations among five growth traits (BL, BW, CL, CW and CH) have been reported in other species, such as Atlantic salmon [35] and Zhikong Scallop [36]. Similarly with our study, five growth traits were closely related to each other, with the highest correlation coefficient (r = 0.96) between BL and BW, and relatively low correlation coefficient (r = 0.81) between CW and other traits.
High-resolution genetic linkage map is an essential tool for research of genetics and genomics, such as comparative genome analysis and QTL fine mapping [37]. SNPs are especially suitable for genetic linkage map construction, which representing the most abundant and stable form of genetic variation in most genomes [38]. In this study, 301314 polymorphic SLAF tags were developed from 4 E. carinicauda parents and 200 offspring individuals. 4050 SNPs distributed throughout the  The total length of the backcross map was 2939.27 cM with an average distance between adjacent markers of 0.73 cM, and the total length of the reverse backcross map was 3460.28 cM with an average distance between adjacent markers of 0.55 cM. The mapping resolution of reciprocal backcross population exceeded those of P. monodon (0.9 cM) [19], and similar to L. vannamei (0.7 cM) [20], P. trituberculatus (0.51 cM) [34] and E. sinensis (0.49 cM) [22]. Additionally, the marker developed in our study contained a 200 bp terminal sequences, which provided enough information for trait-related QTLs verfication and comparative genome analysis.
To identify potential growth-related genes, we compared the detected QTLs with the scaffold assembly from genome survey and transcriptome of E. carinicauda [39]. Based on the integrated map, 91 markers could be aligned to genome assemblies or transcript sequences in the backcross map, and 13 of 91 markers could annotated candidate genes via blast in public databases. Meanwhile, 156 markers could be aligned to genome assemblies or transcript sequences in the reverse backcross map, and 25 of 156 markers could annotated candidate genes via blast in public databases. These aligned genes were related to many important physiological processes and functions, such as protein kinase activity [40], regulation of apoptotic process [41] and sequence-specific DNA binding transcription factor activity [42]. These annotated genes provide candidate information for analysis of economic traits in E. carinicauda. QTL mapping is a very effective strategy to locate traitrelated genes for MAS in genetic breeding of aquaculture species [36]. The high-resolution genetic map constructed in this study provided important tools for performing QTL fine mapping for economical traits of E. carinicauda. In the present study, 7 QTLs associated with growth traits were found to be distributed on three LGs (LG1, LG2 and LG20) in the backcross linkage map. And 16 QTLs associated with growth traits were found to be distributed on six LGs (LG1, LG10, LG13, LG17, LG19, LG24 and LG38) in the reverse backcross linkage map. Interestingly, most of the QTLs were concentrated within a narrow region (cluster) on the LGs. In the backcross linkage map, three QTLs were clustered together (47.024-79.283 cM) on LG2 corresponding to assembly contig85361652 (2.2 kb) and scaffold1051673 (6.4 kb). And two QTLs were found in another cluster (0-1.826 cM) on LG20, corresponding to scaffold1658074 (1.8 kb). Meanwhile, in the reverse backcross map, three QTLs were clustered together (60.091-60.576 cM) on LG17 corresponding to assembly contig84295559 (1.3 kb), scaffold234386 (1.2 kb) and scaffold037331 (3.7 kb). Additionally, three QTLs were also clustered together (90.716-91.096 cM) on LG19, indicating that the growth traits may be controlled by the same genes. The quite small genetic and physical distances among QTLs in specific clusters suggested that the individual clusters might be highly effective QTLs. The results were in accordance with the previous results in P. trituberculatus [34]. The demonstration of one trait controlled by a few significant QTLs with higher PVE values was consistent with the characteristics of growth traits controlled by several major genes with higher heritability [43,44].

Conclusions
In conclusion, genome size of E. carinicauda was estimated as 9.33 Gb with remarkably low heterozygosity. Large-scale SNPs identified and genotyped via SLAF-seq technology were used to construct a high-resolution genetic map of the ridgetail white prawn. The developed genetic map is the most comprehensive genetic map to date for this species. Based on SNP mapping analysis, we identified 23 positive QTLs for growth traits that will be helpful to clarifying the genetic mechanism of growth regulation in the ridgetail white prawn. The obtained SNPs and high-resolution linkage map, coupled with genome survey and transcriptome established an important platform for QTL mapping and also provided an extremely useful resource for future molecular breeding efforts such as genome selection.

Genome survey analysis
DNA samples of E. carinicauda was extracted from the muscles of parent of backcross family for sequencing. Six paired-end DNA libraries with an insert size of 270 bp were constructed following the Illumina operating protocols. The paired-end sequencing was performed on the Illumina Hiseq 4000 platform (Illumina, Inc.; San Diego, CA, USA). The raw data were trimmed to filter out lowquality reads and adapter contaminates using NGS QC Toolkit [45]. De novo assembly was performed to obtain contigs using SOAP denovo software [46] (https://sourceforge.net/projects/soapdenovo2/files/SOAPdenovo2/) with the following parameters: the k value in K-mer was set at 23, unsolve repeats by reads fill gaps in scaffolds.

SLAF library construction and high-throughput sequencing
Based on the size of E. carinicauda genome, content of GC and in-silico analysis, endonuclease Hae III were used to digest the genome. SLAF library were constructed as described by Sun et al. [23] with minor modifications. For the BC 1 population, endonuclease Hae III (New England Biolabs, NEB, USA) were chosen to digest the genomic DNA of E. carinicauda. Subsequently, a single nucleotide (A) overhang was attached to the digested fragments by using Klenow Fragment (3′ → 5′ exo − ) (NEB, USA) and dATP at 37°C. PAGE-purified duplex tag-labelled sequencing adapters (Life Technologies, USA) were then ligated to the A-tailed fragments Fig. 6 The integrated map of genome, reverse BC 1 map and transcriptome in E. carinicauda. Outer ring, the linkage group; Intermediate ring, contigs or scaffolds of genome assembly aligned with markers from the linkage map; Inner ring, unigene sequences of transcriptome aligned with scaffold/contig sequences To avoid false positive reads, the sequence error rate was estimated using the data of Oryza sativa as a control. The ratio of high quality reads with quality scores greater than Q30 (indicating a 1% chance of an error, and thus 99% confidence) in the raw reads and GC amounts were calculated for quality control.

SNP calling and genotyping
After filtering out the low-quality reads (quality score < 30e), the remaining reads were sorted to each progeny according to duplex barcode sequences. Then each of the high-quality read was trimmed off 5-bp terminal position, sequences clustered by similarity above 90% were defined as one SLAF locus [47]. Genome Analysis Toolkit (GATK) [48] and Sequence Alignment/Map tools (SAMtools) [43] was used for calling the SNPs. Local realignment was performed to avoid false alignments near InDels. The "Unified Genotyper" module of GATK was used for variant calling. Both SAMtools and GATK tools were used to identify SNPs, and their intersection was merged as the candidate SNP dataset.
Only biallelic SNPs were identified as the final SNP dataset. Polymorphic SNP markers were converted to four segregation patterns (hk × hk, lm × ll, nn × np and aa × bb). The reciprocal-cross backcross families of E. carinicauda were obtained from the crosses between parents and F 1 individuals with genotype aa or bb. Thus, only the SNPs whose segregation patterns were aa × bb were used to construct the genetic linkage map. The average sequencing depths of SNPs were more than 18-fold in the parents and greater than 6-fold in the progeny, respectively. And a progeny contained more than 70% of the SNPs in both parents, i.e., 70% integrity of SNPs in individuals. A χ 2 test was implemented for each SNP with a null hypothesis that the two alleles at a locus segregated with 1:1 ratio in our BC population. All SNPs that significantly deviated from this ratio (P < 0.001) were eliminited from the SNP dataset.

Genetic linkage map construction
To ensure the quality of the genetic map, HighMap strategy was utilized to sort the SNP markers and correct genotyping errors within LGs [44]. Marker loci were distributed primarily into LGs by the modified logarithm of odds (MLOD) scores > 5.0 and a maximum recombination fraction of 0.4. MSTmap algorithms [49] and SMOOTH algorithms [50] were chosen to order the SNP markers and correct genotyping errors, respectively. The LGs were constructed as follows: Primary marker orders were first obtained by their location on the chromosomes according to the relationship of the ordered markers, and genotyping errors or deletions were corrected using the SMOOTH algorithm; then, MSTmap was used to sort the map and SMOOTH was reused to correct the newly ordered genotypes. The processes were conducted repetitively to ensure the accuracy of marker