Comparative transcriptomics uncovers alternative splicing and molecular marker development in radish (Raphanus sativus L.)

Background Alternative splicing (AS) plays important roles in gene expression and proteome diversity. Single nucleotide polymorphism (SNP) and insertion/deletion (InDel) are abundant polymorphisms and co-dominant inheritance markers, which have been widely used in germplasm identification, genetic mapping and marker-assisted selection in plants. So far, however, little information is available on utilization of AS events and development of SNP and InDel markers from transcriptome in radish. Results In this study, three radish transcriptome datasets were collected and aligned to the reference radish genome. A total of 56,530 AS events were identified from three radish genotypes with intron retention (IR) being the most frequent AS type, which accounted for 59.4% of the total expressed genes in radish. In all, 22,412 SNPs and 9436 InDels were identified with an average frequency of 1 SNP/17.9 kb and 1 InDel/42.5 kb, respectively. A total of 43,680 potential SSRs were identified in 31,604 assembled unigenes with a density of 1 SSR/2.5 kb. The ratio of SNPs with nonsynonymous/synonymous mutations was 1.05:1. Moreover, 35 SNPs and 200 InDels were randomly selected and validated by Sanger sequencing, 83.9% of the SNPs and 70% of the InDels exhibited polymorphism among these three genotypes. In addition, the 15 SNPs and 125 InDels were found to be unevenly distributed on 9 linkage groups. Furthermore, 40 informative InDel markers were successfully used for the genetic diversity analysis on 32 radish accessions. Conclusions These results would not only provide new insights into transcriptome complexity and AS regulation, but also furnish large amount of molecular marker resources for germplasm identification, genetic mapping and further genetic improvement of radish in breeding programs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3874-4) contains supplementary material, which is available to authorized users.


Background
The advent of next-generation sequencing (NGS) has greatly advanced our opportunities to obtain abundant sequences data by resequencing plant genome and transcriptome in greater depth [1,2]. RNA-Seq technology had been used to rapidly isolate and identify alternative splicing (AS) [3,4], novel transcripts [5] and structural variations [6]. Surprisingly, the transcriptome data offers rich resources to identify and develop large number of single nucleotide polymorphism (SNP) and insertion/deletion (InDel) markers based on comparing transcriptome sequences to the reference genome sequences [7,8]. This approach offers a direct, reliable and high-efficient strategy for identification and development of SNPs and InDels using data mining by bioinformatic methods. Recently, using comparative transcriptomic analysis, a large number of SNP and InDel markers have been identified in many plant species including Arabidopsis [7], Brassica napus [9], rice [10] and tomato [11]. However, a limited number of SNP and InDel markers have been developed in transcriptome data of radish [12].
Molecular markers related to phenotypes provide us with valuable tools for elucidation of genetic variations. In the past two decades, several molecular marker systems, such as restriction fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers had been widely employed in genetic diversity analysis, and genetic mapping studies in many plant species [10,[13][14][15]. Recently, SNP and InDel markers has increasingly being received attention owing to their high frequency, wide distribution, co-dominant inheritance and abundant DNA polymorphisms [16,17]. Previous studies indicated that the SNPs and InDels within or near the coding sequences could be influenced by certain phenotype, transcription factor binding and alternative splicing [18,19]. SNP and InDel markers have been extensively utilized in several crop improvement programs including genetic diversity analysis, quantitative trait locus (QTL) mapping and genome-wide association analyses (GWAS) [20,21].
Alternative splicing, generating multiple transcript variants from a single pre-mRNA and increasing the transcriptome complexity, is one of the most important regulatory mechanisms for gene expression and functional diversity of proteome [22,23]. In plants, it has been estimated that 60% of Arabidopsis genes [24], 33% of rice genes [23], 40% of soybean genes [25] and 59.3% of tomato genes [26] could be alternatively spliced. Exon-skipping (ES) events were the predominant AS type in animals, while intron retention (IR) was more frequent in plants [27]. Moreover, AS played crucial roles in regulation of biological functions, particularly in developmental processes and stress responses [24]. Genome-wide analysis of AS events had been surveyed in tomato, which showed that the splice variants of multi-exon genes in the seedlings and flowers were lower than those in early growth fruits [26]. In six teosinte and 10 maize transcriptomes, a large number of genes were induced in AS complexity during domestication from teosinte to maize [3]. These results provided useful information for identification and characterization of AS in other plant species.
Radish (Raphanus sativus L., 2n = 2× = 18) is an economically annual or biennial root vegetable crop of the Brassicaceae family. Although a set of SNP and/or InDel markers had been developed from the public radish EST database or whole genome resequencing data [28,29], the number of SNP and InDel markers are far from saturated for linkage mapping and association study in radish. Recently, the release of radish reference genome [30] and availability of de novo transcriptome datasets [31][32][33] generated a great number of transcriptome sequences, which provided useful sequence information for identification and development of SNP, InDel and SSR markers in radish.
To better systematically characterize the overall transcriptional landscape in radish, three available radish transcriptomes data were collected in this study. The alignment of the short reads to the radish genome dataset were conducted for identification of AS events, novel mRNA transcripts as well as SNP, InDel and SSR markers among these three radish transcriptomes. Additionally, the frequency, distribution and polymorphic of SNPs and InDels were estimated and characterized in three radish genotypes. The newly developed InDel markers were applied to investigate the genetic diversity among 32 accessions. The outcomes of this study could enhance our understanding of radish gene structure and facilitate marker-assisted selection and genetic improvement of some important horticulture traits in radish breeding programs.

Identification of AS events and discovery of novel mRNA transcripts
In all, 170,086 unique splice junctions sites were identified using TopHat program [34] (Additional file 1: Table  S2); among which 131,106 (77.1%) were annotated in the radish genome and 38,980 (22.9%) were novel splice junctions. As shown in Additional file 1: Table S2, the 'NAU-LB' had the largest number of junctions, followed by 'NAU-YH' and 'NAU-RG'. Among the dinucleotides at the intron border, a total of 218,574 GT-AG splice sites (95.9%), 163 GC-AG splice sites (0.1%), and 9256 splice sites of other types (4.0%) were identified (Fig. 2a).
To analyze the AS events in radish, the ASTALA-VISTA software was employed to extract the AS events and classify different types of alternative splicing [35]. In total, 56,530 AS events were identified in three transcriptome datasets. These AS events were distributed in 36,546 genes, which accounted for 59.4% of the total radish genes, with an average of 1.68 AS events per gene. In this study, 25.0% of the radish AS-related genes undergo multiple AS events resulting in a variety of transcripts from a single gene (Additional file 1: Figure  S1). The splice junctions of these genes were used to identify and categorize the AS events into five main types of AS events as intron retention (IR), exon skipping (ES), alternative 3′ splice site donor (A3SS), alternative 5′ splice site acceptor (A5SS) and other events. Among the different AS types, IR events were the predominant AS type (27,254,48.2%), followed by other (19,674,34.8%), A3SS (4228, 7.5%), and A5SS (3995, 7.1%) events (Fig. 2b, Additional file 2). To verify the accuracy of AS events, a reverse transcription-PCR (RT-PCR) analysis was performed on the leaf and root of 'NAU-RG' , 'NAU-LB' and 'NAU-YH' , respectively. Nine AS events from nine genes were exhibited to produce splice variants, which were consistent with the RNA-Seq data results. The RT-PCR results indicated that the expression of AS events was tissue-specific, for example, Rsa1.0_01846.1_g00001.1 was expressed in the root of 'NAU-LB' , while not expressed in the leaf (Fig. 2c, Additional file 1: Table S3).
To detect the differential splicing events, the AS events consisted of A3SS, A5SS, IR, ES and mutually exclusive exons (MXE) were analyzed using the Multivariate Analysis of Transcript Splicing (MATS) program [36]. The maximum number of differential AS (1360) was identified in 'NAU-LB' vs 'NAU-RG' , of which RI (459 events in 422 genes) possessed the predominant events (Fig. 3a, Additional file 3). Gene ontology (GO) enrichment analysis showed that a total of 336 (79.6%) differentially spliced genes were assigned into 43 GO terms in the comparison between 'NAU-LB' and 'NAU-RG' (Fig. 3b). In terms of three main categories, 'cellular process' , 'cell' and 'binding' represented the most abundant GO terms in biological process, cellular component and molecular function, respectively. In 'NAU-LB' and 'NAU-RG' comparison, the Rsa1.0_01282.1_g00001.1 with IR event was selected and validated (Fig. 3c). A novel isoform was found in 'NAU-RG' , but not in 'NAU-LB' by RT-PCR (Fig. 3d). In this study, the novel and unknown genes were defined as novel transcripts, a total of 10,657 novel transcripts were detected based on the transcriptome data (Additional file 4). The newly

Identification and characterization of SNPs, InDels and SSRs in radish
To systematically identify genetic variations, three radish transcriptomes were utilized for SNPs and InDels discovery using Genome Analysis Toolkit [37]. Overall, 22,412 putative SNPs were detected among these three genotypes with an average frequency of 1 SNP/17.9 kb in comparison with the reference sequence (Additional file 5). Of these, 19,013, 20,547 and 20,551 SNPs were identified in 'NAU-LB' , 'NAU-RG' and 'NAU-YH' , respectively. A total of 16,997 SNPs were shared by all three genotypes, while 439, 571 and 385 SNPs were specific for 'NAU-YH' , 'NAU-LB' and 'NAU-RG' , respectively (Fig.  4a). The identified SNPs were distributed on the 8248 scaffolds, and the majority of scaffolds only had one SNP (Fig. 4b).
SNPs were further classified based on their zygosity and nucleotide substitution. Between these three pairwise comparisons, 14,583 SNPs (13,241 homozygous and 1342 biallelic SNPs) detected between 'NAU-RG' and 'NAU-YH' were more than those in the other two transcriptome comparisons (Table 2). A total of 3838 (17.1%) and 3209 (14.3%) homozygous SNPs were presented in 'NAU-LB' vs 'NAU-RG' comparison and 'NAU-LB' vs 'NAU-YH' comparison, respectively. There were 1342 homozygous SNPs between 'NAU-RG' vs 'NAU-YH' comparison, which comprised only 6.0% of the total SNPs, indicating that the self-incompatible 'NAU-YH' presented a larger degree of heterozygosity than other two accessions (Table 2). Moreover, the frequency of transitions (58.5%) was higher than transversions (41.5%) with transition to transversion ratio (Ts/ Tv) of 1.41:1 ( Table 2). Across all pairwise comparisons, the Ts/Tv of bi-allelic SNPs was slightly more than the homozygous SNPs, the overall Ts/Tv being relatively stable between genotypes with a range from 1.29 to 1.4.
To identify the potential SSRs, a total of 104,801 assembled unigenes were obtained from three transcriptome datasets. In total, 43,680 potential SSRs were identified in 31,604 unigenes with a density of 1 SSR/ 2.5 kb, among which 8009 (25.0%) possessed more than one SSR (Additional file 7). Among the identified SSRs, the mono-nucleotide repeats were the most abundant (21,599, 49.4%), followed by tri-(10,529, 25.6%), di-(11,164, 24.1%), and tetra-(287, 0.7%) nucleotide repeats, respectively (Fig. 5). For the mono-nucleotide motifs, A/T was the primary motif. AG/CT and GA/TC were the most common motifs among the di-nucleotide motifs (Additional file 1: Table S5). The SSRs identified in this study would facilitate genetic linkage map construction and marker-assisted selection (MAS) in radish breeding programs.

Analysis of genetic variation between three genotypes
To investigate the distributions of SNPs and InDels in genic and intergenic regions, the location of genetic variation was compared with the annotation of the reference gene models. Among the 22,412 SNPs, 11,236 (50.1%) SNPs were located in intergenic regions, 5483 (24.5%) in intron, and 5693 (25.4%) in coding sequence (CDS); and the SNP frequencies were 1 SNP/35.8, 73.3 and 70.6 kb, respectively (Table 3). For SNPs in the CDS regions, the maximum number of 4895 was detected in 'NAU-RG' and 'NAU-YH' comparison, with a frequency of 1 SNP/ 82.1 kb; the lowest number of 4536 SNPs was observed in 'NAU-LB' vs 'NAU-RG' comparison with a frequency of 1 SNP/88.6 kb. Moreover, 1413 (15%) InDels were located in intergenic regions, 1135 (12%) in intron, 6888 In addition, all the detected SNPs in the coding region were categorized as synonymous or non-synonymous. A set of 5318 coding SNPs were obtained, of which 2592 (48.74%) were synonymous SNPs, whereas 2726 (51.26%) were nonsynonymous SNPs and caused amino acid changes in protein sequences. The ratio of nonsynonymous vs. synonymous SNPs was 1.05:1. The majority of synonymous SNPs were the leucine (L) (13.35%), while methionine (M) and tryptophan (W) were absent in synonymous changes. Serine (S) (9.21%) and arginine (R) (8%) were the top two types of the nonsynonymous SNPs change. For the nonsynonymous mutation, the alanine (A) (7.3%) was mutated easier than the original  Table 2 SNPs zygosity and corresponding transition to transversion ratio (Ts/Tv) in pairwise transcriptome comparisons amino acid loci (6.09%) (Fig. 6). Moreover, the predominant proteins correlation to chromatin binding, signal transducer activity and motor activity had more variations as their encoding genes had more nonsynonymous SNPs. Among these genes containing synonymous SNPs, the predominant molecular functions of proteins were involved in structural molecular activity, DNA binding, kinase, and transporter activity (Additional file 1: Figure S2).

Validation of markers and linkage group (LG) locations in radish
To evaluate the accuracy and efficiency of putative polymorphisms, a total of 17 DNA sequences spanning 35 SNPs in 'NAU-LB' , 'NAU-YH' and 'NAU-RG' were randomly selected for PCR amplification and Sanger sequencing (Additional file 8). In total, 15 primer pairs could amplify expected sizes, containing a total of 31 SNP loci, of which 26 (83.9%) SNPs revealed identifiable polymorphisms among three genotypes. A total of 180 candidate InDels with lengths greater than or equal to 3 were designed for PCR validation on polyacrylamide gels electrophoresis (PAGE) gels, of which 131 primer pairs showed distinguishable polymorphisms among three genotypes ( Fig. 7a-b). However, the other 25 primer pairs showed monomorphism, 14 primer pairs exhibited unexpected size and 10 primer pairs could not successfully be amplified in three genotypes. As for the InDels with ≤2 bp could not be unambiguously discriminated on PAGE gels, a total of 20 InDels with lengths of less than 3 bp were randomly selected for Sanger sequencing (Additional file 1: Table S6), from which 9 loci were polymorphic among three genotypes. Accordingly, 140 of the 200    Table S1 primer pairs (70%) revealed identifiable polymorphisms among three genotypes.
To investigate the linkage group (LG) locations of SNP and InDel markers developed in this study, the scaffolds contained SNPs and InDels were separately anchored onto the approximate location of linkage groups. For the validated SNPs and InDels, 15 SNPs and 125 InDels were aligned on radish linkage groups. A total of 140 markers were found to be unevenly distributed on 9 linkage groups (R1-R9) (Additional file 1: Figure S3). Among the 9 linkage groups, R2 contained the largest number of markers (26 markers), while only 5 markers were mapped to R3. Moreover, there were 15 and 9 markers clustered in the bottom of chromosome R2 and R4, respectively.

Application of InDels in genetic diversity analysis of radish germplasm
To assess the value of the InDel markers, 40 InDel marker primers were further used for genetic diversity analysis among 32 worldwide radish accessions. The amplification of 40 polymorphic InDels markers generated 86 alleles across 32 germplasm accessions. The total number of alleles (Na) ranged from 2 to 4 with an average of 2.15. Expected homozygosty (HO) per locus ranged from 0 to 0.53, with an average of 0.24. Expected heterozygosity (HE) per locus ranged from 0.32 to 0.75 with an average of 0.44. The polymorphic information content (PIC) values varied from 0.09 to 0.69 and the mean PIC value was 0.40 (Table 4). The dendrogram showed that the 32 radish accessions could be divided into four major clusters with similarity coefficient varying from 0.55 to 0.81 (Fig. 7c). The cluster I comprised four Chinese accessions, most of which had late maturity characters, and the root shapes were long cylindrical. The cluster II included 22 accessions, and most of the accessions had different geographical origins, taproot colors and maturities. In this cluster, LLYB showed a high similarity with XBY, both had white phloem and xylem with medium maturity. PI358483 originated from Macedonia and PI263262 collected from Japan were divided into a subgroup, and both of them had similar taproot color and maturity. The cluster III contained four wild accessions, the RKZ was collected from China, while the PI271451, PI436536 and X50 with different taproot colors were originated from India, Guatemala and Germany, respectively. The cluster IV consisted of one Russian and Turkey accession with medium maturity.

Discussion
Alternative splicing emerged as a key scheme in multicellular eukaryotes to enhance the proteome diversity [38]. SNP and InDel markers played important roles in molecular breeding program, which had been widely utilized in high-density genetic map construction, genome-wide association mapping and marker-assisted selection [20,21]. Although several molecular markers had been successfully developed in radish [28,29], the number of SNP and InDel markers were still limited for genetic diversity analysis and genetic mapping. To better systematically characterize the overall transcriptional landscape in radish, the identification and development of largescale AS, SNP and InDel markers is needed for future genetic studies. In this study, a total of 177,540 AS events, 22,412 SNPs, 9436 InDels and 43,680 SSRs were successfully detected from three transcriptomes sequencing data of radish. Moreover, 40 InDel markers were further applied for genetic diversity analysis among 32 different radish accessions. To the best of our knowledge, this is the first report on identification of AS events, SNPs and InDels using comparative transcriptomic analysis in radish.

Alternative splicing plays a crucial role in transcriptional regulation
RNA sequencing had been widely used for identification of alternative splicing levels in plants [23,39]. In this study, a genome-wide identification and characterization of alternative splicing in radish was conducted by three transcriptome datasets. It was found that 59.4% of the total radish genes were alternatively spliced, and the frequency was corroborated with previous findings in Arabidopsis [24] and tomato [26]. It was commonly accepted that IR is the major type of AS in plants [23,25]. IR is the most frequent AS type in radish, while ES is the least type (2.4% of the total events). In Arabidopsis, the fewer number of ES was caused by multiple exons skipped together and exon skipping utilized alternative 5′ and/or 3′ splice sites [4]. The UDP-galactose 4-epimerase as a member of the short chain dehydrogenase superfamily were underwent splicing variants in Aspergillus flavus [40]. In this study, Rsa1.0_16819.1_g00001.1, encoding the UDP-D-galactose 4-epimerase 1 also showed alternative splicing. Rsa1.0_01846.1_g00001.1, encoding the 2-oxoglutarate (2OG) and Fe (II)-dependent oxygenase superfamily protein showed tissue-specific expression, which was coincided with previously studies that a few genes exhibited tissue-specific splicing variants in plants [23,26]. Recent studies revealed that different transcripts produced from individual multi-exon genes could accelerate genome evolution through generating new functions [3]. Three BrFLC1 alleles with alternative splicing patterns greatly contributed to flowering-time variations in B. rapa [41], indicating that the AS has important biological consequences in plants. Together, the identification of AS patterns in genes would facilitate understanding the transcriptional complexity in root vegetable crops.

Features of SNPs and InDels by comparative transcriptomic analysis
It has increasingly been proven that SNPs and InDels can potentially be linked to functional genes involved in important agronomic traits [8,17]. Comparative transcriptomic analysis has been widely applied for identification of SNPs and InDels in many plants, such as B. napus [9], tomato [11] and Petunia [42]. In this study, the SNP and InDel density was 1 SNP/17.9 kb and 1 InDel/42.6 kb, respectively, which was lower than previously described average occurrence of 3.9/kb by resequencing 93 radish F 2 individuals [29]. Generally, genomic DNA sequences had higher SNP frequency than those observed in the transcribed regions. Additionally, the SNPs and InDels within the coding regions are more likely to contribute to agronomic phenotypes. Among pairwise comparisons, the largest numbers of SNPs and InDels were present in the 'NAU-RG' vs 'NAU-YH' comparison, which suggested that 'NAU-RG' vs 'NAU-YH' had a far relationship than the other two comparisons. Moreover, the transition-transversion ratio (1.41:1) presented in this study was similar to the previous findings in melon [43] and Petunia [42]. The Ts/Tv bias could be explained by the fact that cytosine-guanine (CpG) dinucleotides exhibited the high transition frequencies after methylation [44]. Notably, a total of 13 InDel types were found in these three genotypes, and the two most common types were single-and binucleotide InDels, which have also been reported in several plant species including B. rapa, cotton and sesame [45][46][47].
In the current study, the identified SNPs/InDels heterozygotes (81.5%/80.5%) were predominate as compared with homozygotes (18.5%/19.5%), confirming that the high heterozygosity rate was found in three accessions. Meanwhile, the clean reads uniquely mapped to the radish reference genome were less than 80% in each transcriptome, which was similar to previously reported results that heterozygous polymorphisms had been supported by a low proportion of uniquely mapped reads [48]. Previous studies have identified a number of heterozygous single-nucleotide variants (SNVs) in Puccinia striiformis f. sp. Tritici (Pst)-CY32 genome, and some of the predicted genes were located in the heterozygous regions [49]. Wang et al. [50] revealed that heterozygous alleles played significant roles in restoring male fertility of cytoplasmic male-sterile in radish. Therefore, the results would provide useful information for the research of genome heterozygosity and its molecular function in radish.
Based on the genome position of SNPs and InDels, the sequence region types were classified. The SNPs frequency was enriched almost equally between genic (49.9%) and intergenic (50.1%) regions. Approximately 85% of the total InDels was identified in genic regions, which was similar to previously reported in rice and sesame [47,51]. It has been extensively shown that SNPs/ InDels in the coding region were most influential variations, and mainly lead to non-function of protein [48,52]. Notably, 73% of all InDels were identified in the CDS regions, which could cause frameshift mutation that results in the changes of gene function. Numerous studies demonstrated that non-synonymous played significant roles in amino acid variation in the protein product of genes [53]. The total numbers of nonsynonymous SNPs and synonymous SNPs were 2592 and 2726, the nonsynonymous to synonymous ratio of 1.05 exhibited a medium value compared with Arabidopsis (0.83) [54] and rice (1.29) [55]. Together, these results illustrated the identified SNPs and InDels would be valuable molecular tool for identification of causal gene mutations in radish.

Characterization and polymorphism of SNP and InDel markers
In this study, a total of 35 SNPs and 200 InDels identified from three transcriptome sequences of the 'NAU-RG' , 'NAU-LB' and 'NAU-YH' were employed for designing PCR primer. The polymorphism rate of SNP/InDel (83.9%/70%) among these three radish genotypes was in accordance with previously reported results [42,45]. Obviously, the polymorphism ratios for transcriptome derived SNPs was 84.3% by selecting a 51 SNPs set in Petunia spp. [42]. A large-scale ESTs sequence data from different cotton species were used to identify putative InDels, of which 71.1% InDels were showed loci polymorphisms [46]. Several identified markers amplified with unexpected bands and failed amplification, which might be related to the presence of large introns in genomic sequence [56]. Most of the InDel (87.5%, 35/40) markers produced two allele products, which is similar to the previous results in tomato [8]. Moreover, the average PIC value of these 40 InDels markers was 0.40, which is relatively higher than the PIC value calculated in sesame [47] but lower than that in trifoliate orange [48], indicating that the moderate level of polymorphism of the developed markers would greatly facilitate genetic diversity analysis in radish.

Availability of InDels in radish genetic diversity analysis
InDels are important tools for a wide range of genetics and genomic studies. In this study, phylogenetic analysis revealed that 32 radish accessions were grouped into four distinct clusters, thereby demonstrating the availability of the InDel markers to differentiate the cultivars. Besides, the distribution of radish cultivars was not strictly base on the geographical origins, which is consistent with previous studies in genus Oryza [57]. It is generally accepted that InDel markers display higher interspecies differentiation than SSR markers [58]. Currently, the efficient differentiation of the cultivars and wild radish by InDel markers in third and fourth main cluster was higher than previous reported SSR marker in 32 radish accessions [59]. However, most of Chinese cultivars were clustered in the same subgroup in the dendrogram, indicating that the genetic relationship of Chinese radish cultivars is still narrow. Therefore, it is important to introduce more exotic genotype for expanding the genetic basis of radish.

Conclusions
In summary, this is the first report on transcriptomebased characterization of AS events, SNP and InDel markers in radish. Totally, 56,530 AS events were accounted for 59.4% of the total radish genes among these three transcriptomes. Moreover, 22,412 SNPs, 9436 InDels and 43,680 SSRs were found in all the three genotypes. Among the total SNPs located in coding regions, the percentage of nonsynonymous and synonymous SNPs were 48.74% and 51.26%, respectively. The polymorphism rate of SNPs and InDels was verified among three radish accessions, and 83.9% of SNPs and 70% of InDels exhibited polymorphism. Totally, 15 SNPs and 125 InDels were found to be unevenly distributed on radish nine linkage groups. Germplasms were successfully differentiated by several newly developed InDel markers. Taken together, our results could provide a better understanding of the radish transcriptome complexity, and these newly developed molecular markers would be excellent tool for construction of linkage map, identification of candidate genes for interest traits, and marker-assisted selection in radish breeding programs.

Plant materials and DNA extraction
Seeds of radish advanced inbred line, 'NAU-RG' , 'NAU-LB' , 'NAU-YH' and other 32 radish accessions with different root colors and origins (Additional file 1: Table S1) were soaked, surface-sterilized and incubated for 3 days. The germinated seeds were grown in plastic pots and cultured in greenhouse under 25°C/14 h light and 18°C /10 h dark. Young leaves of 30-day-old seedlings were collected for DNA extraction. Genomic DNA of all accessions were extracted from young leaves using a modified CTAB procedure [60]. Then DNA samples were diluted to a final concentration of 10 ng μl −1 with 1 × TE buffer for further use.
Transcriptome sequences and mapping reads to reference genome Three transcriptome data, sequencing from roots of genotypes 'NAU-YH' and 'NAU-RG' (NCBI accession ID No. SRX707630 and SRX316199) [31,32], leaves of 'NAU-LB' (NCBI accession ID No. SRX1671013) [33], were collected from our previously reports. The radish reference genome sequence and gene model annotation files were downloaded from the available genome website (ftp://ftp.kazusa.or.jp/pub/radish/). Index of the reference genome was built using Bowtie v2.2.3, and paired-end clean reads mapped to the radish reference genome using TopHat v2.0.12 [34]. TopHat can generate a database of splice junctions based on the gene model annotation file.

Identification and validation of AS events
Splice junctions were identified using TopHat program, and then filtered the junctions with reads number less than three. The known and novel transcripts were identified by Cufflinks v2.1.1 Reference Annotation Based Transcript (RABT) assembly method [61]. The final transcript annotations in a GTF format file were used to identify AS events using ASTALAVISTA program with default settings [35]. Five types including IR, A3SS, A5SS, ES and others that contain more than one of the four basic types were analyzed as previously described [4]. The mapping results of AS events were visualized using the Integrative Genomics Viewer (IGV) [62]. Alternative splicing isoforms were quantified between the three genotypes using the MATS program [36]. The differentially spliced events were defined with at least 10% change in exon inclusion level and false discovery rate (FDR) of less than 0.05. The splicing events were represented with the sashimi plots [63].
Total RNA was isolated from leaves and roots of 'NAU-RG' , 'NAU-LB' , and 'NAU-YH' with Trizol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. The RNAs were treated with RNase-free DNase I (Takara, Japan), and reversetranscribed into cDNAs using SuperScript II reverse transcriptase (Invitrogen). The primer information is listed in Additional file 1: Table S3. RsActin was used as the internal control, and PCR products were visualized in agarose gel stained by ethidium bromide (EB).

Identification of SNPs, InDels and SSRs
The Picard-tools v1.96 and samtools v0.1.18 programs were used to sort and mark duplicated reads as well as reorder the alignment results of each sample [64]. SNP and InDel calling were carried out using the Genome Analysis Toolkit (GATK, version v3.2) [37], and the variant were filtered as follows: low quality scores (QUAL) is less than 30 and quality by depth (QD) is less than 5. Only the putative SNPs or InDels with 10 or more reads coverage were recorded to reduce the numbers of false positive SNPs. The localization of SNPs and InDels was performed based on the annotation of gene models of the radish genome. Synonymous and non-synonymous substitutions of the SNPs were determined using the SNPEff program [65]. Blast2GO program was applied to achieve Gene Ontology analysis on the genes contained SNPs and InDels.
To generate the reference transcriptome sequences, a de novo assembly of three transcriptome data was performed using Trinity program [66]. The potential SSRs were identified from the assembled unigenes using the MISA (microsatellite identification tool) program [67]. The number of repetitive units with a minimum of 10, 6, 5, 5, 5 and 5 were defined for the mono-to hexanucleotide, respectively.

Primer design and validation of DNA polymorphism
Primer pairs flanking the SNPs or InDels were designed using Primer Premier 5.0 program. The length of primer pairs ranged from 18 to 24 bp, while the predicted product sizes were varied from 80 bp to 400 bp. The Tm value was restricted to between 50 and 60°C. PCR was carried out in a 15 μl reaction volume containing 10 ng of template DNA, 2.0 mM of MgCl 2 , 0.2 mM dNTPs, 0.1 μM of each primer and 0.5 U Taq DNA polymerase (TaKaRa Bio Inc., Dalian, China). The PCR procedure comprised an initial denaturation at 94°C for 3 min, followed by 35 cycles of 94°C for 50 s, 56°C for 50 s and 72°C for 1 min, with a final extension at 72°C for 10 min. The SNP amplified PCR products were sequenced by Sanger method and analyzed by BioEdit v 7.0.5.3 (http://www.mbio.ncsu.edu/BioEdit/ bioedit.html). The InDel amplified products were separated on 8% PAGE and visualized with silver staining as described previously [68]. The validated SNPs and InDels were mapped on the prior dense genetic maps of radish according to their approximate positions on the linkage group locations using MapInspect software (http://mapinspect.software.informer.com/) [30].

Data analysis
To estimate the allelic variation of InDel markers in 32 radish accessions, POPgene program (v1.32) was employed to estimate the total number of alleles, expected homozygosty and expected heterozygosity [48]. The polymorphic information content value for the InDel marker was calculated using Power Marker v3.0 [69]. The 0 -1 data matrix was further used to calculate the coefficients of genetic similarity among all the accessions using the SIMQUAL program of NTSYS-pc software. The dendrogram was constructed basing on the unweighted pair-group method with arithmetic averages (UPGMA) in the SAHN module of NTSYS-pc software [59].