Genome-wide characterization and evolutionary analysis of heat stress transcription factors (HSFs) to reveal their potential role under abiotic stresses in radish (Raphanus sativus L.)

Background Abiotic stresses due to climate change pose a great threat to crop production. Heat shock transcription factors (HSFs) are vital regulators that play key roles in protecting plants against various abiotic stresses. Therefore, the identification and characterization of HSFs is imperative to dissect the mechanism responsible for plant stress responses. Although the HSF gene family has been extensively studied in several plant species, its characterization, evolutionary history and expression patterns in the radish (Raphanus sativus L.) remain limited. Results In this study, 33 RsHSF genes were obtained from the radish genome, which were classified into three main groups and 12 subgroups based on HSF protein domain structure. Chromosomal localization analysis revealed that 28 of 33 RsHSF genes were located on nine chromosomes, and 10 duplicated RsHSF genes were grouped into eight gene pairs by whole genome duplication (WGD). Moreover, there were 23 or 9 pairs of orthologous HSFs were identified between radish and Arabidopsis or rice, respectively. Comparative analysis revealed a close relationship among radish, Chinese cabbage and Arabidopsis. RNA-seq data showed that eight RsHSF genes, including RsHSF-03, were highly expressed in the leaf, root, cortex, cambium and xylem, results that these genes might be involved in plant growth and development. Further, quantitative real-time polymerase chain reaction (RT-qPCR) indicated that the expression patterns of 12 RsHSF genes varied upon exposure to different abiotic stresses, including heat, salt, and heavy metals. This data indicated that the RsHSFs may be involved in abiotic stress response. Conclusions These results could provide fundamental insights into the characteristics and evolution of the HSF family and facilitate further dissection of the molecular mechanism responsible for radish abiotic stress responses. orthologs and paralogs were obtained and the expression pattern of RsHSFs in different radish tissues was investigated. The outcomes of this study provide an opportunity to further explore the roles of HSF genes HSF: RT-qPCR: Reverse-transcription quantitative PCR; HSR: heat stress response; TDR: temperature-dependent repression; HSPs: heat shock proteins; DEGs: differentially expressed genes; HMM: Hidden Markov Model; HR-A/B: hydrophobic amino acid residues; OD: oligomerization domain; DBD: DNA-binding domain; NLS: nuclear localization Signal; NES: nuclear export signal; MWs: molecular weights; pI: theoretical isoelectric points; DAS: days after sowing; CDS: coding sequence; TF: transcription factor; WGD: whole-genome duplication; WGT: whole genome triplication.

Background Abiotic stresses due to climate change pose a great threat to crop production.
Heat shock transcription factors (HSFs) are vital regulators that play key roles in protecting plants against various abiotic stresses. Therefore, the identification and characterization of HSFs is imperative to dissect the mechanism responsible for plant stress responses. Although the HSF gene family has been extensively studied in several plant species, its characterization, evolutionary history and expression patterns in the radish (Raphanus sativus L.) remain limited. Results In this study, 33 RsHSF genes were obtained from the radish genome, which were classified into three main groups and 12 subgroups based on HSF protein domain structure. Chromosomal localization analysis revealed that 28 of 33 RsHSF genes were located on nine chromosomes, and 10 duplicated RsHSF genes were grouped into eight gene pairs by whole genome duplication (WGD).
Moreover, there were 23 or 9 pairs of orthologous HSFs were identified between radish and Arabidopsis or rice, respectively. Comparative analysis revealed a close relationship among radish, Chinese cabbage and Arabidopsis. RNA-seq data showed that eight RsHSF genes, including RsHSF-03, were highly expressed in the leaf, root, cortex, cambium and xylem, results that these genes might be involved in plant growth and development.
Further, quantitative real-time polymerase chain reaction (RT-qPCR) indicated that the expression patterns of 12 RsHSF genes varied upon exposure to different abiotic stresses, including heat, salt, and heavy metals. This data indicated that the RsHSFs may be involved in abiotic stress response. Conclusions These results could provide fundamental insights into the characteristics and evolution of the HSF family and facilitate further dissection of the molecular mechanism responsible for radish abiotic stress responses. Background 3 Many inevitable environmental factors (e.g., heat, drought, flooding, salinity and heavy metals) trigger abiotic stress, affect plant growth and consequently increase crop losses [1][2][3]. Heat stress (HS), one of the major abiotic stresses, severely inhibits crop growth and development. These effects have had devastating economic impacts on the yield and quality of rice, wheat, maize and vegetable crops [4][5][6]. Heat shock transcription factors (HSFs) are important regulators that could contribute to controlling the differential expression of heat shock proteins (HSPs) and other functional genes in the process of protecting plants from heat and other stresses including chilling, salinity, drought and heavy metal [7,8]. It is imperative to clarify the molecular mechanisms that govern how plants respond and adapt to HS.
Plants have developed multiple defense mechanisms and strategies to cope with adverse conditions and respond accordingly [9][10][11]. Under abiotic stresses, the induction of myriad proteins, including transcription factors (TFs), could regulate the expression of specific functional genes to enhance plant resistance through signal transduction pathways.
Reactive oxygen species (ROS)-scavenging enzymes and HSPs are important functional proteins induced by HS, and their corresponding genes are targets of several HSresponsive TFs [12][13][14]. Previous studies indicate that HSFA6b vitally acts as a downstream regulator of the ABA-mediated heat stress response (HSR) [4]. Moreover, there are several differentially expressed genes (DEGs) involved in heat stress response process in radish [30]. Recently, the available radish genome sequence provided a useful resource for whole-genome identification of TF families [31]. Nevertheless, little information on systematic characterization of HSF genes and their families in response to abiotic stresses is available in the radish. This study aimed to to isolate the full-length RsHSF sequences from the radish genome, map the RsHSFs onto chromosomes and explore their expression in response to abiotic stresses. Moreover, RsHSF orthologs and paralogs were obtained and the expression pattern of RsHSFs in different radish tissues was investigated. The outcomes of this study provide an opportunity to further explore the roles of HSF genes involved in abiotic stresses and present the expansion and evolutionary history of the HSF gene family in the radish.

Results
Identification and characterization of radish HSF proteins To comprehensively identify the candidate HSF proteins in radish, a profile Hidden Markov Model (HMM) [32] search against NODAI radish genome protein sequences with the HSF domain (Pfam: PF00447) was performed. A total of 55 putative RsHSFgene sequences were obtained from the whole genome. SMART (http://smart.embl-heidelberg.de/) and Pfam (http://pfam.xfam.org/) were used to remove proteins with incomplete hsf-type DBD domain, while MARCOIL (http://toolkit.tuebingen.mpg.de/marcoil) was used to confirm OD (HR-A/B) domain presence. After removing sequences that encoded proteins without the complete DBD, OD (HR-A/B) and/or start/stop codons, 33 RsHSF genes remained for further analysis (Additional file 1: Table S1-S3). Subsequently, the conserved domains of all retained genes were identified by Heatster. The DBD, which consists of approximately 100 amino acids (AAs) at the N-terminus, was the most conserved domain. Overall, 31 AAs of the DBD domain were highly conserved among all RsHSFs (Additional file 2: Figure S1). NLS and NES are crucial for HSF intracellular distribution between the nucleus and cytoplasm. All RsHSFs had an NES, whereas only 13 contained the NLS (Additional file 3: Table S4). Furthermore, the physical and chemical properties of 33 RsHSF proteins were analyzed with ExPASy (Additional file 3: Table S4). Among these, the RsHSF protein sizes varied from 238 to 2,427 AAs with molecular weights (MWs) from 27.80 to 274.05 KDa, respectively. Additionally, the theoretical pI of most RsHSF proteins was < 7, with the exception RsHSF-04, . All RsHSFs were classified as unstable proteins according to the instability index. The aliphatic index may be a positive factor that increased globular protein thermostability. RsHSF-25 had the largest aliphatic index (77.14). Most RsHSFs showed similar physical and chemical properties, while different AA sequences in non-conserved regions may alter some of the molecular characteristics.

Intron-exon structure and conserved motif distribution
To obtain information about the diversity of RsHSF gene structure, full-length complementary DNA (cDNA) sequences were compared with the corresponding genomic DNA sequences through GSDS. To further analyze RsHSF protein motif distribution, whole sequences were subjected to the MEME web server and a total of 25 AA motifs were generated (Additional file 2: Figure.S2).
Overall, 25 of 33 RsHSF genes had one intron, while the ramaining members have two or more introns. Additionally, proteins within the same subgroups had similar motif structures. All RsHSF proteins harbored motifs 1, 2 and 4, all of which highly correspond to the conserved DBD. Moreover, RsHSF-07, RsHSF-14 and RsHSF-26 in subgroup A6 presented a similar gene and motif structure. However, some motifs were only detected in specific RsHSF protein classes. For example, motif 3 was identified in classes A and C, while motifs 9 and 17 only existed in class A1. A proportion of RsHSF proteins in the same class exhibited similar motif distribution, indicating that these proteins might have conserved functions (Fig.1 Table S5). Based on the number of AA residues between the A and B portions of the HR-A/B region, RsHSFs were classified into three classes, namely A, B and C. Intriguingly, the motifs 5 and 19 were only identified in class B (Fig.1).
A phylogenetic tree was constructed using neighbor-joining and maximum likelihood methods by MEGA6 [38] with the full sequences of the 134 HSF proteins (Fig.2). All proteins were categorised into three groups corresponding to the HSF classes A, B and C.
Additionally, there were 9 RsHSF class A subgroups clearly obtained on the basis of the bootstrap values and phylogenetic analysis of Arabidopsis and rice. According to the phylogenetic tree, class A consisted of 8 subclasses (A1, A3, A4, A5, A6, A7, A8, and A9) and contained the most RsHSF numbers, while class C accounted for the least proportion of HSFs among these five species. Interestingly, no RsHSFs, AtHSFs or BrHSFs clustered into classes A2, whereas AcHSFs did not appear in classes A6 and A7. This finding suggests that most RsHSF genes are more closely related to their corresponding homologous genes in Chinese cabbage and Arabidopsis.

Chromosomal location and duplication of RsHSF genes
To obtain the chromosomes distributions of RsHSF genes, their DNA sequences were physically plotted on the chromosomes through blast searches against the genomic sequences (Additional file 5: Table S6). In total, 28RsHSF genes were mapped on the chromosomes with uneven distribution, with exception of five genes 23,25,29 and 33) (Fig.3). Chromosome (Chr.) 5 had the largest number of RsHSF genes among three classes, followed by Chr. 4 (5 RsHSF genes from class A and B). Only one RsHSF gene was present in Chr. 8, while Chr. 1, 6 and 9 harbored two genes. Moreover, all the class B RsHSF genes were mapped on the chromosomes.
Genome duplications have been recognized and contributed to the expansion of gene family in plants [39,40]. MCScanX was used to obtain information about origins of duplicate RsHSF genes in radish genome [41]. 13 (46.4%) RsHSF genes were duplicated and retained from a whole genome duplication (WGD) event and 11 (39.3%) were duplicated and retained from a singleton event (Additional file 6: Table S7). Eight WGD events of 10 duplicated RsHSF genes were identified and classified into four groups.

Identification of orthologous and paralogous HSF genes
Polyploidization events contributed significantly to the evolution of flowering plants [42].
Previous studies showed that the common ancestor of Brassica and Raphanus have experienced ' whole-genome triplication (WGT) event since its divergence from Arabidopsis. However, the gene losses of orthologous groups between Arabidopsis, Brassica and Raphanus for protein-coding genes occurred in both the Brassica and Raphanus lineages [39,42]. Orthologous HSF genes among radish, Arabidopsis and rice were identified for triplication assessment using the Orthomcl-pipeline [43]. In total, there were 23 pairs of orthologous HSFs between radish and Arabidopsis, and eight paralogous gene pairs were identified in radish (Additional file 8: Table S9). It was reported that there are only six orthologous gene pairs between Arabidopsis and rice [23]. In addition, seven orthologous HSF gene pairs were identified between radish and rice (Additional file 8: Table S9). Among the orthologous HSF gene pairs between radish and Arabidopsis, AtHSF-11 and AtHSF-20 had three orthologous genes, three AtHSFs have two orthologous genes and 11 AtHSFs only had one orthologous gene (Additional file 2: Fig S3). These results indicate that many orthologous groups experienced gene losses. Moreover, five genes  had no orthologous gene in either Arabidopsis or rice. These results provide useful resource and reference for further exploring the functions of RsHSF genes in radish (Fig.4).

Expression pattern of RsHSF genes in different tissues
To determine spatiotemporal expression patterns of RsHSF genes, the Reads Per Kilobase Per Million (RPKM) values of the 33 RsHSF genes in different tissues and developmental stages were collected from RNA-Seq data and presented in a heatmap (Fig.5, Additional file 9: Table S10). In total, the expression profiling showed that not all RsHSF genes were expressed. The number of RsHSF genes in different tissues and developmental stages with RPKM >1 was 14 (42.4%). Almost all RsHSFA1s RPKM values were < 10 among different tissues and stages, except for RsHSF-18 and RsHSF-24 that were highly expressed in the cortex, cambium and xylem. The majority of RsHSF genes exhibited differential expression patterns in the various tissues and developmental stages. RsHSF-21 and RsHSF-29 (class C) were relatively lowly expressed during all developmental stages. RsHSF-21 and RsHSF-29 (class C) were relatively lowly expressed during all developmental stages. In addition, RsHSF-13 expression increased significantly in both root and leaf after 7 days after sowing (DAS), and it was in leaves after 40 DAS. RsHSF-30 (class A8) expression was higher compared to RsHSF-28 (class A3) during all developmental stages. Besides, RsHSFB genes exhibited stage-specific expression patterns. Among class B, both RsHSF-04 and RsHSF-10 were lowly expressed during all stages, while RsHSF-03 was highly expressed.

RsHSF gene expression profile under abiotic stresses
RNA-Seq data showed that some HSF genes were differentially expressed under disparate stresses. The transcription levels (log 2 (Fold Change) and log 2 TPM value) of certain RsHSF genes from the NCBI SRA were further analyzed (Additional file11: Table S12; Additional file 2: Fig S5). Several HSF genes, including HSFA1s were up-regulated in response to heat and salt stresses. RsHSFA4s (RsHSF-08, RsHSF-16 and RsHSF-22) were up-regulated under salt and lead (Pb) stresses, while they were down-regulated under HS. Additionally, HSFC1 (RsHSF-21) expression increased after Cd and salt treatments. HSFA1s as master regulators play vital roles in the HSR and activation of transcriptional networks.
Expression levels of genes that encode HS responsive TFs including DREB2A, HSFA7s andHSFBs, are directly regulated by HSFA1s [44]. HsfA4a is induced by oxidative stress, including HS, and regulates APX1 expression [45]. To gain insight into the expression patterns of HSF genes in radish, 12 RsHSF genes that belonged to different groups were verified by RT-qPCR analysis under heat, salt, Pb, and Cd stress (Additional file 10: Table   S11). Overall, the gene expression patterns varied significantly under the various treatments (Fig.6). Most genes were up-regulated after 24h HS exposure, while some genes were down-regulated after HS for 6h (Fig.6) The HSFs family allows plants to cope with abiotic stresses (e.g. heat, Cd and high light) by regulating gene expression to prevent damage [15,46]. Recently, the availability of increasingly complete genome sequencing has enabled the application of a bioinformatics approach to identify various gene families, including HSF in different species [27,28].
However, the available information about the RsHSFs is limited. This study is the first comprehensive overview of the HSF gene family within the radish genome.

Genome-wide identification and characterization of RsHSF genes
It is widely accepted that the HSF-type DBD and OD domains are necessary HSF structural components. The highly conserved DBD is composed of a three-helical bundle (H1, H2 and H3) and a four-stranded antiparallel ß-sheet [9]. The HR-A/B, connected to the DBD by a flexible linker of variable length (15-80 AAs), is comprised of hydrophobic heptad repeats [47,48]. In brief, the DBD ensures HSFs combine with HSE, and the OD is the basis for differentiating the three HSF classes. It is reported that the DNA-binding domain of plant HSF is separated by a single intron. Although the position of this intron is unanimous in all HSF DBDs, the intron size is variable [21]. Therefore, we predicted the conserved domain using Heatster online tools [49] after performing Pfam and MARCOIL analysis.
Among five plant species, 134 HSFs divided into three major classes were used to analyze the relationships. The results suggested that HSFs originated prior to the divergence of these species. Phylogenetic analysis evidenced that RsHSFs were more similar to BrHSFs and AtHSFs than OsHSFs, which corresponds with the fact that the radish, Chinese cabbage and Arabidopsis belong to Brassicaceae family (Fig.2). The radish HSF number (33) was larger than that in Arabidopsis (21)

Arabidopsis-Brassica lineages, and the time of the Raphanus genus diverging from
Brassica was longer than previously predicted [58,59]. Furthermore, recent reports revealed that ~60% of genes that belong to the neopolyploid ancestor of Raphanus and Brassica disappeared due to a WGT event. However, a considerable number of genes still exist within the Raphanus and Brassica genomes, a phenomenon that represents the benefit of this evolutionary novelty [40]. Phylogenetic analysis of HSF genes among radish, Arabidopsis and Chinese cabbage indicated that many RsHSF genes are highly similar to their corresponding Arabidopsis and Chinese cabbage homologs. Moreover, a large number of the identified RsHSFs were detected as orthologous genes in Arabidopsis, including RsHSF-04, RsHSF-05 and RsHSF-06, which were the orthologous genes of AtHSF-20 (Fig.4). These results provide valuable cues on further understanding the functions of these highly homologous genes in radish. For instance, RsHSF-07,  were highly similar to AtHSF-11, which is vital for HS tolerance and is a downstream regulator during the ABA-mediated stress response [15]. In this study, we identified eight pairs of RsHSF paralogs that might be related to the extended genome triplication of the radish. Collectively, the RsHSF gene family expansion may be largely related with gene duplication, and the identification of the orthologous gene pairs between radish and Arabidopsis provide a reference for exploring the roles of HSFs under different stresses in radish.
The potential roles of differentially expressed RsHSF genes HSFs play important roles in plants in response to abiotic stresses by regulating the expression of different genes [13,23,60]. Among HSF genes in plants, HsfA1 (a major transcriptional activator) is necessary to evoke the HSR. Indeed, the HsfA1-knockout mutant shows reduced expression of many HS-induced genes [45,61]. Moreover, HsfA1a is associated with Cd tolerance by improving melatonin biosynthesis [8]. In this study, RsHSF genes were significantly differentially expressed in response to various stresses. . Taken together, several RsHSF genes were differentially expressed upon abiotic stresses including heat, salt or heavy metal stress, and these results indicate that they might be involved in the plant response to abiotic stresses.
RsHSF gene expression during different stages and in tissues is associated with abiotic stresses tolerance. In this study, RsHSF genes exhibited tissue-specific expression patterns. At the seedling stage, RsHSF-04 and RsHSF-33 were lowly expressed in 20 and 14 DAS root, respectively. Throughout development, RsHSF-10 was only detected in 7 and 14 DAS root. AtHSF-16 (At4g36990) is a direct target gene of AtHsfA1s, which are critical for plants to initiate positive activities when coping with HS [16,67]. RsHSF-03 is highly similar to AtHSF-16 (At4g36990) and exhibited high expression level during seedling and taproot-thickening stages. Notably, RsHSF-19, the orthologous gene of AtHSF-02(At5g1682), was not expressed during all stages. These findings imply that the highly expressed genes among different stages and tissues may contribute to enhance stress tolerance in radish. Further characterization of these differentially expressed RsHSF genes would facilitate investigating the abiotic stresses response regulatory networks in radish.

Conclusion
In summary, a total of 33 RsHSF genes were identified at the genome-wide level in radish.
The protein motifs in most HSF members within one class were similar, and most HSF genes had only one intron. The expression patterns of several RsHSF genes were characterized under different stresses: the patterns were more similar within the same class than among different classes. Moreover, comparative analysis revealed a series of paralogous RsHSF genes in radish and orthologous RsHSF genes in Arabidopsis and rice.
These results may enhance our understanding of RsHSF functions and their involvement in the radish stress responses. Identification of RsHSF genes provides a rich resource for comprehensive understanding of the roles of HSF genes in abiotic stresses response regulatory networks. These findings will facilitate further functional characterization of RsHSF genes, and provide valuable information to clarify the molecular mechanisms underlying abiotic stresses response in radish. Gene sequences with ≥ 98% identity and length difference ≤ 5 base pairs were considered to be the same genes between two genomes, and the corresponding location of RsHSF genes in chromosomes was localized using MapChart Software [74]. The Multiple Collinearity Scan toolkit (MCScanX) was used to identify the RsHSF duplication events [41,73]. BLASTP was performed to identify the intra-species paralogous pairs using protein sequences with the following parameters settings: 1) alignment significance: E_VALUE (default: 1e-05); 2) MATCH_SCORE: final score (default: 50); 3) MATCH_SIZE: number of genes required to call a collinear block (default: 5) and the maximum gaps (default: 25).

Sequence collection and HSF identification
The OrthoMCL pipeline [43] was used with standard settings to identify potential orthologous and paralogousHSFgenes in radish, Arabidopsis, and rice. The relationships of orthologous and paralogous genes among the three species were plotted using Circos software [75].

Expression analyses of RsHSF genes by RNA-Seq
To analyze the expression patterns of the RsHSF genes, the Illumina RNA-Seq data from five tissues (cortex, cambium, xylem, root tip and leaf) and six stages [7,14,20,40,60 and 90 days after sowing (DAS)] were collected from radish reference genome [31] (Additional file 10: Table S12). The expression level for each RsHSF gene was presented with the RPKM method. Finally, heat maps were generated using TBtools [72].
Abiotic stress treatments 'NAU-YH' seeds, a radish advanced in a breeding line with small red global roots, were sterilised, rinsed and incubated for three days. Germinated seeds were transferred into a plug tray and cultured in a growth chamber at 25°C day/18°C night with 14 h light/10 h dark, 60% humidity and 12,000 lx light. Three weeks later, seedlings with 6-7 true leaves at the cortex split stage were exposed to 38 °C for 0, 6 or 24 h (termed control, Heat6 or Heat24, respectively). For salt and heavy metal (Cd and Pb) treatment, seedlings of identical sizes were grown in a plastic container with modified half-strength Hoagland nutrient solution as described previously [76]. The nutrient solution was changed every three days. One week later, the seedlings were treated with CdCl 2 2.5H 2 O(50mg/L), Pb(NO 3 ) 2 (100mg/L or 200mg/L) and NaCl (150mM or 250mM), respectively [29]. Seedlings grown in Cd/Pb-free nutrient solution were used as the control. In this study, three similar seedlings with 6 true leaves at the cortex split stage were used for each treatment, and each treatment was performed in triplicate. Additionally, 0.1g root tissue was used for RNA isolation from each sample. These samples were immediately frozen in liquid nitrogen and stored at -80°C for further analysis.

RNA-Seq data and RT-qPCR analysis
In this study, the raw RNA-Seq data was collected from NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra) using the SRA Toolkit (ver.2.8.2). After removing the adaptor sequences, contaminants and low-quality reads, the clean reads were mapped to the radish reference genome using

Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files. among the radish, Arabidopsis and rice.
Additional file 9: Table S10. Tissue-specific expression of radish HSF genes.