Genome-wide identification, phylogenetic and expression analysis of the heat shock transcription factor family in bread wheat (Triticum aestivum L.)

Background Environmental toxicity from non-essential heavy metals such as cadmium (Cd), which is released from human activities and other environmental causes, is rapidly increasing. Wheat can accumulate high levels of Cd in edible tissues, which poses a major hazard to human health. It has been reported that heat shock transcription factor A 4a (HsfA4a) of wheat and rice conferred Cd tolerance by upregulating metallothionein gene expression. However, genome-wide identification, classification, and comparative analysis of the Hsf family in wheat is lacking. Further, because of the promising role of Hsf genes in Cd tolerance, there is need for an understanding of the expression of this family and their functions on wheat under Cd stress. Therefore, here we identify the wheat TaHsf family and to begin to understand the molecular mechanisms mediated by the Hsf family under Cd stress. Results We first identified 78 putative Hsf homologs using the latest available wheat genome information, of which 38 belonged to class A, 16 to class B and 24 to class C subfamily. Then, we determined chromosome localizations, gene structures, conserved protein motifs, and phylogenetic relationships of these TaHsfs. Using RNA sequencing data over the course of development, we surveyed expression profiles of these TaHsfs during development and under different abiotic stresses to characterise the regulatory network of this family. Finally, we selected 13 TaHsf genes for expression level verification under Cd stress using qRT-PCR. Conclusions To our knowledge, this is the first report of the genome organization, evolutionary features and expression profiles of the wheat Hsf gene family. This work therefore lays the foundation for targeted functional analysis of wheat Hsf genes, and contributes to a better understanding of the roles and regulatory mechanism of wheat Hsfs under Cd stress. Electronic supplementary material The online version of this article (10.1186/s12864-019-5876-x) contains supplementary material, which is available to authorized users.


Background
Heat shock proteins (HSPs) perform important roles not only in cellular protection against stress-related damage, but also in the regular folding, intracellular distribution, and degradation of proteins.These functions facilitate organismal survival under stressful conditions [1,2].Heat shock transcription factors (Hsfs) modulate the expression of HSPs, and participate in various aspects of protein homeostasis, such as refolding, assembly and transporting damaged proteins, which sustain protein stability [3][4][5][6][7].Hsfs share a core structure consisting of an N-terminal DNA binding domain (DBD) and an adjacent bipartite oligomerization domain (HR-A/B) [6,8].Some Hsfs also share a leucine-rich nuclear export signal (NES) for nuclear export, a nuclear localization signal (NLS) essential for nuclear import,, and short peptide motifs (AHA motifs) for activator functions [9][10][11][12].Based on the characteristics of their HR-A/B domain and phylogenetic comparisons, plant Hsf genes may be classified into three broad groups: A, B and C [6,8].The HR-A/B regions of class B Hsfs are relatively compact, not including any insertions, while all class A and class C HSFs have an outspread HR-A/B region due to an insertion of 21 (class A) and seven (class C) amino acid residues [6].This classification is also supported by differences in the flexible linkers between the DBD domain and HR-A/B domain, which consists of 9 to 39, 50 to 78, and 14 to 49 amino acid residues in class A, B and C Hsfs, respectively [6,9].Recent studies indicate that Hsfs are engaged in plant development and growth, as well as in response to abiotic stresses such as salt, cold, drought and cadmium challenge [7,9,[13][14][15][16][17][18][19].For example, HsfA9 is related to seed maturation and embryogenesis in sunflowers and Arabidopsis [14][15][16].HsfA4a is involved in cadmium tolerance in wheat [19].Due to the essential modulatory functions of Hsf genes in plants [16][17][18], the Hsf gene family has been studied in the models Arabidopsis thaliana and rice (Oryza sativa), and nonmodels such as poplar (Popupus trichocarpa), maize (Zea mays), and apple (Malus domestica) [5,6,9,[20][21][22].However, the Hsf gene family in the bread wheat (Triticum aestivum) has not been systematically examined.
Bread wheat is one of the most widely grown and consumed crops worldwide [23].Bread wheat is hexaploid (2n = 6x = 42; AABBDD genome), originating from two amphiploidization events: the first hybridization producing the tetraploid wheat species (2n = 4x = 28, genome AABB) was between the Triticum urartu (2n = 2x = 14, genome AA) and presumably Aegilops speltoides, belonging to the section Sitopsis (2n = 2x = 14, genome SS); the second hybridization was between the tetraploid wheat and Aegilops tauschii (2n = 2x = 14, genome DD) [24,25].Therefore, bread wheat has a huge and highly complex genome with three subgenomes (A, B and D) and ~17Gb total size [26], leading to great challenges for genomic studies.Recently, however, a quality draft genome of hexaploid 'Chinese Spring' wheat has provided the foundation upon which we can investigate wheat gene families and to clearly recognize homologous gene copies in these three sub-genomes [27].Further, it has allowed the study of interactions of loci during polyploidization and the retention and dispersion of homologous gene [28,29].
Here we first perform an in silico genome-wide study to comprehensively identify members of the wheat Hsf gene family.Next, to characterize evolutionary and functional features, we determine chromosome locations, gene structures, conserved protein domains, phylogenetic relationships and expression profiles for this family.
Our study provides a foundation for downstream targeted functional investigation of wheat Hsf genes, and will be allow for better understanding of the molecular mechanisms by which Hsfs regulate in growth, development and stress resilience in wheat.

Genome-wide identification and classification of Hsf family in wheat
Through the availability of the genome sequence, it is possible for the first time to identify all the Hsf family members in wheat.In this study, we identified a total of 78 genes as Hsf members in the wheat genome, designating the predicted wheat Hsf genes TaHsf1 to TaHsf78.Members of the Hsf gene family have been broadly subdivided into Classes A, B, and C according to differences in the length of the flexible linkers between the A and B parts of the HR-A/B regions.In the TaHsf gene family, 38, 16 and 24 genes were accordingly assigned to Classes A, B and C, respectively.Within the A clade, 8 distinct subclades (A1, to A8) were resolved.The B-type Hsf genes were grouped into a separate clade subdivided into three groups (B1, B2 and B4).And the C-type genes were subdivided into two groups (C1 and C2).We further performed a BLASTN search against the wheat expressed sequence tag (EST) using the 78 identified Hsfs as queries to verify the existence and completeness of this set of wheat Hsfs.Results showed that most of the TaHsfs were supported by EST hits except 2 Hsfs (TaHsf57 and TaHsf75).We speculated these 2 unsupportted TaHsfs might not be expressed under any the assayed conditions or may be expressed at very low level that cannot be easily detected.Among the supported TaHsf genes, TaHsf8 has the largest number of EST hits, with 49, followed by TaHsf21 and TaHsf27 with 48 and 30 hists, respectively.Chromosome localization analysis found that 4 TaHsfs did not have corresponding chromosomal locations, and that the remaining 74 TaHsf genes were distributed on all of the 21 wheat chromosomes.Chromosome 3B contained the most Hsf genes with 8, followed by 4B, 5A and 5D, with each harboring 6, then 3A with 5, and finally 6A, 6B and 6D with one each.The predicted lengths of the putative TaHsf proteins ranged from 209 to 701 amino acids, with the molecular weights (Mw) ranging from 22.72 to 73.92 kDa and theoretical isoelectric points (PI) ranging from 4.67 to 9.50 (Table 1).

Conserved domains analysis of TaHsf
We identified five conserved domains by sequence alignment approaches (Table 2).All the TaHsf predicted proteins contained a highly conserved DBD domain, forming with a three helical bundles (H1, H2 and H3) and four-stranded antiparallel β-sheet in their Nterminal regions.However, within the Hsf family, the length of the DBD domain was quite different.We then used the MARCOIL tool to detect the presence of a property of the HR-A/B, the coiled-coil structure characteristic of leucine zipper-type protein interaction domains.We found that most of the TaHsfs proteins consisted of NES and NLS domains, which are vital for shuttling Hsfs between the nucleus and cytoplasm.As was expected in the A-type TaHsfs, additional sequence comparisons identified AHA domain in the middle of the C-terminal activation domains.By contrast, these domains were not detected in the B-and C-type TaHsfs.
To further predict and verify domains in the TaHsfs proteins, we used the Multiple EM for Motif Elicitation (MEME) motif search tool.Using this, we found thirty corresponding consensus motifs (Additional file 1: Figure S1, Additional file 2).Compared with class B and C, the members of class A contained the greatest number of conserved motifs (22), with the majority (12) detected in TaHsf1 and TaHsf3.

Phylogenetic analysis in wheat Hsf proteins
To further evaluate the phylogenetic relationships amidst Hsf families, the Hsf conserved amino acid sequences (from the beginning of the DNA-binding domain to the

Genome distribution and gene duplication of TaHsf gene family
We next determined chromosomal locations of TaHsf genes by leveraging the available wheat genome annotation information (Fig. 2).A total of 25, 26 and 23 TaHsf genes are found in the A, B and D sub-genomes, respectively (B > A > D).The distribution of Hsf genes was not even across the chromosomes.There were 7, 9, 17, 13, 16, 3 and 9 genes in the group 1 to group 7 chromosomes, which reveal obvious differences between group 3, 4, 5 and other four groups.Chromosome 3B had the highest number of Hsf genes with 8, while chromosome 6A, 6B and 6D all had only one Hsf gene eachs.These results suggest that Hsf gene duplication events may have happened in wheat 3, 4 and 5 group chromosomes during wheat formation and the evolution of gene families in the different sub-genome is independent, which may relate to gene function.
Gene duplication is frequently revealed in plant genomes, resulting from polyploidization or through tandem and segmental duplication related to replication [30].Here, we found 17 homologous gene groups with a copy on each of A, B and D homologous chromosome, and 7 gene pairs with a copy on only 2 of the 3 homologous chromosomes, while the other 13 genes were not found as homologs (Fig. 2

, Additional file 3). Our results indicate
Fig. 1 Phylogenetic tree of Hsf proteins from wheat, Arabidopsis, rice, brachypodium and maize.The N-proximal regions (from the start of the DNA-binding domain to the end of the HR-A/B region) of Hsf proteins were used to construct an unrooted neighbor-joining tree with MEGA6.0 (with pairwise deletion and Poisson correct).For Hsf proteins of Arabidopsis (prefixed by AT), rice (prefixed by Os), Brachyposium (prefixed by Bradi) and maize (prefixed by ZM), both locus ID and subclass numbers are given.TaHsf proteins are marked in red that gene loss may happen throughout the wheat Hsf gene family, leading to the loss of some homologous copies.Moreover, these homologous genes are clustered in group 3, 4 and 5 chromosomes, which was in line with the above analysis of chromosome localization, suggesting that group 3, 4 and 5 chromosomes subjected less sequence loss and interaction impact compared to other homologous chromosome groups.In addition, 17 pairs of duplicated genes from different sub-genomes were also found, containing 3 duplication events in the same chromosome and 14 segmental duplication events between different chromosomes, indicating that the duplication events could play important roles in the extension of the Hsf genes in wheat genome (Fig. 3, Additional file 3).

Phylogenetic analysis of Hsfs between the T. urartu, A. tauschii, and wheat orthologs
We also identify the Hsfs gene in the diploid ancestors of wheat, T. urartu and A. tauschii, to investigate the change of Hsf number in transition from diploidy to hexaploidy within a given subgenome.Results showed that 16 and 15 putative Hsfs were identified in T. urartu and A. tauschii through our methods, respectively (Additional file 4).Total 16 T. urartu-Hsfs, 25 T. aestivum-A-Hsfs, 15 A. tauschii-Hsfs, and 23 T. aestivum-D-Hsfs gene sequences were applied to build gene trees.16 pairs of T. urartuwheat A genome orthologs were mapped to T. urartu chromosomes with 2 on 1A, 2 on 2A, 4 on 3A, 3 on 4A, 2 on 5A, 1 on 6A and 2 on 7A (Fig. 4).15 pairs of A. tauschii-wheat D genome orthologs were mapped to A.tauschii chromosomes with 2 on 1D, 3 on 2D, 3 on 3D, 2 on 4D, 3 on 5D, 1 on 6D and 1 on 7D (Fig. 4).The majority of the orthologs (75 and 66.67% for T. urartu and A. tauschii, respectively) belonged to class A, as expected due to the high proportional composition of this type (48.72%) among the identified wheat Hsf genes.Moreover, the chromosome locations of the majority of wheat Hsf genes and their orthologs in T. urartu and A. tauschii corresponded to one another (Additional file 5).

Modulatory network between TaHsf genes with other wheat genes
In order to comprehend the interactions between TaHsfs and other wheat genes, the modulatory network of them Fig. 2 Chromosomal localizations and the homologous TaHsf genes in wheat A, B and D sub-genomes (Fig. 5) was predicted via the orthology-based method [31].Results showed that 15 TaHsfs were shown to have homology with Arabidopsis genes and the 420 gene pairs of network interactions were found with the average of 28 gene per TaHsf, suggesting the TaHsfs were broadly engaged in the regulatory network and biological process in wheat.Among these, 292 genes interacted with TaHsfA and 128 genes interacted with TaHsfB.TaHsf16 (A3) was found to interact with 77 wheat genes, including Hsp81.4,ZF2, HBT and HSP90.1, suggesting it was mainly participated in response to stress, metal ion binding, cell differentiation and protein folding.TaHsf18 (A4a) was found to interact with 24 wheat genes, including ZAT6, STZ and S6K2, suggesting it was mainly engaged in metal ion binding, intracellular signal transduction and negative regulation of cell proliferation.TaHsf50 (B4b) was predicted to interact with 88 wheat genes, including MYB15, MYB70, ZFP2, FMA, and HB31, suggesting it is engaged primarily in the regulation of transcription, asmonic acid, metal ion binding and DNA binding.TaHsf44 (B2c) was found to interact with 30 wheat genes including AGC2-1, WRKY39, BAG6 and NF-YC2, suggesting it is mainly engaged in defense response, calmodulin binding, response to heat and flower development (Additional files 6, 7).Moreover, GO and KEGG pathway descriptions of those interacting genes were analyzed to understand the potential function and pathway of the 15 TaHsfs (Fig. 6).The 15 TaHsf interacting genes were significantly enriched for transcription, DNA-templating, response to heat, transcription factor activity, sequence-specific DNA binding and calmodulin binding (Fig. 6a).Significantly enriched pathways included plant hormone signal transduction, PI3K-Akt signaling pathway, and protein processing in endoplasmic reticulum (Fig. 6b).

Verification of the expression of TaHsf in wheat under cd stress by qRT-PCR
According to the expression analysis based on diverse RNA sequencing data above, we obtained an overview of expressed TaHsfs under various agriculturally-relevant stressors.To further verify these results we selected a subset of these TaHsfs to detect their expression levels in root under Cd stress through qRT-PCR.Results showed that compared with H17CK group, levels of TaHsf3 (A1a), TaHsf4 (A2a), TaHsf5 (A2a), TaHsf16 (A3), TaHsf18 (A4a), TaHsf20 (A4a), TaHsf31 (A6b) and TaHsf32 (A6b) were significantly increased, while levels of TaHsf7 (A2b), TaHsf8 (A2b), TaHsf9 (A2b), TaHsf26 (A5) and TaHsf50 (B4b) were significantly decreased (P < 0.05, Fig. 9).The qRT-PCR results were highly consistent with that of RNA sequencing data, confirming that it is reasonable to use RNA sequencing data to evaluate the expression level of transcripts in wheat Cd-response.

Discussion
A growing body of evidence shows that Hsfs play essential roles in plant developmental and defense processes [16,[32][33][34][35]. Due to growing numbers of quality genomes available, putative functions of Hsf family genes have been predicted in many species, from the model plants Arabidopsis [13], rice [5] and maize [36], now to other crops, such as apple [21], Chinese cabbage [37], Chinese white pear [38] and pepper [39].However, despite the global impact of wheat, as well as the importance of environmental Cd contamination, there has been limited investigation into the molecular basis of Cd accumulation, and the Hsf family in wheat.
Here we took advantage of the high quality wheat reference genome, to first identify 78 Hsf wheat genes and to characterize these bioinformatically (Table 1).A first contrast lies on the sheer quantity of these genes in wheat: while we identify 78 in wheat, there are only 21 Hsfs in Arabidopsis, 25 in rice, 30 in maize, 29 in Chinese white pear and 25 in apple [5,13,36,38] We next investigated occurrences of possible gene duplication, which contributes differentially to the extension of specific gene families in plant genomes, and results from polyploidization or tandem and segmental duplication related [30,40,41].In wheat, we found that homologous genes are gathered in group 3, 4 and 5 chromosomes, which was in line with the above analysis of chromosome localization.These results indicated that compared to other homologous chromosome groups, Fig. 7 Heat map of the expression profiles of 46 TaHsf genes in five different tissues (grain, leaf, root, spike and stem).Log2 transformed FPKM values were used to create the heat map.The red or green colors stand for the higher or lower relative abundance of each transcript in each sample.Z represent Zadoks scale, a decimal code for the growth stages of cereals.P-value< 0.05 were regarded as statistically significant group 3, 4 and 5 chromosomes suffered less sequence loss and interaction impact.Three duplication events with the same chromosome and 14 segmental duplication events between various chromosomes were identified, suggesting that in wheat genome, the duplication events could play important roles in the extension of the Hsf cascade genes.A previous study indicated that more than 90% of the enhancement in regulatory genes in the Arabidopsis lineage were facilitated via genome duplications [42].Compared with tandem duplications, segmental Hsf gene duplications were more often.This situation appeared in Arabidopsis, maize, poplar [21,22,36], and also in wheat.
Our phylogenetic analysis indicated that compared with Arabidopsis, maize and rice, brachypodium Hsfs were nearer to wheat Hsf proteins, which was in line with broader classifications.Identification of Hsf genes in wheat and its diploid ancestors, T. urartu and A. tauschii, which suggesting that the number of Hsf in a known subgenome was increased in transition from diploidy to hexaploidy (for A subgenome, 16 to 25 genes, and for D subgenome, 15 to 23 genes).These results further indicate that gene gain happened broadly during the formation of hexaploid [27].
Moreover, protein-protein regulatory interactions were constructed to provide inference of mechanisms of life activities and to explore potential biological functions for unknown proteins.Results showed that TaHsf18 (A4a) interacts with 24 wheat genes, including ZAT6, STZ and S6K2, suggesting it was mainly engaged in metal ion binding, intracellular signal transduction, and the negative regulation of cell proliferation.A previous study indicated that ZAT6 coordinately activates the expression of phytochelatin synthesis-related gene and positively modulate Cd accumulation and tolerance by directly targeting GSH1 in Arabidopsis [43].HsfA4a was also engaged in cadmium tolerance in wheat [19], suggesting it might be involved in metal ion binding via interacting with ZAT6 to further play a role in cadmium tolerance in wheat.TaHsf50 (B4b) interacts with 88 wheat genes, including MYB15, MYB70, ZFP2, FMA, and HB31, suggesting it is involved in regulation of transcription, regulation of jasmonic acid, metal ion binding and DNA binding.It has been reported that MYB15 is required for the defense-induced synthesis of G-rich lignin and the constitutive synthesis of the coumarin metabolite scopoletin, both of which contribute to disease resistance against a hemibiotrophic bacterial pathogen [44].TaHsf44 (B2c) was found to interact with 30 wheat genes including AGC2-1, WRKY39, BAG6 and NF-YC2, suggesting it is engaged in defense response, calmodulin binding, response to heat and flower development.AtBAG6 can induce programmed cell death in yeast and plants [45].Aspartyl protease-mediated cleavage of BAG6 plays an important role in autophagy and fungal resistance in plants [46].GO analysis showed that 15 TaHsfs interacted genes were significantly enriched for transcription, DNA-templating, response to heat, transcription factor activity, sequence-specific DNA binding and calmodulin binding.It has been reported that Hsf family has a unique role as master modulators of thermotolerance, and were essential for plants survival under serious heat stress [9,47].
Furthermore, we characterize wheat Hsf genes that expression throughout tissues and development stages.Many of these genes were highly expressed across development.For example, TaHsf2, 3, 20, 17 and 45 were high expressed in roots, stems, leaves, spikes and grains including whole endosperm, starchy endosperm, transfer cells and aleurone layer, as well as seed coats during different developmental stages.It has been reported that Hsfs were involved in plant growth and development [9,16].Our results further indicated that Hsf genes play important regulatory roles in wheat growth, development and reproductive processes.
In addition, we comprehensively analyzed the expression levels of Hsf genes in response to drought, heat and Cd stresses to predict potential roles.The expression of most Hsf genes were differentially regulated in response to a given stress, which strongly suggests that they may be vital stress response genes.A previous study indicated that Hsfs are involved in responses to the abiotic stress as heat, cold, salt, drought and cadmium [13,17,19].Our results first comprehensively illustrate that Hsf genes likely play important regulatory roles in wheat Cd stress response.Therefore, these genes stand as strong functional candidates for followup research into Cd stress in wheat.

Conclusion
We present the first comprehensive identification and characterization of the wheat Hsf gene family.Through the latest available wheat genome information, total 78 putative wheat Hsf gens were identified through a genome-wide search, and categorized into class A, B and C subfamilies based on conserved motifs.Chromosome localizations, gene structures, conserved protein motifs, and phylogenetic relationship of these TaHsfs were comprehensively analyzed and strongly supported these classifications.Moreover, the gene duplication and homologous genes between wheat A, B and D sub-genome were also surveyed.Expression profiles of these TaHsfs through development and under various abiotic stresses were surveyed and provide strong functional candidates for followup work.Finally, through qRT-PCR analysis, 13 TaHsf genes were selected to verify their expression level in wheat under Cd stress, which provide top candidates for further functional analysis of Hsf genes in response to wheat Cd stress.

Identification and classification of Hsf gene family in wheat
The Hsf gene family was identified following the method as described by Wang et al. with some modifications [48].First, to construct a local protein database, all the wheat (T.aestivum L.) protein sequences available were downloaded from the Ensemble database (http://plants.ensembl.org/index.html).Then, the database were searched with 100 known Hsf gene sequences collected from A. thaliana (21), O. sativa (25), B. distachyon (24) and Z. mays (30) using the local BLASTP program with an e-value of le-5 and identity of 50% as the threshold.Moreover, a self-blast of these sequences was performed to remove redundancy, the physical localizations of all candidate Hsf genes were checked and redundant sequences with the same chromosome location were rejected.Furthermore, all obtained Hsf protein sequences were analyzed to detect DBD domains and coiled-coil structures by the SMART and MARCOIL programs (SMART: http://smart.embl-heidelberg.de/,MARCOIL: http://toolkit.tuebingen.mpg.de/marcoil).Those protein sequences lacking the DBD domain or a coiled-coil structure were removed.Finally, to verify the existence of all the obtained sequences, BLASTN similarity searches against the wheat ESTs deposited in the NCBI database were performed.The theorectical pI (isoelectric point) and Mw (molecular weight) of the putative Hsf from T. aestivum L were calculated using compute pI/Mw tool online (http://web.expasy.org/compute_pi/),respectively.Classification of the three different groups A, B and C was based on structural characteristics and phylogenetic comparisons [49,50].

Gene structure construction, protein domain and motif analysis
Gene structure information were obtained from the Ensemble plants database (http://plants.ensembl.org/index.html).Conserved domains annotation was performed using Pfam (http://pfam.xfam.org/search),SMART (http://smart.emblheidelberg.de/)and Heatster online tools [39].All full-length amino acid sequences of the TaHsfs were used to identify conserved domain motifs by the Multiple Em for Motif Elicitation (MEME) tool [51].The parameters were set as follows: maximum numbers of different motifs, 30; minimum motif width, 4; maximum motif width, 50.

Chromosomal locations and gene duplication
Genes were mapped onto chromosomes by identifying their chromosomal position provided in the wheat genome database.Gene duplication events of Hsf genes in wheat were investigated based on the following three criteria: (a) the alignment covered > 80% of the longer gene; (b) the aligned region had an identity > 80% [52].In order to visualize the duplicated regions in the T. aestivum genome, lines were drawn between matching genes using Circos-0.67program (http://circos.ca/).

Phylogenetic analysis
The N-terminal Hsf protein sequences containing the DBD and HR-A/B regions and parts of the linker between these two regions from A. thaliana, O. sativa, B. distachyon, Z. mays and T. aestivum L. were performed for multiple alignments by CLUSTALW and the results of alignment were used to construct phylogenetic tree using the NJ method in MEGA (version 6.0) [53].Bootstrap test method was adopted and the replicate was set to 1000.

Analysis of the TaHsf family orthologs in T. urartu and A. tauschii
The wheat-T.aestivum, wheat-T.urartu and wheat-A .tauschiiHsf genes were used to construct phylogenetic trees using neighbor-joining method with 1000 bootstrap replicates.According to these orthologous Hsf genes, a collinear map of the T. urartu-wheat A genome and A. tauschii-wheat B genome was created using genome visualization tool CIRCOS according to these orthologous Hsf genes.The locations of Hsf orthologous genes on the chromosomes of T. urartu and A. tauschii were obtained from the database published by Ling et al. [23] and Jia et al. [54], respectively.

Network interaction analysis
The interaction network involving the TaHsf genes was based on the orthologous genes between Wheat and Arabidopsis using the AraNet V2 tool (http://www.inetbio.org/aranet/)[48].Enrichment analysis was implemented by BiNGO, a cytoscape plugin, for gene ontology analysis and identifying processes and pathways of specific gene sets.Over-represented GO full categories were identified with a significance threshold of 0.01.

The TaHsf gene expression analysis by RNA-seq data
To study the expression of TaHsf genes in different organs and response to stress, the wheat expression database (http://wheat.pw.usda.gov/WheatExp/)was used The FPKM (fragments per kilobase of transcript per million fragments mapped) value was calculated for each Hsf gene, the log2 transformed values of the TaHsf genes were used for heat map generation.P-values < 0.05 were taken as statistically significant thresholds [55].

Plant materials, growth conditions, and treatments
The plant of wheat cultivar Chuanyu17, a high-Cdaccumulating cultivar, was planted in growth chambers at 23 ± 1 °C with a photoperiod of 16 h light/8 h dark.One-week-old seedlings were treated with 0 (H17CK) and 100 μM CdCl 2 for 24 h (H17Cd).Roots from the plants with similar size were harvested separately and washed three times with deionized water.All the plant samples from three biological replicates were frozen in liquid nitrogen immediately and stored at − 80 °C for RNA extraction.

RNA extraction and real-time quantitative RT-PCR (qRT-PCR) analysis
Total RNA was extracted from roots of Chuanyu17 in H17CK and H17Cd groups using TRIzol Reagent (Invitrogen, USA) according to the manufacturer's instructions.RNA was quantified by using NanoDrop-1000 and RNA integrity was checked by electrophoresis.First strand cDNA was synthesized using HiScript IIQ RT SuperMix (Vazyme, R223-1).The primers used in the qRT-PCR analyses are listed in Additional file 9. βactin was used as an internal control.The qRT-PCR was carried out using QuantiFast® SYBR® Green PCR kit (Qiagen, 204,054) according to the manufacturer's instructions.Each treatment was repeated three times.The expression levels were calculated from the 2 -ΔΔCt value [ΔΔCt = (CT target/Cd -CT actin/Cd -(CT target/control -CT actin/control )] [45].

DBD
DND-binding domain, HR-A/B OD (oligomerisation domain), heptad pattern of hydrophobic amino acid residues; NLS: Nuclear localization signal, NES Nuclear export signal.AHA Activator motifs, aromatic (W, F, Y), larger hydrophobic (L, I, V) and acidic (E, D) amino acid residues; Numbers in brackets reveals the position of the first amino acid present in the putative NLS, NES, and AHA in the C-terminal; nd: no domains detectable by sequence similarity end of the HR-A/B region) of 39 proteins from wheat (Triticum aestivum L.), 21 proteins from Arabidopsis (A. thaliana), 25 from rice (O.sativa), 24 from brachypodium (B.distachyon) and 30 from maize (Z.mays) were used to construct a phylogenetic tree (Fig.1).According to this tree, class HsfA showed the maximum number of subclasses among the three major groups, and contained eight smaller clusters of which five (A6, A2, A8, A1 and A7) were closer to class HsfC than class HsfA3, A4 and A5.Two HsfA6 members from Arabidopsis (At5g43840 and At3g22830) were not clustered with the HsfA6 subclass from other plant species, but were closer to the HsfA7 subclass.Brachypodium Hsfs were closer to wheat Hsf proteins compared with Arabidopsis, maize and rice, which was in line with the botanical classification.

Fig. 3
Fig. 3 Duplicated Hsf gene pairs identified in wheat.Seven homologous groups of wheat chromosomes are depicted in different colors.Duplicated gene pairs are depicted in corresponding colors and linked using lines with the corresponding color

Fig. 4 Fig. 5
Fig. 4 Collinear analysis for the Hsf gene family among wheat, T.urartu and A.tauschii.The green annulus on the top left represent chromosomes of A. tauschii and the blue annulus on the top right represent chromosomes of T. urartu.Different colors represent seven homologous groups of wheat chromosomes.Homeologous genes of each group are linked by lines with corresponding color . The vast majority of Hsfs can be categorized into three classes: A, B and C. The quantity of class A in Arabidopsis, rice, maize, Chinese white pear and apple are 15, 13, 16, 19 and 16, respectively.Class B Hsfs amount to 5, 8, 9, 8 and 7, in the five plants respectively.Finally, class C is represented by 1, 9, 4, 2 and 2, respectively.In contrast, of 78 putative wheat Hsf genes, 38 belonged to class A, 16 to class B and 24 to class C. Thus class C is relatively expanded in wheat in contrast to these other genomes.

Fig. 8
Fig. 8 Heat map of the expression profiles of TaHsf genes under Cd treatment.FPKM values were used to create the heat map.The red or green colors indicate the higher or lower relative abundance or each transcript in each sample

Fig. 9
Fig. 9 Verification of the expression level of TaHsfs by qRT-PCR analysis.Relative expression levels of 13 TaHsfs under Cd treatment.* represents P < 0.05 vs H17CK

Table 1
The list of the putative wheat Hsf genes

Table 1
The list of the putative wheat Hsf genes (Continued)

Table 2
Functional domains of TaHsfs motif 7 was detected in class B. The conserved motifs 10, 20, 22, 23, 25, 28, 30 were identified as NLS domains.Motifs 10 and 25 represented NLS domains in class A, whereas NLS domains were represented by motifs 20, 23, 28 and 30 in class B, motifs 22 and 23 in class C. Motif 15 represented NES domains, and motif11 was identified as characteristic AHA domains.Thus through the combination of the two methods, predicted DBD domains and HR-A/B domains were observed in each TaHsfs and varied greatly in size and sequence.

Table 2
Functional domains of TaHsfs (Continued)