Genome-wide analysis of the Hsf family in soybean and functional identification of GmHsf-34 involvement in drought and heat stresses

Background High temperature affects organism growth and metabolic activity. Heat shock transcription factors (Hsfs) are key regulators in heat shock response in eukaryotes and prokaryotes. Under high temperature conditions, Hsfs activate heat shock proteins (Hsps) by combining with heat stress elements (HSEs) in their promoters, leading to defense of heat stress. Since the first plant Hsf gene was identified in tomato, several plant Hsf family genes have been thoroughly characterized. Although soybean (Glycine max), an important oilseed crops, genome sequences have been available, the Hsf family genes in soybean have not been characterized accurately. Result We analyzed the Hsf genetic structures and protein function domains using the GSDS, Pfam, SMART, PredictNLS, and NetNES online tools. The genome scanning of dicots (soybean and Arabidopsis) and monocots (rice and maize) revealed that the whole-genome replication occurred twice in soybean evolution. The plant Hsfs were classified into 3 classes and 16 subclasses according to protein structure domains. The A8 and B3 subclasses existed only in dicots and the A9 and C2 occurred only in monocots. Thirty eight soybean Hsfs were systematically identified and grouped into 3 classes and 12 subclasses, and located on 15 soybean chromosomes. The promoter regions of the soybean Hsfs contained cis-elements that likely participate in drought, low temperature, and ABA stress responses. There were large differences among Hsfs based on transcriptional levels under the stress conditions. The transcriptional levels of the A1 and A2 subclass genes were extraordinarily high. In addition, differences in the expression levels occurred for each gene in the different organs and at the different developmental stages. Several genes were chosen to determine their subcellular localizations and functions. The subcellular localization results revealed that GmHsf-04, GmHsf-33, and GmHsf-34 were located in the nucleus. Overexpression of the GmHsf-34 gene improved the tolerances to drought and heat stresses in Arabidopsis plants. Conclusions This present investigation of the quantity, structural features, expression characteristics, subcellular localizations, and functional roles provides a scientific basis for further research on soybean Hsf functions. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1009) contains supplementary material, which is available to authorized users.


Background
Heat stress, defined as a rise in the temperature of 10-15°C above the ambient [1], beyond a given threshold level for a period of time, is an agricultural problem in many areas all over the world, affecting plant growth and development and often leading to reductions in yield. All organisms, including eukaryotes and prokaryotes, share a common heat shock response mechanism, which involves a number of reactions, including new protein synthesis, folding, intracellular targeting, specific biological functions, and protein degradations. Among these proteins, Hsps acting as molecular chaperones are essential for the maintenance and/or restoration of protein homeostasis [2][3][4][5][6][7][8]. The Hsp expression is regulated by the multiple mechanisms. The central regulators are Hsfs. Under high temperature conditions, Hsfs activate Hsps by combining with HSEs in their promoters, leading to the defense of the heat stress and even recovering from its effects.
A typical Hsf protein contains a modular structure with an N-terminal DNA-binding domain (DBD), an adjacent oligomerization domain (OD) composed of heptad repeats of hydrophobic amino acid residues (HR-A/B), a nuclear localization signal (NLS) region essential for nuclear uptake of the protein, a nuclear export signal (NES) region, and an activator motif (AHA) [9]. Arabidopsis Hsfs were classified into A, B, and C classes according to the differences in their HR-A/B regions. Due to the insertion of 21 (class A) or 7 (class C) amino acid residues between the A and B parts of the HR-A/B regions, the class A and class C Hsfs have longer HR-A/B regions than class B, which is distinguished from class A and C by the presence of a heptad repeat pattern instead of an insertion. Unlike class B and class C, the class A members contain a C-terminal AHA motif relevant to their own activator function, and a hydrophobic, frequently leucine-rich NES required for the receptor-mediated nuclear export in complex with the NES receptor [10].
Under normal circumstances, the inactive state of a monomeric Hsf is maintained by the interaction with the molecular chaperones, such as Hsp70 and Hsp90. In response to heat stress, Hsfs released from the chaperone complex are converted from a transcriptional inactive monomer to an active trimmer through combination of their ODs. As sequence-specific trimeric DNA binding proteins, the active Hsfs are capable of recognizing and combining HSEs in the Hsf-inducible gene promoters [11]. HSEs are formed of repetitive palindromic binding motifs of the 5'-AGAAnnTTCT-3' sequence upstream of the TATA box in the Hsf-inducible genes [12][13][14][15].
Since the first plant Hsf gene was identified in tomato [16], the Hsf family genes have been thoroughly characterized, and 21, 25, 25, and 27 Hsf genes were found in Arabidopsis, rice, maize, and tomato, respectively [9,[17][18][19]. In the present study, we scanned for and integrated all the nonredundant sets of the soybean Hsf genes, determined their chromosomal locations, predicted their protein structures by available software and network stations, analyzed the expression levels of the soybean Hsf genes by qRT-PCR and identified the function of GmHsf-34 in the tolerance to drought and heat stresses. This study provides a version on the structures and evolutionary history of the soybean Hsfs, and a candidate gene to the crop molecular breeding.

Identification, phylogenetic, and evolutionary analyses
The amino acid sequences of Hsf-type DBD domains (Pfam: PF00447) were submitted into JGI Glyma1.0 annotation for BLASTP searches. Fifty-eight putative soybean Hsf sequences were acquired. After surveyed using the Pfam database and SMART online tool, 4 soybean Hsf sequences were rejected due to the absence of typical Hsf DBD domains, and 16 were abandoned due to the absence of coiled-coil structures. Consequently, 38 nonredundant soybean Hsfs were identified ( Table 1). The polypeptide lengths of soybean Hsfs varied widely, ranging from 213 to 510. Isoelectric points of the proteins were diverse (Table 1).
To determine the phylogenetic relationships among soybean Hsfs, a phylogenetic analysis of 38 soybean Hsfs, 25 maize Hsfs, 25 rice Hsfs, and 21 Arabidopsis Hsfs was performed by generating a neighbor-joining phylogenetic tree ( Figure 1). According to differences in the amino acid sequences of DBD, the HR-A/B region, and the linker between them, the A, B, and C Hsf classes formed three clusters. Class A was divided into 10 sub-clusters, designated A1, A2, A3, A4, A5, A6, A7, A8, A9, and A10. Class B was divided into sub-clusters B1, B2, B3, and B4, and the class C contains sub-clusters C1 and C2. Soybean Hsfs were further divided into 12 sub-clusters according to their phylogenetic relationship, defined as A1, A2, A3, A4, A5, A6, A8, B1, B2, B3, B4, and C1 ( Figure 1). As a dicot, soybean was more similar to Arabidopsis than to the monocots rice and maize. AtHsf-09 and AtHsf-10 were the only two members of subclass A7. The A8 and B3 subclasses were present only in the dicots, and A9 and C2 existed only in the monocots. Interestingly, soybean subclass B4 had higher similarity to Arabidopsis B4 than to the rice or maize B4 subclasses, and soybean subclass A6 Hsfs showed higher similarity to A4 rather than to Arabidopsis subclass A6.

Physical locations of soybean Hsfs
According to the soybean genome database, 38 soybean Hsf genes were distributed among 15 chromosomes, with the exception of chromosome 2, 6, 7, 12, and 18 ( Figure 2). The number of soybean Hsf genes in each chromosome differed considerably. For example, chromosome 1 and 10 carried 5 soybean Hsf genes, whereas only one was present in chromosome 4, 14, and 19 respectively. Using soybean genome repeat informations, 15 paralogous genes were identified ( Figure 2).

Gene structures and cis-acting elements
Gene structure analysis revealed the existence of introns in the soybean Hsf genes. Four introns were found in GmHsf-20, 3 in GmHsf-23, and 2 in GmHsf-02, GmHsf-12,   GmHsf-17, and GmHsf-18 respectively ( Figure 3). Ciselement analysis demonstrated that every soybean Hsf member carried one or more MYB and MYC elements in their promoters. In addition, 52.6% of the members contained an ABA-responsive element (ABRE), 31.6% contained a dehydration-responsive element (DRE), and 42.1% contained a low-temperature responsive element (LTRE) ( Table 2). It was reported that the above-mentioned 5 elements play different significant roles in stress responses in plants. For example, MYB is involved in stress-induced drought, low temperature, salt, ABA, and GA responses [20]. ABRE responds to drought and ABA via combination with ABRE binding proteins (AREB) [21]. DRE combining with DRE binding proteins (DREB) participate in drought, salt, low temperature, and ABA responses [22]. LTRE contributes primarily to low temperature response and regulation [23]. Analyses of cis-elements in the promoters suggest that Hsfs are significantly related to stress response.

Conserved domains and motifs of soybean Hsfs
The modular structure of the Hsf family in plants has been described thoroughly in several model plants [9,18,24]. A typical soybean Hsf protein contains 5 conserved domains. There is a gradation of DBD, OD, NLS, NES, and AHA domains from N-terminal to C-terminal (Table 3). The DBD domain, the most conserved section, composed of approximately 100 amino acids, contains 3 α-helices and a four-stranded antiparallel β-sheet (α1-β1-β2-α2-α3-β3-β4) ( Figure 4). This helix-turn-helix motif (H2-T-H3) specifically combines to HSEs in the promoters of heatstress-induced genes. HR-A/B adjacent to the DBD domain in the C-terminal is characterized by a coiled-coil structure (coil-coil structure). According to the distinction between the HR-A and HR-B motifs, Hsfs were artificially divided into A, B, and C classes. Because of the insertion of 21 (class A) or 7 (class C) amino acid residues between the A and B parts of the HR-A/B motif, class A and class C Hsfs have longer HR-A/B regions than class B Hsfs, which are distinguished from classes A and C by the presence of a heptad repeat pattern instead of an insertion ( Figure 5). Depending on the balance of nuclear import and export, the intracellular distribution of Hsfs changes dynamically between nucleus and cytoplasm [10,25]. A hydrophobic, frequently leucine-rich NES at the C-terminal of many Hsfs is required for receptor-mediated nuclear export in a complex with the NES receptor. Together with the adjacent AHA motifs, NES serves as part of a typespecific signature region in the C-terminal of class A Hsfs in plants [26]. AHA motifs exist only in class A Hsfs (Table 3). It was noted that the α2-α3 sequence in DBD of GmHsf-12 was unique ( Figure 4). An NLS was not detected in GmHsf-10, GmHsf-14, GmHsf-05, and GmHsf-27; NES was located in HR-A/B in 9 Hsf proteins (GmHsf-03, GmHsf-07, GmHsf-16, GmHsf-18, GmHsf-19, GmHsf-21, GmHsf-26, GmHsf-32, and GmHsf-35); and two HR-A/B regions were found in GmHsf-23 (Table 3).

Expression patterns of soybean Hsf genes
To examine expression patterns in different soybean tissues and organs, an expression pattern map of soybean Hsf genes based on the gene-chip data downloaded from the soybean genome database was drawn ( Figure 6 and Additional file 1: Table S1). The data analysis revealed that soybean Hsf genes were expressed in 14 tissues and organs and at different developmental stages. Moreover, soybean Hsf genes were expressed at the highest level in roots and at the lowest level in seeds after 21 days of development ( Figure 6A).
Three soybean Hsf genes showed tissue-specific expression patterns. For example, GmHsf-02 was expressed in roots; GmHsf-28 in roots and seeds after 14 days of development; and GmHsf-37 in young leaves and root nodules. GmHsf-02, GmHsf-19, and GmHsf-28 expressed at a low level, whereas GmHsf-08, GmHsf-25, GmHsf-33, and GmHsf-34 at an extremely high level. Expression levels were disparate in different soybean Hsf subclasses. Compared with others, the expression levels for subclass A3 were lower. Even in the same subclass, expression levels were varied. For example, GmHsf-17 transcripts reached maximum levels in young leaves, whereas GmHsf-33 reached maximum levels in flowers and pod shells at 14 DAF, and also in nodules. In addition, data from the tissue expression chip revealed differences in expression between 15 pairs of paralogous genes. For example, although GmHsf-20 and GmHsf-28 were expressed at relatively low levels, GmHsf-20 was expressed in 8 tissues and organs, and GmHsf-28 was expressed only in seeds 14 DAF and in roots at a very low level; Although GmHsf-23 was expressed much like GmHsf-37 in quantity, GmHsf-37 was expressed only in young leaves and nodules, whereas GmHsf-23 was also expressed in flowers and seeds at 35 DAF.
GmHsf-04, GmHsf-33, and GmHsf-34 were localized in the nucleus Three genes (GmHsf-04, GmHsf-33, and GmHsf-34) upregulated strongly by both heat and drought were selected for subcellular localization. Expression vectors with green fluorescent protein (GFP) tags were constructed for subcellular localization analysis. The coding regions of GmHsf-04, GmHsf-33, and GmHsf-34 were amplified from the soybean cDNA by PCR with specific primers and fused to the N-terminal of GFP under control of the CaMV 35S promoter. Subcellular localization of GFP expression in mesophyll cell protoplasts of Arabidopsis was monitored by confocal microscopy 16 h after transformation mediated by PEG; 35S::GFP vector was transformed as the control. As shown in Figure 8, control hGFP was uniformly distributed throughout the mesophyll cell protoplast, whereas GmHsf-04, GmHsf-33, and GmHsf-34 fusion proteins were exclusively localized in the nucleus. These results suggest that GmHsf-04, GmHsf-33, and GmHsf-34 are nuclear proteins, possibly serving as transcription factors.

Overexpression of GmHsf-34 improved tolerance to drought and heat stresses in Arabidopsis
According to the expression analysis, GmHsf-34 was strongly induced by drought and heat stresses. To confirm the functions of GmHsf-34 in abiotic stress response, three lines of Arabidopsis overexpressing GmHsf-34 were tested under drought and high temperature conditions, respectively. Seed germination and root growth of transgenic Arabidopsis were tested in the presence of 4% PEG ( Figure 9A to D). Under standard culture conditions, no significant differences in germination rate or morphology between transgenic and wild-type (Col-0) plants were observed. However, germination percentage of transgenic plants was enhanced by nearly 15% compared to wild-type after 2-3 days ( Figure 9A and B). In the presence of 4% PEG, roots of transgenic lines were longer than those of wild-type plants ( Figure 9C and D), showing that overexpression of GmHsf-34 improved tolerance to the imposed drought treatment in Arabidopsis. After heat stress treatment, survival rates of Arabidopsis seedlings overexpressing GmHsf-34 and wild-type were recorded ( Figure 9E and F). The survival rate of wild-type Arabidopsis seedlings was about 11.5%, whereas that of Arabidopsis seedlings overexpressing GmHsf-34 was improved to 58.5-62.5%. Obviously, the transgenic seedlings displayed higher tolerance to high temperature compared to wild-type plants.

Discussion
In previous work, 59 soybean Hsfs presented from the soybean Transcription Factor Database website (http:// soybeantfdb.psc.riken.jp/) were reported [27]. In this study, although a comprehensive set of 58 possible soybean Hsfs were obtained after scanning the current version of the soybean genome (JGI Glyma1.0 annotation), 38 nonredundant soybean Hsfs were finally identified and characterized. Comparison with the locus numbers indicated that soybean Hsfs identified in our study were completely included in reported 59 soybean Hsfs. The widely accepted model of Hsfs defines the necessity of Hsf-type DBD and OD characterized by coiled-coil structure [19]. Briefly, DBD ensures the Hsfs combination with HSEs, and coiled-coil domain is indispensable for trimerization leading to Hsfs activity. Consequently, we surveyed and discarded extra 21 similar Hsfs due to the absence of Hsf-type DBD domains and/or coiled-coil structures. The monocots rice and maize contain the same number of Hsf genes (25), whereas the numbers in dicots soybean (38) and Arabidopsis (21) are quite different. This probably results from the double duplications of genome in soybean [28] but only a single replication in Arabidopsis [29] during evolution. The cluster analysis indicated that Hsfs in the same subclasses in Arabidopsis and soybean, or maize and rice, belong to the same branch in accord with the evolutionary relationships of Arabidopsis and soybean being dicots and maize and rice being monocots. Several genes are unique to monocots or dicots. For example, the subclasses A8 and B3 are restricted to dicots, and A9 and C2 are characteristic of monocots, suggesting the evolution of these subclasses followed the divergence of monocots and dicots. In addition, the subclass A7 is absent in soybean, presumably was lost in the processes of gene recombination, mutation, or redundancy.
In recent years, research on the role of introns has made significant progress. Studies in mammals, nematodes, insects, fungi, and plants suggest that the introns not only regulate the gene expression, but also participate in the gene evolution [30]. Analysis of gene structures revealed that soybean Hsf genes contain a single intron except GmHsf-12 containing two introns. Combined with analysis of the gene chip expression results (Figure 6), the soybean Hsf gene GmHsf-12 was not expressed at lower levels than others under the normal conditions. Seemingly, an intron does not affect gene expressions. Combined with the analysis to phylogenetic evolution (Figure 1), the number and location of Hsf intron in the same subclass are conserved (Figure 3). For example, GmHsf-30 and GmHsf-34 in subclass A2 respectively at 307 bp to 2110 bp and 304 bp to 2113 bp section contains an intron, which indicated that introns could be a reference to the evolution of plant genes.
Transcriptional activity of class A Hsfs is normally mediated by the AHA motif in the C-terminal region. However, the AHA motif is absent in GmHsf-10 or GmHsf-14 in the A3 subclass in soybean (Table 3). It was proposed that proteins without an AHA motif were activated through formation hetero-polymers with other class A Hsfs [17]. Unlike class A Hsfs, most of the class B and C Hsfs do not have the transcription activation ability, since their CTDs lack detectable the AHA motifs. Instead, the class B of Hsfs is characterized with a tetrapeptide-LFGV-in the C-terminal region, which is assumed to function as a repressor motif in the transcription machinery. The previous research showed that several other transcription factors functioning as the repressors also contain a conservative tetrapeptide-LFGV-motif, such as ABI3/VP1, AP2/ERF, MYB and GRAS, although their mechanisms of the action remain unclear [31,32].
The signal transduction pathways are complicated networks where components work together to control the plant physiological and biochemical process. AREB1 (ABRE-BINDING PROTEIN 1), AREB2, and ABF3, members of class A bZIP transcription co-factors of the ABRE elements, regulate the response to osmotic stress through combining with the element in the DREB2A promoter region [33]. The analysis of cis-acting elements in their promoter regions revealed that the soybean Hsf genes contain the MYB/MYC elements and some contain the ABRE, DRE, and/or LTRE elements, demonstrating that the soybean Hsfs play significant roles in the regulation of stress responses ( Table 2). The MYB elements basically participate in the drought, low temperature, salt, ABA, and GA stress responses [34,35] and the MYC elements participate in the drought, salt, and ABA stress responses. We show that the Arabidopsis, rice, and maize Hsf gene promoters contain the MYB/MYC elements (Table 2) and we concluded that Hsfs are involved in the responses to drought, salt, and ABA in plants. However, the expression results from the gene chips did not agree perfectly with the conclusion. For instance, GmHsf-25 contained 15 cis-acting elements and its expression value was up to 1090, whereas the value for GmHsf-10 carrying 85 cis-acting elements was only 228 (Table 2 and Additional file 1: Table S1). One explanation may be that, several elements lost their activities or performed in a negative way.
In consideration of putative similar functions of Hsf genes in the same subclass, 19 genes containing all subclass were selected to investigate responses to drought and heat stresses using qRT-PCR. The results showed that the soybean Hsf genes were differently expressed under the drought stress and heat stress conditions. We detected that three soybean Hsfs (GmHsf-04, GmHsf-17, and GmHsf-33) belonged to the subclass A1 were expressed at significantly high levels under the drought stress and heat stress (Figure 7). Moreover, GmHsf-08 and GmHsf-34 (subclass A2), GmHsf-11 (subclass A4), GmHsf-09 (subclass A5), GmHsf-10 (subclass A8), GmHsf-25 (subclass B1), and GmHsf-16 (subclass C1) were up-regulated under both drought stress and heat stress conditions. It was reported that tomato HsfA1a and Arabidopsis HsfA2 function as master regulators for acquired thermo-tolerance [36,37], and tomato HsfB1 was a co-regulator with HsfA1a [24]. These results indicate that A1, A2, and several other subclass members may be involved in the drought stress and heat stress responses in plants. According to the soybean gene chip data, GmHsf-02, GmHsf-07, GmHsf-19, GmHsf-20, GmHsf-28, GmHsf-35, and GmHsf-37 were expressed at very low levels. Among them, GmHsf-07, GmHsf-20, and GmHsf-28 belong to subclass A3. It is likely that the functions of these soybean Hsf genes, especially subclass A3, are not related to drought or heat stress responses. Most of the soybean Hsf genes expressed in roots were regulated by drought while those expressed in young leaves were regulated by heat. This is consistent with the fact that the root is the first organ sensing drought stress whereas leaf is first to experience heat stress.
In the former publication [27], expression analyses of 5 soybean Hsf genes (GmHsf12, GmHsf28, GmHsf34, GmHsf35, and GmHsf47) were performed under heat, low-temperature, NaCl, and drought stresses, respectively. These 5 genes were named in our study as GmHsf-08, GmHsf-21, GmHsf-26, GmHsf-25, and GmHsf-32 respectively. We founded that GmHsf-08, GmHsf-21, and GmHsf-25 showed markedly up-regulation by heat, which was consistent with the former works, while GmHsf-26 showed no detectable alteration. Under drought condition, GmHsf-08, GmHsf-25, and GmHsf-26 showed to be up-regulated strongly while GmHsf-21 expression was not influenced. GmHsf-32 expressions under abotic conditions were not surveyed in our study. GmHsf-34 was strongly induced by drought and heat stresses and its overexpression improved survival rate and/or root development in Arabidopsis under simulated drought and heat conditions (Figure 9). Similarly, overexpression of Arabidopsis AtHsfA2, one of the most strongly induced genes, lead to enhanced tolerance to heat stress [38]. Given the close phylogenetic relationship of GmHsf-34 with AtHsfA2, it is speculated that GmHsf-34 functions as a typical transcription factor due to the existence of Hsf-type DBD, OD, NES, NLS, and AHA motifs, and participates in heat and drought responses.

Conclusions
Thirty eight soybean Hsf genes were initially identified and classified after scanning for the soybean genome data base. Their locations and duplications, intron-exon structures, DBD structures and HR-A/B, distribution of cis-acting elements in the soybean Hsf promoters, and the expression patterns were determined. Based on the expression analysis, we inferred that soybean Hsf subclasses A1 and A2 may be the primary regulators of the heat stress response Figure 9 Responses of GmHsf-34 overexpressing Arabdopsis to drought and heat stresses. (A) Seed germinations of wild-type (Col-0) and GmHsf-34 overexpressing Arabdopsis seeds. Seeds from three independent transgenetic lines of GmHsf-34 were grown on ½ MS medium with or without 4% PEG. Wild-type seeds were grown in the same condition as a control. Representative images of wild-type and one line (OE-1) of the transgenic plants were taken after treated by 4% PEG for 4 days. (B) Germination rates of wild-type and GmHsf-34 overexpressing Arabdopsis seeds. Germination rates were determined daily after 2 days stratification. Data represent means ± SD (n =90). (C) and (D) Root lengths of wild-type and GmHsf-34 overexpressing plants. Root length was measured by the ImageJ 2X software. Data represent means ± SD (n =90). (E) Representative images of wild-type and transgenic plants after treated under high temperature conditions. Four-week-old wild-type and GmHsf-34 overexpressing seedlings (OE-1, OE-2, and OE-3) were exposed to 42°C for 6 hours heat stress, and then transferred to normal condition. After one week, the photographs were taken. Three independent experiments were conducted. (F) Survival rates of wild-type and GmHsf-34 overexpressing plants after heat stress. One week after heat treated, survival rate of wild-type and transgenetic plants. Data represent means ± SD (n =90).
in soybean. GmHsf-34, a member of subclass A2, played an important role in the response to the drought and heat stress treatments imposed in this study.

Methods
Database searches for Hsf genes in soybean, Arabidopsis, and rice genomes The whole genome data and the repeat information of soybean and maize were obtained from JGI Glyma1.0 annotation [39]. The gene sequences and protein sequences of Arabidopsis and rice Hsfs were acquired from TAIR [40], and TIGR [41], respectively. The gene chip data of soybean were derived from SoyBase [42].

Identification and physical locations of soybean Hsfs
To gather the probable candidate soybean Hsf amino acid sequences, the Hsf-type DBD domain (Pfam: PF00447) was submitted as a query in a BLASTP (P = 0.001) search of the soybean genome data base. A total of 38 soybean Hsfs were obtained after manually filtering out repeated sequences, and sequences without integrated Hsf-type DBD domains or classic coilled-coil structures by SMART [43]. All non-redundant Hsfs were mapped on the 20 soybean chromosomes on the basis of the information in the soybean database using MapDraw software [44]. The paralogous genes are identified and connected by lines according to Lin's method [18].

Genetic structure and cis-acting elements
An exon-intron substructure map was produced by Tools Online GSDS [45], and Promoter 2.0 [46] was applied to predict the soybean Hsf promoters. Cis-acting elements were analyzed by referring to the plant cis-acting element database PLACE26.0 [47].

Domain prediction and phylogenetic relationships
Clustal X 2.0 [48] was applied in protein sequence comparison analyses of Hsfs in Arabidopsis, rice, and soybean. Database tools Pfam [49], SMART, PredictNLS [50] and NetNES [51] were consulted to analyze their typical functional structure domains. A phylogenetic tree was constructed using the adjacent method by MEGA5.0 [52] with a 1000 bootstrap value.

Expression patterns
An analysis was conducted using the soybean gene chip expression data, the analysis was carried out which included the 38 soybean Hsfs in the different tissues and development stages, and also the diversity of different genes, subclasses, organs, tissues, and development stages.

Plant materials and stress treatments
The soybean seeds were germinated in the vermiculite in a light chamber at 25°C for 14 days. The soybean seedlings were removed and exposed to a heat stress temperature (42°C) for 0, 6, and 12 h, after which they were sampled for RNA extraction. For the drought stress, soybean seedlings were removed from the soil, and dehydrated for 0, 6, and 12 h before being sampled and frozen in liquid nitrogen and stored at -80°C.

RT-PCR and qRT-PCR
The total RNA was isolated from the whole plants using an RNeasy Plant Mini Kit (Qiagen) according to the manufacture's handbook. The cDNA synthesis and reverse transcription-PCR (RT-PCR) were conducted as previously described [53]. Quantitative real-time PCR (qRT-PCR) for examination of the soybean Hsfs were performed with the SYBR Premix ExTaqTM kit (TaKaRa) and an ABI 7300 according to the manufacturer's protocols (Applied Biosystem). The expression patterns were analyzed with an ABI Prism 7300 sequence detection system (Applied Biosystems) as previously described [54]. The soybean Hsf genes primers for qRT-PCR were designed using the Primer Premier 5.0 software avoiding the Hsfs conservative domain and soybean Actin (U60506) was used as an internal control for normalization of the template cDNA.

Subcellular localization in Arabidopsis protoplasts
The expression vectors with green fluorescent protein (GFP) tags were constructed for the subcellular localization analysis as described previously [55]. The coding regions of GmHsf-04, GmHsf-33, and GmHsf-34 were amplified by PCR using the specific primers and fused to the Nterminal of GFP under control of the CaMV35S promoter. The subcellular localization of the GFP expression in the Arabidopsis protoplasts was monitored by confocal microscopy 16 h after polyethylene glycol mediated transformation as described [56].

Tolerance assays under stress conditions
The GmHsf-34 gene, which is induced by the drought and heat stresses, was selected to confirm gene functions. Expression vector pBI121::GmHsf-34 containing GmHsf-34 under control of the CaMV35S promoter was built. Three Arabidopsis lines overexpressing GmHsf-34 were obtained after the transformation mediated by agrobacterium (Agrobacterium tumefaciens). For the germination assays, the seeds of Col-0 and transgenic plants were placed on ½ MS medium containing no or 4% (w/v) PEG for 7 days. For the root growth assays, 4-day-old seedlings grown on ½ MS medium were transferred to ½ MS medium containing no or 4% (w/v) PEG for 4 days after which the root lengths were measured by a root system scanner. For the heat stress tolerance assays, the 21-day-old seedlings grown in the soil transferred from ½ MS medium were treated at 42°C for 6 h and then grown under the normal condition for several days.