Based on a genome-wide analysis, we propose that the soybean genome is enriched in small heat shock genes, presenting 51 Hsp20 genes, compared with the 31 and 39 Hsp20 genes observed in Arabidopsis and rice, respectively [12, 21, 30]. This greater expansion of the Hsp20 gene family in soybean was expected because the soybean genome has experienced successive duplications throughout its evolution . According to our analysis, a total of 23 gene duplications of the Hsp20 family can be predicted to have occurred in the soybean genome. As reported above, the vast majority of the GmHsp20 genes (47 gene models) were strongly induced by heat treatment (Figure 6), which suggests that our in silico pipeline for predicting GmHsp20 candidates was efficient. Three of the GmHsp20 candidates were expressed only under non-stressed conditions, while GmHsp19.5 was exclusively induced after M. javanica inoculation. These results indicate that some GmHsp20 genes exhibit functions that are unrelated to heat shock under normal growth conditions, for example, specific housekeeping activities, in addition to more specialized activities, such as in the response to biotic stress. This functional diversification of the Hsp20 gene family has also been reported in sunflower and rice [12, 31].
In earlier attempts to categorize the subfamilies of Arabidopsis Hsp20 genes, it was proposed that the majority of the AtHsp20 could be divided into seven subfamilies (CI, CII, CIII, M, P, ER and Px) and that five other genes do not fall into any subfamily [21, 22]. In a more recent analysis, the AtHsp20 gene family was extended to include 12 subfamilies based on placing the five uncategorized Hsp20 genes into four new nucleocytoplasmic subfamilies (CIV, CV, CVI and CVII) and adding a new mitochondrial subfamily, MII . A recent categorization of the rice Hsp20 gene family proposed a distribution of OsHsp20 genes into 16 subfamilies: four nucleocytoplasmic subfamilies (CVIII, CIX, CX and CXI) plus 12 subfamilies already identified in Arabidopsis). GmHsp20 clustering with Arabidopsis subfamily CVII and rice subfamilies CVIII and CX was not observed in the present study when bootstrap values were considered. However, in the 15 remaining subfamilies, we were able to identify at least 51 members, two of which subfamilies have not yet been described in the literature. Our results suggest that there are 10 nucleocytoplasmic subfamilies in soybean, the largest of which is subfamily CI, with 19 members (Figure 3).
The large number of GmHSP20 proteins classified into nucleocytoplasmic subfamilies is a feature shared with other species, such as Arabidopsis and rice [12, 22, 30], and indicates that the cytoplasm may be the primary site of action for HSP20 proteins. In the cytoplasm, where protein assembly occurs, a higher concentration of Hsp20 proteins could prevent in appropriate folding or interactions that could lead to the formation of prejudicial aggregates.
Notably, in the phylogenetic analysis, the Hsp20 genes from different species that are classified in the same subfamily were observed to be more closely related than the members of the same species that belong to different subfamilies. This finding gave us an indication that synteny might exist among soybean, rice and Arabidopsis HSP20 proteins. The Hsp20 genes most likely had a common ancestor that gave rise to the different subfamilies before the diversification within these species .
Three soybean genes (GmHsp28.7, GmHsp28.6 and GmHsp17.7A) were not grouped into any of the known Hsp20 subfamilies (Figure 3). Among these so-called orphan genes, GmHsp17.7A was not responsive to heat shock stress, despite its high similarity with rice Hsp23.2-ER, which is HT responsive (Figure 6).
Regarding gene organization, 64% (33 of the 51 gene candidates) of the soybean Hsp20 genes are intronless based on genome prediction and qRT-PCR data, which is similar to the percentage reported for rice Hsp20s (74%) . Few of the GmHsp20 genes contain introns, and their lengths are highly variable. The relationship between the occurrence of introns and the expression level of a gene is controversial [32, 33]. In some studies, the absence of an intron, or a short intron length, has been found to enhance the level of gene expression in plants [34, 35]. In addition, there are indications that during evolution, genes must be rapidly activated in response to stress tend to show a decreased intron density . This may be the mechanism that has led to more rapid induction of the expression of plant Hsp20 genes, which occurs within a few minutes after the initiation of heat shock .
Among the GmHsp20 genes containing introns, 10 (35.71%) contain only one intron, and two (GmHsp18.0B and GmHsp18.2A) contain an intron in the 5′UTR region; these two genes were induced by cold stress. According to Kamo et al. , the presence of an intron in the 5′UTR region can potentiate the translation process.
Furthermore, our results indicate that the GmHsp20s can be classified as unstable proteins, since 76.5% of aminoacid sequence showed an unstable profile (when instability index threshold were considered ) (see Additional file 2: Table S4). An unstable profile is believed to be a common feature among stress-induced proteins . Considering that HSP20 proteins are synthesized at a specific time in the cell, their instability indicates a rapid turnover that should allow transcriptional regulation of these proteins in the cellular environment [31, 40].
As expected, the GmHsp20 genes were preferentially located in terminal regions of the soybean chromosomes, which have been demonstrated to be enriched in genes in the soybean genome . This localization might contribute to the occurrence of segmental duplications in the soybean Hsp20 family. Similarly, the genome duplications experienced by the soybean genome during its evolution and the high recombination rates between segmental regions of homologous chromosomes might have increased the occurrence of gene duplications  and, consequently, favored the expansion and functional diversification of the GmHsp20 family. Based on our analysis and the findings of Schmutz et al. , particularly their conclusions about soybean genome evolution and organization, we suggest that the evolution of the soybean Hsp20 family has involved a total of 23 gene duplications, five of which were segmental on four different chromosomes (Figure 5). The same number of duplications has been reported for the rice genome, in which 23 OsHsp20 genes were originated via duplication events . These segmental duplications appear to have contributed significantly to increasing the number of members of the soybean CI subclass, located on chromosomes 7, 8, 13 and 14. In rice, the CI members are also distributed in clusters of segmental duplications .
Considering the concept of parsimony, the conservation of this pattern of Hsp20 gene duplication within the same chromosome observed in the genomes of rice, Arabidopsis and soybean most likely originated through segmental duplication events that occurred before the divergence of monocots and dicots, in the ancestral species, followed by chromosomal duplications in both the ancestral species and within each species [25, 42]. Still, it is notable that three of the six GmHsp20 genes that are responsive to M. javanica and H. glycines infection (GmHsp17.4A, GmHsp17.9B  and GmHsp17.6B ) are organized in blocks of segmental duplications in the genome. In soybean, expansion of the segmental gene families associated with the basal resistance response is common and has been observed in families including NBS-LRR, F-box and auxin-responsive genes . Such duplications may contribute to the diversification of relevant alleles during plant-pathogen interactions or to the maintenance of similar levels of gene expression within the block, as observed in rice [30, 43]. However, unlike the results reported by Ouyang et al. , the expression pattern of the tandem duplicated genes, under the stress conditions tested here, was observed to be highly heterogeneous for GmHsp20.
Based on the organization of the soybean genome, the number of Hsp20 paralog gene groups observed between chromosomes 14, 2, 4 and 17 corroborates their high synteny, as described by Schmutz et al. . Furthermore, this organizational characteristic was observed between chromosomes 20 and 10 as well as between 6 and 4, but 7.08% of the length of chromosome 20 is still homologous to fragments of four other chromosomes . The putative interchromosomal duplication observed between GmHsp22.4 and GmHsp22.0, located at the ends of the lower arms of chromosomes 10 and 20, respectively, is an example of the high rate of recombination between homologous chromatids in chromosome arm end regions.
Our expression analysis showed that the regulation of soybean Hsp20 genes is generally associated more with heat stress than with the other tested stresses. A total of 47 GmHsp20 candidates, including all of the organellar genes, were highly induced under heat shock stress in the roots and leaves, showing variation that ranged from four up to 10,000 times higher expression at 42°C compared with the control condition. The Hsp20 chaperone function under heat shock has been elucidated, but the functional roles of these proteins under other stresses or non-stress conditions have not been extensively worked out. The fact that these genes can be induced not only by heat shock but also under other stress conditions, as demonstrated in this study, reflects an interconnected mechanism of induction involving the HSFs. Hsp20 genes are known to be specifically controlled by different HSFs, which is interesting considering that there are 52 soybean HSF genes, while other species have closer to 30 HSFs [44, 45].
The expression profiles of subfamily CIV and GmHsp17.7A differed from all of the other clustered nucleocytoplasmic GmHsp20 genes, mainly because they were not altered by HT, even when the leaves were tested. The tissue-specific expression patterns of Hsp20 genes have been reported in different species. In Arabidopsis, the expression profile of some AtHsp20 genes under heat shock shows great variation depending on the tissue tested , while in rice, the expression profiles of the OsHsp18.8-CV and OsHsp19.0-CII genes were shown to be regulated differently in flowers and pistils, respectively . In contrast, our results demonstrate very similar expression profiles of the GmHsp20 genes among the tissues analyzed under heat shock treatment (four GmHsp20 and two Acd genes).
The GmHsp22.4, GmHsp17.9B, GmHsp17.9A and GmHsp17.4 genes were induced by M. javanica in the susceptible genotype and have been described by Kandoth et al.  as also being responsive to H. glycines infection (Figure 7). Similarly, four OsHsp20 genes were found to be induced under the biotic stress of infection with M. grisea fungus . Similarity analysis revealed that the rice gene Hsp16.9A-CI is homologous to GmHsp17.9B, suggesting that a functional role of this gene, being activated under pathogen infection, might be conserved. Furthermore, two other genes (GmHsp22.4 and GmHsp17.6B) are clearly involved in biotic responses. In earlier attempts, GmHsp17.6B was mapped to a QTL responsible for Meloidogyne javanica resistance and displaying a differential expression profile in resistant and susceptible soybean genotypes [3, 7]. In our analyses, GmHsp22.4 was shown to be highly induced in the resistant genotype compared with the susceptible genotype; this gene has been described as being associated with the response to H. glycines infection in soybean  and as being located near a biotic resistance QTL (http://soybase.org) (Additional file 4: Table S7).
In silico analysis of the GmHsp20 promoter were in line with previous results that reported the occurrence of putative HSEs within −83 bp from the transcription start site in Hsp20 genes that are responsive to nematodes. Five GmHsp20 genes induced by M. javanica followed this pattern described by Barcala et al.  (Table 1 and Figure 8). Only GmAcd23.1 and GmHsp19.5, which were induced by nematodes, did not exhibit this pattern.
In Arabidopsis mutants for Hsp20 genes involved in the responses to nematode infection, the TATAbox element should be preferentially located between 12 and 21 bp upstream of the transcription start site, followed by an HSE at around −83 bp and a CCAAT box between 84 and 141 bp upstream of the transcription start site . The promoter of one Hsp20, a CAAT box element was previously reported in the promoter region of Hs1 pro-1, a gene conferring complete resistance to H. glycines, and appears to be essential for site-specific regulation . The promoters of all Hsp20, which are responsive to nematode infection, also show putative CAAT elements. Moreover, the GmHsp20 biotic stress-responsive genes followed the same pattern observed in Arabidopsis and sunflower and not observed in the others Gmhsp20, where the CAAT box always occurs either between the HSEs or immediately upstream of them, while the W-box, when present, is further upstream of the HSE. However, previous studies have shown the function of these cis-elements in the Hsp20 regulation in Arabidopsis and sunflower, the involvement of them in soybean responses to nematodes need to be checked by in vivo experiments [9, 14].
TA-rich elements have been described as being directly involved in the regulation of the expression level of an Hsp20 gene in response to nematode infection in soybean , and they appear to act by altering the distances between other cis-elements, interfering with the strength of the promoter . The number of TA repetitions in the promoter region of a soybean genotype resistant to M. javanica appears to be correlated, in a significantly higher level, with GmHsp17.6B expression observed in response to this stress. The resistant plants contain 32 TA repetitions in the GmHsp17.6B promoter region, while the susceptible plants have only nine . Our in silico analysis showed the occurrence of a putative TA region in the promoter regions of GmHsp20 responsive to M. javanica infection (GmHsp17.6B, GmHsp22.4, GmHsp17.9B and GmAcd23.1). It will be now interesting to investigate if these TA rich regions are really GmHsp20 cis-elements i.e., if the number of TA repetitions can be correlated to nematode resistance for these genes and if the deletion of TA region can interfere in gene expression.
Two Acd genes, GmAcd33.0 and GmAcd23.1, were not induced by heat shock in our analyses, and a sequence comparison showed that these genes exhibit high homology to the rice genes OsAcd41.4 and OsAcd31.8, respectively. The cellular roles of the Acd genes are not very well established, but their homologs in rice and Arabidopsis have been shown not to be involved inheat shock responsive (HSR) . These findings, combined with the irregular localization of ACD at the N-terminal ends of the proteins, might suggest that these genes are not real Hsp20 genes . Interestingly, however, both genes present a normal HSE distribution in their promoters, and one of them, GmAcd23.1, was induced under biotic stress in the susceptible genotype. Thus, it appears that the Acd genes might play roles similar to the constitutive Hsp20 genes or could be proteins that are involved in specialized functions.