Genome-wide identification of heat shock proteins (Hsps) and Hsp interactors in rice: Hsp70s as a case study

Background Heat shock proteins (Hsps) perform a fundamental role in protecting plants against abiotic stresses. Although researchers have made great efforts on the functional analysis of individual family members, Hsps have not been fully characterized in rice (Oryza sativa L.) and little is known about their interactors. Results In this study, we combined orthology-based approach with expression association data to screen rice Hsps for the expression patterns of which strongly correlated with that of heat responsive probe-sets. Twenty-seven Hsp candidates were identified, including 12 small Hsps, six Hsp70s, three Hsp60s, three Hsp90s, and three clpB/Hsp100s. Then, using a combination of interolog and expression profile-based methods, we inferred 430 interactors of Hsp70s in rice, and validated the interactions by co-localization and function-based methods. Subsequent analysis showed 13 interacting domains and 28 target motifs were over-represented in Hsp70s interactors. Twenty-four GO terms of biological processes and five GO terms of molecular functions were enriched in the positive interactors, whose expression levels were positively associated with Hsp70s. Hsp70s interaction network implied that Hsp70s were involved in macromolecular translocation, carbohydrate metabolism, innate immunity, photosystem II repair and regulation of kinase activities. Conclusions Twenty-seven Hsps in rice were identified and 430 interactors of Hsp70s were inferred and validated, then the interacting network of Hsp70s was induced and the function of Hsp70s was analyzed. Furthermore, two databases named Rice Heat Shock Proteins (RiceHsps) and Rice Gene Expression Profile (RGEP), and one online tool named Protein-Protein Interaction Predictor (PPIP), were constructed and could be accessed at http://bioinformatics.fafu.edu.cn/. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-344) contains supplementary material, which is available to authorized users.


Background
Plants have evolved a spectrum of molecular programs to adapt to environmental stresses. To survive, plants undergo dramatic changes in physiological and molecular mechanisms [1]. For instance, heat shock proteins (Hsps) are stimulated in response to a wide array of stress conditions and perform a fundamental role in protecting plants against abiotic stresses [1,2].
Furthermore, detailed studies have established that the overexpression of Hsp70 genes enhanced the plant's tolerance to environmental stresses [15][16][17]. Transgenic rice lines that overexpress sHsp17.7 exhibit increased drought tolerance during the seedling stage [18]. However, the cellular mechanisms underlying Hsp function under abiotic stress are not fully understood [3]. The completion of the Rice Genome Sequencing Project and high-throughput experimental methods have generated valuable data that can be used to identify proteins that interact with Hsps in rice, and consequently decipher the functions of Hsps.
Many computational approaches have been proposed to predict protein-protein interactions. In terms of test dataset types, these approaches can be grouped into three classes: sequence-oriented methods [19][20][21][22], gene expression profile-based methods [23] and structure-oriented methods [24]. Interolog, a sequence-oriented method, has been widely used to construct protein-protein interactions (PPIs) in diverse organisms [10,[25][26][27]. This method is based on the principle that orthologous pairs can be detected by mapping those known interactions in the source organism onto the target organism [21]. The gene expression profile-based methods identify genes that exhibit correlated changes in expression over conditions, since they tend to have similar functions or be involved in cellular processes [23,28]. Each protein interaction mapping technique has different advantages and disadvantages [29], and the techniques are complementary to some extent. In this study, we integrated interolog-and gene expression profile-based methods to identify the interactors of Hsps in rice.
To carry out more reliable functional analysis, we first conducted a genome-wide screening for the true Hsps in rice using integration of orthology and expression association data. Then, we used interolog-and expression profile-based methods to identify Hsp70s interactors in rice response to abiotic stresses. Through mining the signal behind their interactors, we further investigated the pattern of binding sites and the interaction network of Hsp70s in response to abiotic stresses.

Results
Gene expression in rice subjected to abiotic stresses Four sets of gene expression data from rice seedlings exposed to drought, salt, cold and heat treatment were collected (Table 1) from the Gene Expression Omnibus (GEO) [30]. The K-nearest neighbor (KNN) impute method was used to estimate the missing values in Gene-Chips [31]. A total of 22,707 probe-sets with detectable expression values were selected from these GeneChips. Within-slide normalization ( Figure 1) and multiple-slide normalization ( Figure 2) were performed sequentially to minimize systematic variations.
Then, we identified heat-responsive (HR) probe-sets and estimated the global gene-gene pairwise relationships. In this study, we applied boxplots [32,33] to identify HR probe-sets, which were defined as a group of probe-sets that were significantly up-or down-regulated by heat treatments. A total of 1,135 (5%) HR probe-sets that were expressed differentially under heat stress were detected ( Figure 3). Among them, 651 probe-sets were up-regulated, while 484 probe-sets were down-regulated. Meanwhile, bootstrap analysis [34] was performed to estimate the absolute median value of Pearson Correlation Coefficients (PCC) between any pair of genes. The bootstrapped 95% confidence interval for the population ranged from 0.5648 to 0.5842 ( Figure 4).

Genome-wide identification of Hsps in rice
Hsps screening in the rice proteome consisted of three steps. First, 41 candidate protein sequences, which were annotated as Hsps and contained the characteristic domains (Additional file 1: Table S1) of Hsps in Uniprot database [35], were downloaded. These sequences included 23 small Hsps (sHsps), eight Hsp70s, four Hsp60s, three Hsp90s and three Hsp100/ClpBs. Second, 10 of the 41 candidate proteins, whose expression value was absent in GSE6901 (GeneChips for drought, salt, and cold treatments) or GSE14275 (GeneChip for heat treatment), were filtered out. Third, since Hsps can stimulate a wide range of HR genes [3,36], and those genes involved in similar functions or cellular processes are likely to have similar expression profiles over conditions [23]. So we supposed the true Hsp genes should have a higher expression correlation with HR probe-sets compared with other genes. Therefore, 27 candidate genes, whose expression patterns were similar to that of the HR probe-sets (Table 2), were ultimately recognized as Hsps, including 12 sHsps, six Hsp70s, three Hsp60s, three Hsp90s and three Hsp100/ ClpBs (Table 3). The average absolute value of the PCC between them and HR probe-sets reached 0.76, which was markedly greater than that of the global pairwise values (0.5648-0.5842) and the value of the Ubq5/control (0.5089). Genome-wide identification of the interactors of Hsps in rice, with a focus on Hsp70s Using the interolog method, 9,132 potential PPIs related to Hsps in rice (Additional file 1: Table S3) were mapped from the experimentally identified PPI in yeast [37]. The predicted PPIs corresponding to 6 Hsp70s accounted for nearly 45% of the total interactions (4,091 out of 9,132). Therefore, in this paper, Hsp70s were selected as a case study.
Each of 6 Hsp70s sequences was used as a query to search its interactors in rice based on interlog method. After that, we applied an expression profile-based method to reduce the false-positive rate of Hsp70s PPIs predicted by interolog. The expression relationship between each interacting partner was further measured by Pearson Correlation Coefficients (PCCs). We found that the absolute PCC of 1,072 PPIs related to Hsp70s, including 430 interactors, were greater than 0.90  (Additional file 2: Supplemental Data 1A). Upon exposure to abiotic stresses, the expression of 166 interactors showed a positive relationship with that of Hsp70s, while the expression of 264 interactors was negatively correlated with that of Hsp70s ( Table 4).

Assessment of the PPIs of Hsp70s in rice
Two computational methods were used to evaluate the overall quality of the above prediction. Randomized PPIs were generated and used as a control.
First, the co-localization method was applied to assess the Hsp70 PPIs. This method is based on the principle that interacting proteins are more likely to localize to the same cellular compartment than randomized pairs [38]. The subcellular localization annotation of each protein in rice was obtained from WoLF PSORT [39], a stringent protein localization predictor based on experimental data. All of the predicted Hsp70s interactors contained subcellular localization annotations (Additional file 2: Supplemental Data 1B). We found that 582 PPIs (54% of 1,072 predicted PPIs) localized in common cellular compartments. In contrast, the maximum number of PPIs localized in the same subcellular compartment in 1,000 randomly repeated networks was 553 (51% of 1,072 randomized PPIs) ( Figure 5), which was significantly lower than that of the predicted Hsp70 PPIs (empirical p-value < 0.001).
Second, we used the co-function method to test the overall quality of predicted Hsp70s PPIs. This method is based on the assumption that interacting partners tend to participate in the same cellular processes or share similar functions [22,39]. The 6 Hsp70s contained four different GO terms (GO:0044260, GO:0005524, GO:0051082 and GO:0006457) in biological processes (BPs) or molecular functions (MFs). The result showed that 385 of 430 predicted Hsp70 interactors had GO annotations (Additional file 2: Supplemental Data 1B), and 300 of these interactors (78%) shared at least one common GO term with Hsp70s.  Bootstrap distribution of the estimated median absolute PCC value between the expression value of any two probe-sets in the GeneChips. Ten thousand non-redundant probe pairs were randomly selected, and the absolute PCC value between each pair was computed. Based on these 10,000 PCC values, 100,000 bootstrap samples were built by sampling with replacement, and the 95% confidence interval of the global median absolute PCC value was determined as ranging from 0.5648 to 0.5842.

Identification of the binding sites of Hsp70s in rice
The above assessments provided strong support for the reliability of the Hsp70 interactors predicted in this paper. Therefore, we used these interactors as the positive dataset, and constructed a negative dataset composed of 10,158 proteins that were less likely to interact with Hsp70s. Since binding sites tend to occur more frequently in interacting proteins than in non-interacting proteins [40], we sought to detect over-represented domains or motifs by comparing their frequency of occurrence in the two different datasets. The annotations of rice protein domains were obtained from Pfam [41]. We identified 102 domains of 397 proteins in the positive dataset (Additional file 2: Supplemental Data 1B), and 2,628 domains of 7,746 proteins in the negative dataset. The number of negative samples was much greater than that of positive samples (20:1). To reduce this bias, we implemented one-tailed Fisher's exact test [42] to detect the over-represented domains in the coordinated datasets (i.e., 397 positive samples versus 794 samples in the negative dataset; a ratio of 1:2), and used the Benjamini and Hochberg (BH) method [43] to control the false discovery rate (FDR). In addition, the above procedure was repeated 10 times by randomly changing the negative samples. Finally, 13 domains were detected with p-value lower than 0.05 in the 10 replicas (Additional file 3: Supplemental Data 2A). Similarly, we analyzed the binding motifs of Hsp70s in rice. The motif annotations were acquired from PROSITE [44,45]. There were 113 motifs in 404 proteins among the positive samples (Additional file 2: Supplemental Data 1B), while there were 1,071 motifs in 10,081 proteins among the negative samples. Twenty-eight overrepresented motifs were ultimately investigated (Additional file 3: Supplemental Data 2B).

Functional analysis of Hsp70s in rice
It is expected that the functions of proteins can be deduced from their interactors. As mentioned above, among the 430 interactors of Hsp70s, 385 have BP or MF GO annotations (Additional file 2: Supplemental Data 1B).
Furthermore, 147 interactors, whose expression levels positively correlated with that of Hsp70s, contained 109 GO annotations. In contrast, the 238 interactors, whose expression levels negatively correlated with Hsp70s, had 90 different GO annotations. The two distinct groups were defined as Positively Correlated Interactors (PCIs) and Negatively Correlated Interactors (NCIs). Using GO enrichment analysis, we found that 24 BP GO terms and five MF GO terms with p-values less than 0.05, were enriched in the PCIs compared with that in NCIs (Additional file 4: Supplemental Data 3A), suggesting that these biological processes or functions would be induced   First step  23  4  8  3  3  41   Second step  14  4  7  3  3  31   Third step  12  3  6  3 3 27 First step: Proteins that were annotated as heat shock proteins and contained the specific domains of heat shock proteins were downloaded from Uniprot database; Second step: Hsp candidates, whose expression value was absent in GSE6901 or GSE14275, were filtered out; Third step: Candidates, whose expression patterns were strongly correlated with the patterns of the HR probe-sets, were ultimately recognized as heat shock proteins.

Construction of tools and riceHsp database
We constructed two databases, named Rice Heat Shock Proteins (RiceHsps) and Rice Gene Expression Profile (RGEP), and one online tool, named Protein-Protein Interaction Predictor (PPIP). The RiceHsps was built to store and show our predicted results in this paper. The RGEP was constructed to store the integrated gene expression data for rice subjected to abiotic stresses, including drought, salt, cold and high temperature. It also provided a function for identifier conversion among Michigan State University Osa1 Rice Locus (MSU ID), Rice Annotation Project Locus (RAP ID) and Affymetrix Rice Genome Probe-set (Affymetrix ID) (Figure 7). The tool PPIP was developed based on the interolog method. Once the user uploads at least two protein sequences in FASTA format into the text area, or a sequence file less than 2 Mb, the corresponding orthologous protein pairs, whose interaction has been verified by biochemical experiments in the selected model organism, will be retrieved ( Figure 8). These online databases and tool can be accessible at http://bioinformatics.fafu.edu.cn.
(2009) identified 23 sHsps in rice [11], 12 of which were confirmed in this paper and showed a strong relationship with HR probe-sets under abiotic stresses. According to orthology-and expression level-based data, Singh et al.
(2010) discovered three Hsp100/ClpB proteins in rice [12], which were consistent with the result of this paper. We further noted that the expression pattern of the three Hsp100/ClpBs closely resembled that of HR probe-sets under abiotic stresses. Recently, Sarkar et al. (2013) identified 32 Hsp70 genes through sequence analysis and orthology-based method [13], including all the six Hsp70s in this paper. However, in this study, we not only adopted the sequence and orthology information, but also the gene expression association information to identify true Hsps in rice. Given that similar proteins in different species may have different functions, one has to take into account that an orthology-based strategy alone is not adequate to identify true Hsps in rice. Furthermore, it is not reliable to screen Hsps for evaluating the gene expression levels of candidates in rice in response to high-temperature stress, because some Hsps express constitutively [3]. Therefore, we used a   combination of orthology and expression association data to identify a highly reliable Hsps in rice.

Binding sites of Hsp70s in rice
Investigating the binding sites of Hsp70s will provide insight into the activity of those proteins and improve our ability to predict the potential risks of a particular mutation. In this study, we identified 13 domains and 28 motifs that occurred more frequently in the positive dataset than in the negative dataset, suggesting that these sequences are potential target sites for Hsp70s in rice. Therefore, the results of this study provided useful clues for experimental biologists in further analyzing the function of Hsp70s.

The Hsp70 interaction network in rice
The Hsp70s network was shown in Figure 9, and described in the following sections. We classified the interaction network into five sub-networks.

Sub-network A: Macromolecular translocation
Our results showed that the small GTPase Ran (LOC_ Os01g42530), importin α (LOC_Os01g14950, LOC_Os05g 06350) and importin β (LOC_Os05g28510) could bind to Hsp70s. Hsp70 and importin β were previously identified as Ran-interacting proteins (Rips) [49]. The results of this study indicated that the Ras family domain (PF00071) and ATP/GTP-binding site motif A (P-loop) (PS00017) of the small GTPase Ran were potential interacting sites of Hsp70s. Furthermore, the expression of Ran and importin proteins was strongly correlated with that of Hsp70s (PCC > 0.90) under abiotic stresses (Additional file 5: Figure S1; Additional file 1: Table S5). We then constructed a protein-protein interaction network consisting of Hsp70s, GTPase Ran and importin proteins in rice ( Figure 9A).
Importin α recognizes the nuclear localization signal (NLS) of nuclear proteins in the cytoplasm, forming a stable complex termed the nuclear pore-targeting complex (PTAC) [50,51]. Importin β docks the PTAC to the cytoplasmic face of the nuclear pore complex (NPC) [52], a channel for macromolecules into the nucleus [53]. In addition, the hydrolysis of GTP by the small GTPase Ran has been shown to be essential for the translocation of docked PATC into the nucleus [54]. Therefore, the interaction network between Hsp70s, GTPase Ran and importin proteins in rice might be involved in translocation of macromolecules. Shulga et al. (1996) stated that Hsp70 could act as a molecular chaperone to promote the formation and stability of the nuclear localization signalcontaining complex during both targeting and translocation phases of nuclear transport [55].

Sub-network B: Plant carbohydrate metabolism
The results of this study revealed that Hsp70s interacted with enolase (LOC_Os09g20820), fumaratehydratase (LOC_Os03g21950), malate dehydrogenase (LOC_Os07 g43700, LOC_Os01g61380, LOC_Os05g49880) and citrate synthase (LOC_Os02g10070), which were constructed in sub-network B ( Figure 9B). Most of these potential interactions have been partly validated by previous studies. In vitro studies indicated that Hsp70 might assist in transporting fumaratehydratase between the cytosol and mitochondria [56]. Furthermore, it has been reported that the Hsp70 complex significantly increased the spontaneous rate of refolding of denatured mitochondrial malate dehydrogenase [57]. Hsp70s have also been demonstrated to reduce the aggregation of citrate synthase under heat stress [58]. Recently, through co-immunoprecipitation (CoIP) assays, Luo et al. (2011) further confirmed that Hsp70 could directly interact with α-enolase [59].
Our results indicated that the expression levels of Hsp70s were positively and strongly correlated with that of enolase, fumaratehydratase, malate dehydrogenase and citrate synthase in response to abiotic stresses (Additional file 5: Figure S2; Additional file 1: Table S6), implying that Hsp70s might have essential functions in stimulating carbohydrate metabolism by regulating the activity of those key enzymes. In a metabolomics study, Kaplan et al. (2004) also found that carbohydrate metabolism was affected by heat shock in Arabidopsis [60]. The amount of pyruvate and oxaloacetate increased coordinately upon heat shock, while the fumarate and malate (oxaloacetate precursors) contents were similarly elevated, suggesting that the Embden-Meyerhof-Parnas (EMP) pathway and tricarboxylic acid cycle (TCA) cycle would be enhanced by abiotic stresses.

Sub-network C: plant innate immunity
In this study, we found that Hsp70s might cooperate with members of the small GTPaseRac family (LOC_Os01 g12900, LOC_Os02g02840, LOC_Os02g20850), Hsp90 (LOC_Os06g50300, LOC_Os08g39140), SKP1 (LOC_Os 09g36830) and MAPK6 (LOC_Os06g06090), as shown in Figure 9C. Hsp70, Hsp90 and RAR1 have been documented as the components of Rac1 complex in rice, based on CoIP experiments [61]. Moreover, multiple lines of evidence have shown that Hsp70 was a negative regulator of ASK1/MAP3K, and overexpression of Hsp70 inhibited the MAPK signaling cascade, which was associated with apoptosis [62][63][64]. Consistent with previous studies, our results further illustrated that the expression level of Hsp70s was positively correlated with that of Rac, Hsp90 and SKP1, and negatively correlated with that of MAPK6 in response to abiotic stresses (Additional file 5: Figure S3; Additional file 1: Table S7). Furthermore, in addition to Rac (PF00071 and PS00017, PS51420), MAPK6 (PF00069 and PS50011, PS00108, PS00107, PS01351) also contained potential binding sites for Hsp70s.
Previous reports have shown that Hsp90 and two cochaperone-like molecules, RAR1 and SGT1, performed a key role in effector-triggered immunity (ETI), the second line of the plant defense system [61,65,66]. Additionally, in vitro studies have indicated that SGT1 can interact with SKP1 and link it to the Hsp90 co-chaperone complexes [67]. Further research found that the SKP1-CULLIN1-Fbox (SCF) complex regulated the stability of resistance (R) proteins [68], suggesting that SKP1 might also be involved in the ETI response. In addition, the small GTPase Rac could function as a critical switch downstream of two types of innate immunity: PAMP-triggered immunity (PTI) and effector-triggered immunity (ETI) [66]. This finding was recently supported by Jung et al. (2013). They found that the OsctHsp70-1 had a functional association with Ras/Raf-mediated MAPK kinase cascades [14].

Sub-network D: photosystem II repair
Sub-network D showed that Hsp70s might interact with FtsH families (LOC_Os06g51029, LOC_Os01g62500 and LOC_Os01g43150) ( Figure 9D). Indeed, this interaction has been previously confirmed by Shen and colleagues [69]. In this study, we found that there was a close positive correlation (PCC > 0.90) between the expression of Hsp70s and FtsH families in rice subjected to abiotic stresses (Additional file 5: Figure S4; Additional file 1: Table S8). The AAA-protein family signatures (PF00004, PS00674) of FtsH proteins were identified as potential target sites for Hsp70s. Previous showed that FtsH family members played an important role in the D1 repair cycle of PSII [70][71][72]. Using native gel electrophoresis, Yokthongwattana et al. (2001) revealed that Hsp70s could form a complex with intact D1 protein and also with D2 and CP47 [73], suggesting Hsp70s have a function in the photosystem II (PSII) repair cycle.

Conclusions
By integrating orthology and functional association data, we identified 27 Hsps in rice, including 12 sHsps, 6 Hsp70s, 3 Hsp60s, 3 Hsp90s and 3 Hsp100/ClpBs. Then, using Hsp70s as a case study, we identified 430 interactors of Hsp70s in rice by combining interolog-and expression profile-based methods. According to the interactors of Hsp70s, we investigated the potential binding sites of Hsp70s, and analyzed the interacting network of Hsp70s in rice. Finally, we constructed two online databases and one tool, which could be accessed at http://bioinformatics. fafu.edu.cn/.

Yeast interaction data
Eight hundred and thirty-seven experimentally verified protein-protein interaction (PPI) pairs related to Hsps in yeast (Additional file 1: Table S2) were manually selected from the Database of Interaction Proteins (DIPs version 20101010; http://dip.doe-mbi.ucla.edu/dip/).

Microarray dataset
Gene expression data for rice subjected to drought, salt, cold or heat treatments were downloaded from GEO (accession number GSE6901 for drought, salt and cold treatments, and GSE14275 for heat treatment). All data were obtained using the same microarray platform (Affymetrix GeneChip Rice Genome Array; platform accession number GPL2025) and rice seedling samples (Table 1).

Heat-responsive probe-sets detection
Boxplot [32,33] in R was implemented to identify heatresponsive (HR) probe-sets. Probe-sets with M-values (log ratios) located beyond the upper or lower fence of the boxplot were considered as HR gene probe-sets.

Estimation of the global median absolute value of Pearson Correlation Coefficient (PCC)
The bootstrap method [34] was used to evaluate the median absolute value of PCCs between the expression levels of any two probe-sets among GeneChips. First, 10,000 non-redundant probe pairs were randomly selected, and the absolute PCC between each pair was computed. Based on these 10,000 PCC values, 100,000 bootstrap samples were built by sampling with replacement to measure the 95% confidence interval of the global median absolute value of PCC.

Identification of rice Hsps
Rice candidate Hsps were selected from the Uniprot database. These sequences satisfied the following criteria: (1) they possessed the conserved domains of Hsps (Additional file 1: Table S1); (2) they were functionally annotated as Hsps or involved in similar biological processes; (3) the sequence length was in agreement with the molecular mass of different Hsp family members; (4) Evidence at RNA or protein expression level; and (5) they were identified as Hsps in the MSU Rice Genome Annotation Project. After that, their corresponding Affymetrix IDs were retrieved from Ricechip.org (http://www.ricechip.org/). R software was used to calculate the PCC values between expression data of each candidate Hsp and HR gene probe-set.

Prediction of proteins interacting with Hsp70s in rice Interolog approach
For each experimentally verified PPI of Hsps, the pairwise amino acid sequence was locally run through BLASTP (version 2.2.23+) [85] against the entire rice proteome in an effort to identify orthologs in rice. The E-value cutoff, identity and alignment coverage were set at 10 −10 , 30% and 40%, respectively. Based on the core principle of interolog [19,21], corresponding orthologous pairs in rice were predicted to interact with each other. Briefly, if two interacting proteins, A and B, in yeast had the corresponding orthologs A' and B' in rice, respectively, A' might interact with B'.

Expression profile-based method
For each PPI predicted by interolog, we determined the absolute value of PCC between the corresponding gene expression data. R software was used to calculate the PCC values. Generally, the PCC values ranged from −1 to 1. A value of 1 indicated that the gene expression level of protein A would increase as that of protein B increased. In contrast, a value of −1 implied that the gene expression level of protein A would decrease as that of protein B increased. A value of 0 implied that there was no linear correlation between the expressions of these two genes. If the absolute value was less than 0.90, the PPI was filtered out.In addition, t-test was utilized to evaluate whether the paired PCC value was significantly greater or less than 0.

Assessment of PPIs of Hsp70s in rice
Protein localization method Subcellular localization information of proteins in rice was obtained from WoLF PSORT [39]. In addition, 1,000 randomized networks, in which the interacting partners of Hsp70s were randomly replaced by other proteins containing meaningful subcellular localization annotations in the rice proteome, were used as a control. The above process was repeated 1,000 times.

Function similarity method
The GO annotations of proteins in rice were downloaded from agriGO (http://bioinfo.cau.edu.cn/agriGO/download. php) [86]. Furthermore, 1,000 randomized repeats of Hsp70 interactors were generated. The predicted interactors of Hsp70s were randomly replaced by other proteins possessing GO annotations in the rice proteome. The above procedure was repeated 1,000 times.

Identification of binding sites of Hsp70s in rice Non-interactors dataset
Non-interactors of Hsp70s were used as negative controls. These proteins were collected from the rice proteome, and satisfied the following conditions: first, they could not interact with Hsp70s, based on the interolog prediction; and second, the absolute PCC value between the expression level of the non-interactor and that of any Hsp70 should be less than 0.40.

Domain assignment
The domain information of rice proteins was obtained from Pfam (http://pfam.sanger.ac.uk/) [41]. Because of the large number of sequences, we ran the PfamScan program (version 091007) [41] and HMMER package (version 3.0b3) [87] locally. Rice protein sequences were searched against Pfam-A domains in PfamScan databases (version 24.0) with an E-value cutoff of 0.0001.

Motif assignment
The motif annotations of proteins in rice were acquired from PROSITE (http://prosite.expasy.org/) [45]. The Scan-Prosite tool [44] was downloaded and applied locally to scan protein sequences against the PROSITE database (version 20.67).

Fisher's exact test
A one-tailed Fisher's exact test was used to detect the over-represented domains and motifs among the Hsp70s interactors in rice compared with the negative interactors. For each domain or motif annotation, a 2 × 2 contingency table was constructed, as shown in Additional file 1: Table S4. Then, R software was used to calculate the p-value to measure the significance level.

Multiple testing
To limit the false-positive error rate associated with multiple statistical tests, R software was further used to alter each p-value into the corresponding adjusted p-value based on the BH method [43]. Ultimately, the adjusted p-value was used to determine the potential binding sites. A cutoff value of 0.05 was used in this work.

Hsp70 network in rice GO enrichment
The GO information of the predicted Hsp70 interactors in rice was obtained from agriGO (http://bioinfo.cau. edu.cn/agriGO/). For each GO term, all parent nodes were retrieved according to the archive of the GO database, and the minimum distance from the root (depth) was determined. Only terms beyond the fourth depth were considered. After that, fisher's exact test was conducted to reveal the over-represented GO terms in the opposite dataset, and the BH method was used to control the false discovery rate (FDR). The Hsp70 network was generated using Cytoscape (http://www.cytoscape. org/) [88].