Genome-wide analysis of WRKY gene family in Cucumis sativus

Background WRKY proteins are a large family of transcriptional regulators in higher plant. They are involved in many biological processes, such as plant development, metabolism, and responses to biotic and abiotic stresses. Prior to the present study, only one full-length cucumber WRKY protein had been reported. The recent publication of the draft genome sequence of cucumber allowed us to conduct a genome-wide search for cucumber WRKY proteins, and to compare these positively identified proteins with their homologs in model plants, such as Arabidopsis. Results We identified a total of 55 WRKY genes in the cucumber genome. According to structural features of their encoded proteins, the cucumber WRKY (CsWRKY) genes were classified into three groups (group 1-3). Analysis of expression profiles of CsWRKY genes indicated that 48 WRKY genes display differential expression either in their transcript abundance or in their expression patterns under normal growth conditions, and 23 WRKY genes were differentially expressed in response to at least one abiotic stresses (cold, drought or salinity). The expression profile of stress-inducible CsWRKY genes were correlated with those of their putative Arabidopsis WRKY (AtWRKY) orthologs, except for the group 3 WRKY genes. Interestingly, duplicated group 3 AtWRKY genes appear to have been under positive selection pressure during evolution. In contrast, there was no evidence of recent gene duplication or positive selection pressure among CsWRKY group 3 genes, which may have led to the expressional divergence of group 3 orthologs. Conclusions Fifty-five WRKY genes were identified in cucumber and the structure of their encoded proteins, their expression, and their evolution were examined. Considering that there has been extensive expansion of group 3 WRKY genes in angiosperms, the occurrence of different evolutionary events could explain the functional divergence of these genes.


Background
Transcription factors exhibit sequence-specific DNAbinding and are capable of activating or repressing transcription of downstream target genes. In plants, WRKY proteins constitute a large family of transcription factors that are involved in various physiological processes. Proteins in this family contain at least one highly conserved signature domain of about 60 amino acid residues, which includes the conserved WRKYGQK sequence followed by a zinc finger motif, located in the C-terminal region [1]. The WRKY domain facilitates binding of the proteins to the W box or the SURE (sugar-responsive cis-element) in the promoter regions of target genes [2,3]. As deduced from nuclear magnetic resonance (NMR) analysis of the C-terminal WRKY domain of Arabidopsis WRKY4 (AtWRKY4), the conserved WRKYGQK sequence of WRKY domains is directly involved in DNA binding [4]. WRKY proteins can be classified into three groups (1, 2 and 3) based on the number of WRKY domains and the pattern of the zincfinger motif. Group 1 proteins typically contain two WRKY domains including a C2H2 motif. Group 2 proteins have a single WRKY domain and a C2H2 zinc-finger motif and can be further divided into five subgroups (2a-2e) based on the phylogeny of the WRKY domains. Group 3 proteins also have a single WRKY domain, but their zinc-finger-like motif is C2-H-C [1].
Since the cloning of the first cDNA encoding a WRKY protein, SPF1 from sweet potato [5], a large number of WRKY proteins have been experimentally identified from several plant species [6][7][8][9][10][11][12][13][14][15][16][17], and have been shown to be involved in various physiological processes under normal growth conditions and under various stress condition [18]. It has been well documented that WRKY proteins play a key role in plant defense against various biotic stresses including bacterial, fungal and viral pathogens [19][20][21][22][23][24][25][26][27]. They also play important regulatory roles in developmental processes, such as trichome initiation [28], embryo morphogenesis [29], senescence [30], and some signal transduction processes mediated by plant hormones such as gibberellic acid [31], abscisic acid [32,33] or salicylic acid [34]. There is also accumulating evidence that WRKY proteins are involved in responses to various abiotic stresses. In Arabidopsis, microarray analyses have revealed that some of the WRKY transcripts are strongly regulated in response to various abiotic stresses, such as salinity, drought and cold [35][36][37]. In rice, under abiotic stresses (cold, drought and salinity) or various phytohormone treatments, 54 WRKY genes showed significant differences in their transcript abundance [18]. In barley, a WRKY gene, Hv-WRKY38, is expressed in response to cold and drought stress response [38] while in soybean at least nine WRKY genes are found to be differentially expressed under abiotic stress [15].
Because of their extensive involvement in various physiological processes, it is likely that the WRKY family in angiosperms has expanded greatly during evolution. There are at least 72 WRKY family members in Arabidopsis [1] and at least 109 in rice [17]. Gene duplication events have played a critical role in the expansion of WRKY genes. For example, in rice, 80% of WRKY genes loci are located in duplicated regions [18]. Gene duplication events can lead to the generation of new WRKY genes. It is worth noting that the three groups of WRKY genes appeared at different times during evolution. Most members of groups 1 and 2 appear to have arisen before the divergence of the monocots and dicots, while group 3 WRKY genes seem to have had a relative later origin [17]. In addition, a recent study showed that expression divergence had occurred among duplicated WRKY genes [18]. However, the reasons for expression divergence among duplicated WRKY genes remain unclear.
Cucumber is not only an economically important cultivated plant, but also a model system for studies on sex determination and plant vascular biology [39]. A draft of the Cucumis sativus var. sativus L. genome sequence was reported recently [40]. In this study, we searched this genome sequence to identify the WRKY genes of cucumber (CsWRKY). Then, we analyzed the expression of the identified CsWRKY genes under normal growth conditions and under various abiotic stresses conditions. We compared the structure of the encoded proteins and the expression profiles of CsWRKY genes with those of their putative homologs in Arabidopsis thaliana WRKY (AtWRKY) genes, and found that there were notable difference between group 3 WRKY genes of Arabidopsis and cucumber. The evolutionary analysis of group 3 WRKY genes indicated that, unlike cucumber, the recent duplicated WRKY genes of Arabidopsis have been under positive selection pressure. This may explain the expression divergence of their orthologs. These studies will be useful for understanding the role of WRKY genes in plant responses to abiotic stresses. In addition, these results provide information about the relationship between evolution and functional divergence of the WRKY family.

Identification of WRKY family in cucumber
A total of 57 genes in the cucumber genome were identified as possible members of the WRKY superfamily and they encoded 57 WRKY proteins. Among these proteins, annotation of eight proteins revealed that they have two complete WRKY domains each. A total of 52 WRKY genes could be mapped on the chromosomes and were renamed from CsWRKY1 to CsWRKY52 based on their order on the chromosomes, from chromosomes 1 to 7 ( Figure 1). Five WRKY genes (Csa018657, Csa018622, Csa018069, Csa018094 and Csa022995) that could not be conclusively mapped to any chromosome were renamed CsWRKY53-CsWRKY57 respectively. In addition, the nucleotide sequence of Csa026380 was completely identical to that of Csa014665, therefore; the latter was eliminated from this study.
Next, to establish whether these WRKY genes are expressed, we screened the cucumber EST database in NCBI. Twenty-seven putative WRKY genes matched at least one EST hits (Table 1). We cloned and sequenced full-length cDNAs of 32 of the annotated CsWRKY genes (Table 1). Consequently, annotation errors of 17 putative WRKY genes could be corrected (data not shown). All CDSs of 32 CsWRKY genes have been submitted to GenBank and their accession numbers in Gen-Bank were showed on Table 1.

Multiple sequence alignment, structure and phylogenetic analysis
The phylogenetic relationship of the CsWRKY proteins was examined by multiple sequence alignment of their WRKY domains, which span approx 60 amino acids (Figure 2). A comparison with the WRKY domains of several different AtWRKY proteins resulted in a better separation of the different groups and subgroups. For each of the groups or subgroups, 1, 2a to 2e and 3, one representative was chosen randomly. These were: AtWRKY20, 40, Figure 1 Mapping of the WRKY gene family on Cucumis sativus L. chromosomes. The size of a chromosome is indicated by its relative length. To simplify the presentation, we renamed the putative WRKY genes from CsWKRY1 to CsWRKY52 based on their order on the chromosomes. Five putative WRKY genes could not be localized on a specific chromosome, so we renamed them from CsWRKY53 to CsWRKY57 according to their raw scores in a search of cucumber WRKY proteins with the Hmmsearch program.  Figure 2, the sequences in the WRKY domain were highly conserved. Sequence comparisons, phylogenetic and structural analyses showed that the WRKY domains could be classified into three large groups corresponding to groups 1, 2 and 3 in Arabidopsis as shown by Eulgem et al., 2000 ( Figure 3). It is worth noting that group 1 contained 12 CsWRKY proteins, eight of which contained two WRKY domains. However, the other four (CsWRKY15, CsWRKY16, CsWRKY38 and CsWRKY39) contained only one WRKY domain but clustered with CTWD (Cterminal WRKY domains) and NTWD (N-terminal WRKY domains) respectively. Our study further showed that CsWRKY15 and CsWRKY16 were actually two domains of one WRKY protein, while CsWRKY38 and CsWRKY39 were two independent WRKY proteins. Domain acquisition and domain loss events appear to have shaped the WRKY family [41,42]. Thus, CsWRKY38 and CsWRKY39 may have arisen from a two-domain WRKY protein that lost one of its WRKY domains during evolution. The structure and phylogenetic tree of the CsWRKY domain clearly indicated that group 2 proteins can be divided into five distinct subgroups (2a-e). Compared with the group 3 proteins in Arabidopsis (14 members), there are only 6 CsWRKY proteins in group 3. Whereas genome duplication events have resulted in the expansion of the WRKY genes in Arabidopsis and rice [17], it appears that these events have not occurred in the cucumber WRKY family. Although Huang et al. [40] reported that the cucumber genome shows no evidence of recent whole-genome duplication and tandem duplication. We used the method of Schauser et al. [43] to search for small duplication blocks in CsWRKY family, but none were found. In addition, a rooted phylogenetic tree of WRKY domains was also constructed to identify putative orthologs in Arabidopsis and cucumber (additional file 1). All orthologs are listed in additional file 2.
Analysis of the structure of CsWRKY genes showed that all WRKY genes except CsWRKY40 had at least one intron insert. Two major types of intron splicing were found in the conserved WRKY domains of CsWRKY genes (Figure 2), which are similar to WRKY domains in AtWRKY genes. However, the length of the conserved introns was 2.8 times greater in cucumber (~686 bp) than in Arabidopsis (~241 bp). Coincidentally, this rate was very similar to the size difference (2.9 times) between the genome of cucumber (376 Mb) and Arabidopsis (125 Mb). The conserved motifs of WRKY family proteins in cucumber and Arabidopsis were investigated using Meme version 4.4 as described in the Methods (additional file 3), and a schematic overview of the identified motifs is given in additional file 4. As displayed schematically in Figure 4, except for the members of group 2c and group 2e, one or more conservative motifs outside of the WRKY domain motif can be detected in a WRKY protein. The CsWRKY and AtWRKY proteins from the groups 1 and 2, always share the same conserved motifs. In contrast, the members of group 3 AtWRKY (AtWRKY63, AtWRKY64, AtWRKY66 and AtWRKY67) show an Arabidopsis-specific conserved motifs (motifs 6, 7 and 8; additional file 3), but other members of group 3 share the same conserved motifs with other CsWRKY proteins.

Expression profile of CsWRKY genes under normal growth conditions and under various abiotic stress conditions
We analyzed the expression of all CsWRKY genes under normal growth conditions in seven different tissues: cotyledons, leaves, roots, stems, female flowers, male flowers and fruits. Not all of the predicted genes were expressed in plants grown under normal growth conditions. Among 55 predicted genes, 48 genes (87%) were expressed in at least one of the seven tissues ( Figure 5). The other seven genes did not show any detectable expression as tested by RT-PCR in the above tissues, but they may be expressed in other tissues, e.g., seeds. Also, some of the CsWRKY genes may be pseudogenes. The following ten genes were expressed in all tested tissues with relatively higher expression intensities:  CsWRKY2, CsWRKY7, CsWRKY14, CsWRKY17, CsWRKY25, CsWRKY37, CsWRKY41, CsWRKY44, CsWRKY49 and CsWRKY57. Five WRKY genes (CsWRKY5, CsWRKY13, CsWRKY23, CsWRKY28 and CsWRKY55) were expressed at relatively low levels in all the tested tissues. We used RT-PCR analyses to examine the expression of CsWRKY genes in response to three different abiotic stresses: cold, drought and salinity. Of the 48 expressed CsWRKY genes, 23 showed differential expressions in response to at least one stress, whereas the other 25 did not ( Table 2). It should be noted that none of the stress-inducible CsWRKY genes belongs to group 3. We conducted real-time PCR analyses to confirm and quantify the expression levels of the 23 stress-inducible WRKY genes in response to abiotic stresses. As shown in Figure 6, RT-PCR and real-time PCR generally gave the same results for the expression profiles and abundance of transcripts. However, in rare instances, the difference in expression detected by real-time PCR was more significant than that detected by RT-PCR ( Figure  5E). As shown in Table 2, the results of real-time PCR showed that most of the stress-responsive genes were upregulated in response to abiotic stress ( Figure 6A, B, C), and only three genes were downregulated ( Figure  6D). As determined by real-time PCR analysis, there were no differences in the expressions of six group 3 CsWRKY genes in response to abiotic stress ( Figure 6F).

Comparison of abiotic stress-inducible orthologs between cucumber and Arabidopsis
We compared the expressions of CsWRKY genes with those of their possible orthologs in Arabidopsis under abiotic treatment. As shown in additional file 5, except for group 3 WRKY genes, Arabidopsis WRKY genes whose orthologus CsWRKY genes were not induced by abiotic treatments were also not stresses-inducible. In addition, most of orthologous AtWRKY genes of stressinducible CsWRKY genes also responded to at least one stress-type treatment. These findings imply a possible correlation between the expression profiles of these orthologs in Arabidopsis and cucumber in response to abiotic stresses. Among the CsWRKY genes whose expressions changed in response to abiotic stress, there were 13 for which stresses-inducible orthologs existed in Arabidopsis (additional file 5). To investigate whether the expressions of these orthologs were correlated between the two species, we compared the expressions  of these 13 pairs of orthologs under various stresses as described in the Methods section. This analysis generated a total of 22 sets of data (one pairs of orthologs may be induced by more than one abiotic stresses). As shown in Table 3, the correlation coefficients of 12 sets of data, more than half of the 22 sets of data, were greater than 0.5, indicating a positive correlation between the orthologous pairs under abiotic stresses ( Figure 7A-D). The expression profiles of only two sets of data were negatively correlated ( Figure 7G-H). Finally, the average correlation coefficients of 22 datasets for all the putative orthologous WRKY genes was 0.40 and differed significantly (p < 0.01) from the average expression correlation of a control dataset composed of randomly chosen gene pairs (0.04) ( Table 3). In contrast, when the correlation coefficients of group 3 CsWRKY and AtWRKY orthologs were calculated, there was no clear positive or negative correlation ( Figure 7E-F). Our results indicated that there is a correlative expression profile between stress-inducible CsWRKY genes and their putative AtWRKY orthologs, except for the group 3 WRKY genes. This finding suggests that the expression of group 3 WRKY orthologs differ between cucumber and Arabidopsis. All expression data used to calculate correlations are shown in additional file 6.

Evolutionary analysis of group 3 WRKY genes in Arabidopsis and cucumber
The group 3 WRKY genes seem to have greatly expanded in angiosperms after the divergence of the monocots and dicots (160 Mya) [44]. Here, we further investigated the duplication and diversification of group 3 WRKY genes after divergence of the eurosids I group (which include cucumber, soybean, and poplar) and the eurosids II group (which include Arabidopsis) (110 Mya). A phylogenetic tree of WRKY proteins encoded by group 3 WRKY genes of Arabidopsis (14), cucumber (6), poplar (10), and soybean (7) was constructed using the most primitive WRKY domain of Giardia lamblia as an outgroup. This analysis showed that many members of the group 3 AtWRKY proteins clustered together and displayed the close phylogenetic relationship ( Figure  8), indicating that they arose after the divergence of the eurosids I and II. Two types of gene duplication events, tandem duplication and segmental duplication, were the main factors in the expansion of group 3 AtWRKY genes. The results of this phylogenetic analysis indicated that no gene duplication events have occurred in CsWRKY gene evolution because of no paralogs of cucumber can be detected. Hence, the different evolutionary patterns of group 3 WRKY in cucumber and Arabidopsis occurred after their divergence.
To determine whether selection pressure had affected group 3 WRKY genes, we estimated the ω (dn/ds) values for all branches of group 3 WRKY genes in Arabidopsis and cucumber ( Figure 9 and Table 4). In Arabidopsis, the ML estimate of dN/dS values for all nodes under model M0 were < 1, with a mean value of 0.276 (Table 4), indicating that group 3 AtWRKY genes have been under purifying selection, which was the predominant force acting on the evolution of the group 3 AtWRKY genes. However, the log likelihood differences between model M3 and model M0 were statistically significant for all nodes tested, suggesting that selective pressure varied among branches and some genes might have been under positive selection. We further used model M7 and M8 of PAML to address whether positive selection has played a role in the evolution of group 3 AtWRKY genes. Of the eight nodes analyzed, log-likelihood values were significantly higher under the M8 model than under the M7 model for five nodes (nodes 1, 2, 3, 4 and 5), which indicates that positive selection has contributed to the evolution of group 3 AtWRKY genes. Interestingly, the terminal nodes with clusters of duplicated AtWRKY genes were all under positive position selection, suggesting a correlation between duplication of genes and positive selection. Furthermore, we identified the positively selected sites under model M8 using the Bayesian method. Several positive selection sites were detected in above five nodes but only one positive selection site could be detected in the region of WRKY domains. Thus, it appears that because of the high degree of conservation in WRKY domains of the WRKY genes, the positive selection contributed mostly to the regions outside of the WRKY domains. In cucumber, although the log likelihood differences between model M3 and model M0 suggest that selective pressure varied among branches, there was no detectable positive selection in any of the nodes. Assuming that there were no duplication events in CsWRKY genes and that positive selection is associated with duplication of WRKY genes as we described here, the extensive positive selection events were probably followed by the group 3 WRKY gene duplication events. This positive selection might be the main evolutionary force for group 3 AtWRKY genes. Due to the absence of duplicated genes and positive selection in cucumber, the functions of group 3 CsWRKY genes might be more conservative than those of AtWRKY genes.

Discussion
Whether the CsWRKY genes were underrepresented in this study?
The WRKY gene family has 72 members in Arabidopsis [1] and 109 members in rice [17]. In this study, we identified a total of 55 CsWRKY genes. Compared with Arabidopsis (genome size 125 Mb) and rice (genome size 480 Mb), in cucumber (genome size 367 Mb), the size of the WRKY family is small. We further compared the number of WRKY genes in different subgroup among Arabidopsis, rice, grape and cucumber ( Table 5). As showed in table 5, the key difference is that the number of group 3 CsWRKY genes (6) was much lesser than those of Arabidopsis (14) and rice (36). A problem has arisen. Whether CsWRKY genes, especially group 3 CsWRKY genes, are underrepresented or not in our study? Complete and accurate annotation of genes is an essential starting point for further evolution and function study in gene family. We identified a total of 55 CsWRKY genes from 26682 cucumber annotated genes in cucumber genome. In addition, a total of 357882 cucumber EST sequences download from Cucumber Genome DataBase and NCBI were used to test whether there are new WRKY proteins encoded by these EST sequences that were ignored in our annotation for CsWRKY proteins. The amino acid sequences of the open reading frame (ORF) of the EST were subjected to HMM program search. The results were screened manually for false positives at E values above 10 100 . Even with this weak criterion, we failed to find any new WRKY proteins in cucumber genome, which indicate that the annotation for cucumber WRKY genes is complete. We further used experimental methods to test the accuracy of annotation for CsWRKY genes. According to the annotated WRKY genes sequence, we detected the expression of 48 CsWRKY genes (87%), indicating that the accuracy of annotation for CsWRKY genes is high. Moreover, we cloned and sequenced full-length cDNAs of 32 of the annotated CsWRKY genes (Table 1), and some annotation errors were corrected. For example, we found that predicted CsWRKY15 and CsWRKY16 were actually two domains of one WRKY protein. Through this process, the integrity and accuracy of annotated CsWRKY genes were improved and were high enough to use in our further study. Therefore, we believed that CsWRKY genes would not be underrepresented in our study. Average correlation random genes** 0.04 *Available expression data on AtWRKY genes from microarray analysis and that of CsWRKY genes generated by real-time PCR analysis were used to calculate the Pearson correlation coefficient for the expression of orthologous WRKY genes under various abiotic stresses (after 0, 0.5, 1, 3, 6, 12, and 24 h treatment)(as showed in Figure 7)as described in the Methods. **a randomly chosen abiotic stress induced cucumber WRKY gene and a randomly chosen abiotic stress induced AtWRKY gene composed of a random gene pair. This process was repeated a 100 times and produced 100 random WRKY gene pairs. The expression correlation of each of 100 random WRKY gene pair was calculated as described in the Methods

The quickly expansion of group 3 WRKY genes is associated with the recent duplication events
Many angiosperms underwent whole genome duplication events (γ, β, α). The γ event appears to pre-data monocots-dicots divergence. The β event pre-dated Arabidopsis divergence from the other dicots, but postdated divergence from the monocots about 170-235 Myr ago. The α duplication event (recent duplication events) pre-dated Arabidopsis divergence from Brassica about 14.5-20.4 million years (Myr) ago [45]. The recent gene duplication events are most important in the quickly expansion and evolution of gene families [46]. Therefore, in our manuscript, we only analyze the influence of recent duplication events to CsWRKY genes. Both Arabidopsis and rice genome underwent the recent duplication events, which lead to the large-scale expansion of gene family in their genome [46,47]. Zhang et al. report that group 3 WRKY domains appear to have been duplicated independently after the divergence of monocots and dicots (160 Mya) [44]. In this study, we further study the duplication of group 3 WRKY genes after divergence of the eurosids I group and the Figure 7 Pairwise comparisons of the expression profiles of putative orthologous cucumber and Arabidopsis WRKY genes under abiotic stresses. The relative expression of CsWRKY genes was obtained by real-time RT-PCR (indicated by triangles). Data are the means of three replicates with standard errors represented by bars. The CsWRKY expression data were compared with the mean-normalized expression data for their putative orthologous AtWRKY genes from a publicly available Arabidopsis microarray data set (indicated by circles) according to the description in Methods. The relative amount of mRNA (y-axis) was the ratio of treated to untreated sample. The treatment time (h) under the particular abiotic stress is presented on the x-axis. R indicates the correlation coefficient for expression between orthologs under the corresponding abiotic stresses. A distinct positive correlation was detected in most orthologs (A-D), but no obvious correlation was detected in group 3 orthologs (E-F). A negative correlation was detected in a small number of orthologs (G-H). Figure 7, the close paralogs WRKY genes of Arabidopsis, poplar and soybean each clustered together respectively, indicating that the expansion of the group 3 WRKY gene family may have occurred after the divergence of the eurosids I and eurosids II (110 Mya), and should be related to the most recent genome duplication events (24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40). Moreover, our result indicated that one of important factor in the expansion of group 3 AtWRKY was the occurrence of tandem duplication events. Four tandem duplication genes were clustered together in phylogenetic trees, indicating that the tandem duplication occurred after the divergence of the eurosids I and eurosids II and also related with recent duplication events. Interestingly, tandem duplication was an important recent gene duplication pattern in Arabidopsis genome [46], but in AtWRKY gene family there were only four AtWRKY genes from tandem duplication blocks and all of them belonged to group 3 AtWRKY genes. From these, we can see that the group 3 AtWRKY genes expanded quickly in Arabidopsis genome by two duplication patterns: recent segmental duplication and recent tandem duplication, which indicate that group 3 WRKY genes may play important roles in the adaptability of angiosperms.

eurosids II group (110 Mya). As showed in
As far as cucumber concerned, although Huang et al., reported that the cucumber genome was absence of recent whole-genome duplication events and tandem duplication [40]. The method of Schauser [43] was still used to detect whether recent small duplication blocks occur in CsWRKY family. We found no CsWRKY genes locus on any recent duplication blocks (additional file 2). In addition, from the Figure 1, we can see that there are  no tandemly arrayed WRKY genes on the same chromosomal location, which indicate the absence of recent tandem duplication event in CsWRKY genes. Therefore, compared with Arabidopsis and rice, the size of group 3 CsWRKY proteins is small, which can be attributed to the absence of recent duplication events in cucumber genome. To prove this hypothesis, we search the grape WRKY proteins (VvWRKY) in grape genome. The grape genome, like cucumber, has not undergone recent duplication events [48]. As showed by table 5, there are only five group 3 VvWRKY (GSVIVT01028718001, GSVIVT01019511001, GSVIVT01027069001, GSVIVT01032662001 and GSVIVT01032661001) can be detected in grape genome. Therefore, on the base of the above discussion, we believe that compared with Arabidopsis and rice, the small size of group 3 CsWRKY can be attribute to the absence of recent duplication events in cucumber genome rather than the underrepresentation of group 3 CsWRKY in our study.

CsWRKY proteins play important roles in various biological processes
The reported WRKY gene (SE71, ID: AAC37515.1) of cucumber shares 93% similarity with the CsWRKY37 reported here. The expression of SE71 increases in cotyledons as they expand and become photosynthetic, suggesting an involvement of SE71 in the development of cotyledons and cucumber photosynthesis [7]. Our RT-PCR results showed that CsWRKY37 was expressed in all seven cucumber tissues at relatively high levels, which indicates that CsWRKY37 could play a role not only in development of cotyledons and photosynthesis but also in the processes such as flower formation and fruit development. Besides CsWRKY37, some other CsWRKY genes also showed relative high expression levels in all seven organs, such as CsWRKY25 and CsWRKY49. The WRKY genes that are highly expressed in plant organs often play key roles in plant development [18]. The role of WKRY gene in plant development is in transcriptional regulation   Note: * the WRKY proteins of grape (Vitis vinifera) of expression of target genes that are involved in some physiological pathway [3]. So, we speculated that the highly expressed CsWRKY genes reported here may play a regulatory role in cucumber development. However, more research is needed to determine the functions of the CsWRKY genes. Evidence is accumulating that WRKY proteins are involved into response to various abiotic stresses. At least 54 OsWRKY genes of rice and 26 GmWRKY genes of soybean were found to be differentially expressed under abiotic stresses [18]. In this study, we showed that 23 CsWRKY genes exhibited differential expression in response to at least one abiotic stress, indicating that CsWRKY genes may play an important role in cucumber responding to abiotic stresses. In fact, previous studies indicated that some of the WRKY proteins are stable and resistant to environmental stresses. Huang et al. reported that a WRKY gene of bittersweet nightshade (STHP-64) encoded an anti-freeze protein, which contains a unique 13-mer repeat in the C-terminus, known to be a common feature of animal antifreeze proteins [9]. However, increasing number of studies indicate that WRKY proteins are transcriptional factors that regulate the tolerance of plant to abiotic stresses [38]. As shown in Figure 6, some of the CsWRKY genes responded to stresses at an early stage. For example, CsWRKY18 peaked at 0.5 h after drought treatment. These results indicated that some CsWRKY genes possible may be as a transcriptional factor to regulate the tolerance of cucumber to stresses. To understand the biological functions of WRKY transcriptional factors, the identification of target genes and the regulatory network of WRKY transcriptional factors are necessary. The soybean GmWRKY54 expressed in transgenic Arabidopsis showed that GmWRKY54 can regulate the expression of DREB2A, which contains a W-box motif in the promoter region and is known to act as a transcriptional factor regulated the expression of many drought-inducible genes [15]. Other recent studies have revealed that two co-regulated networks exist in rice regulating the response to various abiotic stresses [49]. These results indicate that the regulatory role of WRKY proteins under abiotic stresses is complex and more work is needed to understand the regulatory mechanisms.

The functional conservative and divergence of orthologous genes between Arabidopsis and cucumber
In comparative genomics, the clustering of orthologous genes highlights the divergence and conservation of gene families among multiple genomes. Two strategies have often been used to identify orthologs or paralogs: phylogeny-based methods and BLAST-based methods [50]. The comparison of results from phylogeny-based methods contains widely orthologous pairs information but may lead to false positives error [51]. Therefore strict criteria must be adopted in phylogeny-based methods. BLAST-based method (Bi-direction best hit) shows a good overall performance but is restricted to 1:1 orthologs which may lead to omit the in-paralogs [51]. In this study, a rooted phylogenetic tree based on WRKY domain of rice, cucumber and Arabidopsis was used to arrange possible orthologs of cucumber and Arabidopsis. In addition, a standard approach BBH (bidirectional best hit) was also used as reference to arrange possible orthologs. Relatively strict criteria were used to arrange orthologus genes in this study. The nodes of phylogenetic tree which the bootstrap support values (1000 re-sampling) exceed 50% were used to identify possible orthologs pairs. For example, AtWRKY65 and CsWRKY6 were clustered together in phylogenetic tree, but the bootstrap of their node is no more than 50%. Therefore, AtWRKY65 and CsWRKY6 were excluded from the orthologous pair, so does CsWRKY11 and AtWRKY18/60. In addition, the members of group 1 WRKY were considered as possible orthologous pairs unless the same phylogenetic relationship can be detected between their N-domain and Cdomain in the phylogenetic tree. For example, CsWRKY8 and AtWRKY25 /26 were excluded from orthologous pairs because of the different cluster of their N-domain and C-domain in the phylogenetic tree. Totally, we found 38 orthologus pair between cucumber and Arabidopsis (additional file 2).
We further analyze the correlation of orthologous pairs under abiotic stresses. Our results show that correlative expression profiles in stress-inducible orthologous WRKY genes between cucumber and Arabidopsis. Mangelsen et al. reported that in homologous organs the average correlation coefficient of the orthologous WRKY genes between monocots and dicots can reach 0.24 [52]. Because researches on the role played by cucumber genes in abiotic stress tolerance are quite limited, our study provide a new starting point for investigating the function of cucumber genes by comparing the orthologous genes between cucumber and Arabidopsis. Furthermore, in our study, orthologous WRKY genes with different evolution patterns displayed a low correlation in their expression patterns. Almost half of CsWRKY genes in our study responded to at least one abiotic stresses, but none of them belongs to group 3. In contrast, the expression data from microarray of AtWRKY genes has revealed that all the gene orthologous to group 3 CsWRKY genes response to abiotic stresses in Arabidopsis, and interestingly all of them are located in a recent segmentally duplicated region. The recent Segmental duplication occurs most frequently in plants because most plants are diploidized polyploids and retain numerous duplicated chromosomal blocks in their genomes [53]. As discussed earlier in this paper, after the divergence of eurosids I and eurosids II, the group 3 AtWRKY genes experienced segmental duplication events. The long-term evolutionary fate of duplication genes will be determined by functions of the duplicated genes. Four types of functional differentiation may follow by gene duplication: pseudogenization, conservation of gene function, subfunctionalization and neofunctionalization [54]. Many duplicated genes may be lost from the genome after the duplication events, and neofunctionalization and subfunctionalization are the major factors for the retention of new genes. In addition, positive selection may play important roles in the neofunctionalization and subfunctionalization of duplication genes. In the case of neofunctionalization of duplicated genes, positive selection accelerates the fixation of advantageous mutations that enhance the activity of the novel function. In the case of subfunctionalization of duplicated genes, each daughter gene will inherit one of functions of ancestral gene, and further substitutions under positive selection can refine the functions [47]. In Arabidopsis, the number of group 3 WRKY genes increased significantly due to the duplication events after divergence of the eurosids I and eurosids II, and our results suggested that all duplicated group 3 AtWRKY experienced a positive selection after their duplication events. The retention of new members of group 3 AtWRKY could be contributed to their neofunctionalization. In rice, high expression divergence could be one of the mechanisms for the retention of duplicated WRKY genes [18]. Due to the lack of gene duplication events in the CsWRKY family, the functions of group 3 CsWRKY genes are probably more conservative than that of AtWRKY. The functions of the group 3 CsWRKY genes likely resemble the functions of a common ancestor that existed before the divergence of eurosids I and II. Indeed, the common ancestor may not have been responsive to abiotic stresses, and the stressresponsive ability of the group 3 AtWRKY genes could be due to neofunctionalization following gene duplication event(s).

Conclusions
In this study, we identified a total of 55 cucumber WRKY genes and analyzed the expression profile of 48 CsWRKY genes under normal growth conditions and in response to various abiotic stresses. These new WRKY sequences and expression information reported here will be useful for further investigating the function of WRKY genes under various stress conditions. Although the genome sequence of cucumber has been reported, functional studies on cucumber genes are still lag behind. Our results show that correlative expression profiles exist between putative WRKY orthologs of cucumber and Arabidopsis. Hence, comparative genomics approaches could be used to investigate gene function. In addition, compared with group 1 and 2 WRKY genes, the group 3 WRKY genes seem to have arisen more recently in angiosperms, but have expanded rapidly. Our results also indicate that positive selection could have led to the functional divergence of duplicated genes during the expansion of group 3 WRKY genes. Based on all the results presented here, we speculated that the functional divergence of WRKY proteins has played a critical role in the responses of plants to various stresses.

Sequence database searches
Arabidopsis WRKY proteins sequences were obtained from TAIR [55]. The rice WRKY proteins sequences were obtained from rice genome annotation project [56]. The WRKY proteins of poplar and soybean were obtained from PFAM database [57]. The GenBank accession numbers of WRKY protein sequences were provided in additional file 7. The WRKY proteins of grape were obtained from http://www.genoscope.cns.fr/ externe/Download/Projets/Projet_ML/data/12X/annotation/Vitis_vinifera_peptide.fa.gz.
The cucumber annotated (predicted) genes and proteins were obtained from Cucumber Genome Sequencing Project which we participated in. Now, this annotated data can be downloaded from Cucumber Genome DataBase [58]. We searched WRKY proteins from a total of 26682 predicted cucumber proteins. We used 72 Arabidopsis WRKY proteins as query sequences and Blastp searches against the predicted cucumber proteins. The sequences were selected as candidate proteins if their E value satisfied E was ≤-10. Based on the HMMER User's Guide, the Hmmsearch program was then used to predict the WRKY domains (PF03106.7) of all these candidate proteins and the E valve was set to -10. The new WRKY-like sequences confirmed by Hmmsearch in the cucumber genome were in turn used reiteratively to search the cucumber predicted proteins until no new sequences were found. The EST sequences of cucumber were downloaded from NCBI and Cucumber Genome DataBase [58].

Multiple sequence alignment, gene structure construction and phylogenetic analysis
The 60 amino acid spanning WRKY core domain of all CsWRKY proteins and selected AtWRKY protein (AtWRKY20 (At4g26640), 40 (At1g80840), 72 (At5g15130), 50 (At5g26170), 74 (At5g28650), 65 (At1g29280) and 54 (At2g40750)) was used to create multiple protein sequence alignments using ClustalW [59]. Default settings were applied for the alignment in Figure 2. The gene structure was obtained by the cucumber gene annotation GIFF3 file downloaded from Cucumber Genome DataBase. The neighbor-joining method was used to construct the phylogenetic tree based on amino acid sequence of WRKY domains. Two types of software, MEGA 4.0 and PHYLIP 3.2 were used [60,61]. The MEGA 4.0 analysis was carried out according to the description by Zhang et al., [62] and the PHYLIP 3.2 analysis was carried out according to the description by Zhou et al., [15]. Motif detection was performed with MEME 4.0 software [63]. A rooted phylogenetic tree based on WRKY domain of rice, cucumber and Arabidopsis was used to arrange possible orthologs of cucumber and Arabidopsis. In addition, a standard approach BBH (bidirectional best hit) was also used as reference to arrange possible orthologs [51,64].

Microarray based expression analysis and correlation calculation
For the expression analysis of AtWRKY genes, publicly available microarray data of the AtGenExpress global stress expression data set [37] were used. The microarray data of cold stress (ME00325), drought stresses (ME00338) and salt stresses (ME00328) were downloaded from Weigel World database [65]. The meannormalized values of the expression data were used in further analysis. The relative amount of mRNA was calculated by dividing the expression data of the stress treatment by that of the control (0 h treatment).
Available expression data on AtWRKY genes from microarray analysis and that of CsWRKY genes generated by real time RT-PCR analysis described here were used to calculate the Pearson correlation of the expression of orthologous WRKY genes. All expression data (relative amount of mRNA) are composed of seven treatment points (0, 0.5, 1, 3, 6, 12, and 24 h) under corresponding abiotic stresses. For each of orthologous WRKY gene pairs, the correlation of the expression data under their corresponding abiotic stresses was calculated. The following methods were used to test the significance of correlation of the expression of orthologs pair: A randomly chosen abiotic stress induced cucumber WRKY genes and a randomly chosen abiotic stress induced AtWRKY gene constituted a random WRKY gene pair. This process was repeated a 100 times and produced 100 random WRKY gene pairs. The expression correlation of each of 100 random WRKY gene pair was calculated as described above. Lastly, the average correlation of orthologous WRKY gene pairs and of randomly selected gene pairs was calculated. Student's t-test was used to obtain the statistical significance of the difference in average correlation of the two datasets. The random WRKY genes pairs were obtained using Perl scripts. Pearson correlation and P-values in ttest were calculated by using software R. All programs run on a computer with Ubuntu Linux installed.

Detection of positive selection
The Amino acid sequence of group 3 AtWRKY and CsWRKY proteins were used to construct phylogenetic tree respectively, which in turn was used for detecting positive selection. We used PAML4 [66] to analyze codon substitution patterns with a maximum likelihood, implementing a site-specific model. We detected variation in ω values among sites by employing a likelihood ratio test (LRT) between M0 vs. M3 and M7 vs. M8 according to Yang et al. [67]. The nodes were considered to have undergone positive selection, if they satisfied the following criteria: (1) an estimate of ω > 1 under M8 (2) sites identified to be under positive selection by Bayes Empirical Bayes (BEB) analysis and (3) a statistically significant LRT.

Plant materials, growth conditions and treatments
Line 9930, a cucumber typical of northern China, was used throughout the study. Seeds were germinated in pots containing vermiculite, and 3-week old seedlings were used in the following treatments. For dehydration treatment, the plants were carefully pulled out, transferred on to filter paper and allowed to dry. For salinity and cold treatments, seedlings were subjected to a 100 mM NaCl solution or incubated at 4°C, respectively. Above-ground samples for RNA extractions were collected at 0, 0.5, 1, 3, 6, 12 and 24 h after treatment. The roots, stems, leaves, cotyledons of seedlings, female flowers, male flowers and fruits of mature plants were collected separately for RNA isolation and used for tissuespecific expression analysis.
RNA isolation, clone full-length cDNA, RT-PCR and Real -time PCR analysis Total RNA was isolated according to Zhang et al., [59]. For cloning the full-length cDNA of CsWRKY genes, we first used the EST sequences of cucumber to correct the annotated CsWRKY sequence and then used the Fgenesh, a web-base gene prediction method, as a tool to re-annotate all 57 WRKY genes. Subsequently, combined the result of Fgenesh, GLEAN and EVM (GLEAN and EVM were employed to annotate cucumber genome in cucumber genome project), we amplified the fulllength sequence of CsWRKY coding region (CDS) genes by PCR.
For RT-PCR, the specific primers were designed according to the WRKY gene sequences by Primer 5 software (additional file 8). A cucumber b-actin gene (ID: Csa017310), amplified with primers 5'-TCCACGA-GACTACCTACAACTC-3' and 5'-GCTCATACGGT-CAGCGAT-3', was used as a control. The following program was used for RT-PCR: 94 for 2 min followed by 35 cycles at 94 for 10 s, 55-59 for 10 s and 72 for 25 s, followed by a 2 min extension step at 72. While the number of cycles of PCR for actin gene was set as 23.
The PCR products were separated on an agarose gel and quantified using an Imaging System (Bio-Rad, USA). The experiments were repeated three times with independent RNA samples.
The real-time PCR analysis were performed using BIO-RAD CFX96 real-Time PCR system(Bio-Rad, USA) 96 well formats with denaturation at 95°C for 3 min, followed by 40 cycles of denaturation at 95°C for 10 s and annealing/extension at 55 or 60°C for 1 min. Three biological replicates were carried out and triplicate quantitative assays for each replicate were performed on 0.5 μl of each cDNA dilution using TianGen SYBR Green PCR Master mix kit (TianGen Biotech FP202, CHN) according to the manufacturer's protocol. The cucumber bactin gene was used as an internal control. Relative gene expression was calculated according to Jiang et al., [68]. The ΔCT and ΔΔCT were calculated by the formulas ΔCT = CT target -CT reference and ΔΔCT = ΔCT treated sample -ΔCT untreated sample (0 h treatment). The RNA relative amount as selected to evaluate gene expression level as 2-ΔΔCT, which was used for all chart preparations. At the same time, the standard errors of mean among replicates were calculated. All calculations were automatically carried on Bio-Rad CFX Manager (Version1.5.534) of BIO-RAD CFX96. Student's t-test was used to obtain the statistical significance of the difference between treated samples and untreated samples (0 h treatment under abiotic stress). If P-values < 0.01, we considered the WRKY genes as differential expressed genes. The specific primers were designed for WRKY genes and b-actin gene used in real time PCR were listed in additional file 9. The data and pictures produced by BIO-RAD CFX96 were presented in additional file 10 and additional file 11, respectively.