Molecular cloning and expression analysis of WRKY transcription factor genes in Salvia miltiorrhiza

Background WRKY proteins comprise a large family of transcription factors and play important regulatory roles in plant development and defense response. The WRKY gene family in Salvia miltiorrhiza has not been characterized. Results A total of 61 SmWRKYs were cloned from S. miltiorrhiza. Multiple sequence alignment showed that SmWRKYs could be classified into 3 groups and 8 subgroups. Sequence features, the WRKY domain and other motifs of SmWRKYs are largely conserved with Arabidopsis AtWRKYs. Each group of WRKY domains contains characteristic conserved sequences, and group-specific motifs might attribute to functional divergence of WRKYs. A total of 17 pairs of orthologous SmWRKY and AtWRKY genes and 21 pairs of paralogous SmWRKY genes were identified. Maximum likelihood analysis showed that SmWRKYs had undergone strong selective pressure for adaptive evolution. Functional divergence analysis suggested that the SmWRKY subgroup genes and many paralogous SmWRKY gene pairs were divergent in functions. Various critical amino acids contributed to functional divergence among subgroups were detected. Of the 61 SmWRKYs, 22, 13, 4 and 1 were predominantly expressed in roots, stems, leaves, and flowers, respectively. The other 21 were mainly expressed in at least two tissues analyzed. In S. miltiorrhiza roots treated with MeJA, significant changes of gene expression were observed for 49 SmWRKYs, of which 26 were up-regulated, 18 were down-regulated, while the other 5 were either up-regulated or down-regulated at different time-points of treatment. Analysis of published RNA-seq data showed that 42 of the 61 identified SmWRKYs were yeast extract and Ag+-responsive. Through a systematic analysis, SmWRKYs potentially involved in tanshinone biosynthesis were predicted. Conclusion These results provide insights into functional conservation and diversification of SmWRKYs and are useful information for further elucidating SmWRKY functions. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1411-x) contains supplementary material, which is available to authorized users.

Transcription factors are a class of significant regulators controlling plant growth and development through regulating gene expression at the transcriptional level. They bind to the specific regions, known as cis-elements, in the promoters of genes and then activate or repress the expression of regulated genes in collaboration with other regulatory factors. So far, two large transcription factor gene families, including the plant-specific SQUA-MOSA promoter-binding protein-like (SPL) transcription factor gene family and the largest plant transcription factor gene family, termed MYB, have been characterized in S. miltiorrhiza [19,20]. A total of 15 SmSPLs and 110 SmMYBs have been identified from S. miltiorrhiza. SmSPLs are involved in the regulation of developmental timing in S. miltiorrhiza and eight of them are targets of miR156/157 [19]. Similarly, a subset of SmMYBs is regulated by microRNAs, such as miR159, miR319, miR828 and miR858. Many SmMYBs are involved in the biosynthesis of bioactive compounds in S. miltiorrhiza [20].
Characterization of the identified WRKY genes showed that they were significant regulators involved in plant developmental processes and responsed to biotic and abiotic stresses [39]. The involvement of WRKYs in plant immune response against bacterial, fungal, and viral pathogens has been widely reported [40][41][42][43][44][45][46][47][48][49][50][51]. Recently, more and more evidence showed the regulatory roles of WRKY in plant response to abiotic stresses. For example, over-expression of three soybean WRKY genes (GmWRKY13, GmWRKY27 and GmWRKY54) in Arabidopsis showed that GmWRKY21transgenic Arabidopsis plants were tolerant to cold stress, GmWRKY54 conferred salt and drought tolerance, whereas transgenic plants over-expressing GmWRKY13 had increased sensitivity to salt and mannitol stresses and decreased sensitivity to abscisic acid [52]. It suggests the involvement of WRKY genes in multiple abiotic stress-associated signaling pathways and the association of different WRKY members with different abiotic stresses. Moreover, WRKYs associated with same abiotic stress may show different responses. For instance, Arabidopsis WRKY25 and WRKY26 are heat-induced, whereas WRKY33 is heat-repressed [53]. In addition to stress responses, WRKYs also regulate various developmental processes, such as seed dormancy and germination, flowing, fruit maturation, stem elongation, pith secondary cell wall formation, plant senescence, and trichome development [54][55][56][57][58]. It suggests the importance of WRKYs and the complexity of WRKY-associated regulatory networks.
The defining feature of WRKY transcription factors is their DNA-binding domain, known as WRKY domain [39].
It is approximately 60 amino acids in length and includes the conserved amino acid sequence WRKYGQK at the N-terminus and an atypical zinc-finger motif either C2H2 (C-X 4-5 -C-X 22-23 -H-X 1 -H) or C2HC (C-X 7 -C-X 23 -H-X 1 -C) at the C-terminus. The structure of the WRKY domain allows it to specifically interact with W-box and SURE (sugar responsive) cis-elements in the promoter of target genes [59][60][61]. WRKY can be divided into three groups (Groups I, II and III) based on the number of WRKY domains (two domains in Group I and one in the others) and the pattern of zinc finger motif (C2H2 in Groups I and II and C2HC in Group III) [39,40]. Additionally, Group II WRKY proteins can be further divided into subgroups, including IIa, IIb, IIc, IId and IIe, based on the primary amino acid sequence of the WRKY domain.
Although WRKYs have been identified and characterized in various plant species, no information is available for the WRKY gene family in S. miltiorrhiza. In this study, we cloned and characterized 61 S. miltiorrhiza SmWRKYs.

Results and discussion
Molecular cloning of 61 SmWRKY genes from S. miltiorrhiza It has been shown that 72 AtWRKY genes exist in the Arabidopsis genome (Additional file 1: Table S1). To identify SmWRKYs, BLAST analysis against the current assembly of the S. miltiorrhiza genome was performed using AtWRKY protein sequences as queries. A total of 61 gene models were predicted for SmWRKYs. The 5'-sequence of many SmWRKYs showed low homology with known plant WRKYs. It might cause errors in computational prediction. To verify the predicted gene models and correct errors of computation, full-length coding sequences (CDSs) of all 61 SmWRKYs were PCR-amplified using the primers listed in Additional file 2: Table S2 and then cloned and sequenced. It resulted in the identification of 61 SmWRKYs, which were named SmWRKY1-SmWRKY61, respectively. The deduced SmWRKY proteins have amino acid numbers from 129 to 706, isoelectric points (pI) from 4.76 to 9.9, and molecular weights (Mw) from 19.9 to 76.2 kDa. All of the 61 cloned CDSs have been submitted to GenBank under the accession numbers shown in Table 1. The number of identified SmWRKYs is comparable with that in Arabidopsis. Comparable gene numbers were also found for the MYB [20], SPL [19], Argonaute (AGO) [62] and RNA-dependent RNA polymerase (RDR) [63] gene families in S. miltiorrhiza and Arabidopsis. It suggests that S. miltiorrhiza and Arabidopsis may have similar number tfmkof gene members for many gene families. Thus, the identified SmWRKYs represent an almost complete set of WRKYs in S. miltiorrhiza, although it may be not a fully complete set.

Classification of the WRKY domains and the WRKY proteins
Transcription factors in a family usually share highly conserved DNA-binding domains. In order to examine the phylogenetic relationships among WRKYs, a neighborjoining (NJ) phylogenetic tree was constructed for the WRKY domain sequences of AtWRKYs, SmWRKYs, PpWRKYs, GCMa and FLYWCH using MEGA5.0 ( Figure 1). According to the classification of AtWRKYs, the WRKY domains were divided into 3 groups (groups 1, 2 and 3). Group 1 WRKY domains come from proteins containing two WRKY domains, one of which is located in the N-terminal (NTWD), while the other one is in the C-terminal (CTWD  Figure 2). Of the 61 SmWRKY, 13 are two-WRKY-domain-containing proteins and all of them have the C2H2-type zinc-finger motif (C-X 4 -C-X 22-23 -H-X 1 -H) ( Figure 2, Table 2). The other 48 SmWRKY proteins are one-WRKY-domain-containing proteins, 42 of which contain group 2 WRKY domains and have the same type of finger motif (C-X 4-5 -C 23-24 -H-X 1 -H), while the other 6 contain group 3 WRKY domains and have the C2HC zinc finger motif (C-X 7 -C 23 -H-X 1 -C) ( Figure 2, Table 2). The distribution of residues in the WRKY domains of S. miltiorrhiza WRKY proteins is quite similar to that of Arabidopsis (Figures 2 and 3), suggesting evolutionary conservation of SmWRKYs and AtWRKYs. Comparing the number of SmWRKYs and AtWRKYs in each group/subgroup showed that the number of SmWRKYs in groups 1 and 2 is similar to that of AtWRKYs in the same group; however the number 6 of SmWRKY members belonging to group 3 is significantly less than the number 14 of AtWRKY members included in the same group. It is consistent with previous results showing group 3 WRKYs to be a newly defined and most dynamic group [22] and suggests the divergence of WRKYs in S. miltiorrhiza and Arabidopsis.
In order to investigate whether the phylogenies are different between the WRKY domains and the corresponding WRKY proteins, we constructed an NJ tree based on the full-length amino acid sequences of SmWRKYs, AtWRKYs, PpWRKYs, GCMa and FLYWCH ( Figure 4). The results showed that the phylogenetic tree of WRKY proteins was quite similar to the tree of WRKY domains with little difference observed (Figures 1 and 4). For instance, AtWRKY1, AtWRKY32 and SmWRKY28 having two WRKY domains and AtWRKY10 belonging to group 1 WRKY domains form separated clades outside group 1. AtWRKY19 and FLYWCH with the WRKY domain belonging to group 1, AtWRKY16 with the WRKY domain belonging to group 2e, and AtWRKY52 and GCMa with the WRKY domain belonging to group 3 form separated clades outside groups 1, 2 and 3. These results indicate the difference between the WRKY domain and the sequence outside the WRKY domain. Based on the NJ tree constructed with full-length amino acid sequences, we identified 17 pairs of orthologous WRKY genes in S. miltiorrhiza and Arabidopsis (Table 3). It suggests that many SmWRKYs and AtWRKYs are evolutionarily conserved.

Analysis of conserved motifs
In addition to the WRKY domain, other conserved motifs could be important for the diversified functions of WRKY proteins from S. miltiorrhiza and Arabidopsis [64,65]. Using the MEME program, we identified a total of 20 conserved motifs in WRKYs from S. miltiorrhiza and Arabidopsis ( Figure 5). The length of motifs varies from 8 to 150 amino acids and the number of motifs in each WRKY varies between 2 and 11. The majority of the identified motifs were found in more than one subgroup of WRKYs. Many AtWRKYs in a subgroup contain the same motif(s) as their SmWRKYs orthologues in the subgroup. It suggests the conservation of motifs in S. miltiorrhiza and Arabidoopsis WRKYs belonging to a subgroup. Among the 20 conserved motifs, 9 motifs, including motifs 1, 2, 3, 4, 5, 6, 8, 13, and 19, are located in the WRKY domain, while the other 11 are located outside the domain. Most WRKY proteins in a group have similar  [66], commonly exists in most WRKY proteins belonging to groups 2d and 3. Group 2d WRKY proteins usually contain two motif 15 s, while the majority of group 3 proteins contain only one. Motif 20, known as the HARF motif [40,67], exists in 5 of 7 AtWRKYs and all 7 SmWRKYs in group 2d. The results indicate functional similarities of WRKY proteins belonging to a group. Group-specific motifs may attribute to functional divergence of WRKYs.

Selective constraints on SmWRKY genes
In order to preliminarily examine the evolutionary mechanism of WRKYs, we test the hypothesis of positive selection acting on SmWRKY genes using site models and branch-site models in PAML [68] developed by Nielsen and Yang [69] and Yang et al. [70]. Codon substitution models [71] M0, M1a, M2a, M3, M7 and M8 were applied to the alignments and these models assume variation in ω among sites. The parameter estimates, log-likelihood and the LRT tests for these models are shown in Table 4. To examine how dN/dS ratios differed among codon positions, we compared models M0 and M3. The log likelihood of M0 for SmWRKY sequences was l = −5119.032, with an estimate of ω = 0.038. The low ω value suggests a strong action of purifying selection in the evolution of SmWRKY analyzed. M3 (discrete) assumes a general discrete distribution with three site classes (p 0 , p 1 , p 2 ). The log likelihood of M3 was l = −4864.540, with an estimate of ω 0 = 0.00052, ω 1 = 0.02439, ω 2 = 0.08755 (Table 4). Consistent with M0, the data from M3 also suggest that all codons are under purifying selection. Additionally, the value of twice the log likelihood difference (2ΔlnL) between M3 and M0 is 508.98. It is strongly statistically significant (p < 0.01) and suggests the overall level of selective constraints fluctuated.
To test whether positive selection promoted divergence between genes, the codon substitution models that allow positive selection (M2a and M8) and that hypothesize nearly neutral selection (M1a and M7) were compared (M2a vs. M1a and M8 vs. M7; Table 4). The log likelihood of M1a and M2a for SmWRKY sequences was l = −5119.251. However, no site was positively selected at a level of 95%. M7 and M8 fitted the sequences better than M0, M3, M1a and M2a with values of l = −4857.207 and −4857.206, respectively (Table 4). In both cases, no significant evidence of positive selection was found.
Branch-site models aim to detect positive selection affecting a few sites along particular lineages and allow ω ratios to vary among sites and lineages simultaneously [68]. It seems that the branch-site models are most suitable for describing evolutionary processes of the WRKY gene family. Therefore, we analyzed positively selected amino acid sites of SmWRKYs using the improved branch-site model [72]. The branches being tested for  Table 5. A total of 19 residues were found to be under positive selection (p > 90%). It includes 6 in group 2c and 10 in group 2d. The other three residues were found in group 2b, group 2e and group 3. No residues in group 1 and group 2a were found to be under positive selection. The results suggest that different WRKY groups may have different evolutionary rates. Groups 2c and 2d could be confronted with strong positive Darwinian selection, since many highly significant positive sites were detected at the 0.01 significance level ( Table 5). The evolution in the other groups seems to be more conservative.

Functional divergence analysis (FDA) of SmWRKY proteins
Using DIVERGE 2.0 that evaluates shifted evolutionary rate and altered amino acid property after gene duplication [74,75], we carried out posterior analysis for estimation of Type-I and Type-II functional divergence between SmWRKY clusters. The estimation was based on the WRKY protein neighbor-joining tree consisting of three major groups (group 1, group 2a-e, and group 3) (Figure 4). Comparison among SmWRKY subgroups showed that all of the coefficients for the type I functional divergence (θ I ) were greater than zero (Additional file 3: Table S3). The θ I values of eight group pairs, including 1/2e, 1/3, 2a + b/2d, 2a + b/2e, 2a + b/3, 2c/2e, 2c/3 and 2d/3, were ranged from 0.219 to 0.772 and were statistically significant (P < 0.01) (Additional file 3: Table S3). It indicates that significant site-specific changes may have happened at certain amino acid sites between these group pairs, leading to a subgroup-specific functional evolution after their diversification.
Type II functional divergence (θ II ) values in six group pairs, including 1/2c, 1/2d, 2a + b/2d, 2c/2d and 2d/3, were also greater than zero and ranged from 0.017 to 0.234 (Additional file 4: Table S4). It indicates a radical shift in amino acid properties. In order to extensively reduce positive false, Q k > 0.8 and 1.0 were empirically used as cutoff in the identification of the Type-I and Type-II functional divergence-related residues between gene groups, respectively (Figures 6 and 7). Detailed analysis showed that the number and the distribution of predicted residues for functional divergence in group pairs were different (Additional file 3: Table S3 and Additional file 4: Table S4) and the residues predominantly existed in the WRKY domain. It suggests that these residues probably play important roles in functional divergence of WRKYs during the evolutionary process.

Tissue-specific expression of SmWRKYs
It has been shown that a number of WRKY proteins are involved in plant developmental processes [54,76,77]. In order to preliminarily understand the roles of WRKYs in S. miltiorrhiza development, we analyzed the expression of SmWRKYs in roots, stems, leaves and flowers of S. miltiorrhiza plants. All of the 61 SmWRKYs identified were expressed in at least a tissue analyzed and exhibited differential expression patterns (Figure 8). Of the 61 SmWRKYs, 22 (36.1%) showed predominant expression in roots, 13 (21.3%) in stems, 4 (6.6%) in leaves and 1 (1.6%) in flowers. The other 21 (34.4%) were mainly expressed in at least two tissues analyzed, indicating these genes are likely to play a more ubiquitous role in S. miltiorrhiza. Furthermore, some SmWRKYs in a group shared similar expression patterns, while the others were not. For example, SmWRKY2, SmWRKY24, SmWRKY39, SmWRKY54 and SmWRKY55, belonging to group 1, were predominantly expressed in roots, while the other group 1 members, such as SmWRKY42, SmWRKY13 and SmWRKY60, were mainly expressed in stems, leaves and flowers, respectively ( Figure 8). It suggests that SmWRKYs belonging to a group do not necessarily indicate their functions in the same tissues. However, it has been shown that the tissues-specific expression patterns appear to be consistent with their role in the tissues. For example, VvWRKY01, belonging to group 2c, is involved in the regulation of lignin biosynthesis [78]. Over-expression of VvWRKY01 in tobacco resulted in the alteration of expression patterns of genes involved in lignin biosynthesis pathway [78]. Similarly, SmWRKY12, SmWRKY19 and SmWRKY47 in group 2c were predominantly expressed in stems (Figure 8). It indicates the putative roles of these SmWRKYs in the regulation of lignin biosynthesis in S. miltiorrhiza. AtWRKY6, a member of group 2b, and AtWRKY53 and AtWRKY70, two AtWRKYs in group 3, are an important regulator in senescent leaves [55,76,[79][80][81]. Of them, AtWRKY6 acts in the upstream of SIRK during leaf senescence [55]. SmWRKY59 belonging to the same WRKY group of AtWRKY6 and SmWRKY7 included in group 3 could be regulators of leaf senescence in S. miltiorrhiza, since both of them showed predominant expression in leaves (Figure 8). It has been shown that AtWRKY44 included in group 1 is involved in trichome differentiation [54]. Several SmWRKYs in group 1, such as SmWRKY2, SmWRKY39, SmWRKY24, SmWRKY54 and SmWRKY55, were highly expressed in roots with abundant root hairs, and SmWRKY13 belonging to the same group was highly expressed in leaves with abundant of trichomes ( Figure 8).
It implies that these SmWRKY may be associated with trichome development in S. miltiorrhiza.

Methyl jasmonate (MeJA)-responsive SmWRKYs
MeJA is a key signaling molecule involved in plant response to stress and in regulating secondary metabolite production in many plant species, including S. miltiorrhiza  [3,14]. More than 50% (39 of 74) of AtWRKYs were MeJAresponsive in Arabidopsis [82]. More than 1/3 of CrWRKY were regulated by MeJA in hairy roots of Catharanthus roseus [83]. Additionally, CaWRKY27 in Capsicum annuum [84], GbWRKY1 in Gossypium barbadense [85] and GhWRKY3 in Gossypium hirsutum [86] were also regulated by MeJA. In order to test whether WRKYs were responsive to MeJA treatment in S. miltiorrhiza, the expression level of SmWRKYs in roots of plantlets treated with MeJA was analyzed using the quantitative RT-PCR method. MeJA treatment showed a wide variety of SmWRKY gene expression profiles ( Figure 9). Significant expression level changes were observed for 49 SmWRKYs, of which 26 were up-regulated, 18 were down-regulated, while the other 5, including SmWRKY1, SmWRKY15, SmWRKY17, SmWRKY20 and SmWRKY24, were either up-regulated or down-regulated at different time-points of treatment ( Figure 9). It suggests that about 80% of the  Figure 9). It suggests that the number of up-regulated SmWRKYs is slightly more than down-regulated.

Yeast extract and Ag + -responsive SmWRKYs
In order to further investigate the roles of SmWRKYs in S. miltiorrhiza, transcriptome-wide analysis of SmWRKY expression in response to yeast extract and Ag + treatment was performed. RNA-seq data of S. miltiorrhiza hairy roots treated with or without yeast extract (100 μg/ml) and Ag + (30 μM) were downloaded from GenBank under the accession number SRR924662 [12]. RNA-seq reads from non-treated (0 hpi) and treated for 12 h (12 hpi), 24 h (24 hpi) and 36 h (36 hpi) were mapped to SmWRKYs using the SOAP2 software [87]. The log-2-transformed RPKM (RNA-seq reads mapped to a SmWRKY per total million reads from a treatment per kilobases of the SmWRKY length) value of SmWRKYs varied between −3.04 and 8.38 (Additional file 5: Table S5). Using a cutoff of RPKM value >2.0, a total of 49 SmWRKYs were found to be expressed in hairy roots.
Fisher's exact test showed that 42 of the 49 SmWRKYs were differentially expressed (Additional file 5: Table S5). It includes 17 significantly up-regulated, 19 significantly down-regulated and 6 significantly up-or down-regulated at different time-points, suggesting the majority of the Note: *p < 0.05 and **p < 0.01 (x 2 test). 1 ω was estimated under model M0,M3,M7, and M8; p and q are the parameters of the beta distribution. 2 The number of amino acid sites estimated to have undergone positive selection.
identified SmWRKYs were responsive to yeast extract and Ag + treatment.

SmWRKY candidates potentially involved in tanshinone biosynthesis
Terpenoids are plant secondary metabolites with significant physiological and ecological functions and great economic values, and a class of terpenoids, known as tanshinones, is the main bioactive compounds in S. miltiorrhiza. Increasing evidence demonstrates the importance of WRKY genes in the biosynthesis of secondary metabolites, such as terpenoid indole alkaloid in Catharanthus roseus [83]. Additionally, it has been shown that Gossypium arboretum GaWRKY1, which belongs to group 2a, participates in sesquiterpene biosynthesis in cotton by controlling (+)-δ-cadinene synthase (CAD1) activity [88]. PqWRKY1, a member of group 2d, responds to MeJA treatment and is a positive regulator of osmotic stress response and triterpene ginsenoside biosynthesis in Panax quinquefolius [89]. AaWRKY1 and CrWRKY, belonging to group 3, are the other two terpenoid biosynthesis-related WRKYs. Artemisia annua AaWRKY1 was highly expressed in glandular secretory trichomes (GSTs), where the sesquiterpene artemisinin was synthesized [90]. AaWRKY1 might be strongly induced by MeJA and could bind to the W-box in the promoter of ADS gene encoding amorpha-4, 11-diene synthase, a key enzyme in the artemisinin biosynthesis pathway [90]. CrWRKY1 was preferentially expressed in roots of C. roseus and also induced by MeJA [91]. It controlled terpenoid indole alkaloid biosynthesis through positive regulation of DXS and SLS genes involved in the terpenoid pathway and AS and TDC genes involved in the indole pathway [91]. Thus, SmWRKYs included in group 2a, 2d and 3 probably have an evolutionarily conserved role in regulating terpenoid biosynthesis in S. miltiorrhiza. Terpenoid tanshinones have been mainly produced and accumulated in roots of field-grown S. miltiorrhiza during the fast growing period from June to September [92][93][94]. The process of tanshinone production may be   Table S5). SmWRKY2, SmWRKY3, SmWRKY9, SmWRKY11, SmWRKY25, SmWRKY32, SmWRKY37, SmWRKY43 and SmWRKY52 were up-regulated by both the MeJA treatment and the yeast extract and Ag + treatment, while SmWRKY23, SmWRKY26, SmWRKY33, SmWRKY41, SmWRKY47, SmWRKY53 and SmWRKY59 were down-regulated ( Figure 9, Additional file 5: Table S5). Among the sixteen SmWRKYs, eight, including six up-regulated (SmWRKY2, SmWRKY3, SmWRKY9, SmWRKY25, SmWRKY37 and SmWRKY52) and two down-regulated (SmWRKY26 and Figure 6 Site-specific prediction for type-I functional divergence between groups of SmWRKYs. The X-axis represents locations of sites. The Y-axis represents the probability of each group. The red line indicates cutoff = 0.80. SmWRKY33), were predominantly expressed in roots of field-grown S. miltiorrhiza in August (Figure 8), suggesting their specific roles in roots. SmWRKY2, SmWRKY3, SmWRKY9 and SmWRKY26 are members of groups 1, 2d, 2a and 2b, respectively, while SmWRKY25, SmWRKY33, SmWRKY37 and SmWRKY52 are members of group 2c (Figures 1 and 4). Thus, SmWRKY3 and SmWRKY9 are two SmWRKYs (1) with similar responses to the MeJA treatment and the yeast extract and Ag + treatment, (2) having root-specific expression, and (3) belonging to group 2a, 2d or 3 with members probably playing an evolutionarily conserved role in regulating terpenoid biosynthesis. It implicates that SmWRKY3 and SmWRKY9 are very likely to be activators in tanshinone production. Notably, we may not exclude the possibility that some other SmWRKYs are also involved in tanshinone biosynthesis based on the data currently available. Further analysis of transgenic S. miltiorrhiza plants with over-expressed or silenced SmWRKYs may help to elucidate their function.

Divergence of paralogous SmWRKY genes
Gene duplication is an important event for gene family expansion and functional diversity during evolution [65,96,97]. A total of 42 (68.85% of 61) SmWRKY genes appear to be duplicated (Additional file 5: Table S5). In order to preliminarily reveal the mechanism of functional diversity (nonfunctionalization, subfunctionalization and neofunctionalization [98]) of these genes after duplication, the synonymous (Ks) and non-synonymous (Ka) substitution rate were calculated using the CDS of paralogous SmWRKY genes (Additional file 6: Table S6). The Ka/Ks ratios for all of the 21 paralogous SmWRKY gene pairs were less than one with 5 pairs even close to zero. It suggests that the WRKY genes from S. miltiorrhiza have experienced strong purifying selection pressure. Some closely related gene pairs displayed different expression patterns, indicating functional divergences occurred. For example, SmWRKY13 was expressed dominantly in leaves, whereas the other member of the SmWRKY13/SmWRKY31 gene pair, SmWRKY31, was expressed mainly in roots and The expression level of SmWRKYs was analyzed by the quantitative RT-PCR method. Y-axis indicates relative expression levels. X-axis indicates different tissues. SmUBQ10 was used as the reference gene. Transcript levels in leaves were arbitrarily set to 1 and the levels in other tissues were given relative to this. Error bars represent standard deviations of mean value from three biological and three technical replicates. ANOVA (analysis of variance) was calculated using SPSS. P < 0.05 was considered statistically significant.

Conclusions
In this study, we cloned a total of 61 SmWRKY genes. The cloned genes and the deduced proteins were characterized through a comprehensive approach, including phylogenetic tree construction, WRKY domain characterization, conserved motif identification, selective constraint analysis, functional divergence analysis, and expression profiling. We showed that many SmWRKYs and AtWRKYs were evolutionarily conserved.  paralogous SmWRKY genes. Selective constraint analysis and functional divergence analysis showed that the SmWRKY subgroup genes have experienced strong positive selection and diverged in function. Gene expression profiles suggest that the majority of 61 SmWRKY genes are tissue-specific and MeJA-and yeast extract and Ag + -responsive. These results provide insights into functional conservation and diversification of SmWRKYs and are useful for further investigating SmWRKY functions in S. miltiorrhiza development and defense response.

Database search and sequence annotation
The nucleotide sequences and amino acids of 72 AtWRKY genes were obtained from the Arabidopsis Information Resource (TAIR; http://www.Arabidopsis.org/). S. miltiorrhiza WRKY (SmWRKY) genes were predicted by tBLASTn [100] search of AtWRKY [101] homologs against the current S. miltiorrhiza genome assembly, which covers about 92% of the entire genome and 96% of the proteincoding genes [18]. An e-value cut-off of 1e-10 was applied to the homologue recognition. The retrieved sequences were used for gene model prediction on the GENSCAN web server (http://genes.mit.edu/GENSCAN.html). Full-length CDSs of SmWRKYs were amplified by reverse transcription-PCR using the primers listed in Additional file 2: Table S2. PCR products were gel-purified, cloned, and then sequenced. The theoretical isoelectric point (pI) and molecular weight (Mw) were predicted using the Compute pI/Mw tool on the ExPASy server (http://web.expasy.org/compute_pi/) [102].

Multiple sequence alignment, phylogenetic analysis and motif detection
PpWRKY genes were obtained from Physcomitrella patens v3.  [3,63]. Plantlets treated with carrier solution were used as controls. Roots of plantlets with or without MeJA treatment were collected and stored in liquid nitrogen until use. Three independent biological replicates were carried out for each experiment.
RNA extraction and quantitative real-time reverse transcription-PCR (qRT-PCR) Total RNA was extracted from plant tissues using the Quick RNA Isolation Kit (Huayueyang, China). RNA integrity was analyzed on a 1.2% agarose gel. RNA quantity was determined using a NanoDrop 2000C Spectrophotometer (Thermo Scientific, USA). cDNA synthesis was carried out using Superscript III Reverse Transcriptase (TaKaRa, China). qRT-PCRs were performed using the SYBR premix Ex Taq™ kit (TaKaRa, China) and carried out in triplicate for each tissue sample. Gene-specific primers (Additional file 7: Table S7) were designed using Primer Premier 5.0. The length of amplicons is between 80 bp and 250 bp. SmUBQ10 was selected as a reference gene as described previously [3]. Three independent biological replicates were performed. Statistical analysis was carried out as described [20]. Briefly, standardization of gene expression data was performed from three biological replicates as described [106]. 2-ΔΔCq was used to achieve results for relative quantification. For statistical analysis, ANOVA (analysis of variance) was calculated using SPSS (Version 19.0, IBM, USA).
Analysis of SmWRKY expression in response to yeast extract and Ag + treatment RNA-seq data for S. miltiorrhiza hairy roots treated with yeast extract (100 μg/ml) and Ag + (30 μM) were downloaded from GenBank under the accession number SRR924662 [12]. RNA-seq reads from non-treated (0 hpi) and treated for 12 h (12 hpi), 24 h (24 hpi) and 36 h (36 hpi) were mapped to SmWRKYs using the SOAP2 software [87] and analyzed as described previously [107]. The parameter v cutoff of 3 and parameter r cutoff of 2 were applied. SmWRKYs with the RPKM value greater than 2 were analyzed for differential expression using Fisher's exact test. P < 0.05 was considered as differentially expressed.

Ka and Ks calculation
Paralogous SmWRKY genes were inferred from phylogenetic analysis. Non-synonymous (Ka) and synonymous (Ks) substitution of each paralogous gene pair were also determined by PAL2NAL program (http://www.bork. embl.de/pal2nal/) [108], which is based on the codon model program in PAML [68].

Tests of positive selection
To determine whether the WRKY gene family exhibited evidence of positive selection under the site model and branch-site model [71], we applied the codeml program in PAML v4.8 to test the hypothesis of positive selection. An unrooted phylogenetic tree of SmWRKYs was reconstructed using the maximum likelihood method. In the site model, M0 (one ratio), M1a (neutral), M2a (selection), M3 (discrete), M7 (beta) and M8 (beta + ω > 1) were applied to the alignments, and we detected variation in the ω parameter among sites using the LRT for M1a vs. M2 a, M0 vs. M3 and M7 vs. M8. Branch-site model [72] was used to compare the non-synonymous/ synonymous substitution rate ratio (Ka/Ks) between clades or sequences. The ratio of nonsynonymous-tosynonymous for each branch under model A was calculated. Posterior probabilities (Qks) were calculated using the BEB method [68].

Estimation of functional divergence
The software DIVERGE2 was used to detect the functional divergence among members of SmWRKY subgroups [74]. The method is based on maximum likelihood procedures to estimate significant changes in the site-specific shift. The coefficients of Type-I and Type-II functional divergence (θ I and θ II ) between two clusters were calculated. The coefficients of Type-I and Type-II functional divergence (θ I and θ II ) greater than 0 indicates that site specific altered selective constraints or a radical shift of amino acid physiochemical property occurred after gene duplication and/or speciation [74]. Large posterior probability (Qk) indicates a high possibility that the functional constraint (or the evolutionary rate) and/or the radical change in the amino acid property of a site is different between two clusters [74].