Genome-Wide Identi cation and Characterization of WRKY Gene Family in Camelina Sativa

Background: WRKY gene family is one of the largest transcription factor families and WRKY proteins (WRKYs) have the complex biological functions to regulate plant metabolic processes. Although the WRKY genes were identied in many species and the functions were veried, there were no reports of Camelina sativa WRKY genes. Results: In this investigation, a total of 202 CsWRKY genes were identied and encoded 242 CsWRKYs. The CsWRKYs were further classied into three major groups according to their structure and phylogeny. The comprehensive analysis showed the characteristic sequences of CsWRKYs were conserved in the evolutionary process. In addition, the 137 segmental duplication events were the major force to expand the CsWRKY members in evolution. Compared with other reported plant species, CsWRKYs family as the largest WRKY gene family had maximum members. Furthermore, expression proling indicated that different CsWRKY members exhibited differently in shoots and roots, and some CsWRKY genes were also up-regulated to varying degrees under salt stress in shoots. Conclusions: In this research, a detailed overview of CsWRKY family genes and expression patterns offered precious information for understanding the potential evolutionary process and the biological functions of CsWRKY genes, which was useful for the further characteristic research of CsWRKY genes and the development of high-quality Camelina sativa varieties.

proteins to activate or inhibit the expression of downstream defense-related genes, thereby forming a chain of stress responses. Moreover, transcription factors (bHLH [11], MYB [12], bZIP [13], NAC [14], WRKY [15], et al.) play a signi cant role in stress responses, which combines the cis-elements to regulate the gene expression with other regulatory factors [16]. Among them, WRKY transcription factor is a large transcription factor gene family with complex biological functions, and speci c to various species from single-celled algae to higher plants. For instance, in Arabidopsis thaliana (A. thaliana), overexpression of AtWRKY50 promoted the production of sinapic derivatives [17]. AtWRKY46, 54, and 70 have important effect on activating the expression of brassinosteroid-mediated gene and restraining the drought gene response [18]. In Taxus chinensis, TcWRKY8 and TcWRKY47 regulated the expression of taxolbiosynthesis-related genes [19]. TaWRKY33 signi cantly enhanced the wheat drought tolerance [20]. PtrWRKY18 and PtrWRKY35 increased the resistance to Melampsora in Populus [21].
All known WRKYs contain at least one WRKY conserved domain. WRKY domain (around 60 amino acid residues) comprises a highly conserved N-terminal (NT) amino acid sequence (WRKYGQK) and an atypical zinc nger structure (C-X4-5C-X22-23-H-X1-H or C-X7C-X23-H-X1-C) at the C-terminus [16]. The conserved amino acid sequence WRKYGQK only has the single base mutation, such as WRKYGKK, WRKYGEK, and some WRKY sequences are replaced by WRRY, WSKY, WKRY, WVKY or WKKY sequences sometimes [22,23]. The WRKY domain speci cally binds to W-box (T)(T)TGAC(C/T), the core sequence TGAC decides the binding ability and the play of relevant function [24]. Depending on the number of WRKY domain and the characteristic of zinc nger structure, WRKYs are generally classi ed into three main groups (1, 2, and 3). There are two WRKY domains in group 1 and zinc nger structure type is C-X4-C-X22-23-H-X1-H, while group 2 and 3 has one WRKY domain, and the type of zinc nger structure is C-X4-C-X22-23-H-X1-H and C-X4-5-C-X23-24-H-X1-H respectively [25]. Group 2 can be further divided into 2a, 2b, 2c, 2d and 2e. In addition, some special R-protein WRKYs were found in certain species, which may lead to signal diversity and even shorten the speed of signal transmission with other components of signaling pathways. There are three, one and one R-protein WRKYs in A. thaliana (AtWRKY16, AtWRKY19 and AtWRKY52), Glycine Max (GmWRKY1) [26] and pineapple (AcWRKY23) [27] respectively.
In recent years, lots of research have focused on the WRKY gene families of many species. Since the rst WRKY gene (SPF1) was studied in Ipomoea batatas [28], the WRKY genes of many species were identi ed (showed in Additional le 1: Table S1). Many WRKY genes had been performed the functional veri cation in response to salt stress. For example, the up-regulated expression of Glycine max WRKY92, 144 and 165 was connected with salinity stress [29]. VvWRKY30 from grape was con rmed to confer tolerance to salinity stress [30]. GmWRKY45 had a positive effect on the response to phosphorus and salinity stress, and a relation with fertility [31]. At present, there are no reports of the C. sativa WRKY proteins (CsWRKYs). In this study, a total of 242 CsWRKYs were identi ed and divided into three major groups. There were further comprehensive analyses of the CsWRKYs, including the structural characteristics, physicochemical property, construction of phylogenetic tree, identi cation of conserved motif, chromosomal locations and synteny analysis. Several speci c WRKY members were investigated in different tissues in the response to salt stress (SS) by the expression pro ling of CsWRKYs. This research offered precious resources for understanding the biological roles of CsWRKYs.

Identi cation of CsWRKYs
A total of 243 candidate WRKYs were initially obtained. All putative WRKYs were further identi ed for the presence of complete WRKY-speci c domains. One predicted WRKY protein (XP_019082498.1) was removed due to incomplete WRKY domain. The remaining 242 WRKYs were renamed CsWRKY1-CsWRKY242 respectively (Additional le 2: Table S2). Their characteristics were showed including the group and subgroup, gene locus ID, start and end position in the chromosomes, SL, MW and PI. The annotation of 242 WRKYs were scanned using available C. sativa genome database. All 202 CsWRKY genes coded 242 CsWRKYs. The formation of 70 CsWRKYs was by alternative splicing of 30 CsWRKY genes, the rest of CsWRKY genes only had one coding protein (Additional le 3: CsWRKYs were aligned using Bioedit software. Some CsWRKYs selected randomly were showed in Figure  1 because of excessive sequences (Multiple sequence alignment of 242 CsWRKYs were displayed in Additional le 6: Figure S1).
The CsWRKYs included the highly conserved heptapeptide sequence (WRKYGQK) and the zinc-nger motif. A total number of 11 CsWRKYs in the group 2c only were different due to a change in one amino acid of WRKYGQK sequence, CsWRKY202 contained a WRKYGXK sequence, the others had a WRKYGKK sequence (CsWRKY127, CsWRKY128, CsWRKY135, CsWRKY223, CsWRKY228, CsWRKY229, CsWRKY230, CsWRKY232, CsWRKY233 and CsWRKY235).
Each WRKY domain had one highly conserved intron structure except the N-terminal conserved WRKY domain of the WRKYs in the group 1, and the position also was rather conservative ( Figure 1). There were two kinds of introns, the constant amino acid PR intron was between the WRKY sequence and the zincnger domain in some groups (group 1-CT, 2c, 2d, 2e and 3). The other intron VQR was located in the interior of zinc-nger structure (C-X4-5-C-X5-VQR-X18-19-H-X1-H) in the group 2a and 2b [32].
The conserved intron made the identi cation of the correct WRKY domain easier.
Some speci c WRKYs contained R proteins and the WRKY domains that belonged to the group R, which were one of the most signi cant features of the WRKY family genes in owering plants [25]. For example, A. thaliana had three R protein-WRKY genes (AtWRKY16, AtWRKY19 and AtWRKY52) [26]. In the phylogenetic tree, AtWRKY19 and CsWRKY47 were gathered to the group 1, the other four sequences (AtWRKY16, AtWRKY52, CsWRKY186 and CsWRKY220) were clustered into the group 2d. The classi ed standard of R protein-WRKY genes have been characterized by Rinerson et al [26]. According to the further protein architecture analysis of CsWRKY47, CsWRKY186 and CsWRKY220, CsWRKY47 contained the typical domain of PAH-WRKY(1-NT)-WRKY(1-CT)-NB-ARC and the protein kinase domain in the Cterminal end of the sequence, which belonged to the group RW3 and was the only member. CsWRKY186 had the typical domain of NB-ARC-LRR-WRKY. CsWRKY220 had the TIR-NB-ARC-LRR-WRKY domain, which indicated that CsWRKY186 and CsWRKY220 were peculiar and not belong to any group. The three CsWRKYs formed possibly in order to adapt to especial change during the growth.

Motifs analysis and chromosomal mapping
A total of 10 motifs in all CsWRKYs were analyzed using the MEME, the motifs identi ed ranged from 15 to 50 residues in width ( Figure 3) and shown with colored boxes according to the scale (Additional le 7: Figure S2). The CsWRKYs were clustered into groups according to the similar quantity and structure of motifs. The information of each motif was displayed in Figure 3. Motif 1, 2 and 3 were found to code the WRKY conserved domain. All CsWRKYs had more than two motifs, and some CsWRKYs holded 7 kinds of motif at most. The motifs were diverse among different groups. For example, the group 1 had 9 motifs (motif 1, 2, 3, 4, 5, 6, 8, 9 and 10), the number of motif 1, 2 and 3 was two, and motif 6 and 9 were unique to these members of group 1, which may be signi cant to the speci c functional WRKYs. Motif 10 only appeared both group 1 and 2c. Motif 4 and 8 were present in group 1, 2b and 2c. Motif 7 was present in group 2d, 2e and 3. Motif 5 was absent in group 2c and 2e merely. The number of motif 5 was two in all members of the group 2a. As a whole, the motif analysis of CsWRKYs showed that every group possibly had peculiar conservation, corresponding to the grouping of the phylogenetic tree.
All identi ed CsWRKY genes were distributed across all C. sativa chromosomes using MapChart (Chr1-Chr20, Figure 4). Some genes had alternative splicing, so some protein name represented one gene name (in the Additional le 7: Figure [34], Kiwifruit [24] and Vitis vinifera [35]. Gene duplication, synteny analysis and selection pressure analyses Gene duplication played a vital driving force in gene evolution including tandem and segmental duplication. The two main evolutionary patterns played a prominent role in the generation of new genes and the evolutionary formation of new functions, which was also the pivotal reason to expand the member of gene family in plants [22,29,[36][37][38]. In order to analyze the expansion patterns of CsWRKY genes, tandem duplication event (the distance of two or more genes within 200kb in equal chromosome) was con rmed on the basis of the method by Guo et al [35]. The results indicated that the gene duplication of the CsWRKYs contained no tandem duplication. Meanwhile, the gene duplication of CsWRKYs among C. sativa chromosomes was identi ed using Blastp and MCScanX. The analysis showed that there was no tandem duplication, which was consistent with above result. There were 137 segmental duplication events ( Figure 5, Additional le 4: Table S4). These revealed that certain CsWRKYs appeared with the segmental duplication events to a great extent. It has been reported that there were 17 and 55 segmental duplication events in pineapple and kiwifruit chromosomes [24,27]. The above result indicated that segmental duplication was an uppermost mode of the evolution of CsWRKY genes.
The comparative synteny maps of two different genomes (C. sativa and A. thaliana, C. sativa and B. rapa) were performed in order to explore the origin and evolution of CsWRKY genes ( Figure 6, Additional le 5: Table S5). The corresponding orthologs between 173 CsWRKY genes and 65 AtWRKY genes were con rmed and had 173 gene pairs. The corresponding orthologs of 166 CsWRKY genes and 111 BrWRKY genes had 282 gene pairs. The multiple CsWRKY genes corresponded to a single AtWRKY gene. These syntenic relationship showed the expansion of CsWRKY genes possibly occurred after A. thaliana in evolution.

Expression pro ling analysis
The transcript expression levels of 202 CsWRKY genes were displayed in Figure 7. The expression number of CsWRKY genes approximately reached 89.11% in R, followed by 88.61% (F), 81.68% (IF), 80.69% (S), 78.71% (OL), 77.23% (ESD), 76.73% (EMSD) and 75.25% (GS). The percent of CsWRKY genes expression in other tissues were 68.32% (C), 67.82% (LMSD) and 64.85% (YL) respectively. The lowest expression percent was 52.48% in LSD. The 70 CsWRKY genes (34.65%) expressed in twelve tissues including 28 CsWRKY genes in group 1, 41 CsWRKY genes in group 2 and 1 CsWRKY gene in group 3. Simultaneously, the 24 CsWRKY genes highly expressed in every tissue. Furthermore, CsWRKY36 and CsWRKY227 not expressed in every tissue. Some CsWRKY genes also had the tissue speci c expression pattern (Figure 7). In group 1, there were two genes (CsWRKY25 and CsWRKY34) co-expressed in IF and LSD. CsWRKY174 in group 2b was co-expressed in GS and R. In group 2c, CsWRKY182 and CsWRKY202 speci cally were expressed in F, CsWRKY204 only was expressed in ESD, CsWRKY73 was co-expressed in F and OL, CsWRKY203 was co-expressed in F and GS. CsWRKY118 in group 2e was co-expressed in GS and S. In group 3, there were three genes (CsWRKY111, CsWRKY208 and CsWRKY226) merely expressed in R, CsWRKY100 was expressed both in ESD and R.

The functions of most AtWRKY genes had been veri ed. It has been reported that AtWRKY25 and
AtWRKY33 were sensitive to NaCl, and overexpression of either gene enhanced tolerance to NaCl [39], which belonged to group 1. According to structural features and phylogenetic analysis of CsWRKYs and AtWRKYs, we found that AtWRKY33 was the ortholog of CsWRKY8, CsWRKY9 and CsWRKY10, AtWRKY25 was the ortholog of CsWRKY48, CsWRKY49 and CsWRKY50. These CsWRKY genes belonged to group 1. To provide further clues to the function of CsWRKY genes under SS, the expression levels of six CsWRKY genes in roots and shoots under SS and normal conditions (NC) were displayed in Figure 8. These CsWRKY genes had higher levels in shoots than that in roots under same conditions. Compared with gene expression levels in NC, three CsWRKY genes (CsWRKY8, CsWRKY9 and CsWRKY10) under SS had a signi cant down-regulated expression in roots, three other CsWRKY genes (CsWRKY48, CsWRKY49 and CsWRKY50) under SS had no obvious change in roots. These CsWRKY genes had higher levels of expression to varying degrees under SS in shoots. In particular, CsWRKY50 were signi cantly upregulated in shoots under SS. The expression analyses implied that these CsWRKY genes played an important part in the response to SS. These results meant that most genes were deliberately expressed in space and time, which were not useless genes. They may have functional differentiation in the process of evolution to cope with different environmental changes.

Large amounts of WRKYs in Camelina sativa
WRKYs were widely present in various plants, and a large number of WRKYs were identi ed. In this study, a total of 202 CsWRKY genes were obtained and coded 242 CsWRKYs. The distribution of the CsWRKYs in groups was very similar to that of AtWRKYs. The comprehensive analysis showed the characteristic sequences of CsWRKYs were conserved in the evolutionary process. Compared with other reported plant species, CsWRKYs family had the largest members and was the largest WRKY gene family (Additional le 1: Table S1). The number of CsWRKYs in the group 2 and 3 was threefold than AtWRKYs approximately, and the group 1 members increased largely compared with AtWRKYs, these might be the expanding result of the number of CsWRKYs in the process of evolution.

Structure and classi cation analysis
The phylogeny and motifs of 242 CsWRKYs were analyzed (Additional le 7: Figure S2). The identi cation of motifs also might provide support force of evidence to reveal the duplicate evolutionary process of the WRKY family genes and explore the mechanism of functional conservation. All CsWRKYs were classi ed to these groups (group 1, 2a, 2b, 2c, 2d, 2e and 3), and each group or subgroup had largely semblable structure and the speci c domain combination. The group-speci c motifs might due to functional differentiation of CsWRKYs. And CsWRKYs had the similar characters in the average amino acids length, MW and PI compared to other species, such as chickpea [34] and common bean [33]. The CsWRKYs in group 1 contained two WRKY domains and were the ancient members during the WRKY evolution. The WRKYs in group 2a and 2b came from an algal single WRKY domain or the other Group 1derived lineage [26,34]. The members of group 2c had been evolved from the lack of N-terminal domain of CsWRKYs in group 1, and the number of CsWRKYs reached the highest in all groups. The same results had been observed in some species, such as Salvia miltiorrhiza, Panicum miliaceum L and common bean [16,33,40]. The increasing largely quantity might due to adopt to various changes of the whole evolution. There were eleven variant CsWRKYs including WRKYGXK and WRKYGKK in group 2c, the different variants (WRKYGKK, WRKYGEK, WKKYEDK, et al) also were found in many species [16,35,40,41], these variants might give WRKYs multiple biological functions [33]. In addition, R-type CsWRKY 47 belonged to RW3, CsWRKY 186 and CsWRKY 220 were new R-type proteins in C. sativa. From present research Arabidopsis WRKY coded a NBS-LRR-WRKY protein, the protein worked as chimeric protein and the WRKY domain had the DNA-binding activity [42]. The interaction of AtWRKY52/RRS1 with R protein RPS4 protected plant from the infection of fungal and bacterial pathogens [38]. These indicated that various Rtype CsWRKYs possibly gave C. sativa variant disease resistance.

Duplication events plays an important role in evolution of CsWRKY
The expansion reason and complex mechanism of gene families was mainly caused by gene duplication [34]. CsWRKY genes had no tandem duplication event and 137 segmental duplication events, which revealed that segmental duplication events were the major force to expand the CsWRKY members. There were similar results in Chickpea, Pineapple and Kiwifruit [24,27,34]. Thus, it`s speculated that segmental duplication events were the primary cause to reach the largest number for CsWRKY genes. The genetic evolutionary process was reestablished probably by comparing all gene sequences within the same genome or among the different genomes [43]. The comparative synteny maps (C. sativa and A. thaliana, C. sativa and B. rapa) were performed. According to the syntenic map of C. sativa and A. thaliana, 173 CsWRKY genes have 65 corresponding Arabidopsis orthologs, the multiple CsWRKY genes corresponded to a single AtWRKY gene. According to the syntenic map of C. sativa and B. rapa, 166 CsWRKY genes have 111 corresponding B. rapa orthologs. Thus, the expansion of CsWRKY genes probably occurred after A. thaliana and B. rapa. These CsWRKY genes probably formed new gene functions to adapt to various changes so that the member was expanded. The majority Ka/Ks ratios of the above syntenic WRKY gene pairs were less than 1, which demonstrated that CsWRKY genes had experienced strongly the evolution of purify selection. Purify selection (Ka/Ks ratio < 1) usually caused the elimination of deleterious genes in evolution, which indicated the key conserved sequences of WRKY genes was bene cial to the plant survival and growth [8,34].

Vital regulation role of CsWRKYs
AtWRKY25 and AtWRKY33 responded to SS had three orthologs in C. sativa respectively, which was owing to expansion of CsWRKY members in evolution. To further investigate the response of CsWRKYs to SS, six CsWRKY genes in roots and shoots were monitored, which were induced to varying degrees under SS (Figure 8). The results revealed that CsWRKY genes may perform complex function and play an important role in response to SS. Previous studies have indicated that the high expression of WRKYs had the vital in uence by transcription activation of downstream target genes in the plant growth and development [44]. The tissue-speci c expression of WRKY genes might have great effect during the plant growth and development by regulating the transcriptional process [15]. Therefore, the comprehensive analyses and stress response of CsWRKYs would lay a solid foundation for further study of the WRKY gene function in transcriptional regulation.

Conclusion
WRKYs were widely present in various plants, the members of CsWRKYs identi ed expanded largely compared with AtWRKYs. Comprehensive analysis revealed that expansion of CsWRKY members had multiple reasons, segmental duplication events were the major force, which increased the awareness of CsWRKY genes during the functional conservation and evolution. Expression patterns establishes a foundation for the future series of exploration in functional characteristics of CsWRKY genes and mechanism of C. sativa in respond to salt stress.

Sequence identi cation
The complete genome, proteome and CDS sequence les of C. sativa were downloaded from webpage of NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_ 000633955.1_Cs). The WRKY domain HMM (Hidden Markov Model) pro le numbered PF03106 was extracted from the Pfam protein family database (http://pfam.xfam.org/) [33]. The candidate WRKY protein sequences were discovered by the comprehensive research of utilization of HMMER (E-value cut-off < 1E-5) and BLAST analyses (75 AtWRKYs as queries) in C. sativa whole genome protein database [45]. These CsWRKYs sequences were identi ed by checking the complete WRKY conserved domain with SMART (http://smart.emblheidelberg.de/) and InterPro (http://www.ebi.ac.uk/ interpro/), the redundant sequences were manually removed. The con rmed CsWRKYs used the ExPasy online tool website (http://web.expasy.org/protparam/) to calculate physicochemical properties including the protein sequence length (SL), molecular weight (MW) and isoelectric point (PI).
Multiple sequence alignment and the construction of comparative phylogenetic tree Multiple sequence alignment of the CsWRKY domain was performed using ClustalW with Bioedit software. Based on the alignment WRKY domains of CsWRKYs and AtWRKYs, the phylogenetic tree was constructed with MEGA 7.0 using Neighbor-Joining method and the parameters (Poisson model, pairwise deletion, and 1000 bootstrap replications). All identi ed CsWRKYs were divided into different groups according to classi cation of AtWRKYs sequences. The AtWRKYs sequences were obtained from the TAIR database (http://www.arabidopsis.org/).

Motifs analysis
The MEME (http://meme.nbcr.net/meme/intro. html) was used to analyze CsWRKYs and searched for 10 conserved motifs [46]. The interrelated parameters as follow: the repetitive time was any, the maximum motif number was 20 and the motif width was between 5 and 50 residues. The MEME results were displayed with the TBtool software [47].

Chromosomal distribution
The chromosomal position of all CsWRKY genes was determined from the genome annotation le. The physical position of CsWRKY genes from short-arm to long-arm telomeres on the chromosome was mapped using MapChart [48].

Gene duplication and selection pressure analyses
The gene duplication events within C. sativa genome were studied by using MCScanX (Multiple Collinearity Scan toolkit), the whole analytic process was utilized with default setting. The synteny relation between two different genomes (C. sativa and A. thaliana, C. sativa and Brassica rapa (B. rapa)) were identi ed and analyzed. The B. rapa data was downloaded from the Phytozome database (https://phytozome. jgi.doe.gov/pz/portal.html). The synonymous substitution rates (Ks) and nonsynonymous substitution rates (Ka) of the identi ed CsWRKY gene pair were calculated using KaKs Calculator 2.0, the Ka/Ks was also calculated to determine there had the selective pressure act on proteincoding CsWRKY genes or not [49].

Expression pro ling of CsWRKY genes in different tissues
The transcriptional data of twelve tissues of C. sativa was obtained by Kagale et al [7]. During twelve tissues of various developmental stages, the expression levels of CsWRKY genes were utilized to analysis, including cotyledon (C), early-mid seed development (EMSD), early seed development (ESD), ower (F), germinating seed (GS), in orescence (IF), late seed development (LSD), late-mid seed development (LMSD), mature leaf (OL), root (R), stem (S) and young leaf (YL). The hierarchical clustering and the heatmap-based expression pro les of CsWRKY genes were performed by HemI1.0. In order to further explore the function of CsWRKY genes, we researched whether some CsWRKY genes responds to SS. Salt stress was imposed by treating 21 days old seedlings with 192 mM NaCl, and the RNA-Seq data of C. sativa we used in response to SS came from Heydarian et al [4].

Abbreviations
WRKYs, WRKY proteins; NT, N-terminal; CT, C-terminal; SL, protein sequence length; MW, molecular weight; PI, isoelectric point; Ks, synonymous substitution rates; Ka, non-synonymous substitution rates; C, cotyledon; EMSD, early-mid seed development; ESD, early seed development; F, ower; GS, germinating seed; IF, in orescence; LSD, late seed development; LMSD, late-mid seed development; OL, Mature leaf ; R, Root; S, stem; YL, young leaf Declarations Authors' contributions YNS and HLC conceived of the study, participated in all the bioinformatics analysis, including the sequence alignment and phylogeny analysis and drafted the manuscript; Ying Shi helped in bioinformatics analysis. JNX CLJ and CHZ designed the study. RZL guided the writing of the manuscript.
All authors read and approved the nal manuscript.        conditions. Data are means ± SE calculated from three biological replicates. The expression levels of CsWRKY genes in different tissues were compared to NC with one-way ANOVA at signi cance levels of * * P ≤ 0.01 and * P ≤ 0.05.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.