The WRKY transcription factor family in Brachypodium distachyon
© Tripathi et al; licensee BioMed Central Ltd. 2012
Received: 25 January 2012
Accepted: 29 May 2012
Published: 22 June 2012
A complete assembled genome sequence of wheat is not yet available. Therefore, model plant systems for wheat are very valuable. Brachypodium distachyon (Brachypodium) is such a system. The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating important agronomic traits. Studies of WRKY transcription factors in Brachypodium and wheat therefore promise to lead to new strategies for wheat improvement.
We have identified and manually curated the WRKY transcription factor family from Brachypodium using a pipeline designed to identify all potential WRKY genes. 86 WRKY transcription factors were found, a total higher than all other current databases. We therefore propose that our numbering system (BdWRKY1-BdWRKY86) becomes the standard nomenclature. In the JGI v1.0 assembly of Brachypodium with the MIPS/JGI v1.0 annotation, nine of the transcription factors have no gene model and eleven gene models are probably incorrectly predicted. In total, twenty WRKY transcription factors (23.3%) do not appear to have accurate gene models. To facilitate use of our data, we have produced The Database of Brachypodium distachyon WRKY Transcription Factors. Each WRKY transcription factor has a gene page that includes predicted protein domains from MEME analyses. These conserved protein domains reflect possible input and output domains in signaling. The database also contains a BLAST search function where a large dataset of WRKY transcription factors, published genes, and an extensive set of wheat ESTs can be searched. We also produced a phylogram containing the WRKY transcription factor families from Brachypodium, rice, Arabidopsis, soybean, and Physcomitrella patens, together with published WRKY transcription factors from wheat. This phylogenetic tree provides evidence for orthologues, co-orthologues, and paralogues of Brachypodium WRKY transcription factors.
The description of the WRKY transcription factor family in Brachypodium that we report here provides a framework for functional genomics studies in an important model system. Our database is a resource for both Brachypodium and wheat studies and ultimately projects aimed at improving wheat through manipulation of WRKY transcription factors.
KeywordsWRKY transcription factor Brachypodium distachyon Wheat Comparative genomics Database
Grasses (the Poaceae) are one of the most important plant families, because from the very beginning of human civilization they have been one of the major sources of nutrition and sustainable energy and are of huge economic and ecological importance . Wheat is the most widely grown cereal in Europe and the second overall in the world after another grass, rice. Genomic analyses have divided the grasses into several economically important subfamilies; such as the Ehrhartoideae (rice), the Panicoideae (maize, sorghum, sugarcane and millets), and the Pooideae [1, 2]. The first available plant genome sequence, from the dicot model plant Arabidopsis, was not particularly useful for studying the grass family [3, 4]. The low level of synteny between dicot and monocot plants makes Arabidopsis a poor model system for exploring cereals . Even the advent of the first grass genome sequence, that of rice , was of limited use for studying the traits of temperate crops because rice doesn’t exhibit all agronomically important traits that these temperate grasses exhibit . Clearly, dynamic changes in genome sequences have occurred over the 40–54 Ma (Myr) of evolution that separates rice from wheat [7, 8].
Common or bread wheat (Triticum aestivum L.) has the largest genome of the three major agricultural cereal crops . The hexaploid nature of the bread wheat genome, consisting of the A, B and D genomes, creates technical problems with the sequencing and assembly of the genome. The three homeologous genomes share ~95% sequence similarity. This not only causes problems in assembling wheat genomic sequences but also means that functional redundancy is very likely for any given gene . It is clear, therefore, that a suitable model system would be a major tool for both pure and applied projects in wheat. Brachypodium distachyon (Brachypodium) promises to be just such a system. It is a small temperate grass that is phylogentically closer to “core Pooideae” species than rice  exhibiting higher co-linearity and synteny . Brachypodium has many features that make it an excellent model species for temperate grass crops. It is a diploid species with a small number of chromosomes (n = 5), has a small genome size of about 272 Mb, and has useful biological and physiological features such as short height, a short life cycle, favorable inbreeding traits, has a low amount of repetitive DNA, uses self-pollination, and is easy to grow and maintain, [6, 7, 11–14]. It is estimated that the last common ancestor between Brachypodium and wheat was about 32–39 Myr ago, whereas rice and wheat diverged 40–54 Myr ago and 45–60 Myr separates sorghum and wheat . This is supported by both chloroplast based phylogenetic analysis  and nuclear gene based approaches . This lends support to the use of Brachypodium as a model for wheat as it has a more recent common ancestor than rice or sorghum. Anatomically, based on cell wall type, vegetative branching pattern, root development, and inflorescence branching Brachypodium is a typical grass [12, 16]. The advantages of Brachypodium as a grass model system have already been utilized in deciphering processes such as vernalization and flowering time, seed storage proteins, fatty acid turnover and plant-pathogen interactions [12, 17]. Taken together, these features make Brachypodium potentially a monocot equivalent of Arabidopsis as a model system.
Recently, the genome sequence of Brachypodium has become available  and T-DNA mutant populations have also now been generated that will provide new horizons in gene discovery and gene functionality [18–20]. Not surprisingly, wheat lags behind in comparison, although a draft genome sequence of Chinese spring wheat at 5x genome coverage has recently been announced . However, at 5 x genome coverage we can only expect to have at least one read for about 95% of the genome . This is inadequate to cover the complete genome and the depth of coverage is also inadequate to provide an accurate assembly of the complete set of sequences . We have therefore used published wheat WRKY transcription factors in our analyses until such time as a good assembly of the wheat genome is available.
WRKY transcription factors are one of the ten largest transcription factor families across the green lineage and are involved in signaling webs that regulate important plant processes . This includes the responses to biotic stress, abiotic stress, senescence, and seed development [22–29]. Reports from wheat have already shown the importance of the WRKY family [30–33] and our database will be a useful tool for further studies. Just over ten years ago, the first detailed analysis of the WRKY transcription factor family in Arabidopsis was performed. This study not only named the members of the Arabidposis WRKY family but also subdivided them into Groups I, IIa, IIb, IIc, IId, IIe and III based on their phylogenetic positions and the structures of their WRKY domains . With the advent of a model system for the grasses, we have likewise performed a detailed analysis of the complete WRKY transcription factor family in Brachypodium. We have also produced a database to facilitate further research and to enable comparisons of the Brachypodium WRKY gene family with both the known gene family members in wheat and also other WRKY transcription factors across the green tree of life.
Identification and manual curation of the WRKY transcription factor family from Brachypodium
To produce a robust dataset of WRKY transcription factors from the Brachypodium genome, a modification of the pipeline that was developed to identify transcription factor genes in tobacco gene space sequences was used [34, 35]. The TOBFAC pipeline is a general pipeline that can be used for the identification of all WRKY sequences in a dataset. It was first used with the gene space sequence from tobacco, but is equally good at identifying the WRKY family in any genome sequence. The logic behind this strategy was to develop a method to identify every sequence in the genome that codes for at least part of a WRKY domain (this could be a functional gene or even gene fragments caused by transposon insertion or genome rearrangements). Unlike other methods that typically strive to avoid false positives, this approach seeks to avoid any false negatives and filters out false positives at a later manual curation step. Using this approach, in most genomes a larger number of WRKY sequences are identified than there are current gene models that contain WRKY domains (data not shown). Some of these additional sequences represent what appear to be fully functional WRKY transcription factors, whereas other sequences show the hallmarks of pseudogenes either because they contain in frame stops or frame shifts or because they only encode part of a WRKY domain.
To identify the WRKY family in Brachypodium we used v7.0 of Phytozome. tblastn searches were performed against the JGI 8x assembly release v1.0 of strain Bd21 using a representative WRKY domain from each of the subfamilies of WRKY transcription factors (I, IIa, IIb, IIc, IId, IIe, and III) [34, 35]. This multiple search strategy was combined with a cut off e-value of 10 in order to rigorously ensure that all possible WRKY domain-encoding sequences, however fragmentary, were found. All positive sequences were combined into a single dataset and redundant sequences were removed. Each sequence was then manually curated. For each positive, about 20 kb of genomic sequence around the WRKY domain-encoding region was used in gene prediction programs to validate the gene as a bona fide WRKY gene. We used FGENESH with the monocot plant setting for all potential genes  and additionally GENSCAN  with the maize setting for any genes where FGENESH failed to predict a protein with a complete WRKY domain. Each WRKY transcription factor was given a name and the predicted amino acid and cDNA sequences were incorporated into the data set. We also recorded the genomic coordinates, any gene model associated with the gene, and also whether the gene model appeared to be correctly predicted. Gene models were only scored as incorrect if the genome contains nucleotide sequences that code for a complete WRKY domain but this was not part of the gene model or if the gene model was drastically different from the predictions from both FGENESH and GENSCAN. Only gross differences in exon prediction (in most cases these gene models predicted short proteins that are unlikely to represent full length WRKY transcription factors) were regarded as a mis-prediction. Differences in the predictions of the position of the first ATG codon were common and were not scored as a mis-prediction.
The WRKY transcription factor family from Brachypodium
The WRKY transcription factor family in Brachypodium
Group IIc. Gene model incorrect.
Group IIe, Gene Model incorrect.
Group IIa. Prediction using 40kb sequence for
Bradi4g30360 and Bradi4g30370 together.
Group I. Second domain is truncated.
Group IIc. Gene model incorrect.
Bd2: 52664751 - 52666466
Group IIc. No Gene Model
Group IId Gene Model Incorrect.
Group II d
Group III, Gene Model Incorrect.
Group IIc. No Gene Model
Group III. Gene Model short.
Group IIc. No gene Model.
Group IIc, Gene Model Incorrect.
Group I. Gene Model Short.
One and a half WRKY domains followed by
a FAR1-s domain and a MULE transposon.
Group IIe. No Gene Model
Group IIc. WKKY group
Group IIe. No Gene Model.
Retrotransposon with N-terminal part of WRKY domain.
Group IIc. Gene Model Incorrect
Group IIe. No Gene Model.
Group IIe. No Gene model.
Group IIe. No Gene Model.
Group III. Gene Model Incorrect.
Group III. Gene Model Incorrect
Bd2: 52628591 - 52629373
Group III. Gene Model Incorrect.
Group III.No Gene Model
Brachypodium does not appear to contain any genes encoding chimeric intracellular type-R proteins and WRKY transcription factors (NBS-LRR-WRKY proteins). This is in contrast to several plant species such as Arabidopsis, rice, tobacco and soybean, which each contain at least one such chimeric protein (data not shown).
The Database of Brachypodium distachyon WRKY Transcription Factors
We have constructed a publicly accessible database of Brachypodium WRKY sequences to facilitate research into the roles of the WRKY transcription factor family in Brachypodium (http://www.igece.org/WRKY/BrachyWRKY/BrachyWRKYIndex.html). The database provides a portal to sequence and phylogeny data for the 86 identified WRKY transcription factors. One of the main functions of the database is to aid research in Brachypodium by leveraging information from other plant systems to give insights into the possible roles of Brachypodium WRKY transcription factors. To this end, the database contains a BLAST server that can be used to help identify orthologues of Brachypodium WRKY transcription factors in other plant species. MEME has also been used to identify conserved protein domains in each of the WRKY transcription factors in Brachypodium. This promises to reveal both input and output domains in signaling and facilitate comparisons with functional genomics studies of WRKY transcription factors in other plant systems.
The main page
The individual gene pages
Additional information concerning the gene and the protein that it codes for is also presented. This includes the group to which it belongs, the length, molecular weight and isoelectric point of the predicted protein, the chromosomal location, the gene model, and the cDNA and amino acid sequences. The gene model is a link to the gene model at brachypodium.org . One of the major functions of the database is to facilitate functional studies of the WRKY transcription factors in Brachypodium and to that end both general (regulation of transcription) and specific gene ontology classifications are listed where known.
The identification of ortholgues in other species where extensive research has been performed, such as rice, might give important clues as to the function of each Brachypodium WRKY transcription factor. We have constructed a large dataset of manually curated WRKY transcription factors from the following twenty two sequenced genomes: Brachypodium distachyon, Soybean, Rice (japonica), Arabidopsis thaliana, Medicago truncatula, Physcomitrella patens, Populus trichocarpa, Selaginella moellendorffii, Chlamydomonas reinhardtii, Chlorella vulgaris, Coccomyxa sp. C-169, Micromonas pusilla, Ostreococcus tauri, Ostreococcus lucimarinus, Ostreococcus RCC809, Volvox carteri, Phycomyces blakesleeanus, Rhizopus oryzae, Mucor circinelloides, Dictyostelium discoideum, Dictyostelium purpureum, and Giardia lamblia. This data set is available to search on the WRKY BLAST server and can be used to identify orthologues of each Brachypodium WRKY transcription factor. This will facilitate the integration of data about related WRKY transcription factors from across the green tree of life.
Wheat orthologues of Brachypodium WRKY transcription factors
One of the main reasons for studying Brachypodium is its value as a model system. It is much easier to perform many types of experiments using Brachypodium than it is with other grasses such as wheat. When using Brachypodium as a model system, classification of genes within the grasses based on homologous relationships is important, in particular the identification of orthologues and paralogues [45, 46].
Orthologues are genes that evolved via vertical descent from a single ancestral gene in the last common ancestor of the compared species. Paralogues are genes, which have evolved by duplication of an ancestral gene. Orthology and paralogy are intimately linked because, if a duplication (or a series of duplications) occurs after speciation, orthology becomes a relationship between sets of paralogues, rather than individual genes (in which case, such genes are called co-orthologues) . The identification of ortholgues between Brachypodium and wheat WRKY transcription factors is important because orthologues typically have similar function. Paralogues, however, often exhibit functional diversification after duplication [47–49].
We therefore sought to identify wheat orthologues of the Brachypodium WRKY transcription factors using GenBank wheat accessions. There are currently 71 wheat WRKY transcription factors in the GenBank protein sequence database from various sources . The WRKY BLAST server was used to query the Brachypodium WRKY transcription factor family with each of the wheat sequences to identify possible orthologues. Initially, a combined phylogenetic tree of the 86 Brachypodium and 71 wheat proteins was also constructed that suggested possible orthologous/paralogous groups (data not shown). To better resolve the homologous relationships between the WRKY transcription factors, the phylogram in Figure 3 was produced that contains the complete WRKY transcription factor families from Brachypodium, rice, Arabidopsis, and Physcomitrella patens, together with the published WRKY transcription factors from wheat (Figure 3 and Additional file 1: Figure S1). The WRKY domain from a WRKY transcription factor found in a fungus belonging to the Zygomycete class, Mucor circinelloides (scaffold_3:4086226–4087418 fgeneshMC_pg.3_#_1249), was included as a distant root. The phylogram facilitates the identification of orthologues, paralogues, and in some cases co-orthologues. Some caution is, however, required when interpreting these data because the coverage of wheat WRKY transcription factors is incomplete and some available sequences are fragmentary. In addition, the hexaploid nature of the wheat genome compared to the diploid Brachypodium genome also complicates interpretation. Figure 3 and Additional file 1: Figure S1 suggest that most wheat WRKY transcription factors have clear orthologues or co-orthologues in Brachypodium. One exception is the wheat protein TaWRKY8 that forms a distinct clade with rice OsWRKY6. These two WRKY transcription factors appear to represent early branching Group IId genes (Additional file 1: Figure S1). No Brachypodium orthologue is present in this clade.
Comparison of Brachypodium WRKY transcription factor data sets from various databases
Several groups have attempted to characterize the WRKY transcription factor family in Brachypodium. Compared to our data set of 86 transcription factors, our analyses show that the Plant Transcription Factor Database (PlantTFDB)  predicts a total of 72 genes including two pseudogenes (Additional file 2: Figure S2, Additional file 3: Table S1). The PlantTFDB database lists 78 genes but there are some duplicates. The Grass Regulatory Information Server (Grassius) predicts 82 WRKY genes . This actually represents 81 individual WRKY transcription factors as one gene appears to be duplicated Additional file 4: Table S2. The five transcription factors missing from Grassius are BdWRKY52, BdWRKY69, BdWRKY73, BdWRKY75, and BdWRKY83. It appears that these missing genes are hard to identify because none of the five are represented by a gene model. In the case of BdWRKY75, this lack of detection could be because the genome in this region does not code for a complete WRKY domain. BdWRKY75 is an apparent pseudogene with the sequences that code for the C-terminal part of the WRKY domain absent. Retrotransposon sequences are adjacent to the gene suggesting a mechanism whereby a functional gene has become non functional as a result of retrotransposon activity and concomitant genome rearrangements. Recently, 81 Brachypodium WRKY transcription factors have been annotated using the NCBI automated computational analysis pipeline. The pipeline annotates genes using both (1) reference sequence (RefSeq) transcript alignments and (2) Gnomon prediction in those regions not covered by RefSeq alignments. Using this approach, 75 WRKY transcription factors were annotated (Additional file 3: Table S1). The eleven missing genes are BdWRKY8, BdWRKY9, BdWRKY15, BdWRKY27, BdWRKY29, BdWRKY43, BdWRKY44, BdWRKY62, BdWRKY66, BdWRKY76, and BdWRKY86. Interestingly, no WRKY transcription factor is missing in both the Grassius and NCBI data sets, showing that there is independent validation of all of the genes in our data set in at least one other database.
In conclusion, our pipeline has produced the most comprehensive set of WRKY transcription factors that is currently available in Brachypodium. It was able not only to identify genes that are not represented by gene models, but also fragmentary pseudogenes and all members of tandemly repeated WRKY genes.
The WRKY transcription factor family in Brachypodium
The WRKY transcription factor family in Brachypodium (Figure 3) is similar to the typical WRKY family in flowering plants with a division into Groups I, IIa + IIb, IIc, IId + IIe and III (Additional file 4: Table S2) . Over the last dozen years, the original phylogenetic classification of Eulgem et al.  has proven to be robust. The one major modification came from the work of Zhang and Wang who modified the original Groups I, IIa, IIb, IIc, IId, IIe and III into Groups I, IIa + IIb, IIc, IId + IIe and III . This accurately reflects the evolution of the WRKY family and has been verified in a number of species including several monocots such as maize , barley , and rice [22, 56]. These analyses also are consistent with some of the findings of Babu et al. that used a larger data set . The Brachypodium WRKY family also shows characteristics of other monocot species such as rice with a lineage-specifc radiation in Group III. For example, Arabidopsis and Brachypodium both contain three Group IIa WRKY transcription factors but, in contrast, Brachypodium has almost twice the number of Group III WRKY transcription factors (23 compared to 14 in Arabidopsis). The mechanisms responsible for this lineage-specific expansion are unclear, but our studies of the BdWRKY10/BdWRKY15/BdWRKY29/BdWRKY86 cluster on chromosome 4 and the BdWRKY8/BdWRKY9/BdWRKY84/BdWRKY85 cluster on chromosome 2 (Figures 4 and 5) suggest that this expansion is at least partly due to the formation of tandem repeats of paralogous Group III genes. Interestingly BdWRKY10, BdWRKY15, BdWRKY29, and BdWRKY86 are atypical Group III WRKY transcription factors as they all contain a 9–10 amino acid extended region in the zinc finger part of the WRKY domain (Figure 6). A small number of similar WRKY transcription factors with extended WRKY domains in this region of the zinc finger are also found in rice and sorghum, suggesting that this is a feature of some monocot species (data not shown).
The WRKY transcription factor family in wheat
The currently available data set of wheat WRKY transcription factors is fragmentary but comparisons with the WRKY family in Brachypodium are already informative and have consequences for both the identification of orthologous genes and the use of Brachypodium as a model system for wheat. From our data, it is clear that most wheat WRKY transcription factors have an orthologue in Brachypodium (Figure 3 and Additional file 1: Figure S1). However, the identification of orthologues or co-orthologues is complicated by the incomplete coverage of wheat WRKY transcription factors and the fragmentary nature of some available sequences (TaWRKY10 and TaWRKY11 are not full length sequences, for example). In addition, the hexaploid nature of the wheat genome compared to the diploid Brachypodium genome also complicates interpretation. A good example of this is the clade of wheat Group III WRKY transcription factors consisting of TaWRKY10, TaWRKY45A, TaWRKY45B, TaWRKY45D, and TaWRKY11 (Figure 9). It is clear from domain structure and the phylogram that these five WRKY transcription factors together with the other members of this clade probably descended from an ancestral gene with a motif 3-9-7-1-2-4-5-5-like domain structure at the protein level. The presence of only a single transcription factor of this type in the genomes of maize, rice, switchgrass, foxtail millet, and sorghum suggest that the genes in these species all descended from the last common ancestor by vertical inheritance. After lineage-specific radiation in wheat and Brachypodium, a set of orthologues and co-orthologues was formed in these species. Given that orthologues typically have similar function, it is likely that many of the thirteen WRKY transcription factors in this clade play similar roles in plants. Interestingly, OsWRKY45 is up-regulated by several different abiotic stresses, including high salt, water stress, and heat , suggesting that one role of these WRKY transcription factors may be in the regulation of abiotic stress responses. Recently, direct information about the possible roles of TaWRKY10 and TaWRKY11 was presented . TaWRKY10 is up-regulated by cold and wounding, whereas TaWRKY11 is up-regulated by cold, wounding and ABA. This gives further support to the suggestion that this clade of grass WRKY transcription factors regulate abiotic stress responses. By contrast, Additional file 1: Figure S1 shows an example of lineage-specific radiation in Arabidopsis. The ABA-hypersensitive mutant, abo3, is caused by a T-DNA insertion in AtWRKY63 (At1g66600). The abo3 mutant is hypersensitive to ABA in both seedling establishment and seedling growth. In addition, stomatal closure is less sensitive to ABA . However, finding orthologues of AtWRKY63 in other plants, such as soybean, is not possible because the transcription factor forms part of a lineage-specific radiation that appears specific to either the Brassicaceae family or indeed to Arabidopsis itself (Additional file 1: Figure S1). AtWRKY63 is found in a separate clade within Group III that consists only of the Arabidopsis WRKY transcription factors AtWRKY38, AtWRKY62, AtWRKY63, AtWRKY64, AtWRKY66, and AtWRKY67. The situation with these six WRKY transcription factors is obviously complex as two are found on chromosome 5 and the remaining four on chromosome 1.
The Database of Brachypodium distachyon WRKY Transcription Factors
The major output of our analyses of the Brachypodium WRKY transcription factor family is The Database of Brachypodium distachyon WRKY Transcription Factors (http://www.igece.org/WRKY/BrachyWRKY/BrachyWRKYIndex.html). Our aim is to make this knowledgebase a repository for all information pertaining to WRKY transcription factor research in Brachypodium. The database has tools to facilitate the identification of wheat orthologues of each of the Brachypodium WRKY transcription factors with a BLAST server allowing the Brachypodium data set to be queried with new wheat sequences as they become available. These tools will facilitate cross species analyses of WRKY transcription factor function in the grasses.
The BLAST server also allows searching of a large dataset of manually curated WRKY transcription factors that we have constructed from twenty two sequenced genomes from the green tree of life and beyond. This will allow the integration of wet lab data from well-established systems such as Arabidopsis and rice into experimental design and data analysis in Brachypodium. These comparisons, as well as being useful tools for designing experimental strategies, will also start to provide answers concerning the similarities and differences in WRKY transcription factor function across the plant kingdom.
The description of the WRKY family in Brachypodium that we report here provides a framework not only for functional genomics studies of WRKY transcription factors in an important model system, but also identifies orthologues, and co- orthologues in wheat. This will facilitate translational genomics where orthologous Brachypodium WRKY transcription factors will give insights into transcription factor function in wheat. Our database will be a resource for both Brachypodium and wheat studies and ultimately projects aimed at improving wheat through manipulation of WRKY transcription factor function. The total of 86 WRKY transcription factors presented here is higher than other databases and is likely to be close to the true number of WRKY transcription factors in the genome. We therefore propose that the numbering system that we have established (BdWRKY1-BdWRKY86) becomes the standard nomenclature for future work on the Brachypodium WRKY transcription factor family.
Identification and manual curation of the Brachypodium WRKY transcription factor family
To identify the WRKY family in Brachypodium a modification of the TOBFAC pipeline was used. tblastn searches were performed against the JGI 8x assembly release v1.0 of strain Bd21 with JGI/MIPS PASA annotation using a representative WRKY domain from each of the subfamilies of WRKY transcription factors (I, IIa, IIb, IIc, IId, IIe, and III) [34, 35]. The e-value was set to 10 to ensure that all potential WRKY domain-encoding sequences, however diverse or fragmentary, were discovered. All hits were obtained in October 2011 and were pooled into a single data set before duplicate sequences were removed. Each potential gene was then manually curated using both FGENESH  and GENSCAN  gene predictions and also BLAST searches  against published WRKY transcription factors. The two gene prediction programs and the BLAST searches enabled not only a better prediction of the intron-exon boundaries in the WRKY domain-encoding sequences, but also increased reliability in the prediction of the ATG start codon than many of the short gene models (although an accurate prediction of the start of translation remains difficult in some cases in the absence of reliable EST data). No one gene prediction program was better and sometimes the two programs disagreed. We used the result or results that included a complete WRKY domain because any program that didn’t predict it will normally be wrong except in the case of a frame shift. Adjacent transposons and also pseudogenes were also identified by this pipeline and false positives were removed. The final list of WRKY transcription factors was then tabularized and predicted full length cDNA and amino acid sequences were produced. The genome location of each gene was carefully recorded to facilitate future modifications to the gene predictions.
Phylogenetic analysis of the Brachypodium WRKY family
Phylogenetic and molecular evolutionary analyses of the WRKY family were conducted using MEGA versions 4 and 5 [37, 40]. The amino acid sequences of the WRKY domains were used to construct multiple sequence alignments using CLUSTAL. Where necessary, multiple sequence alignments were manually adjusted to optimize the alignments. Short partial domains from possible pseudogenes were discarded. Phylogenetic trees were produced by the neighbor-joining method (settings: gaps/missing, pairwise deletion; model, amino number of differences; substitutions to include, all; pattern among lineages, same; rates among sites, uniform). Statistical support for the nodes in the phylogenetic trees (bootstrap values from 1,000 trials) were obtained for each tree. For each figure, the bootstrap consensus tree is presented. For the phylogenetic analysis of the Group III WRKY transcription factors (Figure 9), the complete amino acid sequences of the proteins were used.
Analysis for conserved motifs in the WRKY proteins was carried out using MEME (http://meme.sdsc.edu/meme/cgi-bin/meme.cgi) . It was observed that most conserved domains are limited to a single subfamily of WRKY transcription factors and therefore MEME analyses were run for the members of each subfamily using the full length proteins. The settings were; any number of repetitions of a single motif, minimum width of a motif six amino acids, maximum width of a motif eighty amino acids, maximum number of motifs to find twelve.
The test instance of the database is located at: http://nim.vbi.vt.edu/BrachyWRKY/, and the developmental instance of the database is located at: http://systemsbiology.usm.edu/BrachyWRKY/. These instances will be consistently improved over time, with the production instance being the most mature version of the knowledgebase systems.
Comparison with grassius, plantTFDB and NCBI databases
Annotation and comparison of wheat WRKY transcription factors
The seventy one published wheat WRKY accessions were downloaded from NCBI  (November 2011). After eliminating redundant sequences, seventy one transcription factors were left and the amino acid sequences of the transcription factors that contained complete WRKY domains were used to construct a combined phylogenetic tree containing the WRKY transcription factor family from Brachypodium, Arabidopsis, rice, and Physcomitrella patens, together with the published WRKY transcription factors from wheat. Potential wheat orthologues of Brachypodium WRKY transcription factors were also validated by BLAST searches against our dataset of Brachypodium genes using the BLAST server at The database of Brachypodium WRKY Transcription Factors. The Group III WRKY transcription factors from maize, sorghum, switchgrass, and foxtail millet were identified by searching the genome sequences in Phytozome.
We are thankful to Drs Senthil Subramanian and Yajun Wu for critical reading of the manuscript. This project was supported by National Research Initiative grants 2008-35100-04519 and 2008-35100-05969 from the USDA National Institute of Food and Agriculture. Research in the Rushton laboratory is also supported by The United Soybean Board, The Consortium for Plant Biotechnology Research, The South Dakota Soybean Research and Promotion Council and The North Central Soybean Research Program. We thank Dr. Chaoyang Zhang at the School of Computing, the University of Sothern Mississippi, and Dr. Josep Bassaganya-Riera at the Virginia Bioinformatics Institute at Virginia Tech for providing computational infrastructure for this project’s development and execution.
- The International Brachypodium Initiative: Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010, 463 (7282): 763-768. 10.1038/nature08747.View Article
- Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H: A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science. 2002, 296 (5565): 92-100. 10.1126/science.1068275.View ArticlePubMed
- Bennetzen JL, SanMiguel P, Chen M, Tikhonov A, Francki M, Avramova Z: Grass genomes. Proc Natl Acad Sci USA. 1998, 95 (5): 1975-1978. 10.1073/pnas.95.5.1975.PubMed CentralView ArticlePubMed
- Devos KM, Beales J, Nagamura Y, Sasaki T: Arabidopsis-rice: will colinearity allow gene prediction across the eudicot-monocot divide?. Genome Res. 1999, 9 (9): 825-829. 10.1101/gr.9.9.825.PubMed CentralView ArticlePubMed
- Spannagl M, Mayer K, Durner J, Haberer G, Frohlich A: Exploring the genomes: from Arabidopsis to crops. J Plant Physiol. 2011, 168 (1): 3-8. 10.1016/j.jplph.2010.07.008.View ArticlePubMed
- Draper J, Mur LA, Jenkins G, Ghosh-Biswas GC, Bablak P, Hasterok R, Routledge P: Brachypodium distachyon. A new model system for functional genomics in grasses. Plant Physiol. 2001, 127 (4): 1539-1555. 10.1104/pp.010196.PubMed CentralView ArticlePubMed
- Kumar S, Mohan A, Balyan HS, Gupta PK: Orthology between genomes of Brachypodium, wheat and rice. BMC Res Notes. 2009, 2: 93-10.1186/1756-0500-2-93.PubMed CentralView ArticlePubMed
- Garvin DF: Brachypodium: a new monocot model plant system emerges. J Sci Food Agric. 2007, 87 (7): 1177-1179. 10.1002/jsfa.2868.View Article
- Bhalla PL: Genetic engineering of wheat–current challenges and opportunities. Trends Biotechnol. 2006, 24 (7): 305-311. 10.1016/j.tibtech.2006.04.008.View ArticlePubMed
- Fu D, Uauy C, Blechl A, Dubcovsky J: RNA interference for wheat functional gene analysis. Transgenic Res. 2007, 16 (6): 689-701. 10.1007/s11248-007-9150-7.View ArticlePubMed
- Wolny E, Lesniewska K, Hasterok R, Langdon T: Compact genomes and complex evolution in the genus Brachypodium. Chromosoma. 2011, 120 (2): 199-212. 10.1007/s00412-010-0303-8.View ArticlePubMed
- Bevan MW, Garvin DF, Vogel JP: Brachypodium distachyon genomics for sustainable food and fuel production. Curr Opin Biotechnol. 2010, 21 (2): 211-217. 10.1016/j.copbio.2010.03.006.View ArticlePubMed
- Huo N, Gu YQ, Lazo GR, Vogel JP, Coleman-Derr D, Luo MC, Thilmony R, Garvin DF, Anderson OD: Construction and characterization of two BAC libraries from Brachypodium distachyon, a new model for grass genomics. Genome. 2006, 49 (9): 1099-1108. 10.1139/g06-087.View ArticlePubMed
- Faris JD, Zhang Z, Fellers JP, Gill BS: Micro-colinearity between rice, Brachypodium, and Triticum monococcum at the wheat domestication locus Q. Funct Integr Genomics. 2008, 8 (2): 149-164. 10.1007/s10142-008-0073-z.View ArticlePubMed
- Bortiri E, Coleman-Derr D, Lazo GR, Anderson OD, Gu YQ: The complete chloroplast genome sequence of Brachypodium distachyon: sequence comparison and phylogenetic analysis of eight grass plastomes. BMC Res Notes. 2008, 1: 61-10.1186/1756-0500-1-61.PubMed CentralView ArticlePubMed
- Doust A: Architectural evolution and its implications for domestication in grasses. Ann Bot. 2007, 100 (5): 941-950. 10.1093/aob/mcm040.PubMed CentralView ArticlePubMed
- Opanowicz M, Vain P, Draper J, Parker D, Doonan JH: Brachypodium distachyon: making hay with a wild grass. Trends Plant Sci. 2008, 13 (4): 172-177. 10.1016/j.tplants.2008.01.007.View ArticlePubMed
- Pacurar DI, Thordal-Christensen H, Nielsen KK, Lenk I: A high-throughput Agrobacterium-mediated transformation system for the grass model species Brachypodium distachyon L. Transgenic Res. 2008, 17 (5): 965-975. 10.1007/s11248-007-9159-y.View ArticlePubMed
- Vain P, Worland B, Thole V, McKenzie N, Alves SC, Opanowicz M, Fish LJ, Bevan MW, Snape JW: Agrobacterium-mediated transformation of the temperate grass Brachypodium distachyon (genotype Bd21) for T-DNA insertional mutagenesis. Plant Biotechnol J. 2008, 6 (3): 236-245. 10.1111/j.1467-7652.2007.00308.x.View ArticlePubMed
- Vogel J, Hill T: High-efficiency Agrobacterium-mediated transformation of Brachypodium distachyon inbred line Bd21-3. Plant Cell Rep. 2008, 27 (3): 471-478. 10.1007/s00299-007-0472-y.View ArticlePubMed
- CerealsDB. uk.net http://www.cerealsdb.uk.net/
- Rushton PJ, Somssich IE, Ringler P, Shen QXJ: WRKY transcription factors. Trends Plant Sci. 2010, 15 (5): 247-258. 10.1016/j.tplants.2010.02.006.View ArticlePubMed
- Eulgem T, Rushton PJ, Robatzek S, Somssich IE: The WRKY superfamily of plant transcription factors. Trends Plant Sci. 2000, 5 (5): 199-206. 10.1016/S1360-1385(00)01600-9.View ArticlePubMed
- Ren XZ, Chen ZZ, Liu Y, Zhang HR, Zhang M, Liu QA, Hong XH, Zhu JK, Gong ZZ: ABO3, a WRKY transcription factor, mediates plant responses to abscisic acid and drought tolerance in Arabidopsis. Plant J. 2010, 63 (3): 417-429. 10.1111/j.1365-313X.2010.04248.x.PubMed CentralView ArticlePubMed
- Li S, Zhou X, Chen L, Huang W, Yu D: Functional characterization of Arabidopsis thaliana WRKY39 in heat stress. Mol Cells. 2010, 29 (5): 475-483. 10.1007/s10059-010-0059-2.View ArticlePubMed
- Qiu D, Xiao J, Xie W, Cheng H, Li X, Wang S: Exploring transcriptional signalling mediated by OsWRKY13, a potential regulator of multiple physiological processes in rice. BMC Plant Biol. 2009, 9: 74-10.1186/1471-2229-9-74.PubMed CentralView ArticlePubMed
- Narusaka Y, Narusaka M, Seki M, Umezawa T, Ishida J, Nakajima M, Enju A, Shinozaki K: Crosstalk in the responses to abiotic and biotic stresses in Arabidopsis: analysis of gene expression in cytochrome P450 gene superfamily by cDNA microarray. Plant Mol Biol. 2004, 55 (3): 327-342. 10.1007/s11103-004-0685-1.View ArticlePubMed
- Mare C, Mazzucotelli E, Crosatti C, Francia E, Stanca AM, Cattivelli L: Hv-WRKY38: a new transcription factor involved in cold- and drought-response in barley. Plant Mol Biol. 2004, 55 (3): 399-416. 10.1007/s11103-004-0906-7.View ArticlePubMed
- Rushton DL, Tripathi P, Rabara RC, Lin J, Ringler P, Boken AK, Langum TJ, Smidt L, Boomsma DD, Emme NJ: WRKY transcription factors: key components in abscisic acid signalling. Plant Biotechnol J. 2012, 10 (1): 2-11. 10.1111/j.1467-7652.2011.00634.x.View ArticlePubMed
- Proietti S, Bertini L, Van der Ent S, Leon-Reyes A, Pieterse CM, Tucci M, Caporale C, Caruso C: Cross activity of orthologous WRKY transcription factors in wheat and Arabidopsis. J Exp Bot. 2011, 62 (6): 1975-1990. 10.1093/jxb/erq396.PubMed CentralView ArticlePubMed
- Talanova VV, Titov AF, Topchieva LV, Malysheva IE, Venzhik YV, Frolova SA: Expression of WRKY Transcription Factor and Stress Protein Genes in Wheat Plants during Cold Hardening and ABA Treatment. Russ J Plant Physl. 2009, 56 (5): 702-708. 10.1134/S1021443709050173.View Article
- Wu HL, Ni ZF, Yao YY, Sun QX, Guo GG: Cloning and expression profiles of 15 genes encoding WRKY transcription factor in wheat (Triticum aestivem L.). Prog Nat Sci. 2008, 18 (6): 697-705. 10.1016/j.pnsc.2007.12.006.View Article
- Talanova VV, Titov AF, Topchieva LV, Malysheva IE, Venzhik YV, Frolova SA: Expression of genes encoding the WRKY transcription factor and heat shock proteins in wheat plants during cold hardening. Dokl Biol Sci. 2008, 423: 440-442. 10.1134/S0012496608060215.View ArticlePubMed
- Rushton PJ, Bokowiec MT, Laudeman TW, Brannock JF, Chen X, Timko MP: TOBFAC: the database of tobacco transcription factors. BMC Bioinforma. 2008, 9: 53-10.1186/1471-2105-9-53.View Article
- Rushton PJ, Bokowiec MT, Han S, Zhang H, Brannock JF, Chen X, Laudeman TW, Timko MP: Tobacco transcription factors: novel insights into transcriptional regulation in the Solanaceae. Plant Physiol. 2008, 147 (1): 280-295. 10.1104/pp.107.114041.PubMed CentralView ArticlePubMed
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.View ArticlePubMed
- The GENSCAN Web Server at MIT.http://genes.mit.edu/GENSCAN.html,
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol. 2011, 28 (10): 2731-2739. 10.1093/molbev/msr121.PubMed CentralView ArticlePubMed
- The Gene Index Project.http://compbio.dfci.harvard.edu/cgi-bin/tgi/gimain.pl?gudb=wheat,
- wDBTF. inventory Database of Wheat Transcription Factor.http://wwwappli.nantes.inra.fr:8180/wDBFT/,
- The MEME Suite.http://meme.sdsc.edu/meme/intro.html,
- Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004, 5 (2): R7-10.1186/gb-2004-5-2-r7.PubMed CentralView ArticlePubMed
- Koonin EV: Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005, 39: 309-338. 10.1146/annurev.genet.39.073003.114725.View ArticlePubMed
- Pauling L, Zuckerkandl E: Chemical paleogenetics. Molecular "restoration studies" of extinct forms of life. Acta Chem Scand. 1963, 17: S9-S16.View Article
- Ohno S: Evolution by Gene Duplication. 1970, Berlin-Heidelberg-New York: SpringerView Article
- Lynch M, Force A: The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000, 154: 459-473.PubMed CentralPubMed
- National Center for Biotechnology Information.http://www.ncbi.nlm.nih.gov/,
- Plant Transcription Factor Database.http://planttfdb.cbi.edu.cn/index.php?sp=Bd,
- Zhang Y, Wang L: The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants. BMC Evol Biol. 2005, 5: 1-10.1186/1471-2148-5-1.PubMed CentralView ArticlePubMed
- Wei KF, Chen J, Chen YF, Wu LJ, Xie DX: Molecular phylogenetic and expression analysis of the complete WRKY Transcription factor family in maize. DNA Res. 2012, 10.1093/dnares/dsr048.
- Wanke D: Phylogenetic and comparative gene expression analysis of barley (Hordeum vulgare) WRKY transcription factor family reveals putatively retained functions between monocots and dicots. BMC Genomics. 2008, 9: 194-10.1186/1471-2164-9-194.PubMed CentralView ArticlePubMed
- Wu KL, Guo ZJ, Wang HH, Li J: The WRKY family of transcription factors in rice and Arabidopsis and their origins. DNA Res. 2005, 12 (1): 9-26. 10.1093/dnares/12.1.9.View ArticlePubMed
- Babu MM, Iyer LM, Balaji S, Aravind L: The natural history of the WRKY-GCM1 zinc fingers and the relationship between transcription factors and transposons. Nucleic Acids Res. 2006, 34 (22): 6505-6520. 10.1093/nar/gkl888.PubMed CentralView ArticlePubMed
- Qiu Y, Jing S, Fu J, Li I, Yu D: Cloning and analysis of expression profile of 13 WRKY genes in rice. Chin Sci Bull. 2004, 49 (20): 2159-2168.
- Niu C-F, Wei W, Zhou Q-Y, Tian A-G, Hao Y-J, Zhang W-K, Ma B, Lin Q, Zhang Z-B, Zhang J-S, Chen S-Y: Wheat WRKY genes TaWRKY2 and TaWRKY19 regulate abiotic stress tolerance in transgenic Arabidopsis plants. Plant Cell Environ. 2012, 35 (6): 1156-1170. 10.1111/j.1365-3040.2012.02480.x.View ArticlePubMed
- The Apache Software Foundation.http://www.apache.org/,
- Yilmaz A, Nishiyama MY, Fuentes BG, Souza GM, Janies D, Gray J, Grotewold E: GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol. 2009, 149 (1): 171-180. 10.1104/pp.108.128579.PubMed CentralView ArticlePubMed
- Zhang H, Jin J, Tang L, Zhao Y, Gu X, Gao G, Luo J: PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database. Nucleic Acids Res. 2011, 39 (Database issue): D1114-D1117.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.