- Research article
- Open Access
Classification and phylogenetic analyses of the Arabidopsis and tomato G-type lectin receptor kinases
BMC Genomics volume 19, Article number: 239 (2018)
Pathogen perception by plants is mediated by plasma membrane-localized immune receptors that have varied extracellular domains. Lectin receptor kinases (LecRKs) are among these receptors and are subdivided into 3 classes, C-type LecRKs (C-LecRKs), L-type LecRKs (L-LecRKs) and G-type LecRKs (G-LecRKs). While C-LecRKs are represented by one or two members in all plant species investigated and have unknown functions, L-LecRKs have been characterized in a few plant species and have been shown to play roles in plant defense against pathogens. Whereas Arabidopsis G-LecRKs have been characterized, this family of LecRKs has not been studied in tomato.
This investigation updates the current characterization of Arabidopsis G-LecRKs and characterizes the tomato G-LecRKs, using LecRKs from the monocot rice and the basal eudicot columbine to establish a basis for comparisons between the two core eudicots. Additionally, revisiting parameters established for Arabidopsis nomenclature for LecRKs is suggested for both Arabidopsis and tomato. Moreover, using phylogenetic analysis, we show the relationship among and between members of G-LecRKs from all three eudicot plant species. Furthermore, investigating presence of motifs in G-LecRKs we identified conserved motifs among members of G-LecRKs in tomato and Arabidopsis, with five present in at least 30 of the 38 Arabidopsis members and in at least 45 of the 73 tomato members.
This work characterized tomato G-LecRKs and added members to the currently characterized Arabidopsis G-LecRKs. Additionally, protein sequence analysis showed an expansion of this family in tomato as compared to Arabidopsis, and the existence of conserved common motifs in the two plant species as well as conserved species-specific motifs.
In the constant war against pathogens, plants are equipped with a surveillance system that relies on pattern-recognition receptors (PRRs), proteins localized at the plasma membrane with ectodomains, that screen the environment for conserved microbial or pest- and damage-associated signals. In addition to the ectodomain, a subgroup of these PRRs has intracellular kinase domains and are therefore known as receptor kinases (RKs). Plant RKs have undergone a recent expansion, with the Arabidopsis thaliana (Arabidopsis) genome encoding more than 600 RKs . According to their ectodomains, RKs can be further classified into specific subgroups, such as leucine-rich repeat RKs (LRR-RKs) and lectin RKs (LecRKs). Receptor kinases are involved in several cellular processes, from adaptation to abiotic stresses to defense responses against pathogens and pests and interactions with microbial symbionts [2,3,4,5,6,7,8,9,10,11,12,13,14]. Several RKs and their cognate elicitor pairs have been described, mostly engaging the recognition of bacteria and fungi by plants [14,15,16].
The best-characterized PRR-elicitor pair is the Arabidopsis LRR-RK FLS2 (FLAGELLIN SENSITIVE2) and the peptide flg22, consisting of a stretch of 22 amino acids of the N-terminal bacterial flagellin . In addition to Arabidopsis, FLS2 orthologs have been identified in several plant species including tomato (Solanum lycopersicum), grapevine (Vitis vinifera), rice (Oryza sativa) and citrus (Citrus paradisi, C. reticulata and Fortunella margarita) [18,19,20,21]. Interestingly in tomato, a flagellin-derived peptide distinct from flg22, flgII-28, is perceived by the LRR-RK FLS3, and similar to FLS2, its perception and downstream signaling requires a second LRR-RK, BAK1 (BRASSINOSTEROID INSENSITIVE 1-ASSOCIATED KINASE 1) [22, 23]. Other receptor-ligand pairs include chitin perception by the lysin-motif RK (LysM-RKs) LYK5 and xylanase perception by the LRR-RK EIX2 [24, 25]. Although a co-receptor has not been characterized for xylanase perception, chitin perception requires participation of the LysM-RK CERK1 (CHITIN ELICITOR RECEPTOR KINASE 1) . Interestingly, chitin perception in rice is mediated by the LysM-receptor like protein CEBiP (CHITIN ELICITOR BINDING PROTEIN), which lacks a kinase domain and relies on its co-receptor CERK1 for kinase signaling of chitin perception .
The LecRKs are a second type of receptors known for their role in binding various carbohydrates . Based on their ectodomains, LecRKs can be classified into C-type, L-type or G-type (Fig. 1). The C-type (calcium-dependent) LecRKs (C-LecRK) contain the C-type motif that is commonly found in several proteins from mammals, and these proteins have been shown to have a role in innate immunity [10, 28, 29]. Interestingly, in plants this LecRK group is represented by only a single gene in Arabidopsis, rice, and tomato, and two genes in wheat (Triticum aestivum) [28, 30, 31].
The L-type (legume-like) LecRKs (L-LecRKs) consist of large number of family members and have well-characterized roles in plant defense. Investigations in Arabidopsis, rice, tomato, Nicotiana benthamiana and wheat revealed 45, 72, 22, 37, 84 members of this family, respectively [28, 30, 31]. Several reports link genes of this family to defense against pathogens; for example, AtLecRK-I.9 against the bacterial pathogen Pseudomonas syringae pv. tomato (Pst) , AtLecRK-IX.I and LecRK-IX.2 against the pathogenic oomycetes Phytophthora brassicae and P. capsici , AtLecRK-I.9 against P. infestans , and AtLecRK-VI.2 against the pathogenic bacteria P. syringae and Pectobacterium carotovorum [13, 35]. Additionally, L-LecRKs have been implicated in perception of the danger molecule, the extracellular ATP, by the AtLecRK-I.9 [36, 37].
The G-type LecRKs (G-LecRKs) are proteins with an ectodomain that resembles the Galanthus nivalis agglutinin (GNA) mannose-binding motif [10, 38]. Since the discovery of the first G-LecRK, GNA, in 1987, this group has been renamed and refined several times . The specificity of GNA for binding mannose differentiated it from the mannose/glucose specificity of L-LecRKs and led other proteins of this group to be generically called alpha-D-mannose-specific lectins [39, 40]. Soon after the discovery of GNA, several studies targeting other members of Amaryllidaceae, along with Orchidaceae and Alliaceae discovered more mannose-specific lectins [41, 42]. Based on the apparently narrow taxonomic occurrence of the proteins at the time, the group was then renamed monocot mannose-binding lectins. Early amino acid sequence-based analyses also began to suggest that these proteins might be encoded by a large multigene family . Subsequently, similar proteins were discovered in both liverwort (Marchantia polymorpha) and yew (Taxus media) necessitating another renaming to the more general GNA-related lectins [44, 45]. Because GNA was first isolated from Galanthus nivalis bulbs and other G-LecRKs were present in high amounts in bulbs of other species, this group has also been called Bulb- or B-type lectins. This name has been widely adopted because it is used to identify this domain in both the PROSITE and Pfam databases (PS50927 and PF01453, respectively). Despite the widespread use of the “B-type” descriptor in databases and genome annotations, GNA-related or G-type lectin is currently the preferred term for these proteins.
Previous investigations identified 32 members of G-LecRKs in Arabidopsis, 100 in rice and 177 in wheat [28, 31]. The best-known members of this group are the S-locus (S-locus glycoprotein/SLG containing) RKs, known for their role in self-incompatibility in flowering plants [46, 47]. Besides the G-type lectin and the kinase domains, G-LecRKs can have additional domains such as a cysteine-rich domain (Fig. 1), known as the epidermal growth factor (EGF) domain, which is thought to play a role in disulfide bond formation . Additionally, family members may contain the plasminogen-apple-nematode (PAN) motif, which likely has a role in protein-protein or protein-carbohydrate interactions (Fig. 1) .
Typically, members of large families do not have consistent nomenclature, as frequently not all members are identified at the same time, and gains and losses in different species make identification of orthologous genes and gene groups difficult. While the Arabidopsis L-LecRK family members have a clear systematic nomenclature based on chromosome location and amino acid and nucleotide identity , currently members of the G-LecRKs do not have such nomenclature. Similarly, although L-LecRKs have been characterized in different plant species [28, 30, 31], tomato G-LecRKs have not been described to date. Given the role of these proteins in defense, an understanding of their diversity in tomato is critical to improving crop resistance to diseases and pests.
For this work, we searched the genomes of Arabidopsis and tomato to identify and characterize G-LecRKs, and performed phylogenetic analyses on the aligned sequences, using C-LecRKs and L-LecRKs as outgroups. To draw inferences regarding expansion vs loss within gene clades, we included G-LecRKs from the monocot rice and the basal eudicot columbine (Aquilegia coerulea), which diverged prior to the origin of the core eudicot clade that includes both Arabidopsis and tomato and thus serves to polarize the evolutionary trends. The analyses allowed identification of incorrect gene annotations in genome databases, and evaluation of sequence similarity between G-LecRKs and identification of instances of gene clade expansion or gene loss in Arabidopsis and tomato. Based on this investigation, we suggest a nomenclature for members of this gene family from both Arabidopsis and tomato.
Database searches, protein domain and genome organization
To identify Arabidopsis (Aarabidopsis thaliana) G-LecRKs a first search was performed using the lectin domain of At1g61550 as the query followed by the lectin domain of At1g61400, At2g19130, At4g21390 and At5g60900 for a second search in The Arabidopsis Information Resource (TAIR) (http://arabidopsis.org) website. Results with e-value < 0.1 were considered G-LecRKs candidates. The localization of G-LecRKs on the Arabidopsis genome was visualized using the chromosomal map tool from TAIR (http://arabidopsis.org/jsp/ChromosomeMap/tool.jsp). Arabidopsis predicted kinase domain sequences were aligned using ClustalW and the alignment was manually checked to identify the kinase subdomains using AtLecRK-VI.2 as a reference [13, 30].
To identify tomato (Solanum lycopersicum) G-LecRKs, the At1g61550 lectin domain was used as the query in the Sol Genomics Network (SGN) (https://solgenomics.net) and at the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/) websites. Results with e-value < 0.1 were considered G-LecRKs candidates. After the initial search using the At1g61550 G-lectin domain, a second search was performed in NCBI and both searches were cross analyzed to compile a list of all possible G-LecRK candidates. Tomato G-LecRKs sequences were mapped according to Shearer et al. . To map tomato sequences onto chromosomes, gene position information was acquired from Phytozyme (https://phytozome.jgi.doe.gov/pz/portal.html) and JBrowse at SGN. Mapping of G-LecRKs onto tomato chromosomes was performed manually using NCBI Map Viewer (https://www.ncbi.nlm.nih.gov/genome/gdv/). Tomato predicted kinase domain sequences were aligned with the kinase domain of Solyc03g006720 using ClustalW and the kinase subdomains were manually checked to identify the kinase subdomains.
To identify columbine (Aquilegia coerulea) G-LecRKs, the same Arabidopsis gene (At1g61550) was used in a BLASTp search of the genome sequence hosted by Phytozome. Three hits from this search were then chosen for another round of searching with BLASTp. Only BLAST hits with an e-value < 0.1 were retained. A search was also conducted using the keyword “lectin”. The resulting sequences were passed to IPRscan on the University of California at Riverside (UCR) High-Performance Computing Center (HPCC) cluster for protein domain identification using Pfam and Prosite .
For G-LecRK identification in rice (Oryza sativa), in addition to the 87 proteins listed in Vaid et al. , protein sequences annotated with the Pfam code PF01453 were retrieved from the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/). All sequences were passed to IPRscan using the same parameters as for columbine.
Full-length protein sequences were downloaded as above from TAIR (Arabidopsis), Sol Genomics Network (tomato), Phytozome (columbine), or the Rice Genome Annotation Project (rice) and aligned using MUSCLE on the UCR HPCC cluster . Eight outgroup protein sequences representing one C-LecRK and L-LecRK each from Arabidopsis, tomato, rice and columbine were included in the alignments. These sequences were from the single Arabidopsis C-LecRK gene, At1g52310, and its top BLASTp hits from tomato (Solyc02g068370.2), rice (Os01g0104000.1), and columbine (Aqcoe2G393700); and an Arabidopsis L-LecRK (At1g52310) and its top BLASTp hits from tomato (Solyc02g068370.2), rice (Os04g0531500.1) and columbine (Aqcoe2G393700). The initial sequence alignment was generated using data from Arabidopsis, tomato, columbine, and rice. Subsequently, a three-species alignment was generated using only sequences from Arabidopsis, tomato, and columbine.
The JTT model of protein sequence evolution was determined to best fit the data based on Akaike Information Criteria using the web server version of SMS, and phylogenetic trees were subsequently constructed from both the four-species and three-species amino acid alignments using this evolutionary model in RAxML v8 on the UCR HPCC cluster with 1000 bootstraps [53,54,55]. A collapsed tree was also constructed from the three- and four- species trees in TreeGraph 2 by collapsing any node with < 70% bootstrap support into a polytomy .
Domain and motif identification
Protein domains were investigated using multiple online programs including, InterPro (https://www.ebi.ac.uk/interpro/) and TMHMMM (http://www.cbs.dtu.dk/services/TMHMM/). Investigation of conserved motifs in the ectodomains of Arabidopsis and tomato G-LecRKs was performed using the default settings at MEME (Multiple EM for Motif Elicitation) Suite 4.11.2 (http://meme-suite.org/tools/meme) .
Subcellular localization prediction
Multiple protein subcellular localization tools were used to localize the Arabidopsis and tomato G-LecRKs. Arabidopsis gene identifiers were used to query “The SUBcellular localization database for Arabidopsis proteins”, SUBA3 (http://suba3.plantenergy.uwa.edu.au/) [58, 59]. Additionally, amino acid sequences of both Arabidopsis and tomato G-LecRKs were analyzed using TargetP 1.1 Server (http://www.cbs.dtu.dk/services/TargetP/) and “subCELlular LOcalization predictor” CELLO v.2.5 (http://cello.life.nctu.edu.tw/) [60, 61].
Annotation of Arabidopsis G-LecRKs
To identify the Arabidopsis G-LecRKs, a BLASTp analysis  was performed at the TAIR website using the region comprising the predicted G-type lectin domain , amino acids 24-170 from At1g61550. The search resulted in 44 sequences. From these sequences, four (At1g61400, At2g19130, At4g21390 and At5g60900) were chosen for use as new queries to fish out additional candidates. These analyses resulted in a total of 49 proteins with a G-lectin domain (Table 1). The majority of these proteins had SLG (75%) and PAN (82%) domains but only 10 had an EGF domain. Of these 49 sequences, 38 proteins had also kinase domains and were considered for further analyses (Table 1).
Previous characterization of Arabidopsis G-LecRKs included 31 sequences , all also identified in our search. This same study also identified a protein, At1g61460, that was not detected in our search. To confirm the identity of this protein, it was used in domain search using InterPro. Domain predictions showed that At1g61460 has SLG, PAN, transmembrane (TM) and kinase domains, but not a lectin domain. Because of the absence of the lectin domain, this protein was not considered a G-LecRK and was not used in further analyses.
Annotation of tomato G-LecRKs
Using the same strategy used to retrieve the Arabidopsis G-LecRKs, the tomato genome was queried for G-type lectin homologs using the lectin domain of At1g61550. Two databases, SGN and NCBI, were searched. The search against SGN resulted in 21 sequences. The search against NCBI resulted in numerous hits, including a number of different isoforms of variable lengths of the same protein. The combined results from these two searches yielded 88 distinct sequences with a G-type lectin domain (Table 2). To assure a comprehensive search, three random tomato G-type lectins were chosen to query the tomato genome again using their predicted G-lectin domain. This resulted in two additional candidates, Solyc07g053220 and Solyc05g008310.
Three of the identified G-type lectin-containing sequences were misannotated. Solyc03g006730, Solyc04g008400, and Solyc07g055640 each contained two G-LecRKs in tandem and were therefore each split into two (Solyc03g006730.A and Solyc03g006730.B; Solyc04g008400.A and Solyc04g008400.B; and Solyc07g055640.A and Solyc07g055640.B) (Table 2). Thus, a total of 93 tomato sequences were identified with G-type lectin domains. The majority (72%) of these tomato sequences had an SLG domain and about half (63.5%) had a PAN domain. However, similar to Arabidopsis, the great majority lacked the EGF domain, with only seven proteins containing this domain (Table 2). Of the 93 tomato sequences, 73 proteins had both G-type lectin and kinase domains and were considered G-LecRKs for further analyses.
Annotation of columbine G-LecRKs
For columbine G-LecRK identification, the same Arabidopsis G-LecRK sequence, At1g61550, was used as a query sequence for a BLASTp search of the genome sequence. Three hits from this search were then chosen for another round of searching with BLASTp. Taking advantage of the functional genome annotations available, a keyword search was also conducted of the columbine genome using the keyword “lectin”. After merging duplicates, this yielded 166 unique protein sequences. Of these, two could not be annotated at all, while 43 others lacked a G-type lectin domain, kinase domain, or both (Additional file 1). Of the 121 columbine sequences, 59 proteins had both G-type lectin and kinase domains and were used in the phylogenetic analysis.
Annotation of rice G-LecRKs
For G-LecRK identification in rice, the 87 previously published proteins were all included . In addition, protein sequences annotated with the Pfam code PF01453 (B-type lectin, synonymous with G-type lectin) were retrieved from the genome sequence. This generated 143 sequences, including the 87 previously reported, but also including different isoforms of some proteins. In the case of isoforms, only the longest was retained. Additionally, LOC_Os09g37840.1 appeared to be a misannotation containing three G-LecRKs in tandem. This locus was split into three sequences (LOC_Os09g37840.1.A,. B, and. C). This resulted in 145 protein sequences with G-type lectin domain (Additional file 2). Of these 145, 122 rice sequences had also a kinase domain and were used in the phylogenetic analysis.
Alignment of the 38 Arabidopsis, 73 tomato, 59 columbine, and 122 rice putative G-LecRK proteins was used to construct a phylogenetic tree with 1000 bootstrap replicates using RaxML . Orthologous copies of a single C- and L-LecRK from Arabidopsis, tomato, rice, and columbine were used as outgroups to root the tree. In this four-species analysis, the C-LecRKs form a sister-clade to a large clade that includes all the other genes, including the L-LecRKs (Additional file 3). The clade containing all of the putative G-LecRKs and the intended outgroup L-LecRKs is further divided into two large clades, although with weak support. The L-LecRKs are included as the first branch in one of these two clades, suggesting that in a phylogenetic context, they should be considered G-LecRKs. In general, support in this tree for deep nodes is weak, with much stronger support towards the tips, suggesting rapid diversification of this gene group, and making interpretation of clade relationships difficult. In addition, the large number of rice genes largely cluster into clades that are nearly or entirely rice-specific (Additional file 3). This indicates extensive expansion of the G-LecRKs independently in rice relative to the eudicots, and makes interpretation of the relationships between Arabidopsis and tomato genes more difficult. For that reason, we performed a second phylogenetic analysis including only Arabidopsis, tomato, and columbine.
The results of this three-species analysis (Additional file 4) mirror those of the four-species analysis, with the C-LecRK outgroup as sister to a large clade that includes both the L- and G-LecRKs. As in the first analysis, this clade is further subdivided into two major clades, one of which (Clade A) includes the L-LecRKs, again suggesting they are not a group distinct from the G-LecRKs. Also consistent with the first analysis, support for deep nodes is weak. For instance, the first branch of clade A (Additional file 4) consists of the three L-LecRK genes that were intended to serve as outgroup with the C-LecRKs. The next branch, with 30% support, includes only one gene each from Arabidopsis and tomato, and the following branch, with 24% support, includes 8 genes from columbine only. This implies that a columbine gene has been lost from the first clade, and that Arabidopsis and tomato genes have been lost from the second. With such low support, however, it is not possible to rule out a topology in which all these genes are members of a single clade, with one Arabidopsis, one tomato, and 8 columbine genes, indicating diversification in columbine rather than multiple losses.
Because of the weak support at many key nodes, for analysis of clade relationships and membership, we used a tree in which all nodes with less than 70% support were collapsed into polytomies (Fig. 2). Although the resulting topology includes fewer resolved relationships, those represented are more robustly supported in the data, providing a stronger basis for exploring expansions and losses in Arabidopsis and tomato. The two large clades remain, although their relationship to the outgroup is unresolved. This suggests an early duplication, prior to the divergence of monocots, producing two independently diverging clades of G-LecRKs. Available data do not show a clear distinction between these groups in function or expression; members of both groups show response (functional or regulatory) to various biotic and abiotic stresses [47, 64,65,66,67,68,69]. Several members of clade A (Fig. 2) are implicated in growth and development processes, which have so far not been reported for clade B members, however relatively few clade B members have been characterized at any level [70, 71].
A glance at the phylogenetic tree shows that within clades, there has been dramatically differential expansion of specific G-LecRK lineages in specific species. For the most part, Arabidopsis, tomato, and columbine genes form distinct species-specific groups within larger clades, suggesting that there have been multiple duplications leading to a large number of gene clades, but that after these duplications, there has been separate diversification of gene groups within each species. Expression and functional date are currently available only for Arabidopsis genes, and the diversity of functions, even within a single clade, suggests that this is a rapidly evolving group [66, 72,73,74,75,76,77,78,79,80]. This makes prediction of function in tomato difficult in most cases.
Clade A1, with 88% support, includes a total of 7 columbine, 7 Arabidopsis, and 28 tomato sequences. This clade is further subdivided into two successive lineages with multiple Arabidopsis and tomato genes but no columbine gene, suggesting loss of columbine genes and additional duplications separately within both Arabidopsis and tomato. The first clade (A1a) includes the Arabidopsis gene At1g11340.1, the expression of which has been shown to be suppressed by mevalonic acid . The second (A1b) includes the S-locus/ARK3 gene At4g21380.1, involved in self incompatibility, which is sister to the sister-gene pair At1g65790.1, upregulated by Fusarium, salt, and flg22, and At1g65800.1 (SD1-6/ARK2), involved in lateral root formation [78, 79, 81, 82]. Clade A1a is sister to a large clade (A1c) with poor internal resolution that includes 7 columbine, 2 Arabidopsis, and 24 tomato genes, indicating dramatic relative expansion within tomato relative to the other two species. Of the two Arabidopsis genes found in this clade, At4g27300.1 is upregulated by osmotic stress and trehalose, and downregulated by sucrose, whereas At4g27290.1 is induced by cold [72, 83]. The wide variation in reported functions for these Arabidopsis genes, and the lack of data for the other species, makes it difficult to generalize regarding the function of genes of individual clades and therefore to extrapolate and predict the functions of the tomato genes.
Clade A includes one other large clade with 97% support (A2) that is further subdivided into two clades with one columbine gene (Aqcoe1g490600) that does not fall into either. Clade A2a, with 100% support, includes 5 genes: one columbine and two each Arabidopsis and tomato. Because the two tomato genes form a sister-pair, as do the two Arabidopsis genes, this indicates independent duplications in those two species. At1g61610.1 has been shown to be upregulated by flg22, bacterial lipopolysaccharide (LPS), and Pst strain DC3000, suggesting a role in defense against bacterial pathogens [69, 84]. Its paralog At4g21390.1 is also implicated in defense and is upregulated in the presence of fungal elicitors . Although it is clear the function of these genes is highly labile, overall it suggests a role in defense for this clade.
Clade A2b, with 81% support, includes two subclades, both with 99% support. Both clades include sequences from both Arabidopsis and tomato, but only one (A2b1) also includes columbine, indicating a duplication that produced the two clades but loss of the columbine gene from one of the clades (A2b2). Both of these sister-clades have experienced expansion in both tomato and Arabidopsis, but A2b1 has a somewhat greater number in tomato (7 vs 4 in Arabidopsis), whereas A2b2 has the reverse, with substantially more in Arabidopsis (13 vs 3 in tomato). The presence of distinct Arabidopsis and tomato genes again suggests, in these two clades, independent expansion in the two species. Clade A2b1 includes Arabidopsis EGM1 (ENHANCED GROWTH ON MANNITOL1; At1g11300) and EGM2 (AT1g11305), paralogs implicated in plant shoot growth and mannitol stress, as well as CBRLK1 (CALMODULIN-BINDING RECEPTOR-LIKE CYTOPLASMIC KINASE1; At1g11350), a negative regulator of immunity against Pst, and At1g1130.2, which likely also plays a role in bacterial immunity [65, 70, 72]. The topology of the clade suggests that the role for shoot growth is derived. Expression of the Arabidopsis genes in clade A2b2 have been shown to respond to a wide variety of factors and, as with other clades, to play roles in plant development and defense responses [71, 75, 76, 84, 85]. Among these is the LPS receptor LORE (LIPOPOLYSACCHARIDE-SPECIFIC REDUCED ELICITATION/ SD1-29; At1g61380), known to mediate LPS sensitivity in Brassicaceae . This LPS sensitivity does not appear to be in solanaceae .
Patterns of diversification are more varied and complex in clade B, and it includes only 7 genes from Arabidopsis, along with 23 from tomato and 37 from columbine. This clade is divided into two subclades, one of which (B1, 100% support) shows diversification in columbine, with 9 genes compared to one each in Arabidopsis and tomato. Although this Arabidopsis gene (At2g19130) has not been characterized, its ortholog in rice (OsSIK2; Os07g0186200) is implicated in salt and drought response . The other clade, B2 (82% support), is further subdivided into one small clade that includes only one gene from each species, and a large clade with multiple subclades. This small clade B2a (100% support), one of the very few clades in the tree that has only one gene per species, has no evidence of diversification in any of the species. The Arabidopsis gene (At4g00340.1) has not been characterized, but its ortholog in strawberry (M2F10) is upregulated in response to infection by the fungus Colletotrichum acutatum . Given that many other G-LecRKs respond to fungal pathogens, it is difficult to hypothesize why this particular clade has not undergone the type of expansion seen in nearly every other clade.
The sister clade to B2a (B2b, with 95% support) is subdivided into 5 clades that are unresolved relative to each other. One of these (B2b1) also shows relatively little expansion: it consists of two sister-clades, each of which has genes from all three species. This indicates a duplication before the diversification of the eudicots, but the only further expansion is a single columbine duplication. The two Arabidopsis genes in clade B2b1 respond to abiotic stress: At2g41890.1 is downregulated in response to gravity, and At1g34300.1, which responds to water loss and decreased dry weight [72, 74]. This suggests a role in abiotic stress response for the tomato and columbine genes as well.
Clade B2b contains two larger subclades that show greater diversification. Clade B2b2 (100% support) has one Arabidopsis member, At5g60900.1, which is implicated in both biotic (upregulated by Fusarium and LPS) and abiotic (downregulated by cold) responses [72, 78, 88]. Although resolution within this clade is poor, the nested position of the Arabidopsis gene suggests that the low number of genes from this species may be the result of loss. However, further resolution could show instead independent diversification in tomato and columbine. Clade B2b3 (99% support) is subdivided into two sister clades, one of which consists entirely of tomato and columbine genes, strongly suggesting loss in Arabidopsis. The other subclade consists of two sister-clades each of which has one gene from each species, another example of a single duplication followed by no additional diversification in any species. One of the Arabidopsis genes, SD2-5 (At4g32300.1), is not characterized, but the rice homolog, Pi-d2, confers resistance to the fungal pathogen Magnaporthe grisea . The other Arabidopsis gene, At5g35370.1, is downregulated in response to nematode infection and is important in salt and drought tolerance, thereby showing roles in both biotic and abiotic responses [90, 91].
Kinase domain analysis of Arabidopsis G-LecRKs
The presence of the 11 known kinase subdomains and sites essential for the catalytic activity  of the G-LecRKs was investigated by aligning the amino acid sequences of the kinase domains to the LecRK-VI.2 kinase domain . The alignment revealed overall conservation of the ATP binding (consensus motif G-x-G-x-x-G-x-V) and the catalytic sites (consensus motif H-R-D-L-K-x-x-N), with a few substitutions in the other kinase subdomains (Additional file 5) . Two of the G-LecRK sequences, At1g67520 and At5g24080, revealed incomplete kinase domains, lacking four (VIII – XI) and seven (I-VII) of the 11 kinase subdomains, respectively. Additionally, At2g41890 has several mutations and deletions notably, in the ATP binding site, in subdomain I, as well as a single amino acid change in its catalytic loop in subdomain VI (Additional file 5) . The essential arginine and aspartic acid residues are substituted for glycine and asparagine, respectively. Therefore, At1g67520, At5g24080, and At2g41890 are likely inactive kinases.
Nomenclature for the Arabidopsis G-LecRKs
The L-LecRKs were previously classified and a nomenclature was established based on the amino acid and nucleotide sequences of the 45 members of the family . In that system, clades, defined as groups of genes with at least 50% identity among homologs at both the nucleotide and amino acid level, were designated by Roman numerals . Following a similar approach, but basing clade membership on the results of our phylogenetic analysis rather than percent sequence identity, we classified the 38 members of the Arabidopsis G-LecRK group into eight clades (I-VIII) (Fig. 3, Table 3, Additional file 6). Within each clade, genes are numbered progressively, across chromosomes and clades as in Bouwmeester et al. ; therefore, gene #1 in each clade would be the gene on the lowest numbered chromosome with the lowest numerical location (for clade I, At1G34300, designated G-LecRK-I.1) and the final gene would be the one on the highest numbered chromosome with the highest numerical location (for clade I, At5G60900, designated G-LecRK-I.8). The two largest identified clades (clades I and VIII) contain eight members, followed by clades with five and four members (VI and V, respectively) and four clades with two members each (Fig. 3, Table 3). Five genes were not placed in any clade, behaving as singletons. Chromosomal location was not predictive of clade membership, with genes from chromosome 1 being found in all eight clades; in addition, three of the singleton genes are located on chromosome 1 where the vast majority of G-LecRKs are localized.
Chromosomal location and prediction of Arabidopsis G-LecRK subcellular localization
The 38 Arabidopsis G-LecRKs were mapped onto the five chromosomes using the chromosomal map tool from TAIR. A single G-LecK locus, At1g11305, was not present on TAIR since it was created when At1g11300 was discovered as a misannotated gene and was split into At1g11300 and At1g11305 . Therefore, the chromosomal location of At1g11305 was based on the location of At1g11300. Unlike Arabidopsis L-LecRKs, most of which are localized on chromosomes 5 and 3 [28, 30], the vast majority of the Arabidopsis G-LecRKs are localized on chromosome 1 (24 members), followed by chromosome 4 (eight members), chromosome 5 (three members), chromosome 2 (two members) and chromosome 3 (one member) (Additional file 7).
Arabidopsis G-LecRKs localization was predicted using SUBA3 [58, 59]. This tool predicted all Arabidopsis proteins to be localized at the plasma membrane, consistent with the existence of a TM domain. SUBA predictions were further investigated with TargetP 1.1 . This tool predicts protein localization by analyzing cleavage site predictions and, therefore, predicts localization to the chloroplast, mitochondria or secretory pathways. Most of the Arabidopsis G-LecRKs were predicted to have a secretion signal peptide (Additional file 8). Two proteins, At1g61390 and At1g61400, were predicted to localize at the mitochondrial membrane and localization was not predicted by Target P 1.1 for one, At1g11280.
To validate localization predictions, the subCELlular LOcalization tool CELLO  was used. CELLO predictions mostly confirmed the predictions obtained by SUBA, but additionally revealed possible specific subcellular localization of two G-LecRKs, At4g27290 and At5g60900 (Additional file 8). These encode proteins without TMs, based on a domain search performed using Interpro, although both proteins were predicted to localize at the plasma membrane by SUBA. Interestingly, CELLO prediction added the possibility that these proteins could also localize to the nucleus and cytoplasm (Additional file 8).
Kinase domain analysis of the tomato G-LecRKs
Like Arabidopsis, the presence of sites essential for catalytic activities of the 11 kinase subdomains  was investigated for the tomato G-LecRKs. The alignment of the tomato G-LecRKs kinase domains revealed overall conservation of the ATP-binding and catalytic sites, with a few substitutions in the other kinase subdomains (Additional file 9). The search also revealed ten genes with incomplete kinase domains with various amino acid modifications and indels in the subdomains (Table 4, Additional file 9). Additionally, Solyc07g063810 shows conservation of subdomains VI to XI, which includes the catalytic site, but displays several amino acid modifications in subdomains I to V, including the ATP binding site, suggesting it is likely an inactive kinase. Solyc03g063650 has a substitution of the aspartic acid to asparagine the kinase catalytic site, in subdomain VI, and lacks essential amino acids of subdomains I to IV suggesting it is also likely an inactive kinase (Additional file 9).
Nomenclature for the tomato G-LecRKs
Following a similar nomenclature as for the Arabidopsis G-LecRKs, the clades containing the 73 tomato G-LecRK members were used as the basis for naming the genes. This methodology resulted in the grouping of tomato G-LecRKs into 13 clades, within which genes are numbered progressively across chromosomes and clades (Fig. 4, Table 5, Additional file 10). Clades range in size from the largest, clade III, with 21 members, to four clades with two members each. Three genes do not fall in any of these clades and are designated singletons, two of which are on chromosome 2 and one on chromosome 7. As with Arabidopsis, chromosomal location is not correlated with clade membership.
Chromosomal location and prediction of tomato G-LecRKs subcellular localization
The 73 G-LecRK loci were mapped onto the 12 tomato chromosomes. As for Arabidopsis, locations of the genes that were misannoted and split into 2, were based on the location of the original locus. Members of tomato G-LecRKs are distributed throughout the 12 tomato chromosomes, with over half (54%) localized on chromosome 7, 4, 2, and 3 encompassing 18, 14, 11, and 11 members, respectively (Additional file 11) . In contrast, tomato L-LecRKs are mostly localized on chromosome 9 and 10, with members located on 8 of the 12 chromosomes.
Tomato G-LecRK protein sequences were used to predict subcellular localization with TargetP 1.1 . The localization of tomato C-LecRKs and L-LecRKs have not been investigated. However L-LecRKs possess TM domains and are predicted to localize mostly at the plasma membrane with a few members predicted to localize to mitochondria or chloroplast . TargetP predicted that the great majority of tomato G-LecRKs have secretion pathway signals and are localized to the plasma membrane (Additional file 12). A single protein (Solyc02g079630) was predicted to have chloroplast localization. Four proteins (Solyc03g006730.B, Solyc07g055640.A, Solyc07g063810 and Solyc11g005630) were predicted to localize to mitochondria. Eight proteins (Solyc07g055640.B, Soly04g077380, Solyc08g076060, Solyc07g055650, Solyc07g055630, Solyc04g008400.B, Solyc02g030300 and Solyc08g076070), for which a signal peptide could not be predicted using this tool, were not localized to a specific subcellular compartment. Additionally, the subcellular localization tool, CELLO, was used to investigate the localization of the tomato G-LecRKs and showed an overlap of prediction of plasma membrane localization and presence of TargetP secretion pathway signal. Interestingly, this tool was able to predict subcellular localization of proteins for which TargetP could not predict localization, and was also able to predict membrane localization for proteins that did not have a predicted TM domain, suggesting a different membrane-associated signal for these proteins. CELLO predictions also suggested multiple localizations for a few tomato G-LecRKs (such as Solyc01g006530 and Solyc07g055630) and contradicted a few predictions by TargetP (such as Solyc02g079630 and Solyc03g006730.B; Additional file 12).
Comparison of conserved motifs in ectodomains of tomato and Arabidopsis G-LecRKs
The predicted cytoplasmic-localized regions of G-LecRKs consist of the extremely conserved kinase domains. To investigate the presence of conserved motifs in the ectodomain of the Arabidopsis and tomato G-LecRKs, the amino acid sequences of the ectodomains were submitted to MEME . Despite the high variability of the ectodomains, five motifs present in at least 30 of the 38 Arabidopsis sequences and in at least 45 of the 73 tomato sequences were identified (Fig. 5).
The highest conserved ectodomain motif (Fig. 5a) was present in all Arabidopsis and tomato G-LecRKs and it was previously shown to be present in 96% of the rice G-LecRKs . One of the motifs, a cysteine-rich region within the PAN domain (Fig. 5b), is present in 34 and 66 of the Arabidopsis and tomato G-LecRKs, respectively. Interestingly, a third motif (Fig. 5c) is also conserved in 76 out of 100 rice G-LecRKs, and was previously identified in 27 Arabidopsis G-LecRKs . The two remaining ectodomain motifs are novel and have not been reported previously (Fig. 5d, e). The conservation of the motifs in the ectodomain of both Arabidopsis and tomato G-LecRKs is remarkable considering that these extracellular domains harbor the lectin domain known to have low conservation among members of this family from a single plant species .
Two previous studies have reported Arabidopsis to have 32 [28, 93] G-LecRK members, a number lower than the 38 members identified in this study. One of the possible reasons for this discrepancy from Shiu and Bleecker , is the current improved annotation of the Arabidopsis genome. Their study also used the presence of a TM domain as a criterion for their analysis, but this was not used in our study. As for the lower number identified by Vaid et al. , this could be due to the fact that their analysis relied on sequence similarity to a single gene sequence, At1g61610, while in our analysis, we used a number of candidate sequences from our initial search results to fish out additional candidates. Nevertheless, their overall criterion for candidates was the same as the one used here, which is the presence of both lectin and kinase domains.
Our search retrieved all genes identified by Vaid et al.  and an additional 16 sequences that have a G-type lectin domain. Their gene list included a single gene, At1g61460, that is not present on our list and which does not encode a G-type lectin domain, so was not recovered in our BLASTp searches. Of the 16 new sequences with a G-type lectin domain, nine do not encode a kinase domain and would not have been retrieved by Vaid et al. . These were also excluded from our analysis. Taken together, our results added seven proteins to the previous list of Arabidopsis G-LecRKs. Of these seven additional genes, At1g67520 and At5g24080 encode proteins with atypical kinase domains and lacking several subdomains, while At2g41890 lacked essential amino acids at the ATP binding site and the catalytic loop, suggesting they are defective kinases . While kinase activity could be crucial for the function of these proteins, lacking kinase activity may not abolish function, as function for kinase inactive receptor-like kinases has been previously reported [94, 95].
The same search methodology used for identifying members of this family in Arabidopsis showed success with identification of the members in tomato. These investigations of the tomato genome allowed the identification of 73 genes encoding proteins with both a kinase domain and a G-lectin domain and revealed expansion of the number of members of this family in tomato relative to Arabidopsis. As in Arabidopsis, several of the 73 tomato G-LecRKs have mutations and deletions in their kinase domains and seem to be inactive kinases, indicative of functional diversification.
Phylogenetic analysis of G-LecRK sequences consistently shows the genes falling into two clades, whether one, three, or four species is included. One of these clades includes the L-LecRK sequences intended as outgroup along with the C-LecRKs. This configuration would indicate that the L-type genes should be merged into the G-type, because sequence analysis shows they are not distinct. However, Arabidopsis alone has 45 L-LecRKs and this analysis included only one, therefore additional analyses are needed to elucidate the relationship between G- and L-LecRKs. The division of the G-type genes into two clades, each with high support, also suggests that this group could be subdivided into two different LecRK groups, however, based on current evidence, there does not seem to be functional differentiation between the two clades, therefore it is reasonable to consider them a single group for purposes of understanding LecRK function and evolution.
There is no pattern discernable in the expression and function data currently available regarding G-LecRKs. Whereas all recorded functions and expression data are consistent with roles in biotic and abiotic stress responses or developmental processes, which involve similar pathways, no clade seems specialized for specific functions within these categories. It is not possible to say “there is a clade of genes that responds to trehalose, and it has expanded in species X, therefore trehalose response is important in species X.” Rather, we find closely related genes that appear to have taken on different functions related to stress response and development. Clade A2b2, which includes the largest cluster of Arabidopsis genes, includes members with putative functions ranging from gravitropism to pollen and root hair development to resistance to bacterial infection [71, 75, 76, 84, 85, 96]. Clades with fewer Arabidopsis sequences, such as B2b1 with two, still show variation, in this case one gene that responds to gravity and one that responds to water loss [72, 74]. Two points are important to keep in mind when evaluating these expression and function data, however. First, data are only available for genes from Arabidopsis; we currently do not have data for tomato or columbine genes. Second, the current analyses are based on published reports, but it is highly likely that no study tested all possible biotic and abiotic factors. Therefore the fact that At1g11340.1 is suppressed by mevalonic acid does not mean it might not be up- or downregulated in response to other factors .
Analysis of the clades in the phylogenetic tree shows that there has been species-specific gene expansion in different clades across the tree, and examination of chromosomal locations of the genes suggests combinations of tandem and possible whole genome duplications. Notably, clade B has only 7 Arabidopsis genes, and no subclade has more than two. In contrast, clade B has 23 tomato genes, with one subclade that has a single Arabidopsis gene (B2b2) having 7 tomato genes. This same subclade has 5 columbine genes. Clade B1 has one gene each from Arabidopsis and tomato, but 9 from columbine. Clade A has a greater number of genes from Arabidopsis, but again we see species-specific expansion in different subclades in all three species. Clade A2b2 has 3 tomato genes, which form a sister-group to 13 Arabidopsis genes; this is the largest Arabidopsis gene group in the tree. The tomato genes are all located on chromosome 3, and the Arabidopsis genes are all located on chromosome 1, suggesting a single ancestral gene that underwent tandem duplications in each species independently, and to a greater extent in Arabidopsis. Clade A1 consists of three subclades, two of which have genes only from Arabidopsis and tomato, and one of which has genes from all three. This topology implies three ancestral genes, with loss of columbine genes from the first two clades. These clades (A1a, A1b) further show evidence of additional duplications within tomato and Arabidopsis, although in this case these appear potentially to be a combination of tandem and whole genome duplications based on chromosomal locations. The third clade, A1c, again shows dramatic expansion in columbine and especially tomato, but not Arabidopsis. Because there is no clear pattern of differential function or expression among clades, it is possible that the differential diversification of clades in different species is essentially stochastic. Duplication appears to be extremely common, and the extent in each clade in each species may not be specifically under selection. In fact, the independent expansion and relatively small number of clades with genes from both tomato and Arabidopsis suggests that the base number of core eudicot G-LecRKs is low, and that the differences in numbers between the two species is more a result of expansion than loss. This is also consistent with the role of these genes in stress responses.
While clustering of G-LecRKs members on chromosomes, such as Arabidopsis chromosome 1 and tomato chromosome 7, suggests duplications, our analysis indicates that chromosomal location is not predictive of a clade membership for either plant species. This suggests that G-LecRKs are rapidly evolving and diversifying consistent with their known functional roles in biotic and abiotic stress responses and development.
Clade membership was also not indicative of the presence of specific ectodomain configurations, specifically presence (or absence) of three domains: SLG, EGF and PAN. The importance of each of these domains, as well as their contributions to G-LecRK activity, have not been investigated to date. Nevertheless, it is to be expected that relevant regions at the ectodomain, outside of the region that confers substrate-binding specificity, would be conserved among different members of the same family. Consistent with this hypothesis, a motif search among members from Arabidopsis and tomato revealed the presence of a single motif (Fig. 5a) in all members of G-LecRKs from both plant species. This motif was also identified in a previous investigation in both Arabidopsis and rice (96% of rice G-LecRKs) . Interestingly, the second motif identified in our search (Fig. 5b), present in 34 Arabidopsis G-LecRKs and 66 tomato G-LecRKs was also identified in 76% of the rice G-LecRKs by the same authors. The observation that the lectin domain is the domain with low conservation in G-LecRKs and the presence of conserved motifs in the ectodomain shows that despite the lack of conservation of the lectin domain, a specific motif is conserved and might constitute essential site(s) for protein activity. The two newly identified motifs, observed in 30 of the 38 Arabidopsis and 45 of the 73 tomato lectin domain of G-LecRKs, are less common. Their presence in additional plant species and how widespread these new motifs are remains to be investigated.
We present here the results of an analysis of G-LecRK gene lineage evolution in Arabidopsis, a member of the mustard family (Brassicaceae, in the rosid clade) and tomato, a member of the nightshade family (Solanaceae, in the asterid clade). Given that these are both members of the derived angiosperm clade, core eudicots, we also included an evaluation of G-LecRKs in columbine (Aquilegia coerulea), a member of the basal eudicots that diverged before the rosid-asterid split in the core eudicots to polarize the tree and allow analysis of duplication, expansion, and loss of G-LecRK genes and gene clades.
The present investigation added to the number of currently known Arabidopsis G-LecRKs and characterized for the first time the tomato G-LecRKs. We proposed a nomenclature for both Arabidopsis and tomato G-LecRKs and identified possible essential sites for G-LecRK activity. Additionally, prediction of protein localization by different tools enriched the initial prediction of G-LecRKs plasma membrane localization and raised the possibility for specificity of modes of actions of a number of G-LecRKs depending on their specific subcellular localization patterns. Given their putative roles in plant defense, and the importance of tomato as a crop, an understanding of the structure and evolution of these proteins in tomato may shed light on defense strategies that can be leveraged to produce hardier plants and yield.
Epidermal growth factor
Galanthus nivalis agglutinin type
Lectin receptor kinases
- Pst :
Pseudomonas syringae pv. tomato
Lehti-Shiu MD, Shiu S-H. Diversity, classification and function of the plant protein kinase superfamily. Philos Trans R Soc Lond Ser B Biol Sci. 2012;367:2619–39.
Gilardoni PA, Hettenhausen C, Baldwin IT, Bonaventure G. Nicotiana attenuata LECTIN RECEPTOR KINASE1 suppresses the insect-mediated inhibition of induced defense responses during Manduca sexta herbivory. Plant Cell. 2011;23:3512–32.
Bouwmeester K, de Sain M, Weide R, Gouget A, Klamer S, Canut H, et al. The lectin receptor kinase LecRK-I.9 is a novel Phytophthora resistance component and a potential host target for a RXLR effector. PLoS Pathog. 2011;7:e1001327.
Arnaud D, Desclos-Theveniau M, Zimmerli L. Disease resistance to Pectobacterium carotovorum is negatively modulated by the Arabidopsis lectin receptor kinase LecRK-V.5. Plant Signal Behav. 2012;7:1070–2.
Desclos-Theveniau M, Arnaud D, Huang T-Y, Lin GJ-C, Chen W-Y, Lin Y-C, et al. The Arabidopsis lectin receptor kinase LecRK-V.5 represses stomatal immunity induced by Pseudomonas syringae pv. tomato DC3000. PLoS Pathog. 2012;8:e1002513.
Singh P, Kuo Y-C, Mishra S, Tsai C-H, Chien C-C, Chen C-W, et al. The lectin receptor kinase-VI.2 is required for priming and positively regulates Arabidopsis pattern-triggered immunity. Plant Cell. 2012;24:1256–70.
Armijo G, Salinas P, Monteoliva MI, Seguel A, García C, Villarroel-Candia E, et al. A salicylic acid–induced lectin-like protein plays a positive role in the effector-triggered immunity response of Arabidopsis thaliana to Pseudomonas syringae Avr-Rpm1. Mol Plant-Microbe Interact. 2013;26:1395–406.
Cheng X, Wu Y, Guo J, Du B, Chen R, Zhu L, et al. A rice lectin receptor-like kinase that is involved in innate immune responses also contributes to seed germination. Plant J. 2013;76:687–98.
Singh P, Zimmerli L. Lectin receptor kinases in plant innate immunity. Front Plant Sci. 2013;4:124.
Lannoo N, Van Damme EJM. Lectin domains at the frontiers of plant defense. Front Plant Sci. 2014;5:397.
Macedo MLR, Oliveira CFR, Oliveira CT. Insecticidal activity of plant lectins and potential application in crop protection. Molecules. 2015;20:2014–33.
Bigeard J, Colcombet J, Hirt H. Signaling mechanisms in pattern-triggered immunity (PTI). Mol Plant. 2015;8:521–39.
Singh P, Chien C-C, Mishra S, Tsai C-H, Zimmerli L. The Arabidopsis LECTIN RECEPTOR KINASE-VI.2 is a functional protein kinase and is dispensable for basal resistance to Botrytis cinerea. Plant Signal Behav. 2013;8:e22611.
Zipfel C. Plant pattern-recognition receptors. Trends Immunol. 2014;35:345–51.
Malinovsky FG, Fangel JU, Willats WGT. The role of the cell wall in plant immunity. Front Plant Sci. 2014;5:178.
Böhm H, Albert I, Fan L, Reinhard A, Nürnberger T. Immune receptor complexes at the plant cell surface. Curr Opin Plant Biol. 2014;20:47–54.
Felix G, Duran JD, Volko S, Boller T. Plants have a sensitive perception system for the most conserved domain of bacterial flagellin. Plant J. 1999;18:265–76.
Trdá L, Fernandez O, Boutrot F, Héloir M-C, Kelloniemi J, Daire X, et al. The grapevine flagellin receptor VvFLS2 differentially recognizes flagellin-derived epitopes from the endophytic growth-promoting bacterium Burkholderia phytofirmans and plant pathogenic bacteria. New Phytol. 2014;201:1371–84.
Shi Q, Febres VJ, Jones JB, Moore GA. A survey of FLS2 genes from multiple citrus species identifies candidates for enhancing disease resistance to Xanthomonas citri ssp. citri. Hortic Res. 2016;3:16022.
Robatzek S, Bittel P, Chinchilla D, Köchner P, Felix G, Shiu S-H, et al. Molecular identification and characterization of the tomato flagellin receptor LeFLS2, an orthologue of Arabidopsis FLS2 exhibiting characteristically different perception specificities. Plant Mol Biol. 2007;64:539–47.
Takai R, Isogai A, Takayama S, Che F-S. Analysis of flagellin perception mediated by flg22 receptor OsFLS2 in rice. Mol Plant-Microbe Interact. 2008;21:1635–42.
Hind SR, Strickler SR, Boyle PC, Dunham DM, Bao Z, O’Doherty IM, et al. Tomato receptor FLAGELLIN-SENSING 3 binds flgII-28 and activates the plant immune system. Nat Plants. 2016;2:16128.
Heese A, Hann DR, Gimenez-Ibanez S, Jones AME, He K, Li J, et al. The receptor-like kinase SERK3/BAK1 is a central regulator of innate immunity in plants. Proc Natl Acad Sci U S A. 2007;104:12217–22.
Ron M, Avni A. The receptor for the fungal elicitor ethylene-inducing xylanase is a member of a resistance-like gene family in tomato. Plant Cell. 2004;16:1604–15.
Cao Y, Liang Y, Tanaka K, Nguyen CT, Jedrzejczak RP, Joachimiak A, et al. The kinase LYK5 is a major chitin receptor in Arabidopsis and forms a chitin-induced complex with related kinase CERK1. Elife. 2014;3 Available from: https://doi.org/10.7554/eLife.03766
Shinya T, Motoyama N, Ikeda A, Wada M, Kamiya K, Hayafune M, et al. Functional characterization of CEBiP and CERK1 homologs in arabidopsis and rice reveals the presence of different chitin receptor systems in plants. Plant Cell Physiol. 2012;53:1696–706.
Shimizu T, Nakano T, Takamizawa D, Desaki Y, Ishii-Minami N, Nishizawa Y, et al. Two LysM receptor molecules, CEBiP and OsCERK1, cooperatively regulate chitin elicitor signaling in rice. Plant J. 2010;64:204–14.
Vaid N, Pandey PK, Tuteja N. Genome-wide analysis of lectin receptor-like kinase family from Arabidopsis and rice. Plant Mol Biol. 2012;80:365–88.
Vaid N, Macovei A, Tuteja N. Knights in action: lectin receptor-like kinases in plant development and stress responses. Mol Plant. 2013;6:1405–18.
Wang Y, Weide R, Govers F, Bouwmeester K. L-type lectin receptor kinases in Nicotiana benthamiana and tomato and their role in Phytophthora resistance. J Exp Bot. 2015;66:6731–43.
Shumayla SS, Pandey AK, Singh K, Upadhyay SK. Molecular characterization and global expression analysis of lectin receptor kinases in bread wheat (Triticum aestivum). PLoS One. 2016;11:e0153925.
Balagué C, Gouget A, Bouchez O, Souriac C, Haget N, Boutet-Mercey S, et al. The Arabidopsis thaliana lectin receptor kinase LecRK-I.9 is required for full resistance to Pseudomonas syringae and affects jasmonate signalling. Mol Plant Pathol. 2017;18:937–48.
Wang Y, Cordewener JHG, America AHP, Shan W, Bouwmeester K, Govers F. Arabidopsis lectin receptor kinases LecRK-IX.1 and LecRK-IX.2 are functional analogs in regulating Phytophthora resistance and plant cell death. Mol Plant-Microbe Interact. 2015;28:1032–48.
Bouwmeester K, Govers F. Arabidopsis L-type lectin receptor kinases: phylogeny, classification, and expression profiles. J Exp Bot. 2009;60:4383–96.
Huang P-Y, Yeh Y-H, Liu A-C, Cheng C-P, Zimmerli L. The Arabidopsis LecRK-VI.2 associates with the pattern-recognition receptor FLS2 and primes Nicotiana benthamiana pattern-triggered immunity. Plant J. 2014;79:243–55.
Choi J, Tanaka K, Cao Y, Qi Y, Qiu J, Liang Y, et al. Identification of a plant receptor for extracellular ATP. Science. 2014;343:290–4.
Cao Y, Tanaka K, Nguyen CT, Stacey G. Extracellular ATP is a central signaling molecule in plant stress responses. Curr Opin Plant Biol. 2014;20:82–7.
Van Damme EJM, Nakamura-Tsuruta S, Smith DF, Ongenaert M, Winter HC, Rougé P, et al. Phylogenetic and specificity studies of two-domain GNA-related lectins: generation of multispecificity through domain duplication and divergent evolution. Biochem J. 2007;404:51–61.
Van Damme EJM, Allen AK, Peumans WJ. Isolation and characterization of a lectin with exclusive specificity towards mannose from snowdrop (Galanthus nivalis) bulbs. FEBS Lett. 1987;215:140–4.
Hester G, Kaku H, Goldstein IJ, Wright CS. Structure of mannose-specific snowdrop (Galanthus nivalis) lectin is representative of a new plant lectin family. Nat Struct Biol. 1995;2:472–9.
Damme EJMV, Allen AK, Peumans WJ. Related mannose-specific lectins from different species of the family Amaryllidaceae. Physiol Plant. 1988;73:52–7.
Van Damme EJM, Goldstein IJ, Peumans WJ. A comparative study of mannose-binding lectins from the amaryllidaceae and alliaceae. Phytochemistry. 1991;30:509–14.
Van Damme EJM, Goldstein IJ, Vercammen G, Vuylsteke J, Peumans WJ. Lectins of members of the Amaryllidaceae are encoded by multigene families which show extensive homology. Physiol Plant. 1992;86:245–52.
Peumans WJ, Barre A, Bras J, Rougé P, Proost P, Van Damme EJM. The liverwort contains a lectin that is structurally and evolutionary related to the monocot mannose-binding lectins. Plant Physiol. 2002;129:1054–65.
Kai G, Zhao L, Zheng J, Zhang L, Miao Z, Sun X, et al. Isolation and characterization of a new mannose-binding lectin gene from Taxus media. J Biosci. 2004;29:399–407.
Sherman-Broyles S, Boggs N, Farkas A, Liu P, Vrebalov J, Nasrallah ME, et al. S locus genes and the evolution of self-fertility in Arabidopsis thaliana. Plant Cell. 2007;19:94–106.
Kusaba M, Dwyer K, Hendershot J, Vrebalov J, Nasrallah JB, Nasrallah ME. Self-incompatibility in the genus Arabidopsis: characterization of the S locus in the outcrossing A. lyrata and its autogamous relative A. thaliana. Plant Cell. 2001;13:627–43.
Shiu SH, Bleecker AB. Plant receptor-like kinase gene family: diversity, function, and signaling. Sci STKE. 2001;2001:re22.
Tordai H, Bányai L, Patthy L. The PAN module: the N-terminal domains of plasminogen and hepatocyte growth factor are homologous with the apple domains of the prekallikrein family and with a novel domain found in numerous nematode proteins. FEBS Lett. 1999;461:63–7.
Shearer LA, Anderson LK, de Jong H, Smit S, Goicoechea JL, Roe BA, et al. Fluorescence in situ hybridization and optical mapping to correct scaffold arrangement in the tomato genome. G3. 2014;4:1395–405.
Mitchell A, Chang H-Y, Daugherty L, Fraser M, Hunter S, Lopez R, et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 2015;43:D213–21.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992;8:275–82.
Lefort V, Longueville J-E, Gascuel O. SMS: Smart Model Selection in PhyML. Mol Biol Evol. 2017;34:2422–4.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3.
Stöver BC, Müller KF. TreeGraph 2: combining and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics. 2010;11:7.
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–8.
Tanz SK, Castleden I, Hooper CM, Vacher M, Small I, Millar HA. SUBA3: a database for integrating experimentation and prediction to define the SUBcellular location of proteins in Arabidopsis. Nucleic Acids Res. 2013;41:D1185–91.
Hooper CM, Tanz SK, Castleden IR, Vacher MA, Small ID, Millar AH. SUBAcon: a consensus algorithm for unifying the subcellular localization data of the Arabidopsis proteome. Bioinformatics. 2014;30:3356–64.
Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007;2:953–71.
Yu C-S, Chen Y-C, Lu C-H, Hwang J-K. Prediction of protein subcellular localization. Proteins. 2006;64:643–51.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 2015;43:D222–6.
Guo Y-L, Fitz J, Schneeberger K, Ossowski S, Cao J, Weigel D. Genome-wide comparison of nucleotide-binding site-leucine-rich repeat-encoding genes in Arabidopsis. Plant Physiol. 2011;157:757–69.
Kim HS, Jung MS, Lee SM, Kim KE, Byun H, Choi MS, et al. An S-locus receptor-like kinase plays a role as a negative regulator in plant defense responses. Biochem Biophys Res Commun. 2009;381:424–8.
Ranf S, Gisch N, Schäffer M, Illig T, Westphal L, Knirel YA, et al. A lectin S-domain receptor kinase mediates lipopolysaccharide sensing in Arabidopsis thaliana. Nat Immunol. 2015;16:426–33.
ten Hove CA, Bochdanovits Z, Jansweijer VMA, Koning FG, Berke L, Sanchez-Perez GF, et al. Probing the roles of LRR RLK genes in Arabidopsis thaliana roots using a custom T-DNA insertion set. Plant Mol Biol. 2011;76:69–83.
Swain S, Singh N, Nandi AK. Identification of plant defence regulators through transcriptional profiling of Arabidopsis thaliana cdd1 mutant. J Biosci. 2015;40:137–46.
Ko J-H, Kim JH, Jayanty SS, Howe GA, Han K-H. Loss of function of COBRA, a determinant of oriented cell expansion, invokes cellular defence responses in Arabidopsis thaliana. J Exp Bot. 2006;57:2923–36.
Trontin C, Kiani S, Corwin JA, Hématy K, Yansouni J, Kliebenstein DJ, et al. A pair of receptor-like kinases is responsible for natural variation in shoot growth response to mannitol treatment in Arabidopsis thaliana. Plant J. 2014;78:121–33.
Lan P, Li W, Schmidt W. Genome-wide co-expression analysis predicts protein kinases as important regulators of phosphate deficiency-induced root hair remodeling in Arabidopsis. BMC Genomics. 2013;14:210.
Chae L, Sudat S, Dudoit S, Zhu T, Luan S. Diverse transcriptional programs associated with environmental stress and hormones in the Arabidopsis receptor-like kinase gene family. Mol Plant. 2009;2:84–107.
Kumari S, Shridhar S, Singh D, Priya P, Farmer R, Hundal J, et al. The role of lectins and HD-ZIP transcription factors in isoprenoid based plant stress responses. Proc Indian Natl Sci Acad. 2012;78:671–91.
Kalladan R, Lasky JR, Chang TZ, Sharma S, Juenger TE, Verslues PE. Natural variation identifies genes affecting drought-induced abscisic acid accumulation in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 2017;114:11536–41.
Qutob D, Kemmerling B, Brunner F, Küfner I, Engelhardt S, Gust AA, et al. Phytotoxicity and innate immune responses induced by Nep1-Like proteins. Plant Cell. 2006;18:3721.
Tischner R, Koltermann M, Hesse H, Plath M. Early responses of Arabidopsis thaliana to infection by Verticillium longisporum. Physiol Mol Plant Pathol. 2010;74:419–27.
Ma S, Gong Q, Bohnert HJ. Dissecting salt stress pathways. J Exp Bot. 2006;57:1097–107.
Zhu Q-H, Stephen S, Kazan K, Jin G, Fan L, Taylor J, et al. Characterization of the defense transcriptome responsive to Fusarium oxysporum-infection in Arabidopsis using RNA-seq. Gene. 2013;512:259–66.
Deb S, Sankaranarayanan S, Wewala G, Widdup E, Samuel MA. The S-Domain receptor kinase Arabidopsis Receptor Kinase2 and the U Box/Armadillo repeat-containing E3 Ubiquitin Ligase9 module mediates lateral root development under phosphate starvation in Arabidopsis. Plant Physiol. 2014;165:1647–56.
Mishiba K-I, Nagashima Y, Suzuki E, Hayashi N, Ogata Y, Shimada Y, et al. Defects in IRE1 enhance cell death and fail to degrade mRNAs encoding secretory pathway proteins in the Arabidopsis unfolded protein response. Proc Natl Acad Sci U S A. 2013;110:5713–8.
Samuel MA, Mudgil Y, Salt JN, Delmas F, Ramachandran S, Chilelli A, et al. Interactions between the S-domain receptor kinases and AtPUB-ARM E3 ubiquitin ligases suggest a conserved signaling pathway in Arabidopsis. Plant Physiol. 2008;147:2084–95.
Birkenbihl RP, Kracher B, Somssich IE. Induced genome-wide binding of three Arabidopsis WRKY transcription factors during early MAMP-triggered immunity. Plant Cell. 2017;29:20–38.
Schluepmann H, van Dijken A, Aghdasi M, Wobbes B, Paul M, Smeekens S. Trehalose mediated growth inhibition of Arabidopsis seedlings is due to trehalose-6-phosphate accumulation. Plant Physiol. 2004;135:879–90.
New S-A, Piater LA, Dubery IA. In silico characterization and expression analysis of selected Arabidopsis receptor-like kinase genes responsive to different MAMP inducers. Biol Plant. 2015;59:18–28.
Blaum BS, Mazzotta S, Nöldeke ER, Halter T, Madlung J, Kemmerling B, et al. Structure of the pseudokinase domain of BIR2, a regulator of BAK1-mediated immune signaling in Arabidopsis. J Struct Biol. 2014;186:112–21.
Chen L-J, Wuriyanghan H, Zhang Y-Q, Duan K-X, Chen H-W, Li Q-T, et al. An S-domain receptor-like kinase, OsSIK2, confers abiotic stress tolerance and delays dark-induced leaf senescence in rice. Plant Physiol. 2013;163:1752–65.
Amil-Ruiz F. Molecular mechanisms of strawberry plant defence against Colletotrichum acutatum Caballero Repullo JL, Muñoz Blanco J, editors. Universidad de Córdoba; 2013.
Sanabria NM, van Heerden H, Dubery IA. Molecular characterisation and regulation of a Nicotiana tabacum S-domain receptor-like kinase gene induced during an early rapid response to lipopolysaccharides. Gene. 2012;501:39–48.
Chen X, Shang J, Chen D, Lei C, Zou Y, Zhai W, et al. AB-lectin receptor kinase gene conferring rice blast resistance. Plant J. 2006;46:794–804.
Fuller VL, Lilley CJ, Atkinson HJ, Urwin PE. Differential gene expression in Arabidopsis following infection by plant-parasitic nematodes Meloidogyne incognita and Heterodera schachtii. Mol Plant Pathol. 2007;8:595–609.
Sun X, Sun M, Luo X, Ding X, Ji W, Cai H, et al. A Glycine soja ABA-responsive receptor-like cytoplasmic kinase, GsRLCK, positively controls plant tolerance to salt and drought stresses. Planta. 2013;237:1527–45.
Hanks SK, Hunter T. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 1995;9:576–96.
Shiu SH, Bleecker AB. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc Natl Acad Sci U S A. 2001;98:10763–8.
Kim J-G, Li X, Roden JA, Taylor KW, Aakre CD, Su B, et al. Xanthomonas T3S effector XopN suppresses PAMP-triggered immunity and interacts with a tomato atypical receptor-like kinase and TFT1. Plant Cell. 2009;21:1305–23.
Niu D, Lii YE, Chellappan P, Lei L, Peralta K, Jiang C, et al. miRNA863-3p sequentially targets negative immune regulator ARLPKs and positive regulator SERRATE upon bacterial infection. Nat Commun. 2016;7:11324.
Liang X, Shen K, Lichtenberg J, Wyatt SE, Welch LR. An integrated bioinformatics approach to the discovery of cis-regulatory elements involved in plant gravitropic signal transduction. Inter J Comp Biosci. 2010;1:33–54.
MAT was funded by the Coordination for the Improvement of Higher Education of Brazil. OGC was funded by a grant from National Science Foundation Research Experiences for Undergraduates (1461297).
Availability of data and materials
Datasets derived from public resources and made available with the article.
Ethics approval and consent to participate
This study has not directly involved humans, animals or plants.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Members of the columbine G-lectins and their domains. (XLSX 57 kb)
Members of the rice G-lectins and their domains. (XLS 33 kb)
Maximum likelihood tree of amino acid sequences from G-LecRK, L-LecRK intended outgroups, and C-LecRK outgroups from tomato, Arabidopsis, columbine, and rice. Bootstrap support from 1000 replicates is shown above nodes. Brackets on the right indicate intended outgroup clades. (PDF 20 kb)
Maximum likelihood tree of amino acid sequences from G-LecRK, L-LecRK intended outgroups, and C-LecRK outgroups from tomato, Arabidopsis, and columbine. Bootstrap support from 1000 replicates is shown above nodes. Brackets on the right indicate the major clades as defined in the text along with the two intended outgroup clades. (PDF 19 kb)
Alignment of predicted amino acid sequences of Arabidopsis G-LecRKs kinase domains with the L-LecRK-VI.2 using ClustalW. Lines on top of the alignment show subdomains I and II, ATP binding site (GxGxxGxV) and subdomain VI, the serine/threonine kinase active site (HRDLKxxN). (PDF 3979 kb)
Maximum likelihood tree of amino acid sequences from G-LecRK, L-LecRK outgroups, and C-LecRK outgroups from Arabidopsis. Bootstrap support from 1000 replicates is shown above nodes. (PDF 11 kb)
Genetic map of the Arabidopsis G-LecRKs. Arrangement of G-LecRKs on the five Arabidopsis chromosomes. Figure was prepared using Chromosome Map Tool in TAIR. Locus At1g11305 was added manually. (TIFF 337 kb)
Subcellular localization of the Arabidopsis G-LecRKs. Localization was predicted using SUBA, TargetP and CELLO software programs/tools. (PDF 55 kb)
Alignment of predicted amino acid sequences of tomato G-LecRKs kinase domains with Solyc03g006720 using ClustalW. Lines on top of alignment show subdomains I and II, ATP binding site (GxGxxGxV) and subdomain VI, the serine/threonine kinase active site (HRDLKxxN). (PDF 7110 kb)
Maximum likelihood tree of amino acid sequences from G-LecRK, L-LecRK outgroups, and C-LecRK outgroups from tomato. Bootstrap support from 1000 replicates is shown above nodes. (PDF 7 kb)
Genetic map of the tomato G-LecRKs. Arrangement of G-LecRKs on the 12 tomato chromosomes. Figure was prepared manually using map viewer in NCBI. (TIFF 309 kb)
Subcellular localization of the tomato G-LecRKs. Localization was predicted using TargetP and CELLO software programs/tools. (PDF 56 kb)
About this article
Cite this article
Teixeira, M.A., Rajewski, A., He, J. et al. Classification and phylogenetic analyses of the Arabidopsis and tomato G-type lectin receptor kinases. BMC Genomics 19, 239 (2018). https://doi.org/10.1186/s12864-018-4606-0
- Solanum lycopersicum
- Arabidopsis thaliana
- Lectin receptor
- Receptor kinase
- Aquilegia coerulea
- Phylogenetic analysis