Comprehensive analysis and identification of drought-responsive candidate NAC genes in three semi-arid tropics (SAT) legume crops

Background Chickpea, pigeonpea, and groundnut are the primary legume crops of semi-arid tropics (SAT) and their global productivity is severely affected by drought stress. The plant-specific NAC (NAM - no apical meristem, ATAF - Arabidopsis transcription activation factor, and CUC - cup-shaped cotyledon) transcription factor family is known to be involved in majority of abiotic stresses, especially in the drought stress tolerance mechanism. Despite the knowledge available regarding NAC function, not much information is available on NAC genes in SAT legume crops. Results In this study, genome-wide NAC proteins – 72, 96, and 166 have been identified from the genomes of chickpea, pigeonpea, and groundnut, respectively, and later grouped into 10 clusters in chickpea and pigeonpea, while 12 clusters in groundnut. Phylogeny with well-known stress-responsive NACs in Arabidopsis thaliana, Oryza sativa (rice), Medicago truncatula, and Glycine max (soybean) enabled prediction of putative stress-responsive NACs in chickpea (22), pigeonpea (31), and groundnut (33). Transcriptome data revealed putative stress-responsive NACs at various developmental stages that showed differential expression patterns in the different tissues studied. Quantitative real-time PCR (qRT-PCR) was performed to validate the expression patterns of selected stress-responsive, Ca_NAC (Cicer arietinum - 14), Cc_NAC (Cajanus cajan - 15), and Ah_NAC (Arachis hypogaea - 14) genes using drought-stressed and well-watered root tissues from two contrasting drought-responsive genotypes of each of the three legumes. Based on expression analysis, Ca_06899, Ca_18090, Ca_22941, Ca_04337, Ca_04069, Ca_04233, Ca_12660, Ca_16379, Ca_16946, and Ca_21186; Cc_26125, Cc_43030, Cc_43785, Cc_43786, Cc_22429, and Cc_22430; Ah_ann1.G1V3KR.2, Ah_ann1.MI72XM.2, Ah_ann1.V0X4SV.1, Ah_ann1.FU1JML.2, and Ah_ann1.8AKD3R.1 were identified as potential drought stress-responsive candidate genes. Conclusion As NAC genes are known to play role in several physiological and biological activities, a more comprehensive study on genome-wide identification and expression analyses of the NAC proteins have been carried out in chickpea, pigeonpea and groundnut. We have identified a total of 21 potential drought-responsive NAC genes in these legumes. These genes displayed correlation between gene expression, transcriptional regulation, and better tolerance against drought. The identified candidate genes, after validation, may serve as a useful resource for molecular breeding for drought tolerance in the SAT legume crops. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07602-5.

(Continued from previous page) Conclusion: As NAC genes are known to play role in several physiological and biological activities, a more comprehensive study on genome-wide identification and expression analyses of the NAC proteins have been carried out in chickpea, pigeonpea and groundnut. We have identified a total of 21 potential drought-responsive NAC genes in these legumes. These genes displayed correlation between gene expression, transcriptional regulation, and better tolerance against drought. The identified candidate genes, after validation, may serve as a useful resource for molecular breeding for drought tolerance in the SAT legume crops.
Keywords: Chickpea, cis-acting regulatory elements (CARE), Drought tolerance, Groundnut, Legumes, NACs, Phylogenetics, Pigeonpea Background Leguminosae, the legume family, is the third-largest family of angiosperms, which is constituted of 800 genera and 20,000 species [1]. Many grain legume crops providẽ 20-40% of dietary proteins to the world [2]. Among grain legumes, chickpea (Cicer arietinum), pigeonpea (Cajanus cajan), and peanut or groundnut (Arachis hypogaea) are the important food legumes grown predominantly by resource-poor farmers in the semi-arid tropic (SAT) regions of the world. Chickpea, a diploid legume crop species (2n = 2x = 16; genome size of 738.09 Mb), is the second most extensively grown legume with an annual production of~17. 19 Mt [3] globally after soybean and provides a rich source of proteins, carbohydrates, vitamins, and minerals for human consumption [4,5]. Pigeonpea (2n = 2x = 22; genome size of~833 Mb), is another major legume food crop grown on approximately 5 million hectares (ha) with a production of~5.96 Mt annually [3], and is the sixth most important food legume globally. In the developing world, pigeonpea is the primary source of protein to more than a billion people and is the means of sustenance for millions of underprivileged farmers in Asia, Africa, South America, Central America, and the Caribbean [6,7]. Groundnut, on the other hand, is one of the leading legumes and oilseed crops with high protein content. It is grown widely in the tropics and subtropics with an annual production of~45.95 Mt [3]. Cultivated groundnut (A. hypogaea L.) is an allotetraploid (AABB; 2n = 4x = 40;~2.7 Gb genome size), having genome from its diploids ancestors A. duranensis (AA) and A. ipaensis (BB) [8,9]. The growth and productivity of these legumes are hugely affected by different biotic and abiotic stresses, which have emphasized the necessity of developing stress tolerant legume cultivars. In the case of chickpea, drought is one of the major constraints which limit crop production [10]. Despite pigeonpea being drought-tolerant and hardy, the crop has limitations under drought stress conditions which lead to yield stagnation. Similarly, groundnut is an oleaginous crop with broad adaptation to tropical and semi-arid climates. However, yield is often compromised when the crop faces water irregularities during the reproductive phase. Furthermore, ominous climate change characterized by enhanced prevalence and severity of drought has spotlighted the adverse impact on plant productivity [11]. Thus, an in-depth understanding of the underlying mechanisms of drought stress tolerance is required to improve the yield potential of these crops.
Over the years, extensive research has been carried out to discover and characterize genes and molecular mechanisms controlling drought responses in both model plants and crops that cope with drought stress conditions [12]. Several transcription factors (TFs) and their DNA binding sites (cis-acting regulatory elements), act as molecular switches for stress-responsive altered gene expression, allowing plants to better adapt under adverse conditions [13]. Legumes vary in their response/sensitivity to drought stress. Considering the nutritional and economic benefits, it is important to study the mechanism of drought tolerance in legumes and identify drought-associated genes in these SAT legume crops.
The plant-specific NAC (NAMno apical meristem, ATAF -Arabidopsis transcription activation factor, and CUCcup-shaped cotyledon) family genes are TFs that constitute one of the largest of plant-specific TF families characterized by a highly conserved NAC domain comprising of approximately 160 amino acid residues at the N-terminus and is further classified into five subdomains assigned A-E [14]. The N-terminal regions of NAC TFs consist of large number of positively and negatively charged amino acid residues. Sub-domains C and D are rich in basic amino acids and exhibit positive charge. Sub-domains A, C and D are highly conserved domains and are involved in DNA binding attribute. The C-terminus of NAC proteins are variable and can act as either a transcriptional activator or a repressor [15]. The C-terminus region of NAC TFs can also influence oligomerization feature. The NAC TF family was first discovered in Petunia more than 22 years ago [16], since then a number of studies have documented the role of NAC genes in a variety of biological processes. For instance, NACs play an important role in lateral root formation [17], seed development [18], leaf senescence [19], stress-inducible flowering induction [20], regulation of secondary cell wall synthesis, cell division [21], plant biotic [22] and abiotic stress responses [23], etc. Furthermore, it was reported that NACs play a significant role in drought stress response and tolerance [24]. Overexpression of OsSNAC1, a rice NAC TF, has shown improvement of salt and drought tolerance in wheat cultivars [25]. Similarly, OsNAC14 caused increased drought resistance in transgenic rice plants by repairing the damaged DNA and defense mechanism [26]. Furthermore, OsSND2 is known to regulate SCW biosynthesis in rice [27]; ONAC020, ONAC026, and ONAC023 genes are involved in seed development [28]; OsY37 (Oryza sativa Yellow37/ONAC011) is known to be involved in promoting senescence [29]. TaNAC29, a wheat NAC TF, caused improved tolerance against salt and drought [30], while TaNAC47 displayed enhanced resistance towards PEG, salinity, and freezing stresses in transgenic Arabidopsis plants [31]. GmNAC109, a soybean NAC TF, accelerated the formation of lateral roots in transgenic Arabidopsis plants [32]. According to plant transcription factor database (Plant TFDBV4.0) [33], most number of NAC genes reported in plant species are: 138 NAC genes in Arabidopsis (Arabidopsis thaliana), 158 in rice (Oryza sativa), 189 in maize (Zea mays), 165 in foxtail millet (Setaria italica L.), 269 in soybean (Glycine max), 411 in rapeseed (Brassica napus), 289 in poplar (Populus trichocarpa), 350 in camelina (Camelina sativa), and 200 in eucalyptus (Eucalyptus grandis), till now.
In this context, the available draft genome sequences of chickpea [5], pigeonpea [7], and groundnut [8] are important resources and provide an excellent opportunity for a comparative genome survey of novel TFs. Towards this direction, in the present study, comprehensive genome-wide analysis has been performed to identify NAC domain TFs in three SAT legume crops viz., chickpea, pigeonpea, and groundnut. Detailed analyses on their genomic distribution, gene structure, regulatory elements, protein-protein interactions, conserved motifs, and expression patterns under various developmental stages were conducted. As a result, a total of ten, six, and five potential drought-responsive candidate NAC genes were identified in chickpea, pigeonpea and groundnut, respectively. The identified candidate genes serve as valuable resources in the legume breeding program targeting better drought-stress adaptation in these three legume crops.
A total of 62 out of the 72 (86%) identified NAC genes were distributed across eight chromosomes (Ch01-Ch08) in chickpea (Fig. 1a), 49/96 (51%) were distributed among 11 chromosomes (Ch01-Ch11) in pigeonpea (Fig.  1b), whereas 166/166 (100%) of the NAC genes were located across all 20 chromosomes in groundnut (Fig. 1c). However, 10 genes in chickpea and 47 genes in pigeonpea were anchored on unmapped scaffolds. In chickpea, the minimum number of NAC genes were found distributed on Ch07 (3); whereas the maximum number of NAC genes (14) were identified on Ch06, followed by Ch01 with 12 NAC genes. Similarly, in pigeonpea, only two NAC genes (Cc_06648, and Cc_07217) were identified on Ch02 and the maximum number of genes (10) was identified on Ch11. However, in the case of groundnut, 17 NAC genes are distributed on Ch13 and 15 genes each on Ch18 and Ch03, followed by 12 genes each on Ch05, Ch07 and Ch08. Two closely-related legumes, Medicago and soybean were used to identify orthologs of NAC proteins of chickpea, pigeonpea, and groundnut. Chickpea and groundnut share the maximum orthologs with Medicago, whereas pigeonpea with soybean ( Fig. 2a, b, c) using parameters mentioned in methodology.

Phylogenetic relationships and identification of putative stress-responsive NAC genes
To discover phylogenetic relationships between NAC proteins/genes in chickpea, pigeonpea, and groundnut, an unrooted phylogenetic tree with full NAC protein sequences was constructed. Neighbor-Joining (NJ) method was used with bootstrap values (1000 replicates) (Fig. 3a, b, c). A total of 72, 96, and 166 protein sequences were used. Based on phylogenetic analysis, the Ca_NACs and Cc_NACs were classified into 10 major groups (Fig. 3a, b) and Ah_NACs were classified into 12 broad groups (Fig. 3c). In chickpea, Group IV is the largest clade, with 18 proteins and accounts for 25% of all NAC proteins, followed by group VIII, which has 15 proteins (20.83%). Group VII is the smallest and has only one NAC protein (Ca_04337). Groups I and VI contain nine; groups II, and III include five; and groups V, and X have three proteins each. Additionally, groups IV and VIII both contain two subgroups. Likewise, in pigeonpea, group IV is the largest with 25 proteins (26%), followed by group III with 22 proteins (22.9%) which also has different subgroups. Further, groups II and X have six proteins each. In groundnut, among twelve major groups, group VIII is the largest (21.7%) with 36 proteins and group VII contains 22 proteins (13.25%), whereas group I has only two proteins (Ah_ann1.FD63AG.1 and Ah_ann1.7J37F0.1). In addition to this, Groups VI, VII, and VIII also contain two major subgroups (Fig. 3c).
For the prediction of putative stress-responsive NAC genes, a phylogenetic analysis involving complete protein sequences of all identified NAC genes from the three legumes studied and most well-known stress-responsive     Fig. S1, S2, and S3), using known stress-responsive NAC genes from Arabidopsis (19 ANACs), Oryza (7 ONACs), Medicago (9 MtNACs) and Glycine max (8 GmNACs). A complete list of the putative stress-responsive genes identified for all the three legumes along with their description is given in Table 2. In addition to this, details regarding the known stress-responsive NACs from the model and crop plants included in the analysis are also provided in Additional file 1: Table S2. As the primary purpose of this study is to discover and explore the potential stress-responsive candidate NAC genes in the three legume crops, further downstream analysis was performed on the putative stress-responsive NAC genes.
Conserved motifs and gene structure analysis of putative stress-responsive NACs All the NAC genes shared highly conserved DNA binding NAC domain consisting of five sub-domains (A-E) at the N-termini, and a variable C-terminal transcriptional regulation domain/region (TRR). Conservation of amino acid residues in NAC sub-domain (A-E) across these legumes is shown in Fig. 5a, b, c. Thus, the conserved motifs of NAC proteins were analyzed from the three SAT legumes using the MEME program (Additional file 1: Table S3). A total of 1 to 20 motifs were identified. In addition, a detailed analysis of conserved motifs in each of the legume crop was studied (Additional file 1: Table S4). The NAC protein motifs distribution analysis revealed that 38/72 (52.8%) chickpea NAC proteins contain all five sub-domains, domains A, B, C, D and E (Additional file 1: Table S4). Interestingly, for the predicted stress-responsive genes/proteins identified, 16/22 (72.7%) contain all five NAC sub-domains A, B, C, D, E (Table 2). However, the number of motifs observed ranged from five to eight for stress-related chickpea NAC proteins. The genes Ca_08372, Ca_04187, Ca_ 09673, Ca_12660, Ca_04233, and Ca_16946 contain the highest number of motifs (8) among the stress-related Fig. 4 Phylogenetic relationship of putative stress-responsive NAC genes of chickpea (22), pigeonpea (31), and groundnut (33) with well-known stress-responsive NAC genes (43) from Arabidopsis thaliana, Oryza sativa, Medicago truncatula and Glycine max using MEGA7.0. The bar indicates the relative divergence of the sequences examined. Stress-responsiveness of each NAC gene from model crops species is shown next to its name in parentheses. Ddehydration/drought; S-salt stress; C-cold stress; H-heat stress; ABAabscisic acid; JA-jasmonic acid; SA-salicylic acid; MMS-methyl methane sulfonate Table 2 Identified stress-responsive NAC genes/proteins from phylogenetic analysis with known NAC genes (stress-responsive) from model crop species using MEGA 7.0 along with their description and distribution of conserved motifs domains in chickpea, pigeonpea, and groundnut using MEME standalone version 5 Table  S4). Among the stress-responsive proteins, 20/31 (64.5%) NAC proteins have all five motifs (A, B, C, D, and E) ( Table 2). The number of motifs identified in stress-responsive proteins ranged between two to nine in pigeonpea. Sixteen proteins (51.6%) contain seven motifs, 22.5% contain six motifs and 16% have five motifs. Cc_01567 has the highest number of motifs observed (9) and Cc_48539 has the least number of motifs identified (2). Six proteins lack one NAC sub-domain (A/C/E); four proteins lack two sub-domains (A and B or C and E); and only one protein,  Table 2). For gene structure analysis of putative stressresponsive NAC genes in selected legume crops, the exon/intron organization of individual NAC genes was analyzed in the coding sequences of chickpea, pigeonpea and groundnut using GSDS 2.0 ( Fig. 6a, b, c). Gene structure prediction revealed that the number of introns ranges from one (Ca_04233) to six (Ca_07077) in chickpea, zero (Cc_48539) to six (Ca_22429) in pigeonpea, and one to three in the groundnut NAC gene family.

Promoter analysis of putative stress-responsive NACs
The promoter regions of NAC genes (1500-bp sequences upstream of the translational start site) were examined using the PlantCARE database to investigate transcriptional regulation and the probable functions of these putative stress-responsive NACs in chickpea, pigeonpea, and groundnut. Several cis-acting regulatory elements (CAREs) involved in response to drought, light, wound, developmental processes, biotic stress, tissuespecific, hormones, and other functions were discovered in the promoter regions of these NAC genes (Additional file 1: Table S5). Promoters of essential elements, such as a TATA box and a CAAT box, were predicted among all the three legumes. Of these CAREs, several regulatory elements related to tissue-specific expression, such as root-specific expression (AS1), meristem expression (CAT-box), vascular-specific expression (AC-I and AC-II motifs), and F-box (plant vegetative and reproduction growth and development; cell death and defense); and light-responsive were found widely distributed among chickpea, pigeonpea, and groundnut NAC gene promoters. Numerous CAREs involved in plant hormones, such as gibberellin-responsive elements, ABA-responsive elements (ABREa possible ABA-dependent regulation  for abiotic stress), an ethylene-responsive element (ERE), auxin-responsive elements, MeJA-responsive elements, and salicylic acid-responsive elements (TCA) were also identified. In particular, several stress-responsive CAREs important in abiotic stress, including drought-responsive elements (MYB, MBS-MYB binding site, MYC), stressresponsive elements (STRE), dehydration-responsive elements (C repeat/DRE), and low-temperature elements (LTR) were detected. Some CAREs which function in biotic stress, including wound-responsive elements (WRE3, and WUN motif), defense-and stress-response (TC-rich repeats), and elicitor-responsiveness (W box) were also identified. In addition, promoters having zein metabolism regulation elements (O 2 -site), anaerobic induction elements (ARE element), and APETALA1 (AP1) for inducible-flowering, were observed. The above results indicate that these NAC genes might respond to abiotic stresses and have potential roles in enhancing abiotic stress tolerance. In chickpea, thirteen NAC genes namely, Ca_04233, Ca_16946, Ca_ 16379, Ca_18090, Ca_05696, Ca_22941, Ca_05696, Ca_ 08693, Ca_20988, Ca_07077, Ca_09673, Ca_05989 and Ca_04337 were found to have drought-responsive elements (DRE core/MYB) (Additional file 1: Table S5). The genes, Ca_16946, Ca_02365, Ca_07077, Ca_04187 and Ca_04337 were identified as having STRE; Ca_ 04233, Ca_16946, Ca_07077, Ca_08372, Ca_05989 and Ca_05227 contain ABRE; Ca_04187 and Ca_04337 have LTR; Ca_27204, Ca_09673 and Ca_20988 (TC-rich repeats) have cis-regulatory element for defense and stress-responsiveness; Ca_22941 and Ca_20988 contain AE; Ca_01414, Ca_07077, and Ca_04337 had elicitor responsiveness and disease resistance element (W box); Ca_20988 contains wound-responsive element (WRE3). Furthermore, Ca_07077 had five types of abiotic stressresponsive CAREs viz., DRE core, STRE, MYB, ABRE, and W box. Ca_04337 contains W box, STRE, MYB, LTR, and MYC types of cis-elements. In general, almost all the putative Ca_NACs contain at least two or more different types of stress-responsive CAREs. Some tissuespecific CAREs, such as AS1 were identified in Ca_ 16946, Ca_22941, Ca_07077, Ca_08372, Ca_04187, Ca_ 09673, and Ca_05227 NAC genes; AC-I (vascular-specific expression) was detected in Ca_07077; AP1 (flowering inducible) was found in Ca_08372.
Among pigeonpea NAC genes, nine genes had MYB binding site, eight had ABRE, nine had ARE, 10 had STRE, five had W box, three had WUN motif/WRE3, and two had LTR (Additional file 1: Table S5). For instance, Cc_26125, Cc_41044 and Cc_01567 genes had up to five different types of abiotic stress-related CAREs. Furthermore, STRE, ARE, MBS, W box, and LTR were found in Cc_26125. Similarly, Cc_41044 contains DRE core, ARE, STRE, MBS/MYB and ABRE; while W box, Fig. 6 Representation of exon/intron structures of putatively predicted stress-associated NAC genes from (a) chickpea (b) pigeonpea (c) groundnut using GSDS 2.0 (Gene Structure Display Server). Exons and introns are represented by colored boxes and black lines, respectively. The sizes of exons and introns can be estimated using the scale below ARE, MYB, MBS, and MYC CAREs were predicted in Cc_ 01567. In addition, some of these genes have three types of stress-associated CAREs. Cc_04140 had W box, STRE, and ABRE; Cc_22489 and Cc_15921 had ARE, LTR, and ABRE abiotic stress-associated motifs. Moreover, 12 of the 31 genes were predicted to have two types of stressassociated cis-elements. Cc_22870 and Cc_38151 had WRE3 and ABRE; Cc_22430 possessed STRE and W box; Cc_17157 and Cc_20225 contained STRE and ARE; Cc_ 29871 and Cc_48539 had ABRE and STRE; and contain WRE3 and ABRE; Cc_23518 contains DRE core and MYB; Cc_42082 and Cc_37971 contained ARE and MYB recognition site; Cc_04140 had WUN-motif and MYB/ MBS/MYC; and Cc_26764 had STRE and MBS/MYB abiotic stress-associated CAREs. With regard to tissuespecific expression, AS1 element (root-specific expression) was noted in Cc_22870, Cc_29871, Cc_01567, Cc_04140 and Cc_48539; CAT-box (meristem-specific expression) was reported in Cc_23518, Cc_04140, Cc_48539, Cc_ 26125, and Cc_29871 genes; and AC-II, AC-I (vascular expression) were reported in Cc_04140.

Validation of predicted stress-responsive NAC genes under induced drought treatment
To assess the potential and response of these stressresponsive NACs under drought stress (PEG 8000 exposure), two contrasting genotypes for each crop were selected and analyzed the expression patterns of these genes in root tissues using quantitative real time PCR (qRT-PCR). Chickpea genotypes -ICC 4958 (tolerant) and ICC 1882 (sensitive); pigeonpea genotypes -ICPL 227 (tolerant) and ICPL 151 (sensitive); and groundnut genotypes -CSMG 84-1 (tolerant) and ICGS 76 (sensitive) were selected for validation of expression profiles of identified candidate NACs. Details of the selected primer pairs are provided in Additional file 1: Table S9. These results indicated that majority of these genes (12 of 15) showed up-regulation with a maximum fold change of 4.3 (Ca_18090) in tolerant genotype, ICC 4958, while only two genes, Ca_07077 (0.87 folds) and Ca_05989 (0.4 folds) showed downregulation with respect to their controls, under drought stress in chickpea. However, a few genes viz., Ca_05989 (1.19 folds), Ca_04337 (1.16 folds), Ca_ 04069 (1.29 folds), Ca_04187 (1.65 folds) and Ca_ 27204 (1.27 folds) were found slightly up-regulated in susceptible genotype (ICC 1882), though the expression was not higher than the tolerant genotype (ICC 4958) except for Ca_04187 gene which showed higher expression than the tolerant one (Fig. 8a). Further, for pigeonpea, all the selected 15 NAC genes examined were found to be up-regulated with a maximum of 3.1 folds (Cc_15921) in the case of ICPL 151, except Cc_22489 gene which was down-regulated in both the genotypes under drought stress (Fig. 8b). However, the genes Cc_26125, Cc_43030, Cc_43785, Cc_43786, Cc_22429, and Cc_22430 were found up-regulated for ICPL 227 (more drought-tolerant) genotype with a maximum of 10 folds up-regulation (Cc_43030). In the case of groundnut, the relative fold change expression was up-regulated in 10 genes for CSMG 84-1 genotype, while 12 genes displayed up-regulation in ICGS 76 genotype with respect to their control (Fig.  8c)

Discussion
Chickpea, pigeonpea, and groundnut are important food legumes, particularly in SAT regions. The seeds of these legumes are an essential food source, while the crop plants also contribute to the fertility of the soil. Furthermore, genome sequences have been available for several food legumes, including pigeonpea [7], chickpea [5], mung bean [38], common bean [39], adzuki bean [40], and groundnut [8,9]. The genome sequence of chickpea was from CDC Frontier, a Canadian 'kabuli' variety [5] and 'desi' ICC 4958 cultivar [41], pigeonpea genome was from the genotype ICPL 87119, popularly known as Asha [7], and groundnut genome from A. hypogaea cv Tifrunner (CV-93, PI 644011), a runner-type groundnut habituated to the southeast of the United States of America [8]. The progress in genome sequencing has provided valuable genomic resources for comparative genomic analyses in these sequenced food legume crops [42]. Being one of the largest among plant-specific TFs, the NAC protein family has a role in plant development, abiotic stress and defense responses. In many plant species NAC proteins have been functionally characterized, including those of Arabidopsis thaliana, Oryza sativa, Zea mays, Triticum aestivum, Glycine max, Populus trichocarpa and other plants [15,[43][44][45]. However, the functions for majority of the NAC genes in legume crops remain unknown. In the present study, genomewide identification of NAC domain TFs has been performed to identify and characterize drought-responsive NAC proteins encoded in chickpea, pigeonpea, and groundnut genome.
Similar studies were conducted in other plant species, for example, 163 NAC genes in Populus [44], 140 in Oryza [43], 105 in Arabidopsis [15], and 101 in Glycine max [45] were analyzed. In this study, a total of 72, 96 and 166 non-redundant NAC genes were analyzed from chickpea, pigeonpea and groundnut, respectively. The number of NAC genes identified in chickpea and pigeonpea is low when compared to those assessed for other plant species such as, Arabidopsis, rice, maize and soybean. Typically, chickpea (~738.09 Mb) and pigeonpea genomes (~833 Mb) are much larger than that of Arabidopsis (125 Mb), and rice (480 Mb), indicating that the number of NAC genes is not directly correlated with genome size.
The identified NAC genes varied greatly in protein length, from 56 AA to 740 AA residues in these three legumes. Average length of the identified NAC proteins was 321.3 AA in chickpea, 330.9 AA in pigeonpea, 321.2 AA in groundnut. As mentioned above, NAC domain is approximately 160 amino acid in length, despite these few small NAC TFs/genes were found such as, Ca_ 15515 (106 AA), Cc_48539 (56 AA), Ah_ann1.8AKD3R.1 (62 AA), those still encodes NAC domain in the three legumes. Furthermore, Ca_15515 consists of only A subdomain, while Ah_ann1.8AKD3R.1 and Cc_48539 genes contain A and B sub-domains. Similarly, NAC protein with 106 amino acid residues has been reported by in case of CaNAC2 gene in Capsicum [46]. Mohanta et al. [47] reported the smallest NAC TF of only 25 amino acids in Fragaria × ananassa (strawberry) that codes NAC domain.
Subcellular localization prediction revealed 62/96 of identified NAC genes in pigeonpea, 53/72 genes in chickpea, and 96/166 genes in groundnut were potentially located in the nucleus. Rest of the NAC genes were localized extracellularly or secreted in these legumes. Transcription factors need to be localized to nucleus to execute their function, either independently or by interacting with other partners. For instance, ATAF1 is localized to nucleus [48], whereas ONAC020 and ONAC023 completely gets localize to nucleus after interacting with ONAC26 [28]. Similarly, NTL4 is targeted to the nucleus only upon heat stress after processing [49]. There are numerous reports which shows the localization of NAC genes in different organelles other than nucleus, as in case of MaNAC6 which gets localized to the cell membrane, cytoplasm and nucleus [50], and ONAC023 is localized to the cytoplasm [28]. Sometimes, TFs gets localized to nucleus after splicing of membrane bound TFs or upon proteolytic cleavage [51,52].
Ten major groups for legumeschickpea (Fig. 3a), pigeonpea (Fig. 3b) and 12 groups for groundnut NACs have been identified (Fig. 3c) based on phylogenetic tree analysis. Earlier studies reported 12 groups in chickpea using 71 NAC proteins [53] and seven major groups in pigeonpea using 88 NAC proteins [54]. Similarly, eight major groups were reported in common bean NACs [55], 15 groups in tartary buckwheat NACs, 16 groups in cassava NACs, and 14 in pepper NACs, 12 NAC groups in broomcorn millet [56]. Furthermore, a comprehensive study of 11 different species with a total of 1232 NAC proteins classified them into eight subfamilies [57]. By analyzing gene structures of the putative stress-responsive NAC proteins, it was observed that predicted NAC genes contained one to six introns in chickpea, zero to six in pigeonpea, and one to three in groundnut. Similarly, introns of common bean NAC genes ranged from one to five [55] and Glycine max NAC genes from one to seven [58]. However, in rice, poplar and cotton, NAC genes introns vary from 0 to 16 [59], 0-8 [44] and 0-9 [60], respectively. Interestingly, the majority of the chickpea (13), pigeonpea (15), and groundnut (18), stress-responsive NACs have two introns. In general, highly similar NAC gene structures were clustered in the same group of their respective phylogenetic trees. However, the distribution of the conserved motif in NAC genes of SAT legumes was similar to that of other species, including common bean, rice, soybean, and Arabidopsis. The sub-domain A has a role in dimer formation, while sub-domain D contains the nuclear localization signal. The most conserved subdomains are C and D and positively charged, whereas the relatively divergent sub-domains are B and E which may be contributing to functional diversity along with the C-terminal domains of NAC proteins [14]. Among the stress-responsive NACs, the highly conserved subdomain observed is E, whereas sub-domain A is relatively highly conserved in chickpea. In pigeonpea subdomain E is least conserved, while sub-domain A and B are most conserved. Interestingly, 73, 64.5, and 75.8% of stress-responsive NACs had all five sub-domains in chickpea, pigeonpea, and groundnut. Despite that, the diversity of gene structures and conserved motifs also implies that these legume NAC proteins are functionally diverged, having roles in meristem development, root development, flowering-inducible, embryo development, vascular-specific expression, hormone signaling, abiotic stresses and defense responses.
To identify putative abiotic stress-responsive NAC genes/proteins in the selected legume crops, we proceeded with the fact that similar protein sequences have similar functions [61]. Thus, the predicted abiotic stressresponsive NAC genes were identified and the functions were analyzed based on the phylogenetic analysis. For this, a total of 107, 139, and 209 NAC protein sequences were used for chickpea, pigeonpea, and groundnut, respectively, including 43 well-known stress-responsive NACs from model and crop plants (Arabidopsis thaliana, Oryza sativa, Medicago truncatula and Glycine max). As most of the ANACs [62][63][64], ONACs [43,65,66], MtNACs [67], and GmNACs [58] included in the phylogenetic analysis have known functions in stress responses, 22, 31, and 33 abiotic stress-responsive NAC genes in chickpea, pigeonpea, and groundnut, respectively, were identified in the present study. There could be more stress-responsive NACs, dispersed on different branches if more stress-responsive NAC proteins from model plants and crops (ANAC, ONAC, MtNAC, and GmNAC) were usedas demonstrated in soybean [61].
Considering this tree-based approach, the possible role of Ca_16946 and Cc_38151 in cold-and drought stress response has been predicted as they are clustered into one subgroup with Medtr8g094580.1 [67]. Similarly, Gm_NAC066, Ca_12660, and Cc_01304 clustered into the same subgroup; Gm_NAC065 and Cc_29871 in one group; Cc_48539 and LOC_Os03g60080.1 in one subclass indicate that these genes may be involved in drought stress response [58]. Some genes may have a role in response to cold stress, like Medtr8g059170.1 and Ca_21186. Genes Ca_18090 and Medtr5g041940.1 contribute to many stresses such as cold, drought, salicylic acid and ABA induced abiotic stress response. Genes, such as Ca_05696, Cc_30687, and Medtr8g099750.1; and Cc_04140, Ca_04187, Ah_ann1.G1V3KR.2, Ah_ ann1.MI72XM.2, and Medtr3g096140.2 are closely related and have been supposed to be involved in salt and drought stress [67]. Furthermore, LOC_Os01g15640.1 and Ah_ann1.UEI6NJ.1 have been reported to be involved in multiple abiotic stresses, such as drought, cold, salinity, and heat [43,65]. Detailed characterization of the gene composition in legumes using comparative genomics is feasible for deriving functional insights of key candidate genes.
Cis-acting regulatory elements (CAREs) are among the most critical gene structures, which determine the transcriptional initiation and consist of short conserved motifs (5 to 20 nucleotides) found in the upstream of the transcriptional start codon [68]. In this study, 13, 9, and 17 drought stress-responsive CAREs were identified in chickpea, pigeonpea, and groundnut, respectively. Another important CAREs detected is Abscisic acid Response Element (ABRE) for abiotic stress regulation which was 06 in chickpea, 08 in pigeonpea, and 05 in groundnut. Also 5, 10, and 10 stress-responsive elements (STRE) in chickpea, pigeonpea and groundnut, respectively, were observed. Besides these, several other promoter elements were also identified which have a role in various plant development and stress response. CAREs are necessary for stress-responsive transcriptional regulation [69]. The existence of different cis-regulatory elements indicates the transcription of several stressresponsive genes via a variety of TFs. Moreover, the significance of the association between CAREs has already been documented for stress-responsive transcription [70]. Hence, the availability of diverse stress-associated elements in the putative stress-responsive NACs deduced from phylogeny might have a role in conferring drought stress tolerance in these legume crops. In several reports, various cis-motifs as DNA-binding sites for the NAC TFs have been identified, which include NACR S (NAC-recognition sequence for drought response) [62], IDE2 motif (iron deficiency-responsive) [71], SNBE (secondary wall NAC binding element) [21], and calmodulin-binding (CBNAC) [72]. As NAC TFs are multiple functional proteins, they can use their DNA binding NAC domains for mediating protein-protein interactions as well [73]. This study showed strong protein-protein interactions between Cc_42082, CZF1, SZF1 (salt-stress response), BZIP60 (ER-stress response), and NTL (protein transporter activity); Cc_01567, Ca_ 02365, Ah_ann1.V0X4SV.1, VND7, MYB46, and MYB83 (regulation of secondary wall biosynthesis).
Furthermore, candidate NAC genes were identified, especially the drought-related NACs. Transcript abundance analysis for particular NAC genes (15 from each legume) was performed upon drought exposure in root tissues. For expression analysis, genes were selected based on their expression patterns in root tissues from different developmental stages of the plant available from gene expression atlases of chickpea [35], pigeonpea [36] and groundnut [37]. Real-time qRT PCR-based gene expression was also analyzed among drought-tolerant and sensitive genotypes. In chickpea, drought-tolerant genotype (ICC 4958) exhibited higher transcript levels when compared to the sensitive genotype (ICC 1882). However, Ca_04337, Ca_04187, Ca_04069 and Ca_ 27204 were found up-regulated irrespective of the drought sensitivity of the genotype, though the expression levels were observed as being lower (except Ca_ 04187) than the tolerant genotype under stressed conditions. In pigeonpea, seven genes (Cc_29871, Cc_26125, Cc_43030, Cc_43785, Cc_43786, Cc_22429, and Cc_ 22430) were found to be induced in both the genotypes ICPL 227 and ICPL 151, under drought stress. Interestingly, a total of 13 genes (out of the 15 examined) were found to be up-regulated in ICPL 151, less droughttolerant genotype against drought stress, and have greater expression levels than ICPL 227 for majority of the genesconfirming that pigeonpea is a relatively drought-tolerant crop. Similarly, in the case of groundnut, nine genes were up-regulated in both tolerant (CSMG 84-1) and susceptible (ICGS 76) genotypes. Comparing the expression of these genes (homologs) in crops, such as Arabidopsis and Oryza revealed their strong induction in several abiotic stresses including salinity, drought, cold, heat, and were mostly up-regulated during high drought, salinity and heat stresses (Additional file 1: Table S4). Moreover, experimental evidences showed that they are expressed in roots, rosette leaves, cauline leaves, shoot apex, stems and flowers [62,74]. However, Medtr8g094580.1 showed downregulation in response to drought stress in Medicago [67]. Similarly, lesser transcript level of Ca_16946 (Medtr8g094580) was observed in sensitive cultivar (ICC 1882) in drought response. Interestingly, Ca_04337, Ca_ 04069, Ca_27204, Cc_26125, Cc_22429, Cc_42082, and Cc_40311 are membrane-bound NAC proteins. These proteins are known to be primarily localized in plasma membrane/endoplasmic reticulum membrane in dormant form and processed into a transcriptionally active and nuclear form after proteolytic cleavage via regulated intramembrane proteolysis, upon specific stress [75,76].
Drought stress has altered the expression of many NAC genes in these legumes. Thus, based on droughtinduced expression of 15 genes examined in each of the three legumes, the possible role of 10 (Ca_06899, Ca_ 18090, Ca_22941, Ca_04337, Ca_04069, Ca_04233, Ca_ 12660, Ca_16379, Ca_16946, and Ca_21186), 06 (Cc_ 26125, Cc_43030, Cc_43785, Cc_43786, Cc_22429, and Cc_22430), and 05 (Ah_ann1.G1V3KR.2, Ah_ ann1.MI72XM.2, Ah_ann1.V0X4SV.1, Ah_ ann1.FU1JML.2, and Ah_ann1.8AKD3R.1) potential NAC genes in drought stress response of chickpea, pigeonpea, and groundnut, respectively, was confirmed. Better understanding of these NAC gene family and identifying the particular function of the specific NACs is most important in agriculture. A detailed regulatory mechanism of these potential stress-related individual NAC genes and their possible interactions provides an opportunity to understand the molecular basis of drought tolerance in these legume crops, that could allow improved varieties to be developed with ample accuracy. Therefore, these genes are valuable resources for further gene function validation and their subsequent use in genetic engineering and molecular breeding for addressing drought stress in legume crops.
It is crucial to mention here that the present study provides very detailed analysis of the identified NAC genes in pigeonpea and chickpea as compared to the previous reports by Satheesh et al. [54] and Ha et al. [53], respectively. We have carried out rigorous motif analysis, promoter analysis, and protein-protein interaction studies of putative stress-related NAC proteins which were lacked in previous reports and therefore, provides much deeper understanding of mechanisms involved in drought stress tolerance in chickpea. Similarly, the findings of Satheesh et al. [54] involved only in silico analysis, whereas NAC transcription factors have not been systematically researched in groundnut, till date. Hence, the present work generates large data sets that further can be used as base for more sophisticated and targeted studies in future. It managed to put attention on the importance of further understanding the potential of legume NAC genes (Ca_ NAC, Cc_NAC, and Ah_NAC), for the purpose of improving abiotic stress tolerance in general.

Conclusions
It is well known that NAC genes play important roles during developmental and abiotic stress responses. Though, few NAC genes have been identified in chickpea and pigeonpea that are involved in drought response. Therefore, to find such notable genes in chickpea, pigeonpea and groundnut was not the only aim of this study. It further aimed to obtain insight into the transcription patterns and putative functions of NAC genes in these legumes. Based on the genome sequence, we have comprehensively identified NAC genes in three SAT legumes viz., chickpea, pigeonpea, and groundnut. A non-redundant set of 72, 96, and 166 NAC genes were detected in chickpea, pigeonpea, and groundnut, respectively. Detailed analyses revealed phylogenetic association, conserved domains, gene structure, transmembrane helices, promoter analysis, gene interaction networks, and expression profiles of NAC genes among these three legumes. Based on data gathered during this investigation, we could identify 21 potential NAC genes for drought tolerance in legumes. This study has furthered our knowledge of legume NAC genes and provided insight into their functions. Furthermore, expression analyses for putative NAC genes during developmental stages and drought exposure confirmed our findings, and have built a robust framework for researchers to select candidates to engineer chickpea, pigeonpea, and groundnut cultivars for enhanced tolerance against drought stress.

Plant material and drought stress imposition
Two contrasting drought-responsive genotypes each for chickpea -ICC 4958 (tolerant) and ICC 1882 (sensitive), pigeonpea -ICPL 227 (more tolerant) and ICPL 151 (less tolerant), and groundnut -CSMG 84-1 (tolerant) and ICGS 76 (sensitive) were selected for the study. The seeds of selected cultivated genotypes (approx. 10-12 seeds) were procured from the Chickpea Breeding unit, Research Program -Asia of the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India. Seeds were thoroughly rinsed with distilled water and germinated on moist filter paper. Germinating seedlings were then transferred to pots filled with autoclaved soil under controlled glasshouse conditions after the emergence of radicle and cotyledonary leaves. Drought stress was imposed on 30day-old chickpea, pigeonpea, and groundnut plants using polyethylene glycol (PEG) induced treatment (20% PEG 8000). Root tissues were collected six days after PEG treatment and stored in − 80°C until RNA isolation.
Identification and data analyses of NAC family genes/ proteins in chickpea, pigeonpea, and groundnut Previously-identified NAC protein sequences from other plant species such as Arabidopsis, Medicago, Lotus, Glycine max, etc., were searched against the predicted gene models of three legumes (chickpea, pigeonpea, and groundnut) using blastp program at a cutoff threshold E-value of ≤1E-05. In addition to this, the HMM profile of the NAC family was extracted from the Pfam database [77], and NAC HMM profile was scanned against the predicted gene models of legumes under study for target hits with the NAC domain by HMMER v2.1.1 [78]. The identified proteins/genes were further confirmed for the presence of NAC domain using SMART and Pfam searches. The physio-chemical properties of the identified NAC proteins, such as the number of amino acids in the open reading frame (ORF), molecular weight (MW), isoelectric point (pI), and length of each gene was determined using ExPASy (http://www.expasy.ch/ tools/pi_tool.html). Softberry (http://linux1.softberry. com/) was used to predict subcellular localization of the identified NAC family proteins using ProtComp (Program for predicting protein sub-cellular location). Subcellular localization predictions were based on significant similarity in Potential Location database by DBSCAN (database homology search program similar to BLAST). MapChart 2.32 software (https://www.wur.nl/ en/show/Mapchart.htm) was used to represent the chromosomal distribution of these identified NAC genes.
Transmembrane domains prediction and orthologs distribution TMHMM v2.0 (http://www.cbs.dtu.dk/services/ TMHMM/) was used to determine the transmembrane helices. To identify orthologs, chickpea, pigeonpea, and groundnut NAC proteins were searched against the whole set of Medicago and soybean proteins using blastp program applying a threshold E-value of 1E− 10, 80% similarity, and 80% query coverage. Further, circos [34] was used to represent these orthologous relationships.
Phylogenetic analysis and identification of putative stress-responsive NAC genes MEGA (V7.0) software (http://www.megasoftware.net/) was used to perform phylogenetic relationships. Neighbor-Joining (NJ) method with bootstrap values more than 30% was used to construct unrooted phylogenetic tree/s. Further, for identifying putative stressresponsive NAC genes, a total of 43 abiotic stressresponsive NAC protein sequences (from Arabidopsis, Oryza sativa, Medicago truncatula and Glycine max) were included along with the NAC protein sequences of legume crops, and sequence alignments were performed.
Conserved motifs and gene structure analysis of putative stress-responsive NAC genes The motif prediction was done at different motif widths using MEME standalone version 5.0.2 [79]. MEME for conserved motifs with parameters like 20 number of motifs, 10-50 motif width, and 2-72 motif sites (2-96 for pigeonpea and 2-166 for groundnut), with E-value threshold of 0.05 were used. GSDS 2.0 (Gene Structure Display Server) was used to visualize the exon/intron organization of the putatively identified stress-responsive NAC genes [80].
Prediction of cis-acting regulatory elements (CAREs) in putative stress-responsive NACs The upstream promoter sequences of identified stressresponsive NAC genes (1500-bp sequences upstream of the translation initiation codon) were analyzed for the presence of putative CAREs using the PlantCARE (Plant Cis-Acting Regulatory Elements) database [81]. Cis-elements with matrix score above five were considered.

STRING analysis for protein-protein interaction studies
Web-based STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database version 11.0 (https://string-db.org/) was used to carry out proteinprotein network studies. Stress-related chickpea, pigeonpea, and groundnut NAC protein sequences were searched against Arabidopsis thaliana proteins for the identification of best corresponding hits. Thus, the resulting hits were used for protein-protein interaction analysis.

RNA isolation
Total RNA was isolated using the Nucleospin RNA plant kit (Macherey-Nagel, Duren, Germany) as per the manufacturer's instructions from root tissues collected from both stressed and well-watered plants of contrasting drought-responsive genotypes of the three legumes. First-strand cDNA synthesis was performed using SuperScript®III RT enzyme (Invitrogen, CA, USA).

Quantitative real-time PCR (qRT-PCR) analysis
Quantitative real-time PCR (qRT-PCR) was performed on Applied Biosystems 7500 Real Time PCR System using SYBR Green-chemistry (Applied Biosystems, USA). The glyceral-dehyde-3-phosphate dehydrogenase, actin, and alcohol dehydrogenase genes were used as an endogenous control for chickpea, pigeonpea, and groundnut, respectively. The reactions were performed with three biological and two technical replicates. 2 -△△CT method was used to calculate relative expression levels [82]. Specific primers for qRT-PCR were designed using PrimerQuest tool (https://eu.idtdna.com/scitools/ Applications/RealTimePCR/Default.aspx).
Additional file 1: Table S1 Putative NAC family genes identified in chickpea (72), pigeonpea (96) and groundnut (166).  Table  S6 Possible matches of chickpea, pigeonpea, and groundnut stressresponsive NAC protein sequences with Arabidopsis proteins. Table S7 STRING analysis for protein-protein interaction analysis of stressresponsive NACs from SAT legumes. Table S8 Corresponding transcript ids of groundnut with Arachis hypogaea gene expression atlas (AhGEA). Table S9 List of NAC-specific primers used for qRT-PCR analysis in selected legumes.
Additional file 2: Fig. S1 Prediction of stress-responsive Ca_NAC genes based on phylogenetic analysis using MEGA7.0. A total of 107 protein sequences were used which included 72 from chickpea and 43 well-known stress-responsive NAC genes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula and Glycine max. Bootstrap values are displayed next to the branch nodes. Fig. S2 Prediction of stress-responsive CcL_NAC genes based on phylogenetic analysis using MEGA7.0. A total of 139 protein sequences were used which included 96 from pigeonpea and 43 wellknown stress-related NAC genes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula and Glycine max. Bootstrap values are displayed next to the branch nodes. Fig. S3 Prediction of stress-responsive Ah_NAC genes based on phylogenetic analysis using MEGA7.0. A total of 209 protein sequences were used which included 166 from groundnut and 43 well-known stress-related NAC genes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula and Glycine max. Bootstrap values are displayed next to the branch nodes. Fig. S4 Representation of proteinprotein interactions among predicted stress-responsive chickpea, pigeonpea, and groundnut proteins using STRING database v11.0.