Comprehensive Analysis of NAC Transcription Factor Family in Pearl Millet Uncovers Drought and Salinity Stress Responsive NAC in Pearl Millet (Pennisetum Glaucum)

Background: Pearl millet (Pennisetum glaucum) is a cereal crop that possesses the ability to withstand drought, salinity and high temperature stresses. The NAC [NAM (No Apical Meristem), ATAF1 and (Arabidopsis thaliana Activation Factor 1), and CUC2 (Cup-shaped Cotyledon)] transcription factor family is one of the largest transcription factor families in plants. NAC family members are known to regulate plant growth and abiotic stress tolerance. Currently, no reports are available on the functions of the NAC family in pearl millet. Results: Our genome-wide analysis found 151 NAC transcription factor genes (PgNACs) in the pearl millet genome. Phylogenetic analysis divided these NAC transcription factors into 11 groups (A-K). Three PgNACs (-073, -29, and -151) were found to be membrane-associated transcription factors (MTF). Seventeen other conserved motifs were found in PgNACs. Based on the similarity of PgNACs to NAC proteins in other species, the functions of PgNACs were predicted. In total, 88 microRNA target sites were predicted in 59 PgNACs. A previously performed transcriptome analysis suggests that the expression of 30 and 42 PgNACs are affected by salinity stress and drought stress, respectively. The expression of 36 randomly selected PgNACs was examined by quantitative reverse transcription-PCR. Many of these genes showed diverse salt- and drought-responsive expression patterns in roots and leaves. These results conrm that PgNACs are potentially involved in regulating abiotic stress tolerance in pearl millet. Conclusion: The pearl millet genome contains 151 NAC transcription factor genes that can be classied into 11 groups. Many of these genes are either upregulated or downregulated by either salinity or drought stress and may therefore contribute to establishing stress tolerance in pearl millet.


Background
Pearl millet [Pennisetum glaucum (L.) R. Br.] is the sixth most important staple food crop. It is a nutritionally superior crop for people living in arid and semi-arid regions of Sub-Saharan Africa and the Indian subcontinent. It can withstand harsh environmental conditions such as drought, salinity and high temperature [1]. Transcriptomic analyses has identi ed functional genes and pathways involved in pearl millets stress response [2][3][4][5][6]. The pearl millet genome has been sequenced by the International Pearl Millet Genome Sequencing Consortium [7]. The availability of pearl millet transcriptome and genome data helps to identify genes that contribute to stress tolerance in pearl millet. Among abiotic stresses, drought and salinity cause severe yield losses in major staple crops [8,9]. Current climate prediction models forecast the deterioration of annual precipitation and an increase in salinization [10]. Breeding abiotic stress-tolerant crops such as pearl millet is therefore important to secure the food supply even under these conditions. To cope with these environmental stresses, plants activate defense responses, including the activation of sets of metabolic pathways and genes. Stress-responsive genes are classi ed into two types [11]: "functional" genes encoding proteins such as late embryogenesis-associated (LEA) proteins, detoxi cation enzymes, heat shock proteins and molecular chaperones, which directly protect plants from abiotic stress, and "regulatory" genes encoding proteins such as protein kinases and transcription factors (TFs), which have roles in the perception and transduction of stress signals. TFs can interact with the promoter regions of gene and thereby alter gene expression patterns. Plant TFs are divided into different families [12]. Like many TFs, the NAC (NAM, ATAF1 and 2, and CUC2) family has versatile functions in plants [13]. The NAC domain was deduced from three previously characterized proteins, petunia NAM (no apical meristem), Arabidopsis thaliana ATAF1/2 (Arabidopsis thaliana activation factor 1/2) and CUC2 (cup-shaped cotyledon) [14,15]. Genome-wide investigations of NAC transcription factor genes suggest that Arabidopsis thaliana has 117 NAC TFs, Setaria italica L. has 147 NAC TFs, Oryza sativa has 151 NAC TFs, and Zea mays has 152 NAC TFs [16][17][18]. However, NAC TF genes in pearl millet have not been analyzed thus far.
The objective of this study was to identify pearl millet NAC TFs and characterize their expression patterns. In this study, 151 NAC TF genes were identi ed in pearl millet (annotated as Pennisetum glaucum NAC genes; PgNACs). We have also analyzed their genomic distribution, phylogenetic relationships, gene structure, conserved motifs, microRNA targeting sites and expression pro les under drought and salinity-stressed conditions.

Identi cation and annotations of NAC genes in pearl millet
The HMMER Search with the HMM pro le identi ed 151 NAC genes among 35,757 pearl millet genes. All the identi ed NAC genes were named by adding the pre x "Pg", for Pennisetum glaucum, and were numbered according to their chromosomal position, yielding PgNAC001 to PgNAC151. Deduced PgNAC protein sequences exhibited a diverse range of amino acid lengths: the smallest PgNAC was 98 amino acids (PgNAC011) long, whereas the largest was 750 amino acids (PgNAC134) long. The CDD search and the SMART program con rmed the presence of NAC domains in each PgNAC.
Chromosomal distribution and structure of PgNAC proteins Physical mapping of PgNACs on all 7 pearl millet chromosomes revealed an uneven distribution for the rst 134 PgNACs. Among the chromosomes, chromosome 3, which is the largest in size (300.9 Mb), had the largest number of PgNACs (25; 18%). Chromosomes 1 and 6 had the second largest number (15%) of PgNACs. Chromosome 5 had the smallest number (11%) of PgNACs, and its size is 158.7 Mb. PgNACs located on chromosomes 1 and 5 appear to congregate at one end of the arms, while chromosomes 2, 3, and 4 show clusters of PgNACs at both ends. Chromosome 7 has a non-clustered distribution of PgNACs. The remaining PgNACs (PgNAC135 to PgNAC151) mapped to 10 different scaffolds. Among these, scaffold 2474 had the largest number (5) of PgNACs (PgNAC141 to PgNAC145), while scaffolds 1622, 2427, 3470, 3477, 7552 and 8799 each had 1 PgNAC. Twenty-seven (15.56%) PgNACs were found to be tandemly duplicated ( Fig. 1 and Additional File 2).
All PgNACs show great variation in size, with the smallest gene being 0.5 kb and the longest being 9 kb. The number of introns in PgNACs ranged from zero to twelve. Thirty-eight PgNACs consist of three exons and two introns. PgNAC101 and PgNAC007 have 13 and 10 exons, respectively. Twenty-nine (40.5%) PgNACs have only one exon (Additional File 3). Three PgNACs (PgNAC073, PgNAC029, and PgNAC151) were predicted to have a single transmembrane helix at the N-terminus (Fig. 2). These three PgNACs are more like membrane-associated NAC proteins from Setaria italica than those from Brachypodium distachyon and Zea mays. Each membrane associated PgNAC was placed in a different NAC protein clade (Fig. 3).

Phylogenetic analysis of PgNACs and conservation of motifs
Comprehensive phylogenetic analyses were performed by aligning the sequences of 145 SiNACs (Setaria italica NAC proteins) and 151 PgNACs. All NACs were grouped into 11 groups (A to K). Group G was the largest group with 32 PgNAC members, while the smallest was group H with 5 members (Fig. 4). Most PgNAC-encoding genes in the same group shared similar exon-intron structures and/or duplication patterns. For instance, all the PgNAC-encoding members in groups D and E have 2 or 3 introns. Group Q has the largest number (12) of duplicated genes.
The MEME suite found 17 motifs in PgNACs. PgNAC N-terminal regions were found to be more conserved than C-terminal regions. Generally, PgNACs in the same groups showed similar motif compositions (Table  1 and Additional File 4), supporting the idea that their functions are also similar. Table 1. Conserved motifs present in pearl millet PgNACs. Conserved motifs were identi ed using the online tool MEME with the default parameters.
GO term analysis suggested that the majority of PgNACs are involved in DNA binding (90%). Eight PgNACs (PgNAC009, PgNAC021, PgNAC085, PgNAC125, PgNAC029, PgNAC112, PgNAC118, and PgNAC137) were associated with the GO term "homodimerization". Approximately 86.25% of PgNACs were associated with the GO term "Nucleus". Nine PgNACs were associated with the GO term "external and internal stimuli", and 29 were associated with "response to stress" (Fig. 5).

miRNAs target sites in PgNACs
The results of analyzing miRNAs targeting PgNACs found a total of 88 miRNA target sites in the 59 PgNACs. Among them, PgNAC023 had the most (six) miRNA target sites. PgNAC023 is targeted by miRNA162, miRNA167h, miRNA394a and miRNA394b. While, PgNAC092 had four miRNA target sites, and is targeted by miR165a, miR165b, miR166b and miR166p. In some PgNACs, for example PgNAC117 and PgNAC120 one miRNA target site found. (Additional le: 5). These results suggested that PgNACs function is regulated by multiple miRNAs.

Discussion
The chromosomal distribution of PgNACs was uneven and clustered. This pattern is similar to the distribution of NAC TF genes in Setaria italica, Oryza sativa, Solanum tuberosum, Brachypodium distachyon and other species [17,[19][20][21]. Structurally diversi ed multigene family are known to develop during evolution to cope with changing environmental conditions [22]. The numbers of introns in PgNACs ranges from 0-12, which is comparable to NAC TF genes in rice (0-16) [19]. However, banana and cassava NAC TFs contain 0-5 and 0-9 introns, respectively [23,24]. During gene duplication, the loss of introns occurs faster than the gain of introns [19]. The 29 PgNACs that have no introns may represent the original genes. Gene duplication, whether tandem or segmental, plays an important role in the diversi cation of gene function. An Arabidopsis NAC TF, ANAC096, is known to interact with speci c bZIP TF family members and regulate plant responses to the stress-related phytohormone abscisic acid [25]. In miRNA-PgNACs association studies 88 miRNAs binding sites were found in 59 PgNACs. This association studies con rms that miRNAs are master regulators of PgNACs. Most of these pearl millet miRNAs targeting PgNACs are salinity and drought stress responsive. Previous studies have proved that miRNAs and NAC TFs together are key players of stress responses [26,27]. However, further studies are necessary to elucidate the roles of these miRNAs and their association with pearl millet NACs.
Using qRT-PCR, we identi ed 36 PgNACs that respond to either drought or salinity stress. In our analysis, many PgNACs were upregulated in roots rather than in leaves. This difference may be because roots are the primary organ exposed to stress [2]. Similar results were obtained for NAC TF genes in Brachypodium distachyon when their expression levels were assessed after plants were exposed to cadmium stress [28]. In common bean (Phaseolus vulgaris), 11 NAC TF genes were upregulated by drought stress, while 8 were downregulated [29]. JUNGBRUNNEN 1 (JUB1), a NAC factor, acts as an important regulator of drought tolerance in Solanum lycopersicum [30]. Another Arabidopsis NAC TF, ANAC092, regulates senescence, seed germination, and tolerance to salt stress [31]. Several other NAC TFs, such as OsNAC6 [32][33][34] in rice; ANAC019, ANAC056 and ANAC072 [35] in Arabidopsis; NAC67 in nger millet [36]; and SNAC1 in cotton [37], are involved in establishing stress tolerance in plants. Similar to these NAC TFs, PgNACs are likely to regulate drought and salinity tolerance in pearl millet.

Conclusions
In our study, a comprehensive analysis including gene structure, phylogenetic analysis, chromosomal location, miRNA binding and expression analysis under abiotic stresses of the NAC gene family in pearl millet was rst performed. We identi ed 151 NAC genes in the pearl millet genome, these genes provide preliminary information for the future functional characterization studies. Furthermore, expression analysis of PgNAC genes in root and leaf tissues and under salinity and drought stresses implied that PgNACs participate in the stress responses of the pearl millet. Our systematic analysis of the NAC TF genes in pearl millet lays foundation for the follow-up study of the functional characteristics of PgNAC genes and the cultivation of stress tolerant pearl millet varieties.

Sequence retrieval, database searches and domain searches
To identify the PgNACs, 35,756 coding gene sequences of pearl millet were retrieved from the GIGA database (http://gigadb.org/dataset/100192) and converted to all 6 standard reading frames with the standalone JEMBOSS software [38,39]. Nucleotide sequences for the 1016 NAC TFs from Arabidopsis thaliana, Oryza sativa, Setaria italica, Sorghum bicolor and Zea mays were retrieved from the Plant Transcription Factor Database v4.0 (http://planttfdb.cbi.pku.edu.cn/family.php?fam=NAC) [12] and used as subjects to perform BLAST searches against the pearl millet coding sequences. The hidden Markov model (HMM) pro le of NAC and NAM domains was extracted from the pFAM database [40][41][42], and NAC and NAM HMM pro les were used to search the pearl millet genome sequences for speci c domains using HMMER [43]. Hits with an expected value less than 1.0 were selected, and redundancy was minimized using the decrease redundancy tool (https://web.expasy.org/decrease_redundancy). The presence of NAC domains in the ltered sequences was con rmed by SMART (http://smart.emblheidelberg.de) and pFAM (http://pfam.xfam.org/). For the transmembrane motifs, the sequences were searched using the TMHMM server ver 2.0 (http://www.cbs.dtu.dk/services/TMHMM/).

Chromosome positioning and gene duplication
The BLASTN program in the BLAST+ suite [44] was performed using PgNAC sequences as queries and the pearl millet genome sequence (Taxid: 4543) as the database to determine the positions of PgNACs in the pearl millet genome. The positions of PgNACs in the genome were displayed using MapChart [45]. Gene duplication was con rmed using two criteria: a) the shorter aligned sequence covered >70% of the longer sequence and b) the similarity of the aligned sequences was >70% [46,47]. Tandem duplicated genes were de ned as adjacent homologous genes on a single chromosome, with no more than one intervening gene.

Phylogenetic analysis and conserved motif identi cation
For phylogenetic analysis, the sequences of all NAC transcription factors were imported into MEGA7, and multiple sequence alignment was performed using CLUSTALW with the default's parameters. A le with the extension ".MEG" was used to construct an unrooted phylogenetic tree based on the neighbor-joining method using bootstrap analysis with 1000 replicates [48]. The MEGA7 output tree was edited using the Interactive Tree of Life online tool (iTOL; https://itol.embl.de/). Conserved motifs in the NAC transcription factor sequences were identi ed using multiple EM for motif elicitation (MEME suite). The input parameters for motif identi cation were as follows: number of repetitions, any; maximum number of motifs, 20; and width of motif, 50. The discovered motifs were annotated using InterProScan in the InterPro database [49,50].

Prediction of microRNAs target sites in PgNACs
In pearl millet, microRNAs (miRNAs) have been reported to be involved in stress responses [3,51].
MiRNAs negatively regulate target genes by pairing to the corresponding gene and facilitating its cleavage. [52] The psRNATarget tool was used to identify the target sites of pearl millet miRNAs in PgNACs (http://plantgrn.noble.org/psRNATarget/) [53]. To avoid false-positives, only gene sequences with a 3.0 points cut-off threshold were considered miRNAs target genes.

NAC TF GO annotation and expression analysis using transcriptome data
Gene Ontology (GO) terms were assigned to PgNACs using the Blast2GO program based on their similarities to Oryza sativa proteins (National Center for Biotechnology Information (NCBI) taxid: 4530) [54].
Previously available RNA sequencing (RNA-Seq) data for drought and salinity conditions were primarily used to examine the expression of the PgNACs. These data were retrieved from the NCBI Sequence Read Archive under accession numbers SRR6327851 to SRR6327856; SRR6327857 to SRR6327862 for drought-stressed samples and SRP128956 for salinity-stressed samples. [2,55] Quantitative reverse transcription-PCR (qRT-PCR) Drought-(ICMB843) and salinity-tolerant (ICMB01222) pearl millet genotypes were obtained from the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), India. Eexperiments were conducted under controlled greenhouse conditions. Approximately, 20 seeds from each genotype were sown in equal volumes of soil and vermiculite in perforated terracotta pots. Each growth experiment was performed in triplicate. Drought stress was imposed on 21-day-old plants by incubating the plants in 20% (w/v) PEG for 6 and 24 hrs, while control plants were maintained in normal water. For salinity stress, plants were incubated in 250 mM NaCl solution for 6 and 24 hrs. Total RNA was isolated from root and leaf tissues with Trizol. For cDNA synthesis, 1 µg of total RNA was reverse transcribed with oligo (dT) primers and PrimeScript Reverse Transcriptase. To examine PgNAC expression, real-time PCR was performed using a step-one real-time instrument and the TB Green I kit (Roche, Basel, Switzerland).
Reactions were performed in triplicate and each reaction mixture contained 100 ng of cDNA, 1 μL of each primer (10 μM/μL) and 10 μL of TB green master mix in a nal volume of 20 μL. The primers sequences used for qRT-PCR are provided in (Additional File 1). The PCR conditions were as follows: initial denaturation at 95 °C for 30 s, 40 cycles of 95 °C for 5 s and 60 °C for 30 secs, and melt-curve analysis. The PgActin gene was used as a reference [56]. Relative fold differences for each sample in each experiment were calculated using the ΔΔCt method [57].

Declarations
Availability of data and materials All data generated or analyzed during this study are included in this published article [and its supplementary information les].

Competing interests
We con rm that none of the authors have any competing interests.    Phylogenetic relationship among membrane-associated NAC TFs from pearl millet, Setaria italica, Brachypodium distachyon and maize. Multiple sequence alignment of all putative membrane-associated NAC TFs in pearl millet (proteins with "PgNAC" in their names), Setaria italica (proteins with "Seita"), Brachypodium distachyon (proteins with "Bradi") and maize (proteins with "GRMZM") was conducted using ClustalW. MEGA 7.0 was used to create phylogenetic trees using the neighbor-joining method with 1000 bootstrap replicates and the p distance method. Phylogenetic analysis of pearl millet and Setaria italica. A phylogenetic tree was constructed using the neighbor-joining method with 1000 bootstrap replicates as described in the legend for Fig. 3. The letters A-K correspond to the 11 NAC TF subfamilies. Gene Ontology (GO) terms associated with PgNACs. A) GO terms for the category "Biological process". B) GO terms for the category "Molecular functions". C) GO terms for the category "Cellular components".  Expression of selected PgNACs under drought-stressed conditions. Pearl millet plants were exposed to drought stress for 0, 6, and 24 hrs, and their roots and leaves were sampled. The relative mRNA abundance of 21 selected PgNACs in these samples was determined by quantitative reverse transcription-PCR, with PgActin as an internal control. Data are presented as the means standard deviations (SD) of three biological replicates. Expression analysis of selected PgNACs for salinity. Pearl millet plants were exposed to salinity stress for 0, 6, and 24 hrs, and their roots and leaves were sampled. The relative mRNA abundance of 16 selected PgNACs was determined in these samples using quantitative reverse transcription-PCR, with PgActin as an internal control. Data are presented as the means standard deviations (SD) of three biological replicates.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.