Genome-wide identi cation and expression patterns analysis of the RPD3/HDA1 gene family in cotton

Jingjing Zhang Chinese Academy of Agricultural Sciences Cotton Research Institute Aimin Wu Chinese Academy of Agricultural Sciences Cotton Research Institute Hengling Wei Chinese Academy of Agricultural Sciences Cotton Research Institute Pengbo Hao Chinese Academy of Agricultural Sciences Cotton Research Institute Qi Zhang Chinese Academy of Agricultural Sciences Cotton Research Institute Miaomiao Tian Chinese Academy of Agricultural Sciences Cotton Research Institute Xu Yang Chinese Academy of Agricultural Sciences Cotton Research Institute Shuaishuai Cheng Chinese Academy of Agricultural Sciences Cotton Research Institute Xiaokang Fu Chinese Academy of Agricultural Sciences Cotton Research Institute Liang Ma Chinese Academy of Agricultural Sciences Cotton Research Institute Hantao Wang (  w.wanghantao@163.com ) Chinese Academy of Agricultural Sciences Cotton Research Institute Shuxun Yu (  ysx195311@163.com ) Chinese Academy of Agricultural Sciences Cotton Research Institute https://orcid.org/0000-00029715-3462

Background DNA is combined with nuclear proteins to constitute the Chromatin that is in charge of storing genetic and directive information in eukaryotic cells. Chromatin is highly arranged and mainly composed of nucleosomes, which is formed by approximately 147 bp of DNA and an octamer organized by four core histone proteins -H3, H4, H2A, and H2B [1]. Gene expression in eukaryotes involves a complicated interaction, which is controlled not only by the DNA sequence but also by epigenetic events. Epigenetic mechanisms mainly containing histone modi cation and DNA methylation play an important role in the regulation of gene expression. In general, histone post-translational modi cations occur at the N-terminal of histones, including methylation, acetylation, phosphorylation, ADP-ribosylation and ubiquitination [2] and these changes facilitate the binding of other proteins to DNA, resulting in synergistic or antagonistic regulation of gene transcription [3,4]. Among the several histone modi cations, histone acetylation is an invertible process and plays essential roles in the epigenetic regulation. The acetylation of core histones is catalyzed by histone acetyltransferases (HATS), promoting the transcriptional activation, while deacetylation is regulated by histone deacetylases (HDACs) related to the transcriptional suppression [5].
HDACs deacetylate the lysine residues of N-terminal histone tails, resulting in repression of gene expression [6].
RPD3/HDA1-type histone deacetylases, homologous to yeast RPD3 and HDA-1, belong to a large family and they require zinc ions to catalyze activity, while an HDAC depressor TSA (trichostatin A) or sodium butyrate can inhibit their enzyme activities [7]. The Arabidopsis RPD3/HDA1 gene family is further classi ed into three groups. Class I includes HDA6, HDA7, HDA9 and HDA19, class II includes HDA5, HDA15 and HDA18 and class III includes HDA2 only [7,8]. The other genes of PRD3/HDA1 family are classi ed to unclassi ed.
Over the past twenty years, RPD3/HDA1-type HDACs (call RPD3 for short below) have been studied extensively as global regulatory factors playing essential roles in a series of growth and development processes of plants and responding to various environmental stresses [8, [12][13][14]. In Arabidopsis, it has been reported that AtHDA19 was involved in various developmental processes including owering time, circadian clocks, seed development and so on [15,16]. Otherwise, AtHDA19 might regulate gene expression related to jasmonic acid and ethylene signaling pathways in response to wounding and pathogen infection [17]. In maize, the expression patterns of three ZmPRD3 genes, ZmRpd3/101, ZmRpd3/102 and ZmRpd3/108, showed that they expressed widely in all investigated corn organs. Furthermore, their products could be detected in all cellular parts at some speci c stages such as kernel, shoot and anther developmental periods [18]. In rice, HDA705 could respond to ABA and abiotic stresses and its expression was induced by JA. In addition, the expression of HDA702 and HDA704 was also signi cantly induced by SA, JA and ABA [19,20]. These ndings indicated that the RPD3 members played an important regulatory role in plant development and response to various stresses and plant hormone.
Cotton is one of the most important economic crops in our country and plays an essential role in China's national economy. Early maturity and stress resistance are vital target traits of cotton breeding. Over the past two decades, the RPD3 genes have been intensively studied and made some progress in Arabidopsis and some other crops. However, there is a lack of systematic research on RPD3 gene family in cotton.
Hence, it is necessary to explore the potential functions of RPD3 genes in cotton. In our study, the protein sequences of cotton RPD3-type HDACs were predicted by genome-wide identi cation and their phylogenetic tree, gene structure, conserved motifs, protein domains, expression pro les and preliminary functions were comprehensively analyzed. The information about GhRPD3 provides a reference for further exploration of the possible functions of RPD3 genes in cotton growth and development.

Results
Identi cation of RPD3 genes in nine species In this study, a total of 108 RPD3 protein sequences from nine species were determined after eliminating redundant sequences and they are named by the position on the chromosome. The corresponding relationship between gene ID number and their names was shown in Additional le 1: Table S1. A total of 18 genes (GhHDA1-GhHDA18) containing Hist_deacetyl (PF00850) domains were identi ed from G. hirsutum, among which 9 genes were located on the At genome and 9 genes were mapped on the Dt genome. Further, 18 genes (GbHDA1-GbHDA18) from G. barbadense, 9 genes (GaHDA1-GaHDA9) from G. aboreum and 9 genes (GrHDA1-GrHDA9) from G. raimondii were detected, repectively. Tetraploid cotton possessed twice as many RPD3 genes as diploid cotton, indicating that no RPD3 cotton gene was lost in the process of polyploidy. The numbers of RPD3 genes in the other ve species were 10 (Arabiposis), 14 (Oryza satica L.), 11 (Populus trichocarpa), 8 (Theobroma cacao) and 11 (Zea mays L.), respectively. The GhRPD3 proteins length ranged from 232 to 635 aa with an average of 459aa. The physicochemical parameters analysis showed that the isoelectric point (pl) of GhRPD3 proteins varied from 4.47 to 8.65 with an average value of 5.68 and the molecular weight of GhRPD3 proteins varied from 25.79 to 73.01 KDa with an average value of 51.21 KDa. The subcellular localization results indicated that most of the GhRPD3 genes were located in cytoplasmic (10) and nuclear (8), suggesting that GhRPD3 genes might possess multiple regulatory functions ( Table 1). The predicted length, pI, Mw and subcellular localization of the RPD3 proteins in other eight species were shown in Additional le 1: Table S1.  (14), Zea Mays (11) and P. trichocarpa (11) were employed to construct an unrooted phylogenetic tree using the neighbor-joining method for investigating the evolutionary relationships of RPD3 proteins. The RPD3 proteins were phylogenetically classi ed into 4 subfamilies (Class I, Class II, Class III and unclassi ed) according to the formulated subfamilies in Arabidopsis [7]. The Class subgroup was the largest subfamily with 49 RPD3 genes, while the Class subgroup has the fewest members, only containing one gene in seven diploid species genomes and including two genes in two tetraploid cotton genomes (Fig. 1). Among these four classes, each subfamily contained all nine species RPD3 genes, indicating this gene family was relatively conserved in different species during evolutionary process.

Exon-intron structure and conserved motif analysis
The domains of RPD3 sequences in cotton were investigated and exhibited according to the results of SMART database using TBtools, showing that all cotton RPD3 genes contained a Hist_deacetyl domain (Additional le 2: Table S2 and Additional le 3: Figure S1). An unrooted phylogenetic tree with the predicted cotton RPD3 genes was constructed (Fig. 2a), then exon-intron structure (Fig. 2b) and conserved motifs (Fig. 2c) were analyzed to better understand the similarity and difference of cotton RPD3 members. The results showed that the gene length of RPD3 cotton genes was relatively conserved in Class and Class , but there were twelve longer sequences existing in Class and unclassi ed group. The RPD3 cotton genes included lots of exons varying from 3 to 17 and most RPD3 genes (48/54) contained more than ve exons (Additional le 4: Table S3), which might be associated with diversi cation of their functions. In terms of distributions of motifs, most RPD3 cotton genes belonging to the same subfamily performed a similar motif composition except for the unclassi ed group (Fig. 2c). Most Class subfamily members contained 9 motifs in addition to GrHDA5 and GhHDA4, only including 4 and 6 motifs respectively. Class subfamily genes had three or four motifs and most Class subfamily members possessed 7 motifs except for GhHDA12, one motif less than other genes. There were differences existing in the exon-intron structure and motif arrangement among four categories, while they were highly conserved on the same branches, indicating that the RPD3 members classi ed into the same branch might perform a relatively conserved function in cotton growth and development.
Chromosomal distribution, gene duplication and selection pressure The chromosomal distributions of GrRPD3, GaRPD3, GbRPD3 and GhRPD3 genes were exhibited according to the genomic position of 54 cotton RPD3 genes (Additional le 5: Table S4 and Fig. 3). In G. hirsutum, 18 GhRPD3 genes were unevenly mapped on 13 chromosomes. A03 contained the most GhRPD3 genes (3), while the other 12 chromosomes only contained one or two GhRPD3 genes (Fig. 3a). The distribution of 18 GbRPD3 genes on chromosomes in G. barbadense was similar with that of GhRPD3 genes in G. hirsutum (Fig. 3b). In G. arboreum, 9 GaRPD3 genes were unevenly located on 6 chromosomes. Chr01 and Chr13 contained three and two GaRPD3 genes, respectively, while the other 4 chromosomes contained only one GaRPD3 genes (Fig. 3c). In G. raimondii, the chromosomal distribution of 9 GrRPD3 genes were highly consistent with the corresponding D sub-genome of G. hirsutum (Fig. 3d), showing conserved numbers and chromosomal distribution of RPD3 genes between diploid and tetraploid cotton species. In addition, the lopsided chromosomal distribution of the cotton RPD3 genes indicated that genetic variation occured during the progress of evolution. Notably, most of the RPD3 genes were distributed on the opposite ends of the chromosomes in four cotton species (Fig. 3).
In general, tandem and segmental duplication were two main reasons of gene family generation in the evolutionary process of gene family [21]. The analysis of gene duplication indicated that all RPD3 family members were ampli ed only through segmental duplication (Additional le 6: Table S5), suggesting that segmental duplication played a vital role in the evolution of the RPD3 gene family. The homologous gene pairs obtained by collinearity analysis among RPD3 genes in G. arboretum, G. raimondii, and G. hirsutum was visualized using circular maps (Fig. 4). The Ka/Ks ratios of most homologous gene pairs were lower than one, indicating that puri ed selection was essential during the evolution of cotton RPD3 genes, while the Ka/Ks ratios of gene pairs (GhHDA2/GaHDA3 and GhHDA6/GaHDA6) were more than 1, suggesting that these two gene pairs might have experienced positive selection pressure. The study also predicted the occurrence time of segmentally duplicated RPD3 gene pairs by the formula " t = Ks/2r" (r = 2.6 × 10 − 9 ) [22]. Except for gene pair (GhHDA6/GaHDA6), the other segmental duplication events of three cotton species might have occurred 0.6 to 144.44 million years ago with an average time of 18.39 million years (Additional le 6: Table S5).

Analysis of cis-elements in predicted promoter regions of GhRPD3
In order to explore the possible regulatory functions of GhRPD3 genes under various environmental stresses and hormone regulation pathway, the 2000 bp promoter regions of 18 GhRPD3 genes were employed to the PlantCARE database for identi cation of putative stress-associated and plant hormonerelated cis-elements. A total of 9 kinds of elements related to plant hormone: AuxRR-core (auxin), TGAelement (auxin), P-box (gibberellin), TATC-box (gibberellin), GARE-motif (gibberellin), CGTAC-motif (MeJA), TGACG-motif (MeJA), TCA-element (SA) and ABRE (ABA), and 4 kinds of elements responding to stresses: TC-rich repeats (defense and stress responsiveness), MBS(drought), WUN-motif (wound) and LTR (cold stress), were detected in the predicted promoters of GhRPD3 genes. As shown in Fig. 5, the promoters of some GhRPD3 genes contained various hormone-responsive and stress-responsive components, such as GhHDA2 (2 MBS, 2 LTR, 2 TC-rich repeats, 1 GARE-motif, 2 ABRE, 1 TGACG-motif) and GhHDA13 (1 MBS, 1 LTR, 1 TC-rich repeats, 1 AuxRR-core, 2 GARE-motif, 1 TCA-element, 4 ABRE, 2 TGACG-motif). Among all the 18 GhRPD3 genes, there are large numbers of light-responsive elements distributing in their promoter regions (Additional le 7: Table S6), in addition, MeJA-responsive and ABAresponsive elements are more common than other hormone-related elements (Additional le 8: Figure  S2). The results revealed that GhRPD3 genes might be involved in MeJA and ABA hormone signaling pathways as well as response to environmental stresses.

Expression pro les of GhRPD3 genes in different tissues and under different abiotic stresses
To understand the potential functions of GhPRD3 genes in the growth and development of cotton, we studied their expression in various cotton tissues containing anther, pistil, bract, sepal, petal, lament, torus, root, leaf, stem, ovules and bers using publicly available transcriptomic data provided by Hu et al. [23]. Transcripts of all the GhRPD3 genes were detected in at least three tissues with fragments per kilobase million (FPKM) ≥ 1. Furthermore, ten genes exhibited high expression in all the selected tissues (Additional le 9: Table S7). The results indicated that GhRPD3 genes were widely expressed in both reproductive organs and vegetative organs and might have multiple biological functions. After log2conversion of FPKM values, the expression pro les of GhRPD3 genes in different tissues were shown in Fig. 6a. Seven GhRPD3 genes exhibited relative high expression in at least eight tissues (log2-transformed FPKM value ≥ 2.6), especially a pair of homologous genes (GhHDA1/GhHDA10), performing higher expression level at all the tissues and showing the same expression pattern. Nevertheless, three GhRPD3 genes (GhHDA4, GhHDA14, GhHDA18) revealed relative low expression in at least eight tissues (log2-transformed FPKM value < 1), of which GhHDA14 showed the lower expression at all tissues except for pistil. These homologous gene pairs (GhHDA1/GhHDA10, GhHDA4/GhHDA11, GhHDA2/GhHDA13, GhHDA6/GhHDA15, GhHDA7/GhHDA16 and GhHDA9/GhHDA18) were located on At and Dt subgenomes, and exhibited similar expression patterns. For example, homologous gene pairs (GhHDA4/GhHDA11 and GhHDA9/GhHDA18) showed relative low expression at all these twelve tissues. Gene pair (GhHDA2/GhHDA13) exhibited relative high expression in torus and ovule, while showed relatively low expression in petal (Fig. 6a).
Based on the analysis of cis-elements in promoter regions and previous reports on RPD3 genes in other plants, GhRPD3 gens might respond to abiotic stresses. To verify this hypothesis, we investigated the expression characteristics of 18 GhRPD3 genes under cold, heat, PEG and salt treatments using available transcriptomic data [23] (Fig. 6b). The results indicated that most GhRPD3 genes could be induced by all four stresses to varying degrees, especially these six genes (GhHDA1, GhHDA2, GhHDA6, GhHDA10, GhHDA12 and GhHDA18), signi cantly up-regulated at every stage of the treatment. However, only one gene (GhHDA4) exhibited markedly down-regulated expression under four abiotic stresses. Some genes were induced by a particular treatment, for example, GhHDA13 and GhHDA16 was signi cantly induced by PEG treatment because of containing drought-responsive elements in their predicted promotors. Four genes (GhHDA7, GhHDA17, GhHDA11, GhHDA5) showed up-regulated expression under heat treatment. The expression of GhHDA9 was signi cantly up-regulated under cold and salt treatments. According to the results, we can make a conclusion that GhRPD3 genes play an essential role in response to abiotic stresses.

Characterization of GhRPD3 genes expression during ower bud differentiation period
To explore the expression difference of GhRPD3 genes between the early-maturity and late-maturity cottons in ower bud differentiation period, we selected nine genes showing relatively high expression in tissues (anther, pistil, bract, sepal, petal, lament and torus) that made up the oral organs for qRt-PCR. The buds of early-maturity variety (CCRI50) and late-maturity variety (GX11) from one-leaf stage to veleaf stage were used for qRt-PCR respectively (Fig. 7). The results revealed that more than half of these genes (5/9) possessed relatively higher expression in early-maturity cotton than that in late-maturity cotton during ower bud differentiation period. GhHDA5 showed marked difference at two-leaf stage and three-leaf stage and these two stages were regarded as the important period of ower bud differentiation. A homologous gene pair (GhHDA6/GhHDA15) located on At and Dt respectively showed same expression trend that both of them presented the highest expression at three-leaf stage and then exhibited downregulated expression at next two stages in CCRI50. In addition, all the nine genes performed relatively high expression at two-leaf stage or three-leaf stage in CCRI50 compared with GX11. The results showed that GhRPD3 genes were associated with early maturity of cotton.
Responses of GhRPD3 genes to MeJA and ABA treatment MeJA and ABA play important roles in plant stress resistance. To further explore the possible functions of GhRPD3 genes, we selected the GhRPD3 genes containing MeJA and ABA responsiveness elements in the predicted promotors to analyze the expression characteristics of those genes under MeJA and ABA treatment by qRT-PCR ( Fig. 8 and Fig. 9). Most GhRPD3 genes (8/13) were markedly up-regulated at 9 h after MeJA treatment. Three genes (GhHDA7, GhHDA13 and GhHDA18) exhibited signi cantly upregulated expression at least three time points, while four genes (GhHDA2, GhHDA8, GhHDA9 and GhHDA11) showed marked transcriptional down-regulation at least three time points after MeJA treatment (Fig. 8). More than half of GhRPD3 genes (6/11) were signi cantly up-regulated at 9 h after ABA treatment. Three GhRPD3 genes (GhHDA14, GhHDA15 and GhHDA18) performed relatively high expression at least three time points, while three GhRPD3 genes (GhHDA10, GhHDA11, GhHDA17) showed early down-regulated and then up-regulated expression patterns under ABA treatment (Fig. 9).
The results showed that exogenous application of MeJA and ABA signi cantly induced the transcriptional levels of most GhRPD3 genes containing MeJA-responsiveness and ABRE elements in their promoter regions.

Discussion
Among the several histone modi cations, histone acetylation plays an essential role in plant growth and development [24]. Histone acetylation contains acetylation and deacetylation, which were catalyzed by histone acetyltransferases (HATs) and histone deacetylases (HDACs), respectively [20]. In plants, HDACs are involved in a variety of biological processes associated with plant growth and development [25]. As the superfamily of HDACs, RPD3 gene family was most studied and crucial in plant development and physiological processes, including owering time, abiotic stress response, female gametophyte and embryo development, senescence, seed germination, plant hormone signals response and so on [12-14, 19, 26-28]. To date, although there were a few studies on function analysis of the RPD3 family members in G. hirsutum, it mainly focused on roles GhHDA5 played in ber initiation, lack of systematic report [29]. In order to explore characteristics of RPD3 family members and understand the roles played by cotton RPD3 genes in cotton growth and development, we conducted an integrated analysis of RPD3 gene family in cotton, containing their phylogenetic relationships, exon-intron structures, conserved motifs, chromosomal distributions, duplication events, expression patterns in different tissues as well as under abiotic stresses and response to MeJA and ABA treatment.
Phylogeny, gene structure and expansion of RPD3 genes in cotton A total of 18, 18, 9 and 9 RPD3 genes were determined by genome-wide identi cation in G. hirsutum, G.
barbadense, G. aboreum and G. raimondii, respectively. The number of RPD3 cotton genes in diploid cotton was half of that in tetraploid cotton, suggesting that the deletion of RPD3 genes did not happen in allotetraploid, which was not agreement with the higher ratio of gene deletion in allotetraploids [30].
According to the AtRPD3 genes, 108 RPD3 genes from nine species were classi ed into four groups (Fig. 1), similar to the previous classi cation of Arabidopsis [7].
The conservation of biological functions might be related to the conservation of gene structure [31]. In order to investigate the conservation of the RPD3 gene sequences, exon-intron structure and conserved motifs were analyzed. Exon-intron structure analysis showed that the exon numbers of RPD3 genes in cotton was highly diverse, ranging from 3 to 17 (Additional le 4: Table S3), which might be associated with diversi cation of their functions. Notably, the gene structure and motif arrangement were different among four subfamilies, while they were highly conserved on the same branch, indicating that the RPD3 genes (especially the members on the same branch) might preform conserved functions in the growth of cotton. The cotton RPD3 genes exhibited similarities and differences in exon-intron structures and motifs, which might be associated with conservation and subfunctionalization caused by gene duplication during the evolution of cotton RPD3 gene family.
According to the evolutionary history of cotton, tetraploid cotton was formed by the hybridization of two diploid cottons and the following polyploidization [31]. To investigate the evolutionary relationships of predicted RPD3 genes between two diploid genomes and sub-genomes in allotetraploid, we analyzed their chromosomal distributions and gene duplication events. The results showed that the chromosomal distribution of RPD3 genes in G. arboreum and the corresponding A sub-genome of G. hirsutum was not identical while the chromosomal location of RPD3 genes in G. raimondii and the corresponding D subgenome of G. hirsutum was highly consistent (Fig. 3), illustrating the high conservation during the evolution of RPD3 gene family. The analysis of gene duplication showed that segmental duplication was essential to expansion of RPD3 family members (Additional le 6: Table S5). According to previous genomic studies of cotton, A and D genome diploid cottons began to differentiate from a common ancestor 5-10 MYA [23]. Subsequently, G. hirsutum evolved from the hybridization of two diploid cottons at approximately 1-2 MYA [22]. In G. hirsutum, the deduced divergence times of most RPD3 homologous gene pairs varied from 5.66 to 11.06 MYA (Additional le 6: Table S5), accompanying with the divergence of A and D ancestral genomes. Ka/Ks ratios analysis indicated that the segmentally duplicated gene pairs might preform similar functions on account of puri ed selection in functional segregation [32].

Functions analysis of GhRPD3 genes in upland cotton
Flowering time, an important indicator of early-maturity cotton, was also in uenced by the time of ower bud differentiation, which is a physiological and morphological marker of the transformation from vegetative growth to reproductive growth [33,34]. The previous studies showed that ower bud differentiation period of early-maturity cotton variety was earlier than that of late-maturity cotton variety, and early-maturity cotton generally begins ower bud differentiation when two true leaves are completely attened [34]. In Arabidopsis, at least 4 RPD3 genes have been reported to be associated with owering time, AtHDA6 has been identi ed to be associated with the autonomous pathway of four oweringpromoting pathways and regulated owering time by interacting with FLD (Flowering LOCUS D ) [13,26]; HDA5 regulated owering time by repressing the expression of FLC (FLOWERING LOCUS C) and MAF1. In addition, HDA5 and HDA6 might form a HDAC complex with FLD and FVE to control owering time in Arabidopsis [27]; In short days, AtHDA9 represses the owering promoting gene AGL19 (AGAMOUS-LIKE 19) regulated by photoperiod [35] and the down-regulation of AtHDA19 caused delayed owering, ower abnormalities, embryonic defects and seed set reduction [36]. In recent studies, GhHDA5, similar to AtHDA5 in Arabidopsis, exhibited higher expression at -1 and 0 DPA and the RNAi lines for that performed delayed owering, suggesting that it is a potential candidate gene related to cotton ber initiation and owering time [29]. As the homologous genes of these four AtRPD3 genes related to owering time in cotton, gene pair GhHDA2/GhHDA13, homologous to AtHDA6, and GhHDA6/GhHDA15, homologous to AtHDA9, performed marked higher expression in early-maturity cotton variety than that in late-maturity variety during all ve stages of ower bud differentiation. Beyond that, the other ve GhRPD3 genes we selected possessed relative higher expression at two-leaf stage or three-leaf stage between two cotton varieties (Fig. 7), indicating that GhRPD3 genes are helpful for improving the molecular breeding of early maturity cotton.
Cis-elements in promoter regions play important roles in gene expression regulation. In general, gene expression depends on the presence or absence of these elements [37]. In the previous studies, multiple evidences have revealed that RPD3 members played essential roles in response to various stresses or plant hormone regulation pathway in Arabidopsis, rice, maize, Populus trichocarpa and so on [13,17,38]. To further understand the regulation of GhRPD3 genes under different environment conditions, we investigated the cis-elements in their promoter regions. Nine kinds of plant hormone-related elements and 4 kinds of stress-responsive regulatory elements were identi ed in the presumed promoters of GhRPD3 genes (Fig. 5). Such a wide range of cis-acting elements was consistent with the published studies on the multifunctional roles RPD3 genes played in plants growth [7,8]. Based on the expression patterns of GhRPD3 genes under four abiotic stresses, we could see that almost all of 18 genes were signi cantly induced by all four stresses or a speci c treatment (Fig. 6b), fully illustrating the consequence that GhRPD3 genes could respond to abiotic stresses.
MeJA and ABA not only regulates plant growth and development, but also participates in plant defense response to environmental stress such as mechanical injury, disease and osmotic stress [39,40].
According to the analysis of cis-elements in predicted GhRPD3 promotors, we selected the GhRPD3 genes containing MeJA and ABA responsiveness elements in the predicted promoters to analyze the expression characteristics of those genes under MeJA and ABA treatment by qRT-PCR. The results revealed that exogenous application of MeJA and ABA signi cantly induced the transcription of most GhRPD3 genes at different points. The expression of GhHDA13, in particular, was markedly up-regulated after treatment with MeJA and ABA, otherwise, it could be signi cantly induced under cold and PEG treatment, similar to AtHDA6, which was related to ABA and JA signal pathway and stress response [13,38,41]. As the homologous gene of AtHDA19, which was involved in JA and ABA signaling response [17,38], GhHDA4 and GhHDA11 were signi cantly induced down-regulated expression by exogenous application of MeJA (Fig. 8). GhHDA1, GhHDA10 and GhHDA11 performed early down-regulated and then up-regulated expression trend under ABA treatment (Fig. 9).
On the basis of our expression pro les analysis of RPD3 genes in upland cotton, GhHDA1, GhHDA2, GhHDA6, GhHDA10 and GhHDA13 showed relative higher expression in most of investigated tissues and in early-maturity cotton variety during ower bud differentiation period, also played essential roles in response to MeJA, ABA and abiotic stresses, constant with the previous extensive evidence showing that plant histone deacetylases played vital roles in a round of plant developmental processes and responses to various environmental stresses [7,17,26,27,41,42]. Besides, other GhRPD3 genes also performed a speci c function in cotton development, laying the foundation for further functional veri cation of RPD3 genes in upland cotton.

Conclusions
In this study, a total of 108 RPD3 genes were detected in nine species by genome-wide identi cation.
These genes were divided into four subgroups according to the classi cation of Arabidopsis. The exonintron structure and conserved motifs analysis of 54 cotton RPD3 genes showed that the signi cant difference existed among the four subfamilies, while they were highly conserved on the same branch, indicating the cotton RPD3 genes on the same branch might perform the similar functions in cotton growth and development. The chromosomal distributions of cotton RPD3 genes exhibited the conserved gene numbers and chromosomal location between diploid and tetraploid cotton species. Characteristics of gene expression showed that most GhRPD3 genes performed relative high expression in oral organs and possessed the higher expression in CCRI50 compared with GX11 during ower bud differentiation period. In addition, the expression of GhRPD3 genes could be signi cantly induced by one or more abiotic stresses as well as exogenous application of MeJA and ABA. The results revealed that GhRPD3 genes might be involved in ower bud differentiation and resistance to abiotic stresses of cotton, which provides a basis for further functional veri cation of GhRPD3 genes in cotton development and a foundation for breeding better early-maturity cotton cultivars in the future.

Plant materials and treatments
Two G. hirsutum cultivars (CCRI50, GX11) were grown under standard led environments (5 rows, each 8 m long and 0.8 m width) in Anyang, Henan province, China. CCRI 50 is an early-maturity cotton variety with initial owering time in 60 days and GX11 is a late-maturity cotton cultivar with initial owering time in 70 days. The buds of these two cotton cultivars were taken from one-leaf stage to ve-leaf stage.
TM-1 was planted in a climate-controlled green house with suitable growing environment (light/dark cycle: 16 h at 28 °C/8 h at 22 °C). Four-week seedings appearing two at true leaves were sprayed with 100 mM MeJA,200 mM ABA and water as a blank control to explore the responses to MeJA and ABA treatments. After exogenous application of plant hormones and water, we took the leaves of three seedlings in every treatment at 3 h, 6 h, 9 h and 12 h respectively and promptly freeze these samples in liquid nitrogen.
Identi cation and sequence retrieval of RPD3 family members The HMM le (PF00850) of the conserved Hist_deacetyl domain was downloaded from Pfam (https://pfam.xfam.org/) [43]. Putative PRD3 proteins of G. arboreum (CRI_1.0) [44], G. raimondii (https://phytozome.jgi.doe.gov/pz/portal.html) [50] using the Hist_deacetyl HMM le. After obtaining the ID number of the possible genes in these nine genome databases, the protein sequences of RPD3 genes of different species were extracted from the formatted protein databases using blast program (ncbi-blast-2.6.0+-x64-win64.tar). The SMART database (http://smart.embl-heidelberg.de/) was employed to verify each predicted RPD3 protein with a Hist_deacetyl domain and the proteins that did not contain the conserved domain were removed [51]. These RPD3 genes were named following a rule that short for species names.

Multiple alignment and phylogenetic analysis of RPD3 proteins
Multiple alignment of all the presumed RPD3 protein sequences from the nine plant species including G. raimondii, G. arboretum, G. hirsutum, G. barbadense, A. thaliana, T. cacao, P. trichocarpa, Oryza sativa and Zea mays was carried out using the ClustalW program [53]. The neighbor-joining (NJ) method was employed to construct an unrooted phylogenetic tree using MEGA 7.0 program with 1000 bootstrap repetitions [54].
Chromosomal distribution, gene structure and conserved motif analysis The positions of GhRPD3, GbRPD3, GaRPD3 and GrRPD3 genes on chromosomes were identi ed according to the GFF les obtained from the CottonFGD website [47]. The coding sequences of RPD3 genes in four cotton species were aligned with their genomic DNA sequences to analyze the exon-intron structure, which were visualized using the online toolkit GSDS 2.0 (http://gsds.cbi.pku.edu.cn/) [55]. The online program MEME 5.0.5 (http://meme-suite.org/tools/meme) was employed to detect the conserved motifs of cotton RPD3 proteins with the following optimized parameters: maximum number of motifs, 10; the optimum width of each motif, 6-50 aa and setting E value at 1e-5 [56, 57].
Gene duplication events and selection pressure This study used BLASTp search (E-value < 1e-10) and MCScanX program in TBtools to perform genome collinearity analysis and detect orthologous and paralogous gene pairs [58,59]. The circular maps of identi ed gene pairs were visualized using the circos program [60]. The adjacent RPD3 family members on a single chromosome were considered to be tandem duplicated genes [61]. The ratios of nonsynonymous (Ka) substitutions and synonymous (Ks) substitutions of homologous gene pairs were calculated using TBtools with NG methods to evaluate the selection pressure of these gene pairs [59,62]. Normally, Ka/Ks < 1 demonstrated purifying selection; Ka/Ks = 1 demonstrated neutral selection; and Ka/Ks > 1 demonstrated positive selection. Then, the divergence times of the homologous gene pairs were estimated using the formula "t = ks/2r", with r(2.6 × 10 − 9) representing neutral substitution [63].

Analysis of cis-elements in GhRPD3 promoter regions
The promoter regions of all GhRPD3 genes containing 2000 bp DNA sequences of upstream of the initiation codon (ATG) were extracted from the G. hirsutum genome database downloaded from CottonFGD (https://cottonfgd.org/), employed to the website PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) [64] to determine the cis-elements of the GhRPD3 genes and TBtool was used for visualization [59]. Depending on the operating instructions, total RNA of these samples was extracted using the Tiangen RNAprep Pure Plant kit (Tiangen, China) and then 1ug of total RNA was reverse transcribed to synthesize rst-strand cDNA using the PrimeScript RT Reagent kit (Takara, Japan), which was diluted ve times for the further experiments. The cotton histone-3 gene (AF024716) was used as an internal control [69] and speci c primers of GhRPD3 genes for qRT-PCR were designed using Oligo 6.0 software, shown in Additional le 10: Table S8. The qRT-PCR were conducted on an ABI 7500 real-time PCR system (AppliedBiosystems, USA) using UltraSYBR Mixture (Low ROX) (Cwbio, China) with three technical repetitions and three biological replicates. The following was the detailed run method: step 1: primal denaturation of 10 min at 95 °C, step 2: 40 cycles of 10 s at 95 °C, 30 s at 60 °C, 32 s at 72 °C; step 3: paper; AMW and PBH helped the data analysis; SSC and QZ revised the manuscript; All authors read and approved the nal manuscript. Figure 1 Neighbor-joining phylogenetic tree of RPD3 gene family. The 108 predicted RPD3 proteins from G. hirsutum, G. arboreum, G. barbadense, G. raimondii, A. thaliana, P. trichocarpa, T. cacao, Oryza satica and Zea mays were aligned using ClustalW program and the neighbor-joining (NJ) method was employed to construct this unrooted phylogenetic tree using MEGA 7.0 program with 1000 bootstrap repetitions. Four subfamilies were represented by different colored lines.     Cis-elements of GhRPD3 genes in promoter regions. The numbers of different cis-elements were presented in the form of bar graphs and similar cis-elements were exhibited with same colors.     Supplementary Files This is a list of supplementary les associated with this preprint. Click to download.