Characterization of cotton ARF factors and the role of GhARF2b in fiber development

Background Cotton fiber is a model system for studying plant cell development. At present, the functions of many transcription factors in cotton fiber development have been elucidated, however, the roles of auxin response factor (ARF) genes in cotton fiber development need be further explored. Results Here, we identify auxin response factor (ARF) genes in three cotton species: the tetraploid upland cotton G. hirsutum, which has 73 ARF genes, and its putative extent parental diploids G. arboreum and G. raimondii, which have 36 and 35 ARFs, respectively. Ka and Ks analyses revealed that in G. hirsutum ARF genes have undergone asymmetric evolution in the two subgenomes. The cotton ARFs can be classified into four phylogenetic clades and are actively expressed in young tissues. We demonstrate that GhARF2b, a homolog of the Arabidopsis AtARF2, was preferentially expressed in developing ovules and fibers. Overexpression of GhARF2b by a fiber specific promoter inhibited fiber cell elongation but promoted initiation and, conversely, its downregulation by RNAi resulted in fewer but longer fiber. We show that GhARF2b directly interacts with GhHOX3 and represses the transcriptional activity of GhHOX3 on target genes. Conclusion Our results uncover an important role of the ARF factor in modulating cotton fiber development at the early stage. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07504-6.


Background
Cotton is the most important natural and renewable material for the textile industry in the world [1]. The primary cultivated species upland cotton (G. hirsutum L.) is grown in over 80 countries and accounts for more than 90% of global cotton fiber output. Cotton fibers are unusually long, single-celled epidermal seed trichomes and a model for plant cell growth research [2]. Fiber development can be divided into four overlapping stages: initiation, elongation, secondary cell wall biosynthesis and maturation [3]. The fiber length and density are both key traits that determine cotton quality and yield.
The study of cotton fiber development regulation provides not only valuable knowledge to understanding plant cell growth and cell wall biosynthesis, but also candidate genes for cotton molecular breeding [4]. To date a number of genes that function in cotton fiber cells have been identified, including homeodomain transcription factor GaHOX1, GhHOX3 and GhHD1 [5][6][7], bHLH transcription factor GhPRE1 [8], KNOX transcription factor knl1 [9], the sterol carrier gene [10], MYB transcription factors GhMYB25, GhMYB25-like, GhMML3 and GhMML4 [11][12][13][14], NAC transcription factor fsn1 [15], transcription factor WLIM1a gene [16], sucrose synthase gene [17], cotton actin1 gene [18], cotton BURP domain protein GhRDL1 [19], ethylene pathway related genes [20], fasciclin-like arabinogalactan protein, Ghfla1 [21], and TCP transcription factor GhTCP4 [22] etc. Among recent progresses are the characterizations of transcription factors which regulate the major events of cotton fiber development, such as MYBs and HD-ZIP IVs involved in cotton fiber initiation and elongation, as well as a number of other types of factors. The MIXTA type MYB transcription factors (GhMYB25, GhMYB25-like and GhMML4_D12) are master regulators of cotton fiber initiation [11,13,14] and lint fiber development [12], whereas the HD-ZIP IV transcription factor GhHOX3 plays a pivotal role in controlling fiber elongation [5], whose activity is regulated by the phytohormone gibberellin. In addition, NAC (GhFSN1) and TCP4 transcription factors positively regulates secondary cell wall biosynthesis [15,22]. However, cotton fiber growth and development are complex processes involving cell differentiation, cell skeleton orientation growth, cell wall synthesis, and so on [23]. Currently the picture of the regulation network of cotton fiber is far from complete.
Auxin response factors (ARFs), a group of plant transcription factors, are composed of a conserved Nterminal DNA binding domain (DBD), a most case conserved C-terminal dimerization domain (CTD) and a non-conserved middle region (MR) [24]. The MR region has been proposed to function as a repression or an activation domain [25]. Arabidopsis thaliana contains 23 ARF genes and Oryza sativa has 25 [26,27]. It has been reported that ARF2 negatively modulates plant growth in A. thaliana [26,[28][29][30] and tomato [31], yet functions of transcription factors can vary with tissues and more diversified in polyploid species, to date the role ARF2 in cotton fiber cells has not been explored.
In this study, we conducted a genome-wide analysis ARF genes in three cotton species (G. hirsutum, G. arboreum and G. raimondii), and classified them into four clades. In G. hirsutum most ARF genes were expressed in multiple cotton tissues, among which GhARF2b exhibited a preferential expression in developing cotton fiber cells, and it negatively affects cotton fiber elongation but plays a role in promoting fiber initiation.

ARF transcription factors in G. arboreum and G. hirsutum
The genome sequences of G. raimondii and G. arboreum provide us data resources to conduct a genome-wide screen of the ARF genes in the extent diploid progenitors of the allotetraploid G. hirsutum. In the previous studies, Sun et al., (2015) identified 35 ARF genes in G. raimondii [32]. To mine more ARF transcription factors in cottons the conserved domain (Pfam ID: PF06507) was used to hmmersearch against the G. arboreum and G. hirsutum genome databases, which resulted in 36 and 73 genes in G. arboreum and G. hirsutum genomes, respectively. The 36 G. arboreum ARF genes were designated GaARF1-GaARF20, and the 73 G. hirsutum ARF genes in A-and D-subgenomes were designated as GhARF1A/D-GhARF21A/D ( Table 1). As those of Arabidopsis, cotton ARF proteins are composed of three domain regions, including DBD (DNA-binding Domain), MI (Middle Region) and CTD (C-terminal Domain) (Additional file 1: Figure S1).

Phylogenetic analysis of Gossypium ARF proteins
To illustrate the evolutionary relationships among the cotton ARFs, a phylogenetic tree was constructed using the protein sequences of 144 cotton ARFs, which were clustered into four clades (I-IV). The highest number of Gossypium ARFs are found in clade III and I, followed by clade IV and II (Fig. 1).
Overall, the expected diploid-polyploid topology is reflected in the tree for each set of orthologous/homoeologous genes, indicating general preservation during divergence of diploids and through the polyploid formation. We found that the number of ARF genes in G. hirsutum are approximately twice that in G. raimondii and G. arboreum, with one A t or D t homoeologous copy corresponding to one ortholog in each of the diploid cottons. Further, as shown in Fig. 1, the orthologous paired genes of the A genome (G. arboreum) and A t sub-genome, or from the D genome (G. raimondii) and D t sub-genome, tend to be clustered together and share a sister relationship.

Divergence of ARF genes in allotetraploid G. hirsutum and its diploid progenitors
The ARF genes in the two diploid species were then compared with G. hirsutum A t -and D t -subgenome homoeologs (Table 1). To explore the evolutionary relationship and possible functional divergence of ARF genes between the allotetraploid cotton and its extend diploid progenitors, the nonsynonymous substitution (Ka) and synonymous substitution values (Ks) and the Ka/Ks ratios for each pair of the genes were calculated (Table 1). By comparing the Ka and Ks values of 66 orthologous gene sets between the allotetraploid and its diploid progenitor genomes, we found that the Ka and Ks values are higher in the D t subgenome than in the A t subgenome (Fig. 2). These results indicate that GhARF genes in the D t subgenome tend to have experienced faster sequence divergence than their A t counterparts, suggesting an inconsistent evolution of ARF genes in the two subgenomes (Fig. 2).
In addition, the Ka/Ks ratios of one D t -subgenome genes (GhARF3b_D) and five A t -subgenome gene (GhARF2e_A, GhARF3c_A, GhARF4b_A, GhARF16b_A and GhARF17b_ A) are greater than 1 ( Table 1), suggesting that these genes have under positive selections after divergence of G. hirsutum from diploid ancestors, and may have gained new functions.

Expression analysis of GhARF genes in different cotton tissues
The expression profile of a gene family can provide valuable clues to possible functions of each genes. Analysis of 73 GhARF genes showed that most genes have different spatial expression patterns. For instance, GhARF1, GhARF2a, GhARF2b and GhARF2c were expressed in all the tissues of cotton examined (Additional file 2: Figure  S2), whereas GhARF3a and GhARF3c were expressed preferentially in the pistils and ovules. Compared to GhARF5b, GhARF5a showed higher expressions in the root, pistil and ovule organs. Transcripts of GhARF3c and GhARF4a, GhARF9a and GhARF9b were most abundant in stem and root, respectively. Over half of GhARF genes showed a relatively high level of transcript accumulation in leaf. Notably, there are more than 10 genes (including GhARF1, GhARF2a, GhARF2b, GhARF8a, GhARF9a, GhARF10b, GhARF11, GhARF16a, GhARF18 and GhARF19) that were highly expressed in cotton fiber cells at the fast elongation stage (5 dpa). Among them, GhARF2 genes showed the highest expression in fiber (5 dpa) and were located in the Clade I of phylogenetic tree (Fig. 1), suggesting that they may function in cotton fiber development. Previous studies have demonstrated that ARF2 plays a role in transcriptional regulation in auxin-mediated cell division [30], leaf longevity [33], response to stress [34], regulation of fruit ripening [31] and so on. As GhARF2s shown pleiotropic effects on plant development [35], we decided to identify the major GhARF2s in regulation of cotton fiber elongation in subsequent experiments.
GhARF2 had a high expression pattern during fiber elongation process There are nine ARF2 genes in G. hirsutum (GhARF2c_ At not annotated), we first examined their expression profiles in different tissues in cotton (Fig. 3). Based on the RNA-seq data , GhARF2a, GhARF2b and GhARF2c genes had higher expression levels in various tissues than GhARF2d or GhARF2e (Fig.  3a). Among them, in 5 dpa fiber, the expressions of GhARF2b were 1.1-37 folds to other four GhARF2 genes. Whereas in ovule (0dpa), GhARF2b showed 1.2-15 folds higher expressions than others. Thus, the transcripts of GhARF2b homoeologs (GhARF2b_At and GhARF2b_Dt) were enriched and abundant in cotton fiber and ovule cells (Fig. 3a). Subsequent quantitative RT-PCR (qRT-PCR) confirmed the expression pattern, and GhARF2b showed 3.6-9 folds higher expressions in fiber (3dpa) or ovule (0dpa) than other tissues (Fig. 3b).
The highly up-regulated expression in fiber cell suggested that GhARF2b has been recruited to act primarily in cotton fiber.

GhARF2b overexpression represses cotton fiber elongation
To test the function of GhARF2b, we constructed the vectors to over-express and down-regulate GhARF2b_Dt in G. hirsutum by using the fiber-specific GhRDL1 promoter [8,19,36]. The expression levels of GhARF2b in transgenic cotton were clearly elevated in the overexpression lines according to qRT-PCR analysis; for example, the GhARF2b transcript abundance was about two-fold higher in the OE-3 than in the wild-type cotton fiber cells (Fig. 4a). However, GhARF2b did not stimulate fiber cell elongation, rather, it resulted in shorter fiber (Fig. 4b, c). On the contrary, suppressing GhARF2b expression by RNAi resulted in longer fibers (Fig. 5a, b). The expression levels of GhARF2b in RNAi cottons in the RNAi lines were about 3~5-fold down-regulated in cotton fiber of 0DPA, 6DPA and 12DPA (Fig. 5c-e). Together, these data suggest that GhARF2b acted as a negative regulator of fiber cell elongation, at least when its expression exceeded the threshold. Alternatively, it may function in other aspects of cotton fiber development.

GhARF2b interacted with GhHOX3
The homeodomain-leucine zipper (HD-ZIP) transcription factor, GhHOX3, plays a determinant role in controlling cotton fiber elongation [5]. We used the yeast two-hybrid system (Y2H) to screen a cotton fiber cDNA library for GhHOX3 interacting proteins. GhARF2 was among the top five interacting factors of the target proteins. In further yeast two-hybrid assays, GhARF2b and GhARF2b middle region strongly interacted with GhHOX3 (Fig. 6a, b). We also used bimolecular fluorescence complementation (BiFC) assays to confirm the interaction between GhARF2b and GhHOX3 (Fig. 6c).

The transcriptional activities of GhHOX3 target genes were repressed by GhARF2b protein interactions
Given the fact that GhARF2b represses cotton fiber elongation, we tested the two protein interactions would affect the transcriptional activation of GhHOX3 target genes. Two cell wall protein coding genes [19,36], GhRDL1 and GhEXPA1, are direct targets of GhHOX3 in promoting the fiber elongation [5]. We used a dualluciferase assay system to study the effect of GhARF2b on activity of GhHOX3 protein (Fig. 7a). The level of the luciferase activity driven by GhRDL1 and GhEXPA1 promoters was significantly increased when GhHOX3 was expressed (Fig. 7b, c). In contrast, activation of GhHOX3 to GhRDL1 or GhEXPA1 promoters was significantly repressed by GhARF2b (Fig. 7b, c). These results further supported that interaction of GhARF2b with GhHOX3 results in a much lower activity of targets gene activation, thus cotton fiber elongation was disturbed.

GhARF2b overexpression enhances cotton fiber initiation
Next, we examined the effects of GhARF2b up-regulation on cotton fiber initiation. The over-expression line OE-3 and RNAi line ds-2 were selected for analyses. The SEM with 60 × magnification of ovules of WT-R15, OE-3 and ds-2 collected at − 1, 0, 1 DPA were observed (Fig. 8). The cotton fiber initiation of the − 1-DPA ovules did not present differences among the three types of cottons, however, the 0-and 1-DPA ovules of OE-3 and ds-2 lines showed higher and lower densities of fiber initials compared to the wild-type control (Fig. 8). Further, we magnified the SEM views of ovules to 500-700× (Fig. 9). Obviously, at the fiber initiation stage (0, 1 DPA), the fiber initial density of the OE-3 was increased by about 1.5-fold compared with that of the wild-type, in contrast, the fiber initial density of the ds-2 line was reduced (Fig. 9a-c). These results support a role of GhARF2b in promoting cotton fiber cell initiation.
After whole genome duplication, the amplified genes generally undergo the events of functional loss, or neofunctionalization or subfunctionalization [50]. In this study, we found that six GhARF genes (five from A t subgenome) have experienced relatively faster positive selection compared to its diploid progenitors. Thus, duplicated genes from A t and D t subgenomes might be functionally diverged in the allotetraploid cotton after the merge of the two genomes. In addition, the GhARF genes expression profiles analyzed from the RNA-seq data showed subgenome-biased expression that might undergone functional divergence during the evolution. For instance, unequal expressions were observed in A and D-subgenome genes, including GhARF3c, GhARF16c, GhARF18a and GhARF20. These massive alterations in gene expression can cause distinct function and may just be one of the important features emerging  [8,51]. During the evolution of allopolyploid, some duplicate gene pairs (homoeologs) are expressed unequally, as also proved in the allopolyploid cotton genome with the features of asymmetrical evolution [42]. The above results indicated that this suite of unequally expressed genes may be a fundamental feature of allopolyploids.
Previous studies showed that ARF family genes have been identified in many plant species, including 23 ARF genes in Arabidopsis thaliana [26], 25 in Oryza sativa [27], 39 in Populus trichocarpa [52], 31 in Zea mays [53], 15 in Cucumis sativus [54] and 35 in G. raimondii [32]. Auxin response factors (ARFs) are important in plant development as they play crucial roles in regulating a variety of signaling pathways [24,25]. According to their functions, ARF proteins are divided into two classes: transcriptional activators and transcriptional repressors [24]. Many studies have revealed their regulatory roles in regulating various aspects of cellular activities [35,[55][56][57]. As transcriptional repressors, ARF2 was involved in the regulation of K + uptake by repressing HAK5 transcription in Arabidopsis [34]. In addition, ARF2 is regulated by a variety of upstream factors at the transcription and protein levels, and participated in the pathways of auxin, gibberellin, oleoresin, ethylene and abscisic acid [28,29,31,58]. In cotton, Zhang et al. uncovered that expression of the IAA biosynthetic gene, iaaM, can significantly increase IAA levels in the epidermis of cotton ovules at the fiber initiation stage, and increased the number of lint fibers and lint percentage in a 4-year field trial. They proved that the lint percentage of the transgenic cotton was increased in transgenic plants with a 15% increase in lint yield [59]. Han et al. found that the auxin response factor gene (GhARF3) was highly correlated with fibre quality by using the haplotype analysis and transcriptomic data. Above all, auxin signaling plays an essential role in regulating fibre development. In addition, Xiao et al. showed that G. hirsutum ARF genes promoted the trichome initiation in transgenic Arabidopsis plants [60]. They identified 56 GhARF genes in their study, including three GhARF2 genes [60]. They showed that GhARF2-1 could be exclusively expressed in trichomes, and overexpression of GhARF2-1 in Arabidopsis can enhance trichome initiation. But their study did not perform the cotton transformation to test the function of GhARF2-1 in cotton fiber cell.

Conclusions
In our study, we reported 73 GhARF genes in Gossypium hirsutum genome, including 9 GhARF2 genes. Among them, GhARF2b, was specifically higher expressed in developing fibers. Overexpression of GhARF2b represses fiber elongation, and RNAi silencing of GhARF2b promotes the fiber longer. Through yeast two-hybrid assays and the Dual-LUC experiment, GhARF2b plays a negative role in controlling cotton fiber elongation by interacting with GhHOX3. Further, GhARF2b was shown to promote the production of fiber initials, suggesting that auxin is an important player in controlling cotton fiber development. The auxin signaling pathways in developing cotton fiber cells deserve further investigation.

Identification of Gossypium species ARF factors
G. raimondii [37], G. arboreum [39], G. hirsutum [42] genome sequences were acquired from the CottonGen database [61]. We developed a Hidden Markov Model [62] profile matrix of ARF factors (Pfam ID: PF06507) via the hmmbuild program [63] with default parameters to identify Gossypium ARF transcription factor proteins. SMART conserved domain search tool [64] and Pfam databases [65] were used to identify the conserved domain.

Sequence alignment, Ka, Ks analyses and phylogenetic analyses
Gossypium ARF factor amino acids and nucleotide sequences were aligned by MAFFT software with the G-INS-i algorithm [66]. Ka, Ks and Ka/Ks values for each gene pairs between diploid and allotetraploid were calculated by DnaSP v5 [67]. The Neighbor-Joining (NJ) phylogenetic tree was drawn by MEGA 5.03 [68] by sampling 1000 bootstrap replicates based on the ARF whole protein sequences.

Gene expression analyses based on transcriptome
Raw RNA-Seq data were downloaded from the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/ bioproject/PRJNA248163) [42], including G. hirsutum seed, root, stem, leaf, torus, petal, stamen, ovary, calyx, ovule (− 3 dpa, − 1 dpa, 0 dpa, 1 dpa, 3 dpa, 5 dpa, 10 dpa, 20 dpa, 25dpa, 35dpa) and fiber (5 dpa, 10 dpa, 20 dpa, 25dpa). The method of gene expression analyses based on transcriptome was same to our previous study [69]. Differentially expressed genes were determined based on the following criteria: more than two-fold change and p-value less than 0.05. Multiple Experiment Viewer (MeV) [70] was used to display the gene expression values. were collected for expression analyses. Fibers were collected by scraping the ovule in liquid nitrogen. All these tissues were frozen in liquid nitrogen immediately after sampling and stored at − 80°C until RNA extraction. Three times were repeated for all these treatments.

qRT-PCR analyses
All cotton samples were ground in liquid nitrogen and total RNAs of these cotton tissues were extracted using the RNAprep pure plant kit (TIANGEN, Shanghai, China) following the manufacturer's protocol. The method of qRT-PCR analyses was same to our previous study [69]. The forward and reverse primers of specific gene for quantitative real-time PCR (qRT-PCR) analyses, were designed using the Primer5 software (Additional file 3: Table S1). Analyses were performed with SYBR-Green PCR Mastermix (TaKaRa) on a cycler (Mastercycler RealPlex; Eppendorf Ltd., Shanghai, China). The internal gene was G. hirsutum histone-3 (GhHIS3, AF024716), and the 2-ΔΔCt method was used to calculate the relative amount of amplified product [71]. Relative expression levels among different organs of G. hirsutum samples were normalized by calibrating with the WT samples.

Cotton transformation and fiber length analysis
The open reading frame (ORF) of GhARF2b was PCRamplified from a G. hirsutum cv R15 fiber cDNA library with PrimeSTAR HS DNA polymerase (Takara Biomedical Technology Co. Ltd., Beijing, China) and inserted into the pCAMBIA2301 vector to construct RDL1::GhARF2b. For 35S::dsGhARF2b, sense and antisense: GhARF2b fragments, separated by a 120-bp intron of the RTM1 gene from A. thaliana, were cloned into pCAMBIA2301. Primers used in this investigation are listed in Additional file 3: Table S1. The binary constructs were transferred into Agrobacterium tumefaciens.
Cotton transformation was conducted as reported in Shangguan et al. [72]. Transgenic cotton plants were grown in glasshouse or field. β-glucuronidase (GUS) staining and PCR amplification were performed to identify the transgenic lines of T 0 and subsequent generations. Thirty seeds from each plant were harvested to statistics fiber length.
BiFC and dual-luciferase (dual-LUC) assays We performed the BiFC assays following previous reports [73,74]. In summary, CDSs of GhARF2b and GhHOX3 were amplified and cloned into JW771 and JW772 vectors, respectively. Each gene was fused to the carboxyl-terminal half (cLUC-GhARF2b/GhHOX3) and the amino-terminal half (GhARF2b/GhHOX3-nLUC) of luciferase (LUC), respectively. cLUC and nLUC were used as controls. Assays were finished as described [5,75]. The Dual-LUC assay was performed as reported [5,76]. Briefly, the promoters containing intact L1-boxes of GhRDL1 and GhEXPA1 were inserted into pGreen-LUC vector with a firefly LUC reporter gene. Then, the constructs were transferred into Agrobacterium tumefaciens cell with a co-suppression repressor plasmid pSoup-P19. Transient transformation was conducted by infiltrating the A. tumefaciens cells into N. benthamiana leaves. The total protein was extracted from the infected area after 3 days. The Dual-Luciferase Reporter Assay System (Promega) was used to detect the fluorescent values of LUC and REN with a luminometer (BG-1, GEM Biomedical Inc.). The value of LUC was normalized to that of REN. Three biological replicates were measured for each experiment.

Microscope observation
Images were generated with an optical microscope (BX51, Olympus). For scanning electron microscope images, cotton ovules (− 1, 0, 1DPA) were attached with colloidal graphite to a copper stub, frozen under vacuum and visualized with a scanning electron microscope (JSM-6360LV, JEOL).