Systematic analysis of the Capsicum ERF transcription factor family uncovers new insights into the regulation of species-specific metabolites

Background: ERF transcription factors (TFs) belong to the Apetala2/Ethylene responsive Factor (AP2/ERF) TF family and play a vital role in plant growth and development processes. Capsorubin and capsaicinoids have relatively high economic and nutritional value, and they are specifically found in Capsicum. However, there is little understanding of how ERFs participate in the regulatory networks of capsorubin and capsaicinoids biosynthesis. Results: In this study, a total of 142 ERFs were identified in the Capsicum annuum genome. Subsequent phylogenetic analysis allowed us to divide ERFs into DREB (dehydration responsive element binding proteins) and ERF subfamilies, and further classify them into 11 groups with several subgroups. Expression analysis of biosynthetic pathway genes and CaERFs facilitated the identification of candidate genes related to the regulation of capsorubin and capsaicinoids biosynthesis; the candidates were focused in cluster C9 and cluster C10, as well as cluster L3 and cluster L4, respectively. The expression patterns of CaERF82, CaERF97, CaERF66, CaERF107 and CaERF101, which were found in cluster C9 and cluster C10, were consistent with those of accumulating of carotenoids (β-carotene, zeaxanthin and capsorubin) in the pericarp. In cluster L3 and cluster L4, the expression patterns of CaERF102, CaERF53, CaERF111 and CaERF92 were similar to those of the accumulating capsaicinoids. Furthermore, CaERF92, CaERF102 and CaERF111 were found to be potentially involved in temperature-mediated capsaicinoids biosynthesis. Conclusion: This study will provide an extremely useful foundation for the study of candidate ERFs in the regulation of carotenoids and capsaicinoids biosynthesis in peppers. the ERF family were characterized by utilizing the newly sequenced Capsicum annuum genome. Characteristic analysis was carried out to identify the involvement of specific ERF family members in carotenoids and capsaicinoids biosynthesis. Overall, this study contributed to the understanding of the function of ERF family members in the carotenoids and capsaicinoids biosynthetic pathways in peppers. Capsaicinoids biosynthesis is affected by environmental factors. Therefore, the function of candidate ERF TFs associated with capsaicinoids biosynthesis was also analysed in response to different temperatures. ACL:Acyl FatA:Acyl-ACP thioesterase; HMM:Hidden markov MG:mature 7:Breaker days.


Results
Identification and multiple sequence alignment of CaERF proteins in pepper A total of 142 ERF genes were obtained from the Capsicum annuum genome after excluding redundant sequences, the candidates containing an AP2 plus a B3 domain, and candidates containing more than two AP2 domains (Table S3). The 142 candidate genes were renamed consecutively according to the chromosomal positions (Table S3; Fig. S1). In addition, all identified ERF members encoded 44-672 residues. The molecular weight (Mw) of each CaERF protein ranged from 7.19 kDa to 74.91 kDa, and the theoretical pI varied from 4.24 to 11.10. Most of these proteins were unstable, and only fifteen CaERF proteins were stable (instability index < 40) (Table S3).
Before phylogeny analysis was performed, multiple alignment analyses were performed using the amino acid sequences of the AP2 domains. The classification of all identified CaERFs is shown in Fig. 3, as described later. The alignment analyses indicated that the DREB subfamily possesses a specific WLG motif that is a completely conserved residue ( Fig. 2A; Fig. S2), while more than 95% of members in the ERF subfamily had a WLGT motif for the ERF subfamily except for groups X and XI ( Fig. 2B; Fig. S2). The DREB subfamily was completely conserved in V15 and E20, and more than 95% of the members of groups V to IX in the ERF subfamily contained A14 and D19 ( Fig. 2AB; Fig. S2). The shaded residues shown for 37 DREB subfamily members indicate complete conservation in the AP2 domain ( Fig. 2A; Fig. S2). However, the alignment revealed that the N-terminal regions of the AP2 domains in the ERF subfamily possessed a high homology, while those of the C-terminal regions showed very low conservation ( Fig. 2B; Fig. S2). Moreover, groups X and XI possessed very low conservation in the 15th and 20th amino acids, and there was difficulty in in classifying these residues. Nevertheless, taking into account the topology of the tree in Fig. 3, groups X and XI were preliminarily classified as the 'ERF-like subfamily'.

Phylogenetic analysis of the ERF family in four plant species
To clarify the phylogenetic relationships, an unrooted phylogenetic tree was constructed for all of the identified CaERF sequences based on their alignment with those in Arabidopsis by a neighbour-joining phylogenetic analysis. As shown in Fig. 3, based on the classification of AtERF in Nakano's and Sakuma's studies [15,16], putative CaERF proteins were divided into two large subfamilies that corresponded to the DREB and ERF subfamilies ( Fig. 3; Fig. S3). According to the cited studies [15] and taking into account the topology of the tree, the two subfamilies were further defined as 11 groups named group I to XI (Table 1; Fig. S3). Notably, some differences existed in groups IX and X, which were then subdivided into IXa, IXb, Xa and Xb, because the members of groups IXb and Xb were only found in peppers. Additionally, the members of group XI were present only in pepper as well, whereas group V-Like (V-L) were absent in pepper (Table 1). These results indicated that the members of IXb, Xb and XI might be pepper-specific groups. To determine whether these three groups were specific to peppers, all CaERF genes were used to construct a neighbour-joining phylogenetic tree with those from tomato (137), rice (138) and Arabidopsis (Fig. S4). The topology of the phylogeny was mostly similar to that tree obtained when using only protein sequences from pepper and Arabidopsis (Fig. S3). The number of ERF proteins in each group is listed in Table 1. Groups IXb and Xb contained a significantly higher number of ERF TFs from peppers. In contrast, the members of ERF members in group XI included rice and tomato, and no significant differences were observed in other investigated species (Table 1). Therefore, groups IXb and Xb were designated as putative 'pepper-specific groups' (Fig. 3).
To evaluate the biological functions of the CaERF protein of the groups, the functional characteristics of ERF from Arabidopsis, tomato and pepper were investigated in the literature. As shown in Table S4, the members of the same group possessed similar biological functions, and group VIII members were found to be likely involved in alkaloid biosynthesis. Because of the importance of capsaicinoids and capsorubin in pepper, the possibility of the Capsicum annuum genome (version 2.0) containing putative ERF homologs involved in secondary metabolites was investigated. A previous study demonstrated that Erf and Jerf in peppers were involved in the regulation of the pungency phenotype [31]. Erf and Jerf were mapped to CaERF53 and CaERF101 in the Capsicum annuum genome (version 2.0), respectively. (Table S5). Moreover, CaERF101 was identified as the putative orthologue of both CaPF1 and JERF1, and it was shown to be associated not only with the regulation of polyamine biosynthesis but also with ABA biosynthesis (Table S5). It was likely that the members of group VII which contained CaERF53 and CaERF101, were related to secondary metabolite biosynthesis.

Conserved Motif Analysis Of Caerf
Conserved amino acid motifs represent functional areas maintained during the evolutionary process.
The conserved motifs within the 142 CaERF sequences were analysed and compared using MEME. A total of 15 significantly conserved motifs (E-value < 10 − 32 ) possessing 11-41 residues were identified and named motif 1 to motif 15 (Table S6). Five conserved amino acid motifs, motif 1 to motif 5, were found to be located in the AP2 domain region, which were present in the majority of CaERF proteins and designated as "general motifs" (Fig. 4); however, both motif 2 and motif 5 were mainly shared within group VIII in the ERF subfamily (Fig. 4B). The remaining motifs (motif 6 to motif 15) were distributed outside of the AP2 domain and were classified as "specific motifs". Motif 9 and motif 12 were primarily restricted to group IV in the DREB subfamily (Fig. 4A). Motif 10 and motif 11 were specifically contained in group VIII. Motifs 6 and 13 were found in group X, and motif 14 was in group V (Fig. 4B). Further, motif 15 was specifically present in group VII. Nevertheless, the same group of trees harboured similar motif patterns (Fig. 4). were also gradually expressed after this stage. This result indicated that these ERF TFs may regulate different genes involved in capsorubin biosynthesis. Thus, the members of cluster C9 and cluster C10 were candidates for the regulation of capsorubin biosynthesis.
The expression of genes involved in capsaicinoids synthesis tended to rapidly increase from 6 DPA to 25 DPA, and then they gradually decreased, which was consistent with abundant production of capsaicinoids at stages from 13 DPA to 25 DPA (Fig. 5C). A total of 38 CaERFs (26%) were not expressed in any of the developmental stages of the placenta. The placenta-expressed genes were hierarchically clustered based on similar expression patterns, yielding 10 clusters (Fig. 5D). Generally, CaERF in the same phylogenetic group revealed distinct expression. In the ten clusters, only the expression of members in cluster L3 and cluster L4 exhibited good agreement with the stages of  in pericarp tissue not only maintained a good agreement with the tendency of carotenoids biosynthesis (β-carotene, zeaxanthin and capsorubin), but also it exhibited a lower level of transcription in other tissue (roots, flowers, stems, placentas, leaves and seeds) (Fig. 6B). Thus, it was likely that the members of cluster C9 and cluster C10 were involved in carotenoids biosynthesis.

Validation Of Capsaicinoids Biosynthesis Related Erf Tfs
The capsaicin and dihydrocapsaicin content significantly increased in placental tissue from 10 DPA to 25 DPA, after which they increased slowly (Fig. 7A). The pattern of expression levels CaERF102, CaERF53, CaERF111 and CaERF92 in placental tissue were similar to the capsaicinoids biosynthesis patterns, while CaERF28 expression did not show a developmental stage-regulated pattern. With the exception of CaERF53, these genes were also highly expressed in certain tissues (Fig. 7B).
Additionally, we aimed to obtain a preliminary understanding of whether capsaicinoids biosynthesis was regulated by CaERF genes in pepper to enable adaption to different temperatures. As shown in Fig. 7C, the capsaicin and dihydrocapsaicin content dramatically accumulated with increasing temperature but the capsaicin content at T25 was significantly higher than it was in T33. The expression of CaERF53, CaERF92 and CaERF28 was the highest in T25, which was consistent with the accumulated level of capsaicin, while the expression of CaERF102 and CaERF111 decreased with increasing temperature (Fig. 7D). Therefore, these results indicated that CaERF102, CaERF53, CaERF111 and CaERF92 might be associated with capsaicinoids biosynthesis in pepper, but they perform different functions response to temperature to control capsaicinoids biosynthesis.

Discussion
The AP2/ERF superfamily is one of the largest TF families in the plant kingdom, and it has been successfully identified and investigated in many plant species of sequenced genomes [47][48][49].
Although the AP2/ERF superfamily in peppers was reported by Jin et al [50], they indicated that CaAP2/ERFs might be involved in the response to P. capsici in peppers. Capsorubin and capsaicinoids are unique to Capsicum spp., and they possess high economic and nutritional values. This study put more emphasis on demonstrating the relationship between Capsicum-specific secondary metabolites and the ERF family (the largest branch of the AP2/ERF superfamily). study of the Capsicum genome contributes to understanding the structure of gene families and predicting their biological functions.
In this study, a total of 142 non-redundant ERF genes were identified from the Capsicum annuum genome. The ERF family in Arabidopsis (122) [15], watermelon (120) [48], rice (143) [51], Chinese cabbage (248) [52], cauliflower (146) [48] and Bryum argenteum (75) [53] were successfully identified and investigated. These results indicated that the number of ERF genes in different plants was distinct. Additionally, alignment analyses showed that the members of the ERF and DREB subfamilies possessed a specific WLG motif, as observed in the report of Cui et al [49]. The distinction between the ERF and DREB subfamilies is that they can interact with the different motifs. The ERF subfamily typically binds to the GCC-box in the promoter regions, whereas the DREB subfamily is characterized by dehydration-responsive element binding factor containing a core motif of CCGAC [29,54] According to Nakano and Sakuma's study [15,16], this DNA-binding specificity is mainly determined by the 14th and 19th amino acids in the AP2 domain (V14 and E19 for the DREB subfamily but A14 and D19 for the ERF subfamily); however, the DREB subfamily is completely conserved at V15 and E20, and the ERF subfamily is highly conserved at A14 and D19 (Fig. 2).
All CaERF members were used to construct a phylogenetic tree with matched proteins from tomato, rice and Arabidopsis. The classification of the tree was defined and annotated based on the proposed by Nakano et al. [15], and it ultimately defined 11 groups. This result was similar to that of Jin's study in peppers [50], no matter the topology or classification of the tree. However, in this study, both groups X and IX were subdivided, and a new group XI was identified. Group XI showed a very low conservation of certain amino acids, which resulted in difficult classification. They group was classified as the 'ERF-like subfamily'. It was likely that many gene signature motifs underwent divergent evolution after duplication from a common ancestor. Moreover, groups IXb and Xb were regarded as putative 'pepper-specific groups' (Fig. 3), and we cannot completely rule out the possibility that the members of putative "pepper-specific groups" were related to capsorubin and capsaicinoids biosynthesis. However, the members of these TFs were rarely expressed both in the pericarp and placenta throughout different developmental stages. Therefore, it seems that these 'pepper-specific groups' are not the master regulators of capsorubin and capsaicinoids biosynthesis.
Numerous studies have indicated that the members of a group in large families of plant TFs generally possess similar conserved amino acid motifs or domains, such as MYB, WAKY, and NAC [55][56][57]. In most cases, similar amino acid motifs are likely to share a similar function. Motifs 1 to 5, which are mainly located in the AP2 domain region were defined as "general motifs", (Fig. 4B). Motifs 6 to 15 distributed outside the AP2 domain and were designated as "specific motifs" (Fig. 4B); they are potentially related to nuclear localization and transcription regulation [58]. Some reports suggested that the D(I/V)QAA sequences were regarded as the basic characteristics for the DREB family in cauliflower [48,59], whereas motif 8 contained these conserved sequences, and it was primarily restricted to groups VI and X of the ERF family (Fig. 4). It was likely because TFs have occurred divergent evolution in different species. Indeed, groups VI and X in the phylogenetic tree were near the branch of the DREB family (Fig. 3).
In some cases, the same phylogenetic subgroup had a similar transcript level [60], implying that members of the same phylogenetic subgroup might perform similar functions. SlERF6 was involved in the regulation of carotenoids biosynthesis and fruit ripening in tomato (Table S4) [18], which was located in group VII (Fig. S4). However, in this study, the genes of cluster C9 and cluster C10 were from different groups (except for CaERF101, which was in group VII), and they were regarded as candidates for the regulation of capsorubin biosynthesis. Because their expression patterns exhibited good agreement with the transcriptional level of the capsorubin synthesis gene (Fig. 5AB), and the members of this two pericarp highly expressed cluster (CaERF82, CaERF97, CaERF66, CaERF107 and CaERF101) maintained good agreement with the increase in carotenoids biosynthesis (β-carotene, zeaxanthin and capsorubin) in pericarp tissue (Fig. 6B). These results indicated that the genes of the same phylogenetic subgroup exhibited distinct expression patterns, which is consistent with the observation from a previous study [60,61] Fig. 5C and 5D). These results implied that CaERF101 may perform multiple functions in addition to capsaicinoids biosynthesis. Moreover, the expression of four CaERFs CaERF102, CaERF53, CaERF111 and CaERF92 showed a positive correlation with the level of capsaicinoids biosynthesis (Fig. 7AB). In addition, capsaicinoids biosynthesis is regulated by environmental factors. ERF TF transcription is influenced by different temperatures, and ERF TFs have been shown to enhance plant tolerance to stress by being partially responsible for increasing certain metabolites [45,62]. For example, overexpression of DREB1A can cause accumulation of monosaccharides, disaccharides, trisaccharides, and sugar alcohols to improve the tolerance to freezing and dehydration stress in transgenic plants [63]. In this study, the placenta significantly accumulated capsaicin and dihydrocapsaicin content following the higher temperature treatment. The expression of CaERF53 and CaERF92 increased, but that of CaERF102 and CaERF111 decreased with increasing temperature (Fig. 7C). Therefore, it may be that the members of cluster L3 and cluster L4 are related to temperature mediated capsaicinoids biosynthesis. However, CaERF111, CaERF92, CaERF102 and CaERF111 might play different roles in the regulation of capsaicinoids biosynthesis when exposed to different temperatures. Conclusion a total of 142 members in the ERF family were identified in the pepper, and they were divided into DREB and ERF subfamilies. The DREB subfamily is completely conserved at V15 and E20, while the ERF subfamily is highly conserved at A14 and D19. The phylogenetic analysis of the ERF family resulted in a distribution of 11 groups, of which the DREB subfamily included group I to group IV, and the ERF subfamily contained group V to group XI. Generally, the same group of trees possessed similar motif patterns. Motifs 1 to 5 are present in the largest number of CaERF proteins and were thus designated "general motifs", whereas other motifs distributed outside the AP2 domain were classified as "specific motifs". The members of cluster C9 and cluster C10 might be involved in capsorubin biosynthesis, especially those with high expression: CaERF82, CaERF97, CaERF66, CaERF107 and CaERF101. These five genes not only showed a trend that was similar to that of the accumulation of carotenoids biosynthesis genes (β-carotene, zeaxanthin and capsorubin) in pericarp tissue, but also were expressed at low levels in other tissues. The genes in cluster L3 and cluster L4 were likely associated with the regulation of capsaicinoids biosynthesis. CaERF102, CaERF53,  [65], and those of Arabidopsis were obtained from Nakano et al (Table S1) [15]. The trees were constructed and visualized using Evolview (http://www.evolgenius.info/evolview).

Rna-seq Data Analysis
The pepper RNA-seq data (GenBank: AYRZ01000000) were downloaded from the SRA database  Table S2.
The qPCRs were carried out in a Bio-Rad CFX384 Touch TM system with qPCR SYBR Green Master Mix (Q131-02, Vazyme, China). The reaction mix was 1 µL of cDNA template, 0.2 µL of each primer (10 µmol/µL), 5 µL of SYBR Green Master Mix, and 3.6 µL of nuclease-free water. The PCR amplification conditions were as follows: 95℃ for 5 min; then 40 cycles at 95℃ for 5 s and 60℃ for 30 s. A melting-curve analysis was performed at 95℃ for 5 s, which was followed by a temperature increase from 60℃ to 95℃. Additionally, CA00g52149 and CA12g20490 (ID in version 1.55 of the Capsicum annuum genome) were used as housekeeping genes; they were identified in the pepper genome and the data were unpublished. The relative expression of each ERF gene was calculated with the 2 −ΔΔCt method [69]. The qPCRs using the placenta were performed with biological triplicates.
The results were analysed statistically using SPSS 22 with Dunnett's t-test to determine significant differences.

Quantification Of Carotenoids And Capsaicinoids Content
Oven-dried placental tissue from pepper fruits was ground into fine powder with a mortar and pestle.
A total of 0.1 g that was extracted from the samples was mixed with 5 ml of methyl alcohol and tetrahydrofuran (1:1, HPLC grade) in 15 ml of centrifuge tubes, and then they were ultrasonicated for 30 min. These samples were extracted for 12 h at room temperature.one millilitre of the supernatant was collected and filtered through a 0.22 µm millipore membrane, and then the capsaicinoids content was determined by an HPLC system (Alliance E2695, Waters, America).  Table S1. ERF family genes in Arabidopsis, tomato and rice.     Carotenoids (A) and capsaicinoids (B) biosynthetic pathways. A. Geranylgeranyl diphosphate that is synthesized from the prenyl lipid biosynthesis pathway is catalyzed by phytoene synthase to produce phytoene, which represents the first step in the carotenoid pathway.
Then, after synthesizing lycopene, this pathway is divided into two different branches: αcarotene is finally developed into lutein, and β-carotene is ultimately formed into capsorubin or capsanthin as a function of capsanthin/capsorubin synthase (CCS   Neighbour-joining phylogeny of the pepper ERF family in relation to Arabidopsis. The groups were named and classified according to Arabidopsis [15]. The DREB and ERF subfamilies are divided by the dashed red line. Both groups X and XI possessed very low conservation in the 15th and 20th amino acids, and they were near the dashed red line. Groups X and XI were tentatively defined as the 'ERF-like subfamily'.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.