Systematic analysis of the basic/helix-loop-helix (bHLH) transcription factor family in pummelo (Citrus grandis) and identification of the key members involved in the response to iron deficiency

Background Iron (Fe) deficiency is a common problem in citrus production. As the second largest superfamily of transcription factors (TFs), the basic/helix-loop-helix (bHLH) proteins have been shown to participate in the regulation of Fe homeostasis and a series of other biological and developmental processes in plants. However, this family of members in citrus and their functions in citrus Fe deficiency are still largely unknown. Results In this study, we identified a total of 128 CgbHLHs from pummelo (Citrus grandis) genome that were classified into 18 subfamilies by phylogenetic comparison with Arabidopsis thaliana bHLH proteins. All of these CgbHLHs were randomly distributed on nine known (125 genes) and one unknown (3 genes) chromosomes, and 12 and 47 of them were identified to be tandem and segmental duplicated genes, respectively. Sequence analysis showed detailed characteristics of their intron-exon structures, bHLH domain and conserved motifs. Gene ontology (GO) analysis suggested that most of CgbHLHs were annotated to the nucleus, DNA-binding transcription factor activity, response to abiotic stimulus, reproduction, post-embryonic development, flower development and photosynthesis. In addition, 27 CgbHLH proteins were predicted to have direct or indirect protein-protein interactions. Based on GO annotation, RNA sequencing data in public database and qRT-PCR results, several of CgbHLHs were identified as the key candidates that respond to iron deficiency. Conclusions In total, 128 CgbHLH proteins were identified from pummelo, and their detailed sequence and structure characteristics and putative functions were analyzed. This study provides comprehensive information for further functional elucidation of CgbHLH genes in citrus.


Background
Citrus is the largest fruit crop in the world, which provides not only necessary nutrition for human but also substantial economic income for farmers. In 2015, the global citrus area was 13.5 million hectares and the yield reached 178.2 million tons (FAO statistics, http://faostat. fao.org/default.aspx). However, citrus production is also continuously influenced by many environmental factors, such as diseases, cold, drought, heat, and nutrient disorders, among which iron (Fe) deficiency is a common problem that can cause severe chlorosis of leaves, impaired tree vigor, and reduction of fruit yield and quality in citrus production [1]. In particular, in calcareous soils, citrus plants are highly sensitive to low Fe availability because bioavailable forms of ferrous Fe (II) are oxidized into insoluble ferric Fe (III) in a high-pH and oxygen-rich environment [2,3]. To cope with this issue, plants have developed two major adaptive mechanisms for efficient Fe uptake from soils. All dicot and non-graminaceous monocot plants, such as Arabidopsis thaliana (A. thaliana), use a reduction-based strategy (strategy I), while graminaceous plants such as rice employ a chelation-based strategy (strategy II) [3][4][5]. In both strategy I and strategy II, one class of transcription factors (TFs), the basic/helix-loophelix genes (bHLHs), is found to play the core regulatory role [3,4,[6][7][8][9].
The first discovery of bHLH was during the study of murine muscle development [10]. Thereafter, the proteins containing this conserved domain have been widely identified in all three eukaryotic kingdoms and constitute one of the largest families of TFs [11][12][13][14][15]. In plants, the bHLH proteins are recognized as the second largest superfamily of TFs [12]. The bHLH domain consists of approximately 60 amino acids with two functionally distinct regions, the basic region and the HLH region [12,14]. The basic region, containing 13 to 17 amino acids, is located at the N-terminus of the bHLH domain and functions in specific recognition and binding to the DNA motif of the target gene promoter [16]. In contrast, the HLH region is located at the C-terminal end and consists of approximately 40-50 amino acids with two amphipathic α helices that are linked by a loop of variable length. This region often functions in domain dimerization and allows the formation of homodimers or heterodimers to promote protein-protein interactions [12,16,17]. Outside of the conserved bHLH domain, the sequence of bHLH proteins is considerably divergent [18].
Although many bHLH genes in response to Fe deficiency have been well documented in model plants, it is still poorly understood in citrus. Pummelo (Citrus grandis) is a major cultivated species of citrus, which also belongs to the progenitor species that contributes to the generation of hybridized mandarins and sweet orange [37]. Both directly cultivated pummelos and the varieties grafted on pummelo rootstocks readily show Fe deficiency. Recently, high quality genome information of pummelo [38] was published, which promoted us to consider that we can use pummelo as material to systematically analyze the bHLH family genes and identify the essential members involved in the response to Fe deficiency in citrus. Therefore, in the present study, we identified 128 bHLH family members from pummelo at the whole genome level. We also carried out a detailed analysis and prediction of their sequence characteristics, phylogeny, gene duplication, chromosomal distribution, gene structure, protein motif, and protein-protein interaction. The bHLH members in response to Fe deficiency were identified by performing gene ontology (GO) annotation and time-course expression analysis under Fe deficiency. Our results provide valuable clues for functional elucidation of the bHLH genes in citrus, especially for revealing the regulatory mechanisms of bHLHs in citrus under Fe deficiency in the future.

Identification and classification of CgbHLH members in pummelo
Based on the methods in "Materials and methods," we finally identified 128 bHLH proteins from pummelo (Table 1 and Additional file 2: Table S1). These bHLHs showed 21.48% to 73.16% sequence identity with bHLHs  Table S1).
To explore the evolutionary relationship and the classification of CgbHLHs, a neighbor-joining phylogenetic tree was constructed with conserved sequences of 128 CgbHLHs and 136 AtbHLHs. Based on this phylogenetic analysis and the previous reported classification of the AtbHLHs, we classified 128 CgbHLHs into 18 subfamilies using Arabic numerals 1-18 ( Fig. 1). Group 1 was the largest subfamily with 17 CgbHLH members, while the smallest, group 3, contained only 2 members.
Multiple sequence alignment and analysis of the gene structure, conserved motif and domain of CgbHLHs As shown in Additional file 1: Figure S1, multiple sequence alignment of 128 CgbHLHs showed that most of them were highly conserved in their bHLH domains, except that CgbLH37, CgbLH87.1, CgbLH88, CgbLH29.1, CgbLH45 and CgbLH162.9 were absent in the basic region, and CgbLH96.2 and CgbLH25.1 were short of the second helix. There were 19 conserved amino acid residues with a consensus ratio higher than 50% Among 98 E-box CgbHLHs, 58 proteins (containing His/Lys-9, Glu-13 and Arg-17) were predicted to bind the G-box motif (CACGTG), while 40 proteins were predicted to recognize other types of E-boxes (CANNTG) and were defined as non-G-box binders (Fig. 1).
Gene structure analysis showed that 127 CgbHLHs contained 1 to 14 exons, and one CgbHLH (CgbHLH73.2) contained 21 exons (Additional file 2: Table S1). A total of 6 members (CgbHLH37/43/87.2/88 in subfamily 11 and CgbHLH6/14.1 in subfamily 18) were intronless, and 51 members had no untranslated region (UTR). Conserved motif prediction showed the constitutions of motif 1 to motif 20 in each CgbHLH protein (other possible motifs are not shown). Details of the 20 motifs are shown in Additional file 2: Table S1 and Fig. 3. Motifs 1, 2 and 3, located in bHLH domains, were found in almost all CgbHLHs, while the other motifs existed only in certain members. The members that phylogenetically clustered together or in the same subfamily often showed a similar gene structure and motif pattern. For example, the closely clustered pair CgbHLH93.1 and CgbHLH93.2 had identical motifs 1, 2, 3, 4 and 15. Most of the members of subfamily 1 contained motifs 1, 2, 3 and 4, and those of subfamily 2 contained motifs 4, 7 and 15. In addition, we found that motifs 9 and 10 were identified only in subfamily 5 and subfamily 6, respectively. Five types of motifs (motif 1, 3, 6, 18 and 19) were repeated twice or thrice in certain members, such as twice for motif 3 in CgbHLH48/ 50/57/137.1/137.2, twice for motif 6 in CgbHLH75.2, and thrice for motif 1 in CgbHLH155. As shown in Fig. 3, conserved domain analysis showed that all 128 CgbHLHs had an HLH domain, but there were also 36 CgbHLH genes with other domains, which might be fusion genes.

Chromosomal distribution and gene duplication of CgbHLHs
According to the genome annotation information, we found that 125 CgbHLH genes were distributed on nine chromosomes, while the chromosomal location of three CgbHLHs (CgbHLH6/110.2/121) could not be determined (Fig. 4). Chromosome 5 (chr5) contained the most CgbHLHs (25 genes), followed by chr2 (22 genes), chr8 (18 genes) and chr1 (14 genes). Although chr4 was longer than chr7 and chr8, it contained the fewest CgbHLHs (only 4 genes). The gene duplication and collinear correlations analysis showed that a total of 12 CgbHLHs were identified to be tandem duplicated genes, distributed on chr1, chr2, chr8 and chr9 (Fig. 4). In addition, 47 CgbHLHs were predicted to be segmental duplicated genes, accounting for about 37% of all  CgbHLH genes, and the identified collinear genes showed great overlapping with these segmental duplicated genes (Fig. 4). These results suggested that the tandem and segmental duplication events that occurred during citrus evolution might have played an essential role in CgbHLH family expansion.

GO annotation and protein interactions of CgbHLHs
To determine potential functions of each CgbHLH gene, GO annotation was performed for all CgbHLHs. As shown in Fig. 5a, except for two genes that were not annotated a GO term, the other 126 CgbHLHs were annotated in three functional categories: biological process (BP), cellular component (CC) and molecular function (MF). In the BP category, these CgbHLHs were further annotated to respond to abiotic stimulus (86 genes), reproduction (88 genes), post-embryonic development (84 genes), flower development (62 genes) and photosynthesis (93 genes). Under CC and MF categories, we found that all 126 CgbHLHs were annotated to the nucleus, protein or DNA binding, and transcription factor activity, which agreed well with TF property of these CgbHLHs. Proteins in the bHLH family often interact with each other by forming homodimers or heterodimers, which are essential for binding and regulating downstream target gene expression. Thus, prediction of protein interactions of CgbHLHs was performed by using orthologous bHLHs of A. thaliana. The result showed that a total of 27 CgbHLH proteins were predicted to have protein interaction relationships (Fig. 6). For example, CgbHLH29 (FIT ortholog), CgbHLH39, CgbHLH47 (PYE ortholog), CgbHLH104, CgbHLH105 (ILR3 ortholog) and CgbHLH110 were predicted to have direct or indirect interactions with each other. CgbHLH33 (ICE2 ortholog) was predicted to interact with CgbHLH45 (MUTE ortholog), CgbHLH97 (FMA ortholog) and CgbHLH98 (SPCH) directly. Moreover, interactions among several phytochrome interacting factor (PIF) orthologous proteins (CgbHLH8, CgbHLH9, CgbHLH15 and CgbHLH72) were predicted. Overall, the result provides an important reference for identifying true interactions of CgbHLHs with biochemical experiments.

Candidate CgbHLH members in response to Fe deficiency
As described in the "Introduction," bHLH genes play an essential regulatory role in plant Fe homeostasis. To identify candidate CgbHLH members in response to Fe deficiency in pummelo, their expression levels in the root were evaluated by analyzing the previous RNAseq data and performing qRT-PCR confirmation. Based on analysis of previous RNAseq data published by Guo et al. (2017) [39], a total of 95 CgbHLH genes were determined to be transcribed in the root of normal cultured pummelo (Fig. 5b). Their transcription levels (expressed as the TPM value) ranged from 0.05 (CgbHLH85.2) to 281.08 (CgbHLH105.1). These root-expressed genes are preliminarily considered to respond to Fe deficiency, as the root is directly responsible for ion uptake. To further narrow the range of candidates, we searched for root-expressed genes among the GO annotated abiotic stimulus responsive genes; as a result, 66 CgbHLHs were found to overlap (Fig. 5c). Due to lower sensitivity of qRT-PCR than RNAseq in determining gene expression, we finally selected 39 CgbHLHs with TPM values higher than 3.00 from these 66 genes for qRT-PCR confirmation. In addition, three genes (CgbHLH104, CgbHLH105.1 and CgbHLH105.2) that are orthologous to known Fe-related bHLHs but that were not included in the 39 CgbHLHs were also determined by qRT-PCR in this study. Except for eight genes were still undetectable, the other 34 genes were determined to be up-or downregulated in roots of pummelo after 0.5 d, 1.5 d, 2 d, 7 d, and 12 d of Fe-deficient (−Fe) treatments when comparing with those at corresponding time points of CK (Fig. 7). In general, the tested genes showed three types of expression patterns. The first type, such as CgbHLH3, CgbHLH6, CgbHLH13, CgbHLH14.2, CgbH LH16, CgbHLH29.4, CgbHLH30, CgbHLH39, CgbHLH 48, CgbHLH63, CgbHLH68, CgbHLH73.1, CgbHLH79, CgbHLH80, CgbHLH102.1, CgbHLH104, CgbHLH10 5.1, CgbHLH107.2, CgbHLH122, CgbHLH123 and Cgb HLH128, was up-regulated from an early period (0.5 d), which indicated an early response of these genes to Fe deficiency. The second type, such as CgbHLH14.1, CgbHLH33.1, CgbHLH49, CgbHLH69.1, CgbHLH91, (See figure on previous page.) Fig. 5 Functional annotation and tissue expression of CgbHLH genes. a All annotated GO terms including biological process (BP), cellular component (CC) and molecular function (MF) of 126 CgbHLHs (two other CgbHLHs were not annotated a GO term). b Expression heatmap of CgbHLHs. The TPM values were generated from the RNAseq data of pummelo roots in a public database. c Venn diagram shows the number of the genes that respond to abiotic stimulus and expresssed in roots Fig. 6 Predicted protein-protein interactions of CgbHLHs according to their orthologs in A. thaliana. In the network, only the pairs with higher than 40% sequence identity between CgbHLHs and AtbHLHs and with an interaction score > 0.7 are shown. Line and node colors indicate the different kinds and degrees of interactions, respectively. The filled or empty nodes represent known or unknown 3D structures, respectively. The abbreviated names are the genes that have been reported in A. thaliana

Discussion
The genomes of several citrus varieties have been released, including the sweet orange (Citrus sinensis) genome published by Xu et al. (2011) [40], the clementine mandarin genome published by Wu et al. (2014) [41], and the pummelo, papeda (Citrus ichangensis) and citron (Citrus medica) genomes published by   [38]. After the completion of these genomes, systematic identification and analysis of important gene families have been widely reported for citrus [1,[42][43][44]. These studies have provided us a comprehensive understanding of potential gene functions and have promoted related research progress. However, no such study has been done on the citrus bHLH family, except that Geng and Liu (2018) [45] identified 56 putative bHLH genes from sweet orange. In this study, we used a genomewide approach to identify 128 CgbHLHs from pummelo. This number possibly indicates the real quantity of bHLH genes in most citrus varieties because the ratio of CgbHLHs in the pummelo genome (0.42%) is very similar to that of tomato (0.46%) [46], rice (0.44%) [13], poplar (0.40%) [11], apple (0.42%) [27], and grape (0.40%) [47]. With respect to the bHLHs of these plants, a high proportion of tandem duplications and segmental duplications has been determined, which indicates that bHLH expansion was possibly derived from gene duplication during evolution [48]. In citrus, an ancient whole-genome duplication event (WGD) occurred [40], and it could have led to this chromosome segmental duplication.
In the CgbHLH family, 19 amino acid residues are highly conserved, with a consensus ratio higher than 50%. These residues are also highly conserved in bHLHs of A. thaliana [15], maize [49] and grape [47], indicating conservation of bHLH families among different plants.
Evidence has shown that at least five basic residues in the basic region of the bHLH domain determine DNA binding activity of bHLHs; Glu-13 is critical in specific recognition of the E-box DNA binding motif, while Arg-16 functions in fixing and stabilizing the position of Glu-13; moreover, His/Lys-9 and Arg-17 confer specificity to Gbox (CACGTG) recognition [15]. Based on these findings, 128 CgbHLHs can be classified into four types: G-Box, non-G-Box, non-E-Box and non-DNA binding (Fig. 1). This classification indicates the basic functional property of each CgbHLH. We also noticed that Leu-27 was conserved in all CgbHLH proteins, and a similar result was found in AtbHLHs [15]. A previous study showed that the Leu residue at position 27 is necessary for dimer formation [50]. Thus, Leu-27 of CgbHLHs suggests their dimerization capacity.
To date, only very limited citrus bHLH genes have been functionally characterized, including PtrbHLH, an ortholog of AtbHLH33 (ICE2), that was isolated from trifoliate orange (Poncirus trifoliata) and proved to confer cold tolerance by regulating both POD-and CATmediated reactive oxygen species (ROS) scavenging [29,45]. CsbHLH18 of sweet orange was also proved to modulate cold tolerance and ROS homeostasis [51]. Moreover, a CubHLH1 of Satsuma mandarin (Citrus unshiu Marc.) was overexpressed in transgenic tomato fruit, resulting in modulation of carotenoid metabolism [52]. Generally, the potential functions of the unknown genes could be predicted through their orthologous genes. Based on phylogenetic and orthologous analyses, the CgbHLH family was classified into 18 subfamilies (Fig. 1). The closely clustered CgbHLHs and AtbHLHs in the same subfamily/group may have similar functions. In group 1, AtbHLH29 (FIT), AtbHLH33 (ICE2), AtbHLH116 (ICE1), AtbHLH93 (NFL), AtbHLH61, and AtbHLH21 (AMS) have been shown to be involved in Fe deficiency, cold tolerance and flower development [32,[53][54][55], suggesting that there are similar functions for the CgbHLHs of this group. In the other groups, many AtbHLH genes such as AtbHLH70 (MYC70) and AtbHLH45 (MUTE) in group 2, AtbHLH10, AtbHLH89 and AtbHLH91 in group 3, AtbHLH38, AtbHLH39, AtbHLH100 and AtbHLH101 in group 4, etc. have also been functionally characterized [9,33,[56][57][58]. Based on these previous studies, the potential functions of most CgbHLH genes could be predicted. Furthermore, GO annotation and protein-protein interaction analysis of the CgbHLHs could help us to understand their possible functions.
Fe deficiency-regulated bHLHs are the main concerns in this study. qRT-PCR showed that most of the tested CgbHLH genes were significantly differentially expressed at two or more time points under Fe deficiency. Among them, CgbHLH29.4, CgbHLH39, CgbHLH104, CgbHLH105.1 and CgbHLH105.2 are orthologs of known Fe deficiencyresponsive AtbHLHs [32,33,36], and their expression patterns further supported the putative function on Fe deficiency. CgbHLH6, CgbHLH13, CgbHLH14.1 and CgbHLH14.2 are orthologs of AtHLH6 (MYC2), AtHLH13 (JAM2) and AtHLH14, respectively, and these three AtHLHs of A. thaliana have been shown to participate in jasmonic acid (JA) signaling [6,59]. Because JA is a negative regulator of Fe uptake, it suggests that these four CgbHLH genes may also be involved in the response to Fe deficiency. Interestingly, several of the significantly expressed CgbHLHs such as CgbHLH105.1, CgbHLH104, CgbHLH13 and CgbHLH130.2 also showed a high expression level in roots based on RNAseq data, which further indicates their possible functions in Fe deficiency. Moreover, we found that CgbHLH104 and CgbHLH105.1 belong to the same subfamily (Group 6) and have a similar gene structure, pI and exon number. Their orthologous bHLHs of A. thaliana, AtbHLH104 and AtbHLH105, as well as all the other members in group 6 (AtbHLH47 and AtbHLH121) have been functionally identified in Fe deficiency [7,8,[34][35][36]. Herein, CgbHLH104 and CgbHLH105.1 also showed significant up-regulation under Fe deficiency. Thus, we speculate that these two CgbHLHs most likely have similar functions with AtbHLH104 and AtbHLH105 under Fe deficiency. Apart from these known Fe deficiencyresponsive bHLHs, there were also significantly differentially expressed CgbHLHs, but direct evidence could not be found regarding their orthologs that have a function in plant Fe deficiency, such as CgbHLH16, CgbHLH48, CgbHLH62, CgbHLH63, CgbHLH79, CgbHLH80, CgbHL H123, CgbHLH128 and CgbHLH130.2. New and essential bHLHs that regulate Fe homeostasis are expected to be identified from these candidates in future. In particular, CgbHLH16 was continuously up-regulated at all tested times of -Fe treatment, and it was also predicted to interact with an important Fe deficiency-responsive TF, PYE (bHLH47), based on protein-protein interaction prediction (Fig. 6). In addition, CgbHLH62 and CgbHLH63 were predicted to interact with each other while their expression levels showed the opposite tendency at the same time points, which indicates that there may be negative regulation relationships between them under Fe deficiency. Overall, these results provide us abundant information for understanding of the citrus bHLH genes, but more work including transgenic and biochemical experiments needs to be done to further confirm our speculations and illustrate their functions.

Conclusions
In summary, 128 CgbHLH proteins were identified from pummelo (Citrus grandis) genome and were classified into 18 subfamilies based on the phylogenetic relationship with AtbHLH proteins. The detailed analysis of Mw, pI, sequence alignment, phylogeny, gene duplication, chromosomal distribution, gene structure and protein motif suggested basic properties of each CgbHLH. GO annotation and protein-protein interaction analysis further revealed the potential functions of CgbHLHs. qRT-PCR showed that 34 of the tested CgbHLH genes were up-or down-regulated under Fe deficiency, and some of them such as CgbHLH6, CgbHLH14.2, CgbHLH16, CgbHLH48, CgbHLH63, CgbHLH79, CgbHLH80, CgbHLH104, CgbHL H105.1, CgbHLH123, CgbHLH128, CgbHLH93.1, CgbHL H62, CgbHLH130.2, CgbHLH3, CgbHLH29.4, CgbHLH 14.1, CgbHLH69.1, CgbHLH153 and CgbHLH77 were considered the key candidates that respond to Fe deficiency. This study lays the foundation for further functional elucidation of CgbHLH genes in citrus.

Plant material and treatment
Seeds of pummelo (Citrus grandis) were collected from the Citrus Research Institute, Southwest University, Chongqing, China. After surface disinfection with 2% sodium hypochlorite, seed coats were removed, and seeds were then germinated on wet filter papers that filled 96-cell plastic boxes with lids. After 2 weeks germination, uniform seedlings were transferred into modified Hoagland's solution for hydroponics as described by   [1]. When the seedlings had grown four leaves, half of them were renewed with Fe deficient (0.5 μM Fe-EDTA) solution (−Fe), while the others received normal (50 μM Fe-EDTA) solution (CK). The culture conditions were a temperature of 28°C and a relative humidity of 70% under a 16-h photoperiod (50 μmol m − 2 s − 1 ). Both -Fe and CK groups had three biological replicates, and each replicate contained 14 seedlings. After 0.5 d, 1.5 d, 2 d, 7 d, and 12 d of treatment, roots of -Fe and CK were sampled and then frozen in liquid nitrogen immediately.

Identification of bHLH members in pummelo
The genome data of pummelo were downloaded from the Orange (Citrus sinensis) Genome Annotation Project (http://citrus.hzau.edu.cn/orange/download/index. php). Then, we used two methods to search for all bHLHs of pummelo. Firstly, the hidden Markov model (HMM) file of the HLH domain (PF00010) obtained from the Pfam database (http://pfam.xfam.org/) was used as the query to scan the pummelo genome with HMMER software (http://hmmer.org/). Secondly, the bHLH domain sequences of A. thaliana (AtbHLHs) were used as queries to blast against the pummelo genome. After removal of redundant sequences, all putative bHLHs obtained from the two methods were submitted to the NCBI (National Center of Biotechnology Information) conserved domain database (CDD, https://www.ncbi. nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) to further verify the existence of the conserved bHLH domain. All the identified bHLH proteins of pummelo were named as CgbHLHs.
Multiple sequence alignment and phylogenetic tree construction Molecular weights (MWs) and isoelectric points (pIs) of CgbHLH proteins were analyzed with ExPASy (http://web. expasy.org/compute_pi/). Conserved domain sequences of CgbHLHs were submitted to ClustalW to perform multiple sequence alignment, and the BioEdit was used to edit the aligned sequences and shade the conserved sites. To visualize the conserved motifs, the sequences were analyzed with WEBLOGO programs (http://weblogo.berkeley.edu). The neighbor-joining phylogenetic tree was constructed by using MEGA7 and 1000 replicates in bootstrap analysis. In addition, the full-length protein sequences of all CgbHLHs and known AtbHLHs were also aligned in Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) to obtain their sequence identity (%) data, and the CgbHLH-AtbHLH pairs with highest percent identity were considered orthologs.
Gene structure, conserved motifs, chromosomal location and gene duplication analysis Gene structure information of each CgbHLH was obtained from the generic feature format (GFF) file downloaded from the pummelo genome database, and TBtools [60] was used to display their intron-exon structures. Conserved motifs of each CgbHLH protein were predicted in the MEME [61] web tool with the maximum number of motifs set at 20, and the other parameters were the default values. The chromosomal locations of the CgbHLH genes were provided by the pummelo genome database, and TBtools software was used to draw the location images. To predict the gene duplication, the BLASTP program was first used to blast against the pummelo genome using all peptide sequences of pummelo as the queries; then, the top matches (e value ≤1e− 05) and their GFF files were inputted into MCScanX software [62] to identify tandem duplications, segmental duplications, and collinear correlations.
Gene ontology (GO) annotation and RNA-sequencing (RNAseq) data analysis GO annotation of all the CgbHLH proteins was performed with the Blast2GO program, and A. thaliana was chosen as the reference. To evaluate the expression levels of CgbHLH genes in pummelo, the published RNAseq raw data (SRR4050288/SRR4050289) of pummelo [39] were downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/sra/). Then, the raw data were analyzed with the linux system of the fastqc and trimommatic programs to remove adaptors and low quality sequences. The kallisto program was used to map sequence data to the pummelo genome, and the featurecount program was used to calculate the values of the transcripts per million reads (TPM).

Prediction of the protein-protein interaction network
The interaction network of CgbHLHs was predicted in STRING (version 11.0, http://string-db.org) website that contains known and predicted proteinprotein interactions of different organisms [63]. In detail, all the CgbHLH protein sequences were first submitted to the STRING website as queries, and A. thaliana was chosen as the reference organism for blasting because pummulo has no protein-protein interaction database and is not included in STRING. After blasting, the matched homologs of A. thaliana with the highest scores (Bitscore) and higher than 40% identity were used to construct the network. To ensure reliability, only the predicted interactions with interaction scores > 0.7 (high confidence level was set in STRING) were shown in the network.

RNA extraction and quantitative real-time PCR (qRT-PCR) analysis
Total RNA was extracted from the -Fe and CK roots of pummelo by using the RNAprep pure plant kit (Cat#DP432, Tiangen Biotech Co., Ltd., Beijing, China), and the RNA concentration and quality were determined with a Nanodrop 2000 spectrophotometer (Thermo Scientific, Waltham, MA, USA). Then, 1 μg of high-quality RNA was used for cDNA synthesis with HiScript II Q RT Super Mix (Cat#R222-01, Vazyme Biotech Co., Ltd., Nanjing, China) according to the manufacturer's instructions. qRT-PCR was performed on the Bio-Rad CFX Connect RealTime system by using ChamQ™ Universal SYBR® qPCR Master Mix (Cat#Q711, Vazyme Biotech Co., Ltd., Nanjing, China). Each PCR reaction contained 5.0 μL SYBR mix, 0.2 μM primers, and 1.0 μL diluted cDNA in a final volume of 10 μL. The 2 -ΔΔCT method was used to normalize and calculate the expression level of each tested gene relative to an internal reference gene, actin (Cs1g05000.1). All the primers of CgbHLH genes are listed in Additional file 3: Table S2. Three biological replicates and three technical replicates were performed for each treatment. The presented data herein are means ± standard errors (SE) of three biological replicates. Student's t-test was used to analyze the statistical differences between -Fe and corresponding CK samples, taking P < 0.05 as significantly different.