Genome-wide identification and expression analysis of the AT-hook Motif Nuclear Localized gene family in soybean

Soybean is an important legume crop and has significant agricultural and economic value. Previous research has shown that the AT-Hook Motif Nuclear Localized (AHL) gene family is highly conserved in land plants, playing crucial roles in plant growth and development. To date, however, the AHL gene family has not been studied in soybean. To investigate the roles played by the AHL gene family in soybean, genome-wide identification, expression patterns and gene structures were performed to analyze. We identified a total of 63 AT-hook motif genes, which were characterized by the presence of the AT-hook motif and PPC domain in soybean. The AT-hook motif genes were distributed on 18 chromosomes and formed two distinct clades (A and B), as shown by phylogenetic analysis. All the AHL proteins were further classified into three types (I, II and III) based on the AT-hook motif. Type-I was belonged to Clade-A, while Type-II and Type-III were belonged to Clade-B. Our results also showed that the main type of duplication in the soybean AHL gene family was segmented duplication event. To discern whether the AHL gene family was involved in stress response in soybean, we performed cis-acting elements analysis and found that AHL genes were associated with light responsiveness, anaerobic induction, MYB and gibberellin-responsiveness elements. This suggest that AHL genes may participate in plant development and mediate stress response. Moreover, a co-expression network analysis showed that the AHL genes were also involved in energy transduction, and the associated with the gibberellin pathway and nuclear entry signal pathways in soybean. Transcription analysis revealed that AHL genes in Jack and Williams82 have a common expression pattern and are mostly expressed in roots, showing greater sensitivity under drought and submergence stress. Hence, the AHL gene family mainly reacts on mediating stress responses in the roots and provide comprehensive information for further understanding of the AT-hook motif gene family-mediated stress response in soybean. Sixty-three AT-hook motif genes were identified in the soybean genome. These genes formed into two distinct phylogenetic clades and belonged to three different types. Cis-acting elements and co-expression network analyses suggested that AHL genes participated in significant biological processes. This work provides important theoretical basis for the understanding of AHLs biological functions in soybean.


Background
The AT-Hook Motif Nuclear Localized (AHL) gene family is highly conserved across all land plants, and the AHL transcription factors were previously described in mosses and flowering plants [1]. It has been previously demonstrated that some conserved transcription factor families were essential to plant growth and stress tolerance during plant evolution, including the bHLH and NAC gene families [2][3][4][5][6][7]. However, some of the transcription factor families that have played important roles in plants evolution remain understudied. The AT-hook motif gene family is highly conserved across plant species and plays relevant roles during plant development.
The AT-hook motif gene family is involved in in very important biological processes in plants. For example, AHL genes are associated with the regulation of plant reproductive development and the formation of ears in maize [8]. In rice, the DP1 gene, encoding for an AThook DNA binding protein, plays an important role in flower development [9]. Moreover, the AT-hook motif gene family is also able to regulates the expression of cell-specific genes. The overexpression of the GIANT KILLER(GIK) gene, which encodes an AHL protein, leads to serious defects in the reproductive organs and the reduction of expression levels in associated genes [10]. In Arabidopsis, the AHL gene BoMF2 is preferentially expressed in the stamens and its overexpression results in a significantly shorter siliques and a decrease in pollen vigor relative to the wild type [11]. Importantly, the AHL gene family also has been identified to regulate hormone balance in plants, especially gibberellin [12], jasmonic acid and auxin-related genes [13][14][15]. This is also illustrated by previous transcriptomic analysis showing that AtAHL13 is a key factor regulating jasmonic acid biosynthesis signal transduction and pathogen immunity [16]. Importantly, AHL proteins also can regulate the chromatin state. The AT-hook motif protein AHL22 regulates flowering time by interacting with the deacetylase at the FLOWERING LOCUS site. The overexpression of AHL22 in Arabidopsis mutant exhibits delayed flowering, significantly decreased transcription activity and acetylation of histone H3 at the FLOWER-ING LOCUS, and to an increased demethylation rate of H3 Lysine 9 [17]. It has also been previously reported that the protein TEK (TRANSPOSABLE ELEMENT SILENCING VIA AT-HOOK) protein, which is encoded by an AHL gene, is involved in the regulation of silent TEs. Specifically, knocking down of TEK leads to increased histone acetylation and decreased H3K9me2 and DNA methylation levels in the target loci [18]. Recently, a total of 37 AHL genes have been identified in maize. The transcription levels in different tissues suggest that AHL proteins are involved in maize pollen development, drought response and senescence [19]. A high number of 48, 51, 99 AHL genes also be found in different three cotton genomes, and gene expression analysis indicated that the majority of AHL genes in Clade-B were expressed in the stem whereas the Clade-A genes were expressed in the ovules [20]. Furthermore, the 20 AHL genes uncovered in rice exhibited three expression patterns, all OsAHL genes may be functional genes with 3 different expression patterns [21]. The overexpression OsAHL1 improved rice response to multiple stress tolerances, especially drought resistance [22].
These studies suggest that the AT-hook motif gene family not only plays important roles in plant growth and development of plants, but also affects plant response to stress and hormonal stimulus. These studies still lack a systematic investigation on how the AT-hook motif gene family regulates plant stress. Hence, this study evaluated plant response to drought and submergence stress mediated by AHL genes.
AHL proteins contain two conserved domains, the AT-hook motif and the plant and Prokaryote Conserved (PPC) domain, also known as the Domain of Unknown Function#296 (DUF296) [23]. The PPC domain contains 120 amino acids, and has the same secondary or tertiary structure from prokaryotes to higher plants [23]. The hydrophobic region at the C-terminus of the PPC domain plays an important role in nuclear location and protein interaction [1,24], indicating that AHLs may have a role in regulating plant transcriptional activity [25]. The AT-hook motif contains one or two conserved Arg-Gly-Arg motifs that are used to bind the AT-rich DNA regions. This result has been confirmed in both prokaryotes and eukaryotes organisms, including the High Mobility Group A (HMGA) proteins in mammals [24]. The binding of the AT-hook motif to the AT-rich DNA forms a concave structure and results in insertion of two arginines [26]. So the AT-hook motif gene family regulates plant growth and development through DNAprotein interoperability and the formation of proteinhomo/hetero-trimeric complex [25,26].
Phylogenetic analysis of land plants showed that the AHL proteins can be divided into two categories based on differences in the PPC domain, Clade A and Clade B [1]. The conserved amino acid sequence of Clade A is Leu-Arg-Ser-His, whereas the equivalent in Clade B is Phe-Thr-Pro-His [1]. Nonetheless, the amino acid sequence Gly-Arg-Phe-Glu-Ile-Leu is sometimes part of the PPC domain and is essential for the function of some AHL proteins [25]. The differences of AT-hook motif make it possible to classify AHL proteins into three different types (I, II, and III). Type-I belongs to Clade-A, Type-II and Type-III belong to Clade-B. The AT-hook motif of Type-I has a Gly-Ser-Lys-Asn-Lys conserved sequence at the C-terminal of the Arg-Gly-Arg center, while Types II and III instead contain Arg-Lys-Tyr. In angiosperms, phylogenetic analysis allowed to divide Clades A and B into five and four subfamilies, respectively [1]. The observed similar expression patterns in each clade suggest that AHLs retained their biological functions in the course of evolution [1].
Soybean (Glycine max L. Merr) is the major leguminous species and an important source of protein worldwide, playing a vital role in human survival and development [27]. The function of the proved AT-hook motif genes provides the basis for our research and the detailed genome-wide analysis of the AT-hook motif gene family in soybean has been not performed. In this study according to the findings of the AT-hook motif gene family in maize and cotton, we annotated the AT-hook motif gene family in the soybean genome and identified 63 AHL genes. We then analyzed function of these genes and respective protein structure features, as well as their chromosome locations, gene duplication events, Gene Ontology annotations, phylogenetic relationships, collinear co-expression network and expression patterns. Our results will foster understanding of the biological functions of the AHL family in soybean.

Phylogenetic analysis of the AT-hook motif gene family in soybean
We predicted a total of 63 AHL proteins containing the AT-hook motif and PPC domain in soybean, named GmAHL1~GmAHL63 ( Fig. 1, Table 1). To infer the evolution relationship among the AHL proteins in soybean, phylogenetic analysis was performed on the fulllength AHL protein sequences. Our results showed that AHL proteins in soybean can be divided into two clades, Clade-A (with 34 proteins) and Clade-B (with 29 proteins), as previously described in other land plants [1]. Multiple sequence alignments allowed to further divide, Clade-A and Clade-B into Type-I (54%), Type-II (27%) and Type-III (19%). The higher abundance of Type I in soybean is also consistent with observations in other land plants [1], and shows that AHL proteins are conserved in the course of evolution.
We found that Clade-A, which contained the conserved PPC domain sequences Leu-Arg-Ser-His and Leu-Arg-Ala-His, was more variable than Clade-B, with a PPC domain comprised of Phe-Thr-Pro-His. At the same time, we also observed that the variability of the PPC domain in soybean AHL proteins is higher than that of maize [19]. It is possible that the increase in PPC domain variability may extend the range of biological functions of AHL proteins.
The Type-I AT-hook motif contains four conserved conservative amino acid residues at the N-terminus of Arg-Gly-Arg-Pro, and eight conserved amino acid residues at the C-terminus of Gly-Ser-Lys-Asn-Lys-Pro-Lys-Pro. This contrasts with an observed seven and ten conserved amino acid residues at the N-terminal and Cterminal of Type II, respectively. Comparing the structure of Type-III and Type-II, they have the same PPC domain and the N-terminal of AT-hook motif conservative structure, but the former lack conserved amino acids residues of AT-hook motif at the C-terminal. The observed diversity in the AT-hook motif and PPC domains across soybean AHL proteins are likely to result in diverse biological functions.
Gene structure and motif prediction analysis in the AThook motif gene family in soybean We implemented a gene structure analysis and estimated the length of AHL genes, and the variability in the number of CDS and UTRs (Fig. 2, Table 1). The length of the AHL gene family ranges from 585 bp to 7968 bp, with a total of 12 genes (mostly from Clade A), lacking the UTR, and some showing a variable number of introns and exons (usually Types II and III showed a higher number of introns). Type-I genes were the shortest and contained the lowest number of CDS, which began to increase from Glyma.20G202300. Among them, Type-II and Type-III have two or more introns, which are more obvious than Type-I. Thus, we believe that Type-II and Type-III evolved from Type-I. This result is consistent with the report of maize AHL gene family [19]. In eukaryotes, introns and exons alternately form genes. In plants, up to 60% of the genes undergo splicing, most of which occurs in introns [28]. After the introduction of intron-mediated enhancement(IME) into Arabidopsis, mRNA accumulation increased by 24 times and the activity of the reporter enzyme increased by 40 times, indicating that introns have an important influence on the regulation of gene expression in plants [29]. This was also observed in maize, where introns increased the expression level of the genes Zm00001d018515 and Zm00001d051861 [19]. The alternative splicing of introns results in a diverse range of encoded proteins and thus to abundant biological functions. So it is possible that the increased number of introns in soybean AHLs expand the abundance of AHL proteins. In Type-I of maize, only one gene has UTR, while most genes have UTR in soybean [19], indicating that AHLs gene structure of different species is diverse. In summary, we suspect that Type-II and Type-III introns enable plants to acquire more complex and diverse biological functions, and at the same time lay the foundation for the further expansion of intron-carrying AHLs.
Next, MEME website was used to predict the protein motifs (Fig. 3). We found a total of ten conserved motifs were identified in the AHL proteins (Table 2), which contained of amino acids ranges from 8 to 32 while the sits rang from 8 to 62.
The motifs 3 and 6 had a common conserved Arg-Gly-Arg core, whereby likely belong to the AT-hook motif family. The motif 3 is defined as type I AT-hook motif, and motif 6 is defined as II AT-hook motif. Type-I AHL proteins contains a I AT-hook motif, Type-II contains both I and II AT-hook motifs, and Type-III only has a II AT-hook motif. The sequences downstream of the Arg-Gly-Arg core share common conserved that play an important role in AHL proteins [1]. Interestingly, there is also a conserved sequence Gly-Arg-Phe-Glu-Ile-Leu (motif 2) sequence in the PPC domain. This motif is not only found in soybeans, but also in other land plants, previous study has shown that this motif has an important influence on the PPC domain [1]. It is worth noting that all AHL proteins contain motif 1, motif 4 and motif 5, indicating the consistency of the AHL protein sequences.
In summary, the results of our gene structure and motif prediction analyses indicate that the AHL gene family has a consistent and evolutionary diversity in soybean and other land plants [1], including maize [19] and cotton [20].
Evolution relationship of the AT-hook motif gene family in different species In order to further explore the evolutionary relationship between AHLs in different species by selecting Arabidopsis thaliana, sorghum (Sorghum bicolor L) and soybean as materials and constructing a phylogenetic tree a phylogenetic tree (Fig. 4). Patterns of different colors are used to represent different species. The phylogeny includes 29, 63 and 25 full-length AHL proteins from Arabidopsis, soybean and sorghum, respectively. Our   analysis showed that the AHL genes of these species can be divided into two distinct clades, A and B. A total of 15 and 14 proteins belonged to Clade-A in Arabidopsis and sorghum, respectively, compared to an observed 14 and 11 in Clade-B (Table 3). While Type-I was the more conserved of all types, the lack of a new subgroup between Types II and III in Clade-B indicates the divergence of these proteins occurred relatively late. To sum up, the phylogenetic tree highlights the consistency of the evolution of AHLs among different species, together with the determination of the homology relationships between species provides insights for the future analysis of the biological functions of these proteins. In order to study the arrangement of 63 AHL genes to 20 different chromosomes in the soybean genome (Fig. 5a). The gene location information was in Table 1.
Sixty-three AT-hook motif genes are distributed on 20 soybean chromosomes. There are 9 AHLs on chromosome 20, 1 AHL on chromosome 19 and no AHL on chromosome 12 and 15. And found that the distribution of these genes on chromosomes was independent of chromosomal length.
In the current study, we then used GO enrichment analysis to predict the potential biological functions of AHLs. As shown in Fig. 5b and Table 4, AHLs are involved in different biological functions of biological process(BP), molecular functions(MF), and cellular component(CC). Among all the enriched biological functions, we detected an association that the biological process(BP) biological process is related to flowering development, indicating that the AHL gene family interfere in the growth and development of floral organs in soybean, which is consistent with the data published in Arabidopsis [17]. As for cellular component is the most abundant, the most of the cell components are located in the nucleus. In terms of the molecular function (MF) category, we identified DNA binding (GO: 0003677), sequence-specific DNA binding transcription factor activity (GO: 0003700) and protein binding (GO: 0005515) Gene duplication is a common process in plant evolution that leads to the expansion of gene families, of which tandem and segmental gene duplication events are the most common in angiosperms [30][31][32][33]. In order to further examine the evolution of AHLs in soybean, we analyzed gene duplication events in the AT-hook motif gene family, as shown in Fig. 5c and Table 5. And showed that 84% of AHL genes result from segmental duplication events, while 13% represent tandem gene duplication events, and the remaining 3% are proximal. These results suggest that segment duplication events may be the main driver of AHL gene family evolution.
The collinearity relationship of AHLs of two dicotyledonous plants (Poplar and Medicago) and two monocots plants (rice and maize) plants were investigated in order to explore the potential evolutionary relationships (Fig. 6). The results revealed a higher homology between soybean, Medicago and Populus than that between rice and maize. Compared with monocots, more AHL homologous genes are found in dicots. Some soybean AHL genes are collinear with AHL genes in other plants, particularly in Populus and Medicago, which suggests that these genes may play important roles in plant evolution. These results can be useful for subsequent comparative studies of AHL genes with known functions.
Promoter sequence analysis of the AT-hook motif gene family in soybean In organisms, the gene promoter region is located upstream of genes, binds to transcription factors is called the cis-regulatory element, which plays an important role in the biological regulation of gene expression under stress [34]. We identified cis-regulating elements for light responsiveness, anaerobic induction, MYB and gibberellin-responsiveness cis-regulating elements in the 2100 bp region upstream of the AHLs promoters (Fig. 7). Approximately 43.5% of the selected genes contained a MYB binding sites, and previous studies have shown that the MYB gene family can regulate anther development and function formation [35,36]. In addition, more than 198 and 183 MYB members directly or indirectly involved in responses to drought stress were described in Arabidopsis and rice, respectively [37], including a AHL gene in rice [22]. However, there are few studies on plant stress and hormone effects of the AHL gene family. Therefore, it is possible that the AHL gene family can also mediate responses to drought stress in soybean. All selected AHL promoters contain the light responsiveness element, suggesting that the AHL genes participated in plant light morphogenesis in soybean. Approximately 91.3% of the selected AHLs had the anaerobic induction element. Under anaerobic conditions, plant disease resistance is reduced, root morphological formation is imperfect, and root tip epidermal cells are damaged or died, leading to pathogen invasion [38]. Hemoglobin is an intracellular signal of hypoxia in plants, and the amount of symbiotic hemoglobin in legumes is relatively high [39]. Higher plants perceive O 2 molecules through hemoglobin under anaerobic conditions, and the changes in hemoglobin concentration are regulated by partial pressure of O 2 pressure [39]. Our results predict that AHLs play significant roles in soybean anaerobic induction. Gibberellin plays an important role in the growth cycle of plants, promoting cell division and elongation [40], controlling seed germination and enabling roots formation [41,42]. 17.4% of the selected AHLs include the gibberellin-responsiveness element, whereby AHLs may participate in the regulation of growth and development in soybean, confirming the variety of functions played by AHLs in soybean growth. Similarly, in the study of grape AHL genes, it was found that all grape AHL genes contain cis-elements related to light response, stress response and hormone response, indicating that not only in soybean, but in other species, AHL genes may affect plants growth and development [43].
Co-expression network analysis of the AT-hook motif gene family in soybean A co-expression network was used to represent the upstream and downstream genes that interact with AHLs in the three different Types (Fig. 8). We picked out the representative genes from the co-expression network and the annotated genes functions are available in the supplementary material Table 6. Our study demonstrates that some AHLs are associated with genes related to energy binding, such as Glyma.11G179200 Glyma.09G196600, that might be involved in soybean energy transduction. The co-expression network indicates that in addition to interacting with other genes, AT-hook motif genes also interacted to some extent with each other. For example, Type II Glyma.20G212200 interacted with four AT-hook motif genes to jointly regulate the expression of other genes. We also found that AT-hook motif genes are involved in biological processes histone binding and ATP binding in soybean and that the same gene is involved in histone modification in     Arabidopsis thaliana [17]. In our speculations, part of AHL genes is related to nucleation signals and mainly distributed in Type-II, whereby, AHL genes regulates the nucleation process of other proteins in soybean. The reported DELLA (LeGAI) gene is expressed in both nutritional and reproductive tissues in tomato and this gene family is also involved in GA signal transduction [44]. In our research, that the AHL gene of Glyma.20G212200 was co-expressed with two Glyma.05G140400 and Glyma.08 g095800 DELLA genes. Similarly, Glyma.16G204400 and Glyma.08 g095800 Glyma.05G140400 DELLA genes interact to regulate the gibberellin transduction pathway in soybean. Therefore, we consider that the AT-hook motif gene family is involved in gibberellin signal transduction pathway in soybean. Together, our results show that the AHL gene family is involved in regulating biological processes such as energy transduction, the gibberellin pathway and the nuclear entry signal pathway in soybean.

Expression profiles of the AT-hook motif gene family in soybean
To address the expression patterns of the AT-hook motif gene family, we selected the representative soybean cultivars, Jack and Williams82 at different tissues and during the VC stage. The transcription data is available from NCBI (accession number: SRP285849) [45]. W82 and Jack were used to investigate whether there were differences in the expression profiles of the AT-hook motif gene family between different soybean varieties ( Fig. 9a  and b). The expression results showed that AHLs were mostly expressed in roots and meristems, and that these patterns were similar in W82 and Jack. There are 35 and 31 genes with high expression levels in Jack and W82 roots, respectively. Of the 35 highly expressed genes in Jack's roots, 22 expressed the same as W82. Of the remaining 13 genes with inconsistent expression, 9 genes had high expression in Jack. In meristem, 26 and 24 genes are highly expressed in Jack and 21 in W82, respectively. The results of the study find that the expression of the same gene differs between different varieties. For example, the expression level of Glyma.09G260600 is higher in Jack and lower in W82. The expression levels in the leaves of both Jack and W82 are very low, with the exception of 5 genes in Jack and 4 genes in W82. This corroborates previous results in maize [19].
In the Jack' epicotyl, we find 5 highly expressed genes, similar to W82. In the hypocotyl, Glyma.04G091600 and Glyma.06G093400 are both highly expressed, and the expression is consistent. But the expression level of Glyma.18G036200 of the hypocotyl in W82 is higher than that of Jack. Interestingly, the genes showing high levels of expression in meristematic tissues are mainly distributed in Type-II, while those highly expressed in the roots mainly belong to Type-I. These results indicate that although the AHL genes in Jack and W82 had similar expression patterns in different tissues, different genes were expressed differently between the two varieties. Hence, different AHL genes may have different functions in the two varieties, and may play important roles in plant development. At the same time, for verification the data of RNA-seq, 3 genes for RT-qPCR were performed to evaluate the expression pattern of three genes in the roots, leaves, meristem, epicotyl and hypocotyl of W82 (Fig. 9c). The results show that it is consistent with the transcriptome.
The expression of the AT-hook motif gene family under drought and submergence Both drought and submergence have adverse effects on plant growth and a previous study has shown that AHLs mediate plant response to drought stress [22]. And in the study of grape AHLs, after PEG treatment, the AHL genes has different degrees of response to the stress [43]. so we hypothesis that AHLs in soybean may also impact in drought stress responses in in soybean. Hence, we tested the expression of genes in the leaves and roots of W82 under submergence and drought conditions (PRJNA574626) at the V1 stage ( Fig. 10a and b). The Table 5 Types of gene replication Gene Name Gene Name Duplication Type

GmAHL63
GmAHL62 segmental RNA transcription data is from NCBI. Both in the control and treatment showed that a higher number of AHLs were expressed in roots compared to the leaves, which is consistent with the results in Fig. 9a and b. After 5-6 days of drought treatment, the expression of highly expressed genes, such as Glyma.02G285500, considerably reduced. However, the expression of Glyma.14G181200 increased, especially after 6 days of drought treatment in leaves. In the roots, drought treatment led a significant reduction of expression genes compared to the control group. Similar patterns were observed under submergence treatment, where some genes, such as Glyma.14G066800, showed significantly higher expression in leaves than controls. Overall, the levels of expression of most genes were decreased after submergence in roots. We used roots and leaves at V1 stage of W82 to verify the expression of AHL genes under drought and submergence stresses (Fig. 10d). Our study found that after 1 day of submergence stress, the expression level of AHL genes in leaves increased significantly, and the expression decreased significantly after 3 days of submergence. When the treatment was restored for 1 day, the expression level of AHL genes were same as that of the control. The expression level in roots decreased after submergence stress. The expression of AHL genes increased significantly after 1 day of drought stress, and decreased after 6 days of drought in the leaves. As the stress time increased, the expression level decreased compared with the control in the roots after drought stress. At the same time, we recorded the phenotype of soybean under submergence and drought stress (Fig. 10c). After mannitol stress treatment, the expression of OsAHL1 was increased at the beginning, and as time increased, the expression of OsAHL1 began to decrease [22]. As the stress time increases, the soybean plant under stress is shorter and more wilting than the control, but the phenotypic difference is not particularly obvious.
These results suggest that during stress condition, gene expression overall increases in the leaves and decreases in the roots. Furthermore, we also found that after 1 day of recovery, the levels of gene expression were restored, and were sometimes even higher than

Identification of the AT-hook motif gene family in soybean
It's well documented that soybean is the staple crop in world, and provides a great source of proteins for human populations. Previous studies in Arabidopsis thaliana, maize and cotton have provided comprehensive information and the basis for our research on soybean, revealing the multiple functions associated with of AHLs, particularly involved in regulating plant growth and stress responses [19,20,25]. We decided to further study the AHL gene family in soybean as this may provide the molecular basis for high-stress tolerance in plants and shed light on the improvement of environmental adaptation.
We identified AHL soybean genes from the JGI Phytozome website [46]. These genes were predicted based on the presence of a PPC domain and the AT-hook motif, and were included in the Pfam website [47]. In this   study, 63 AT-hook motif genes were identified in soybean and generated a phylogenetic tree using the MEGA7 software [48]. According to the phylogenetic tree, the AT-hook motif gene family is divided into two Clades on the basis of PPC domain, Clade-A and Clade-B, respectively. Among them, Clade-B is further classified into two Types on the basis of the AT-hook motif, Type-II and Type-III. Clade-A is also referred to as Type-I. That the Table 6 Annotation of genes present in co-expression network (Continued) PPC domain of Clade-A has more changes, which is consistent with the results in maize [19]. Our results indicates that more changes in the PPC domain lead adaptation in plants. The flanking sequences of the AT-hook motif in soybean are similar to other land plants [1], and most AHL genes belonged to Clade-A, whereby this clade seemingly contains richer and more conserved functions that are essential for plant survival. In our paper, the AHL gene family was distributed on 18 chromosomes, independently of chromosome size and location. We also found that segmental duplication events are the main form of duplication in the AHL gene family in soybean, which contrasts to observations in maize showing dispersive duplication is more common [19]. This illustrates that the AHL gene family expanded in different ways in different species.

Conversation of the AT-hook motif gene family in soybean
The AHL gene family is conserved across land plants, and all AHL genes share a PPC/DUF domain. In Clade-A, this PPC/DUF domain contains the conserved L-R-S-H motif, while Clade-B displays F-T-P-H. We were also able to observe that the diversity of the AHL gene family in soybean extends beyond the amino acid sequences of the PPC/DUF domain and is also present in the AThook motif sequences, which have an R-G-R core. However, while the sequence of this core in Clade-A is R-G-R-P in Clade-B it is R-G-R-P-R-K-Y. It has been previously suggested that Clade-B evolved from Clade-A [1]. The gene structures of the AT-hook motif gene family with UTR-less and multiple-CDS. Twelve genes in Clade-A show UTR-less. And in Type-II and Type-III, the number of intron is increased. So we speculate that

Expression patterns in soybean
The expression patterns based on cis-elements found in the promoter regions show that AHL genes may participate in plant light morphology, growth and development, and also stress response. Co-expression analysis indicates that AHL proteins may be involved in the gibberellin pathway, which is involved in plant responses to drought and excess water. Previous study has shown that gibberellin can be involved in plant drought and water flooding stress [49]. Overexpression of CBF/DREB2 in Arabidopsis thaliana can reduce the content of active GAs and improve drought tolerance [50], and the CYP96B4/ SD37 in the amycin synthesis pathway is related to the drought tolerance in rice [49]. The drought tolerance of the dss1 mutant is significantly higher than that of the wild type, which is due to the decrease of GA 1 [51].
The stress caused by long-term water-flooding in rice inhibits the levels of ethylene, reduces the amount of active GAs, and thus inhibits the elongation of the internodes [52,53]. It is found that the AHL genes may be involved in the gibberellin pathway, and the AHL gene family may also regulate the gene expression in response to drought and flood stress in soybean. Therefore, the AHLs expression of W82 under drought and flood conditions was analyzed. Our results indicated that, under these stress conditions, the expression of AHL genes decreased in the roots. At the same time, the expression of AHLs in different tissues from distinct soybean varieties indicated that the expression of AHLs was higher in the roots. We also used the W82 leaves and roots of the V1 stage to verify. It is interesting to find that the gene expression levels in the leaves on the first day of stress treatment increased significantly, and then decreased. Regarding the mechanism of this phenomenon, it is also needs further study. In order to further explore the AHL gene family, we did a correlation analysis between the number of introns and gene expression level in W82 ( Table 7). The analysis showed that in different tissues, except for the roots, the p values of other tissues are all less than 0.05 and are positively correlated. Under stress conditions, similarly, the p value of leaves is less than 0.05 and is positively correlated, while roots are not correlated. The specific mechanism has not yet been resolved. In future research, we will further study the molecular mechanism, but it is certain that the number of introns in soybeans does affect the expression of AHL genes to a certain extent. Accordingly, the AHL gene family plays an important role in soybean resilience, providing a theoretical basis for future breeding of this important crop.

Conclusion
We characterized 63 AHL genes in soybean and analyzed their respective motif composition. The phylogenetic tree divided these genes into two clades based on the PPC domain. We also investigated the cis-acting elements of the promoter regions of AHL genes and their co-expression network, and systematically studied the AHLs expression profiles in different tissues and varieties, as well as the response to stress conditions. The systematic exploration of AHL genes in soybean lays the foundation for future work in soybean breeding.

Identification of the AT-hook motif gene family
The AT-hook motif gene family of Arabidopsis thaliana was obtained from the TAIR database (https://www. arabidopsis.org/) [54]. The amino acid sequences of the AT-hook motif genes of soybean and sorghum were from JGI Phytozome website (https://phytozome.jgi.doe.gov/ pz/portal.html) and Ensemble Plants (https://plants. ensembl.org/index.html) [46,55]. We used Pfam (https://pfam.xfam.org) to predict the genes containing (See figure on previous page.) Fig. 10 Expression patterns of the AHL genes under a drought and b submergence conditions in Williams82. DRO and SUB represent drought and submergence, respectively. D represents day. CT represents control treatment. L and R are the leaves and roots, respectively. DRO_REC_L/R means 1 day recovery following 6 days of drought in leaves/roots. SUB_REC_L/R means 1 day recovery following 3 days of submergence in leaves/roots. The growth of soybeans under submergence and drought stresses c, the left is the treatment group, the right is the control group. Expression of Glyma.18G231300, Glyma.07G072300 and Glyma.20G087200, Glyma.05G111500 and Glyma.17G155400 in leaves and roots l at the V1 stage (d) the PPC domain, and then filtered out the genes containing both the PPC domain and AT-hook motif [47]. The homology comparison of amino acid sequences of Arabidopsis thaliana, soybean and sorghum was performed. We used online ExPASy program (http://www. expasy.org/tools/) to determine the biochemistry of each AHL protein, including the number of amino acids, the molecular weight (MW) and predict the isoelectric point (pI) parameters [56].

Phylogenetic analysis
We used a Neighbor-Joining tree to represent the phylogenetic relationship between the AHL genes [57]. The amino acid sequences of Arabidopsis thaliana, Glycine max and sorghum were selected to construct the phylogenetic tree by using the MEGA7 software [48]. We implemented a total of 1000 bootstraps to present the evolutionary history [58].

Gene structure analysis
We used MEME (http://meme-suite.org/) to predict the conserved motif of AT-hook motif in the AHL gene family with an e-value of 10 − 5 in soybean [59], and obtained a total of 10 conserved motifs. The final file was generated by TBtools [60]. The gene structure of the AT-hook motif genes was analyzed using the TBtools software [60]. The structures of the genes were mapped through CDS and genome sequencing. We used the SMART website (http://smart.embl-heidelberg.de/) to evaluate the accuracy of the selected proteins [61].

Chromosome location analysis, collinearity analysis and GO annotation analysis
Chromosome mapping information for the AT-hook motif genes was obtained from JGI Phytozome Ensemble Plants. The map of chromosome locations was drawn using the TBtools software [60]. We selected full-length amino acids sequences for four species to perform collinearity analysis with soybean. The collinear relationship was estimated using the MCScanx and TBtools software [60,62]. We used the Soy Base (https://www.soybase. org) website to conduct GO analysis on 63 AT-hook motif genes.

Cis-acting elements analysis and co-expression network
We obtained 2100 bp genome sequences spanning the promoter regions of the AT-hook motif gene family of Glycine max from NCBI. The cis-acting elements were analyzed using TB tools [60]. Co-expression analysis of the AT-hook motif gene family was derived from find new members of a pathway in SoyNet (www.inetbio.org/ soynet) [63]. The resulting sif files were downloaded and visualized with Cytoscape to construct the co-expression network [64].

Expression pattern analysis
The transcription data was obtained from the NCBI database (https://www.ncbi.nlm.nih.gov). We processed the transcriptome data and constructed the heat map in R. The fragments-per-kilobase-per-million (FPKM) value was used to quantify gene expression. The heatmap map was built according to the observed expression levels.
Quantitative RT-PCR (qRT-PCR) for AHL genes Williams82 was used plant material and grown in a greenhouse 26°C and 14 h/ 10 h light/dark conditions. The meristem, leaves, epicotyl, hypocotyl and roots were collected separately in the VC stage, with three independent replicates per sample. We did three levels of treatment during the V1stage, control, submergence treatment and drought treatment. Drought treatment for 6 days and rehydrated for 1 day, and the leaves and roots were taken for RNA extraction on the first day, the sixth day, and 1 day after the rehydration. Submergence treatment for 3 days and 1 day for recovery, the leaves and roots were taken for RNA extraction on the first day, the third day and the recovery day. Fresh plant materials were immediately frozen in liquid nitrogen for RNA extraction. We used the SYBR Green I Master mixture (Roche, Basel, Switzerland) as qRT-PCR reagent. The designed qRT-PCR primers are shown in Table 8. The 2 −ΔΔCT method was used to calculated the relative gene expression levels [65].