Genome-wide identi�cation and expression analysis of the AT-hook Motif Nuclear Localized gene family in soybean

Background: Soybean is an important legume crop and has signi�cant agricultural and economic value. Previous research has shown that the AT-Hook Motif Nuclear Localized (AHL) gene family is highly conserved in land plants, playing a crucial role in plant growth and development. To date, however, the AHL gene family has not been studied in soybean. Results: To investigate the roles played by the AHL gene family in soybean, genome-wide identi�cation, expression patterns and gene structures were performed to analyze. We identi�ed a total of 63 AT-hook motif genes, which were characterized by the presence of the AT-hook motif and PPC domain in soybean. The AT-hook motif genes were distributed on 18 chromosomes and formed two distinct clades (A and B), as shown by phylogenetic analysis. All the AHL proteins were further classi�ed into three types (I,II and III) based on the AT-hook motif. Type-I was belonged to Clade-A, while Type-II and Type-III were belonged to Clade-B. Our results also showed that the main type of duplication in the soybean AHL gene family was segmented duplication event. To

The AT-hook motif gene family is involved in in very important biological processes in plants. For example, AHL genes are associated with the regulation of plant reproductive development and the formation of ears in maize [8]. In rice, the DP1 gene, encoding for an AT-hook DNA binding protein, plays an important role in ower development [9]. Moreover, the AT-hook motif gene family is also able to regulates the expression of cell-speci c genes. The overexpression of the GIANT KILLER(GIK) gene, which encodes an AHL protein, leads to serious defects in the reproductive organs and the reduction of expression levels in associated genes [10]. In Arabidopsis, the AHL gene BoMF2 is preferentially expressed in the stamens and its overexpression results in a signi cantly shorter siliques and a decrease in pollen vigor relative to the wild type [11]. Importantly, the AHL gene family also has been identi ed to regulate hormone balance in plants, especially gibberellin [12], jasmonic acid and auxin-related genes [13][14][15]. This is also illustrated by previous transcriptomic analysis showing that AtAHL13 is a key factor regulating jasmonic acid biosynthesis signal transduction and pathogen immunity [16]. Importantly, AHL proteins also can regulate the chromatin state. The AT-hook motif protein AHL22 regulates owering time by interacting with the deacetylase at the FLOWERING LOCUS site. The overexpression of AHL22 in Arabidopsis mutant exhibits delayed owering, signi cantly decreased transcription activity and acetylation of histone H3 at the FLOWERING LOCUS, and to an increased demethylation rate of H3 Lysine 9 [17]. It has also been previously reported that the protein TEK (TRANSPOSABLE ELEMENT SILENCING VIA AT-HOOK) protein, which is encoded by an AHL gene, is involved in the regulation of silent TEs. Speci cally, knocking down of TEK leads to increased histone acetylation and decreased H3K9me2 and DNA methylation levels in the target loci [18]. Recently, a total of 37 AHL genes have been identi ed in maize. The transcription levels in different tissues suggest that AHL proteins are involved in maize pollen development, drought response and senescence [19]. A high number of 48, 51, 99 AHL genes also be found in different three cotton genomes, and gene expression analysis indicated that the majority of AHL genes in Clade-B were expressed in the stem whereas the Clade-A genes were expressed in the ovules [20]. Furthermore, the 20 AHL genes uncovered in rice exhibited three expression patterns, all OsAHL genes may be functional genes with 3 different expression patterns [21]. The overexpression OsAHL1 improved rice response to multiple stress tolerances, especially drought resistance [22].
These studies suggest that the AT-hook motif gene family not only plays important roles in plant growth and development of plants, but also affects plant response to stress and hormonal stimulus. These studies still lack a systematic investigation on how the AT-hook motif gene family regulates plant stress. Hence, this study evaluated plant response to drought and submergence stress mediated by AHL genes.
AHL proteins contain two conserved domains, the AT-hook motif and the plant and Prokaryote Conserved (PPC) domain, also known as the Domain of Unknown Function#296 (DUF296) [23]. The PPC domain contains 120 amino acids, and has the same secondary or tertiary structure from prokaryotes to higher plants [23]. The hydrophobic region at the C-terminus of the PPC domain plays an important role in nuclear location and protein interaction [1,24], indicating that AHLs may have a role in regulating plant transcriptional activity [25]. The AT-hook motif contains one or two conserved Arg-Gly-Arg motifs that are used to bind the AT-rich DNA regions. This result has been con rmed in both prokaryotes and eukaryotes organisms, including the High Mobility Group A (HMGA) proteins in mammals [24]. The binding of the AT-hook motif to the AT-rich DNA forms a concave structure and results in insertion of two arginines [26]. So the AT-hook motif gene family regulates plant growth and development through DNA-protein interoperability and the formation of protein-homo/hetero-trimeric complex [25,26].
Phylogenetic analysis of land plants showed that the AHL proteins can be divided into two categories based on differences in the PPC domain, Clade A and Clade B [1]. The conserved amino acid sequence of Clade A is Leu-Arg-Ser-His, whereas the equivalent in Clade B is Phe-Thr-Pro-His [1]. Nonetheless, the amino acid sequence Gly-Arg-Phe-Glu-Ile-Leu is sometimes part of the PPC domain and is essential for the function of some AHL proteins [25]. The differences of AT-hook motif make it possible to classify AHL proteins into three different types (I, II, and III). Type-I belongs to Clade-A, Type-II and Type-III belong to Clade-B. The AT-hook motif of Type-I has a Gly-Ser-Lys-Asn-Lys conserved sequence at the C-terminal of the Arg-Gly-Arg center, while Types II and III instead contain Arg-Lys-Tyr. In angiosperms, phylogenetic analysis allowed to divide Clades A and B into ve and four subfamilies, respectively [1]. The observed similar expression patterns in each clade suggest that AHLs retained their biological functions in the course of evolution [1].
Soybean (Glycine max L. Merr) is the major leguminous species and an important source of protein worldwide, playing a vital role in human survival and development [27]. The function of the proved AThook motif genes provides the basis for our research and the detailed genome-wide analysis of the AThook motif gene family in soybean has been not performed. In this study according to the ndings of the AT-hook motif gene family in maize and cotton, we annotated the AT-hook motif gene family in the soybean genome and identi ed 63 AHL genes. We then analyzed function of these genes and respective protein structure features, as well as their chromosome locations, gene duplication events, Gene Ontology annotations, phylogenetic relationships, collinear co-expression network and expression patterns. Our results will foster understanding of the biological functions of the AHL family in soybean.

Results
Phylogenetic analysis of the AT-hook motif gene family in soybean We predicted a total of 63 AHL proteins containing the AT-hook motif and PPC domain in soybean (Fig. 1, Table 1). To infer the evolution relationship among the AHL proteins in soybean, phylogenetic analysis was performed on the full-length AHL protein sequences. Our results showes that the AHL proteins in soybean can be divided into two clades, Clade-A (with 34 proteins) and Clade-B (with 29 proteins), as previously described in other land plants [1]. Multiple sequence alignments allowed to further divide, Clade-A and Clade-B into Type-I (54%),Type-II (27%) and Type-III (19%). The higher abundance of Type I in soybean is also consistent with observations in other land plants [1], and shows that AHL proteins are conserved in the course of evolution.
We found that Clade-A, which contained the conserved PPC domain sequences Leu-Arg-Ser-His and Leu-Arg-Ala-His, was more variable than Clade-B, with a PPC domain comprised of Phe-Thr-Pro-His. At the same time, We also observed that the variability of the PPC domain in soybean AHL proteins is higher than that of maize [19]. It is possible that the increase in PPC domain variability may extend the range of biological functions of AHL proteins.
The Type-I AT-hook motif contains four conserved conservative amino acid residues at the N-terminus of Arg-Gly-Arg-Pro, and eight conserved amino acid residues at the C-terminus of Gly-Ser-Lys-Asn-Lys-Pro-Lys-Pro. This contrasts with an observed seven and ten conserved amino acid residues at the N-terminal and C-terminal of Type II, respectively. Comparing the structure of Type-III and Type-II, they have the same PPC domain and the N-terminal of AT-hook motif conservative structure, but the former lack conserved amino acids residues of AT-hook motif at the C-terminal. The observed diversity in the AT-hook motif and PPC domains across soybean AHL proteins are likely to result in diverse biological functions.
Gene structure and motif prediction analysis in the AT-hook motif gene family in soybean We implemented a gene structure analysis and estimated the length of AHL genes, and the variability in the number of CDS and UTRs (Fig. 2, Table 1). The length of the AHL gene family ranges from 585bp to 7968bp, with a total of 12 genes (mostly from Clade A), lacking the UTR, and some showing a variable number of introns and exons (usually Types II and III showed a higher number of introns). Type-I genes were the shortest and contained the lowest number of CDS, which began to increase from Glyma.20G202300. Among them, Type-II and Type-III have two or more introns, which are more obvious than Type-I. Thus, we believe that Type-II and Type-III evolved from Type-I. This result is consistent with the report of maize AHL gene family [19]. In eukaryotes, introns and exons alternately form genes. In plants, up to 60% of the genes undergo splicing, most of which occurs in introns [28]. After the introduction of intron-mediated enhancement(IME) into Arabidopsis, mRNA accumulation increased by 24 times and the activity of the reporter enzyme increased by 40 times, indicating that introns have an important in uence on the regulation of gene expression in plants [29]. This was also observed in maize, where introns increased the expression level of the genes Zm00001d018515 and Zm00001d051861 [19]. The alternative splicing of introns results in a diverse range of encoded proteins and thus to abundant biological functions. So it is possible that the increased number of introns in soybean AHLs expand the abundance of AHL proteins. In Type-I of maize, only one gene has UTR, while most genes have UTR in soybean [19], indicating that AHLs gene structure of different species is diverse. In summary, we suspect that Type-II and Type-III introns enable plants to acquire more complex and diverse biological functions, and at the same time lay the foundation for the further expansion of intron-carrying AHLs.
Next, MEME website was used to predict the protein motifs (Fig. 3). We found a total of ten conserved motifs were identi ed in the AHL proteins (Table 3), which contained of amino acids ranges from 8 to 32 while the sits rang from 8 to 62.
The motifs 3 and 6 had a common conserved Arg-Gly-Arg core, whereby likely belong to the AT-hook motif family. The motif 3 is de ned as type I AT-hook motif, and motif 6 is de ned as II AT-hook motif. Type-I AHL proteins contains a I AT-hook motif, Type-II contains both I and II AT-hook motifs, and Type-III only has a II AT-hook motif. The sequences downstream of the Arg-Gly-Arg core share common conserved that play an important role in AHL proteins [1]. Interestingly, there is also a conserved sequence Gly-Arg-Phe-Glu-Ile-Leu (motif 2) sequence in the PPC domain. This motif is not only found in soybeans, but also in other land plants, previous study has shown that this motif has an important in uence on the PPC domain [1]. It is worth noting that all AHL proteins contain motif 1, motif 4 and motif 5, indicating the consistency of the AHL protein sequences.
In summary, the results of our gene structure and motif prediction analyses indicate that the AHL gene family has a consistent and evolutionary diversity in soybean and other land plants [1], including maize [19] and cotton [20].
Evolution relationship of the AT-hook motif gene family in different species In order to further explore the evolutionary relationship between AHLs in different species by selecting Arabidopsis thaliana, sorghum (Sorghum bicolor L) and soybean as materials and constructing a phylogenetic tree a phylogenetic tree using MEGA7 (Fig. 4) [1]. Patterns of different colors are used to represent different species. The phylogeny includes 29, 63 and 25 full-length AHL proteins from Arabidopsis, soybean and sorghum, respectively. Our analysis showed that the AHL genes of these species can be divided into two distinct clades, A and B. A total of 15 and 14 proteins belonged to Clade-A in Arabidopsis and sorghum, respectively, compared to an observed 14 and 11 in Clade-B (Table. 2). While Type-I was the more conserved of all types, the lack of a new subgroup between Types II and III in Clade-B indicates the divergence of these proteins occurred relatively late. To sum up, the phylogenetic tree highlights the consistency of the evolution of AHLs among different species, together with the determination of the homology relationships between species provides insights for the future analysis of the biological functions of these proteins.

Chromosome location, duplication, GO annotations and collinearity analysis of the AT-hook motif gene family in soybean
In order to study the arrangement of 63 AHL genes to 20 different chromosomes in the soybean genome. (Fig. 5A). The gene location information was in Table 1. 63 AT-hook motif genes are distributed on 20 soybean chromosomes. There are 9 AHLs on chromosome 20, 1 AHL on chromosome 19 and no AHL on chromosome 12 and 15. And found that the distribution of these genes on chromosomes was independent of chromosomal length.
In the current study, we then used GO enrichment analysis to predict the potential biological functions of AHLs. As shown in Fig. 5B and Table 4, AHLs are involved in different biological functions of biological process(BP), molecular functions(MF), and cellular component(CC). Among all the enriched biological functions, we detected an association that the biological process(BP) biological process is related to owering development, indicating that the AHL gene family interfere in the growth and development of oral organs in soybean, which is consistent with the data published in Arabidopsis [17]. As for cellular component is the most abundant, the most of the cell components are located in the nucleus. In terms of the molecular function (MF) category, we identi ed DNA binding (GO: 0003677), sequence-speci c DNA binding transcription factor activity (GO: 0003700) and protein binding (GO: 0005515) are identi ed.
Most AHL proteins evolved to bind DNA and are able to speci cally target DNA to perform different biological processes, suggesting AHLs can regulate the expression of other genes.
Gene duplication is a common process in plant evolution that leads to the expansion of gene families, of which tandem and segmental gene duplication events are the most common in angiosperms [30][31][32][33]. In order to further examine the evolution of AHLs in soybean, we analyzed gene duplication events in the AThook motif gene family, as shown in Fig. 5C and Table 6. And showed that 84% of AHL genes result from segmental duplication events, while 13% represent tandem gene duplication events, and the remaining 3% are proximal. These results suggest that segment duplication events may be the main driver of AHL gene family evolution.
The collinearity relationship of AHLs of two dicotyledonous plants (Poplar and medicago) and two monocots plants (rice and maize) plants were investigated in order to explore the potential evolutionary relationships (Fig. 6). The results revealed a higher homology between soybean, medicago and poplar than that between rice and maize. Compared with monocots, more AHL homologous genes are found in dicots. Some soybean AHL genes are collinear with AHL genes in other plants, particularly in poplar and medicago, which suggests that these genes may play important roles in plant evolution. These results can be useful for subsequent comparative studies of AHL genes with known functions.
Promoter sequence analysis of the AT-hook motif gene family in soybean In organisms, the gene promoter region is located upstream of genes, binds to transcription factors is called the cis-regulatory element, which plays an important role in the biological regulation of gene expression under stress [34]. We identi ed cis-regulating elements for light responsiveness, anaerobic induction, MYB and gibberellin-responsiveness cis-regulating elements in the 2100bp region upstream of the AHLs promoters (Fig. 7). Approximately 43.5% of the selected genes contained a MYB binding sites, and previous studies have shown that the MYB gene family can regulate anther development and function formation [35,36]. In addition, more than 198 and 183 MYB members directly or indirectly involved in responses to drought stress were described in Arabidopsis and rice, respectively [37], including AHL gene in rice [22]. Therefore, it is possible that the AHL gene family can also mediate responses to drought stress in soybean. AHLs participate in the regulation of growth and development in soybean, con rming the variety of functions played by AHLs in soybean growth.
Co-expression network analysis of the AT-hook motif gene family in soybean A co-expression network was used to represent the upstream and downstream genes that interact with AHLs in the three different Types (Fig. 8). We picked out the representative genes from the co-expression network and the annotated genes functions are available in the supplementary material Table 5. Our study demonstrates that some AHLs are associated with genes related to energy binding, such as Glyma.11G179200 Glyma.09G196600, that might be involved in soybean energy transduction. The coexpression network indicates that in addition to interacting with other genes, AT-hook motif genes also interacted to some extent with each other. For example, Type II Glyma.20G212200 interacted with four AThook motif genes to jointly regulate the expression of other genes. We also found that AT-hook motif genes are involved in biological processes histone binding and ATP binding in soybean and that the same gene is involved in histone modi cation in Arabidopsis thaliana [17]. In our speculations, part of AHL genes is related to nucleation signals and mainly distributed in Type-II, whereby, AHL genes regulates the nucleation process of other proteins in soybean. The reported DELLA (LeGAI) gene is expressed in both nutritional and reproductive tissues in tomato and this gene family is also involved in GA signal transduction [43]. In our research, that the AHL gene of Glyma.20G212200 was co-expressed with two Glyma.05G140400 and Glyma.08g095800 DELLA genes. Similarly, Glyma.16G204400 and Glyma.08g095800 Glyma.05G140400 DELLA genes interact to regulate the gibberellin transduction pathway in soybean. Therefore, we consider that the AT-hook motif gene family is involved in gibberellin signal transduction pathway in soybean. Together, our results show that the AHL gene family is involved in regulating biological processes such as energy transduction, the gibberellin pathway and the nuclear entry signal pathway in soybean.
Expression pro les of the AT-hook motif gene family in soybean To address the expression patterns of the AT-hook motif gene family, we selected the representative soybean cultivars, Jack and Williams82 at different tissues and during the VC stage. The transcription data is available from NCBI (accession number: SRP285849) [44]. W82 and Jack were used to investigate whether there were differences in the expression pro les of the AT-hook motif gene family between different soybean varieties ( Fig. 9A and Fig. 9B). The expression results showed that AHLs were mostly expressed in roots and meristems, and that these patterns were similar in W82 and Jack. There are 35 and 31 genes with high expression levels in Jack and W82 roots, respectively. Of the 35 highly expressed genes in Jack's roots, 22 expressed the same as W82. Of the remaining 13 genes with inconsistent expression, 9 genes had high expression in Jack. In meristem, 26 and 24 genes are highly expressed in Jack and 21 in W82, respectively. The results of the study nd that the expression of the same gene differs between different varieties. For example, the expression level of Glyma.09G260600 is higher in Jack and lower in W82. The expression levels in the leaves of both Jack and W82 are very low, with the exception of 5 genes in Jack and 4 genes in W82. This corroborates previous results in maize [19]. In the Jack' epicotyl, we nd 5 highly expressed genes, similar to W82. In the hypocotyl, Glyma.04G091600 and Glyma.06G093400 are both highly expressed, and the expression is consistent. But the expression level of Glyma.18G036200 of the hypocotyl in W82 is higher than that of Jack. Interestingly, the genes showing high levels of expression in meristematic tissues are mainly distributed in Type-II, while those highly expressed in the roots mainly belong to Type-I. These results indicate that although the AHL genes in Jack and W82 had similar expression patterns in different tissues, different genes were expressed differently between the two varieties. Hence, different AHL genes may have different functions in the two varieties, and may play important roles in plant development. At the same time, for veri cation the data of RNA-seq, 3 genes for RT-qPCR were performed to evaluate the expression pattern of three genes in the roots, leaves, meristem, epicotyl and hypocotyl of W82 (Fig. 9C). The results show that genes with high expression levels in one tissue have low expression levels in other tissues, indicating that AHL genes expression is tissue speci c in soybean.

The expression of the AT-hook motif gene family under drought and submergence
Both drought and submergence have adverse effects on plant growth and a previous study has shown that AHLs mediate plant response to drought stress [22]. Based on the cis-acting analysis, a part of AHLs contain a MYB element, so we hypothesise that AHLs in soybean may also impact in drought stress responses in in soybean. Hence, we tested the expression of genes in the leaves and roots of W82 under submergence and drought conditions (PRJNA574626) at the V1 stage ( Fig. 10A and Fig. 10B). The RNA transcription data is from NCBI. Both in the control and treatment showed that a higher number of AHLs were expressed in roots compared to the leaves, which is consistent with the results in Fig. 9A and B.
After 5-6 days of drought treatment, the expression of highly expressed genes, such as Glyma.02G285500, considerably decreased. However, the expression of Glyma.14G181200 increased, especially after 6 days of drought treatment in leaves. In the roots, drought treatment led a signi cant reduction of expression genes compared to the control group. Similar patterns were observed under submergence treatment, where some genes, such as Glyma.14G066800, showed signi cantly higher expression in leaves than controls. Overall, the levels of expression of most genes were decreased after submergence in roots.
We used roots and leaves at V1 stage of W82 to verify the expression of AHL genes under drought and submergence stresses (Fig. 10D). Our study found that after one day of submergence stress, the expression level of AHL genes in leaves increased signi cantly, and the expression decreased signi cantly after three days of submergence. When the treatment was restored for one day, the expression level of AHL genes were same as that of the control. The expression level in roots decreased after submergence stress. The expression of AHL genes increased signi cantly after one day of drought stress, and decreased after six days of drought in the leaves. As the stress time increased, the expression level decreased compared with the control in the roots after drought stress. At the same time, we recorded the phenotype of soybean under submergence and drought stress (Fig. 10C). As the stress time increases, the soybean plant under stress is shorter and more wilting than the control, but the phenotypic difference is not particularly obvious.
These results suggest that during stress condition, gene expression overall increases in the leaves and decreases in the roots. Furthermore, we also found that after 1 day of recovery, the levels of gene expression were restored, and were sometimes even higher than those of the control group. The different expression patterns indicate that AHLs are more expressed in the roots, and are involved in responses to drought and submergence stress.

Discussion
Identi cation of the AT-hook motif gene family in soybean It's well documented that soybean is the staple crop in world, and provides a great source of proteins for human populations. Previous studies in Arabidopsis thaliana, maize and cotton have provided comprehensive information and the basis for our research on soybean, revealing the multiple functions associated with of AHLs, particularly involved in regulating plant growth and stress responses [19,20,25]. We decided to further study the AHL gene family in soybean as this may provide the molecular basis for high-stress tolerance in plants and shed light on the improvement of environmental adaptation.
We identi ed AHL soybean genes from the JGI Phytozome website [45]. These genes were predicted based on the presence of a PPC domain and the AT-hook motif, and were included in the Pfam website [46]. In this study, 63 AT-hook motif genes were identi ed in soybean and generated a phylogenetic tree using the MEGA7 software. According to the phylogenetic tree, the AT-hook motif gene family is divided into two Clades on the basis of PPC domain, Clade-A and Clade-B, respectively. Among them, Clade-B is further classi ed into two Types on the basis of the AT-hook motif, Type-II and Type-III. Clade-A is also referred to as Type-I. That the PPC domain of Clade-A has more changes, which is consistent with the results in maize [19]. Our results indicates that more changes in the PPC domain lead adaptation in plants. The anking sequences of the AT-hook motif in soybean are similar to other land plants [1], and most AHL genes belonged to Clade-A, whereby this clade seemingly contains richer and more conserved functions that are essential for plant survival. In our paper, the AHL gene family was distributed on 18 chromosomes, independently of chromosome size and location. We also found that segmental duplication events are the main form of duplication in the AHL gene family in soybean, which contrasts to observations in maize showing dispersive duplication is more common [19]. This illustrates that the AHL gene family expanded in different ways in different species.

Conversation of the AT-hook motif gene family in soybean
The AHL gene family is conserved across land plants, and all AHL genes share a PPC/DUF domain. In Clade-A, this PPC/DUF domain contains the conserved L-R-S-H motif, while Clade-B displays F-T-P-H. We were also able to observe that the diversity of the AHL gene family in soybean extends beyond the amino acid sequences of the PPC/DUF domain and is also present in the AT-hook motif sequences, which have an R-G-R core. However, while the sequence of this core in Clade-A is R-G-R-P in Clade-B it is R-G-R-P-R-K-Y. It has been previously suggested that Clade-B evolved from Clade-A [1]. The gene structures of the AThook motif gene family with UTR-less and multiple-CDS. 12 genes in Clade-A show UTR-less. And in Type-II and Type-III, the number of intron is increased. So we speculate that the increase of introns leads to the diversity of protein structures.
The collinearity analysis showed that soybean AHLs have high degrees of homology with other species, as shown by comparisons in four different plant species: Oryza sativa, Zea mays, Populus trichocarpa, Medicago Sativa.

Expression Patterns In Soybean
The expression patterns based on cis-elements found in the promoter regions show that AHL genes may participate in plant light morphology, growth and development, and also stress response. Co-expression analysis indicates that AHL proteins may be involved in the gibberellin pathway, which is involved in plant responses to drought and excess water. Previous study has shown that gibberellin can be involved in plant drought and water ooding stress [47]. Overexpression of CBF/DREB2 in Arabidopsis thaliana can reduce the content of active GAs and improve drought tolerance [48], and the CYP96B4/ SD37 in the amycin synthesis pathway is related to the drought tolerance in rice [49]. The drought tolerance of the dss1 mutant is signi cantly higher than that of the wild type, which is due to the decrease of GA 1 [49].
The stress caused by long-term water-ooding in rice inhibits the levels of ethylene, reduces the amount of active GAs, and thus inhibits the elongation of the internodes [50,51]. It is found that the AHL genes are involved in the gibberellin pathway in soybeans, it is possible that the AHL gene family also regulates the expression of genes in soybeans mediating responses to drought and ood stress. Therefore, the AHLs expression of W82 under drought and ood conditions was analyzed. Our results indicated that, under these stress conditions, the expression of AHL genes decreased in the roots. At the same time, the expression of AHLs in different tissues from distinct soybean varieties indicated that the expression of AHLs was higher in the roots. This shows that the expression of AHLs decreased under abiotic stress, and that the soybean response to involves the gibberellin and other biological pathways. At the same time, we also used the W82 leaves and roots of the V1 stage to verify. It is interesting to nd that the gene expression levels in the leaves on the rst day of stress treatment increased signi cantly, and then decreased. Regarding the mechanism of this phenomenon, it is also needs further study. Accordingly, the AHL gene family plays an important role in soybean resilience, providing a theoretical basis for future breeding of this important crop.

Conclusion
We characterized 63 AHL genes in soybean and analyzed their respective motif composition. The phylogenetic tree divided these genes into two clades based on the PPC domain. We also investigated the cis-acting elements of the promoter regions of AHL genes and their co-expression network, and systematically studied the AHLs expression pro les in different tissues and varieties, as well as the response to stress conditions. The systematic exploration of AHL genes in soybean lays the foundation for future work in soybean breeding.

Methods
Identi cation of the AT-hook motif gene family The AT-hook motif gene family of Arabidopsis thaliana was obtained from the TAIR database (https://www.arabidopsis.org/) [52]. The amino acid sequences of the AT-hook motif genes of soybean and sorghum were from JGI Phytozome website (https://phytozome.jgi.doe.gov/pz/portal.html) and Ensemble Plants (https://plants.ensembl.org/index.html) [45,53]. We used Pfam (https://pfam.xfam.org) to predict the genes containing the PPC domain, and then ltered out the genes containing both the PPC domain and AT-hook motif [54]. The homology comparison of amino acid sequences of Arabidopsis thaliana, soybean and sorghum was performed. We used online ExPASy program (http://www.expasy.org/tools/) to determine the biochemistry of each AHL protein, including the number of amino acids, the molecular weight (MW) and predict the isoelectric point (pI) parameters [55].

Phylogenetic Analysis
We used a Neighbor-Joining tree to represent the phylogenetic relationship between the AHL genes [Gene Structure Analysis class="CitationRef">56]. We used MEME (http://meme-suite.org/) to predict the conserved motif of AT-hook motif in the AHL gene family with an e-value of 10 − 5 in soybean [58], and obtained a total of 10 conserved motifs. The nal le was generated by TBtools [59]. The gene structure of the AT-hook motif genes was analyzed using the TBtools software [59]. The structures of the genes were mapped through CDS and genome sequencing. We used the SMART website (http://smart.embl-heidelberg.de/) to evaluate the accuracy of the selected proteins [60].
The amino acid sequences of Arabidopsis thaliana, Glycine max and sorghum were selected to construct the phylogenetic tree by using the MEGA7 software [46]. We implemented a total of 1000 bootstraps to present the evolutionary history [57].

Chromosome Location Analysis, Collinearity Analysis And Go Annotation Analysis
Chromosome mapping information for the AT-hook motif genes was obtained from JGI Phytozome Ensemble Plants. The map of chromosome locations was drawn using the TBtools software [59]. We selected full-length amino acids sequences for four species to perform collinearity analysis with soybean.
The collinear relationship was estimated using the MCScanx and TBtools software [59,61]. We used the Soy Base (https://www.soybase.org) website to conduct GO analysis on 63 AT-hook motif genes.

Cis-acting Elements Analysis And Co-expression Network
We obtained 2100 bp genome sequences spanning the promoter regions of the AT-hook motif gene family of Glycine max from NCBI. The cis-acting elements were analyzed using TB tools [59]. Co-   Gene structure analysis of the AT-hook motif gene family in soybean. The x-axis shows the inferred length of the different genes (5' to 3') and their respective CDS (green) and UTR (yellow).

Figure 3
Conservative motif prediction of the AT-hook motif gene family. All motifs were identi ed using the MEME website. A total of ten different motifs are represented by different colors, with the motif sequence shown below. The length of the amino acid was inferred by ruler at bottom. Different colors of letters represent different kinds of amino acids residues, and the size of letters represents the frequency of amino acid occurrence. Most of the genes in the same clade contain the similar motifs.   The cis-acting elements of the promoter sub-region. The four elements contained in the AT-hook motif gene family include light responsiveness, anaerobic induction, MYB and gibberellin-responsiveness elements. Different colors represent different elements.

Figure 8
Co-expression network involving in soybean. The whole network for Type-I(A), Type-II(B) and Type-III(C) were drawn with brown ellipses. The genes interacting with AHLs are shown as pink circles, and the selected AHL genes correspond to the orange circles.

Figure 9
The expression levels of AT-hook motif genes in Jack(A) and Williams82(B). The colors going from blue to red indicate an increasing level of expression. The cluster tree on the left was classi ed based on expression levels. The horizontal axis represents the expression level of the same gene in different tissue.

Page 25/26
The ordinate represents the level of expression of different genes in the same tissue. Tissue speci c expression of the AT-hook motif genes and expression patterns of three genes in Williams82(C).
Expression of Glyma.05G111500, Glyma.20G087200 and Glyma.06G093400 and in leaves, meristem, roots, epicotyl and hypocotyl at the VC stage. M: Meristem; U: Unifoliate leaves; R: Roots; E: Epicotyl; H: Hypocotyl. DRO and SUB represent drought and submergence, respectively. D represents day. CT represents control treatment. L and R are the leaves and roots, respectively. DRO_REC_L/R means 1 day recovery following 6 days of drought in leaves/roots. SUB_REC_L/R means 1 day recovery following 3 days of submergence in Page 26/26 leaves/roots. The growth of soybeans under submergence and drought stresses (C), the left is the treatment group, the right is the control group. Expression of Glyma.18G231300, Glyma.07G072300 and Glyma.20G087200, Glyma.05G111500 and Glyma.17G155400 in leaves and roots l at the V1 stage (D).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.