Evolutionary significance of amino acid permease transporters in 17 plants from Chlorophyta to Angiospermae

Nitrogen is an indispensable nutrient for plant growth. It is used and transported in the form of amino acids in living organisms. Transporting amino acids to various parts of plants requires relevant transport proteins, such as amino acid permeases (AAPs), which were our focus in this study. We found that 5 AAP genes were present in Chlorophyte species and more AAP genes were predicted in Bryophyta and Lycophytes. Two main groups were defined and group I comprised 5 clades. Our phylogenetic analysis indicated that the origin of clades 2, 3, and 4 is Gymnospermae and that these clades are closely related. The members of clade 1 included Chlorophyta to Gymnospermae. Group II, as a new branch consisting of non-seed plants, is first proposed in our research. Our results also indicated that the AAP family was already present in Chlorophyta and then expanded accompanying the development of vasculature. Concurrently, the AAP family experienced multiple duplication events that promoted the generation of new functions and differentiation of sub-functions. Our findings suggest that the AAP gene originated in Chlorophyta, and some non-seed AAP genes clustered in one group. A second group, which contained plants of all evolutionary stages, indicated the evolution of AAPs. These new findings can be used to guide future research.


Background
With the evolution of organisms being shaped by local conditions, this provides key information for understanding plants' appearance and reproduction characteristics. The transition from aquatic to terrestrial environments presents challenges accompanied by physiological and genetic adaptations [1]. As the ancestors of plants, algae play an important role in plant evolution. They are typically water-living and are also closely related to land plants [2]. Following the evolutionary history of plants, the presence of transcription factor gene families significantly increased over evolutionary time [3]. This explosive growth is due to dramatic changes in the environment which result in some new transcription factor families appearing or the enhancement of family members due to adaptation new ecosystems [4,5].
Thus far, research has been limited to the evolution of transcription factors in the plant kingdom [6]. However, to improve our understanding of the evolution of plant genes, early plants and their ancestors should also be investigated. Fortunately, some early plant species have been sequenced, including various alga, moss, and some other species. Interestingly, an early plant, Marchantia polymorpha, has a different level of transcription factor diversity compared with other land plants [7]. Following the evolutionary history of transcription factor families, we can also speculate their earliest function and importance in plants.
The basic conditions for plant growth and development are sunlight, water, and soil. Leaves can be used for photosynthesis to produce organic matter while roots absorb water and nutrients for developmental. Nitrogen is one of the most important nutrients for plant growth and it is required in many different compounds. Nitrogen mainly exists in the form of amino acids in plants, which assimilates within roots and leaves and is transported in the phloem to other organs [8]. To achieve this, amino acid compounds move into the phloem of minor veins in leaves. In roots, amino acids are transported through the xylem [9]. The root cells intake of amino acids is dependent on integral membrane transporter proteins [10]. Many of the proteins which were annotated may facilitate amino acid transport in plants.
As one of the amino acids translocators, the AAP subfamily has been analyzed in Arabidopsis thaliana (8 proteins), Oryza sativa (19 proteins), and other plants [15]. Each protein contains an amino acid transporter (Aa_trans; PF01490) domain and solute carrier families 5 and 6-like superfamily, which includes the solute-binding domain of SLC5 proteins, SLC6 proteins and NCS1 transporters [16]. The function of AtAAP1 is to regulate the absorption of amino acids in the endosperm [17], whereas AtAAP2 transports amino acids from the xylem to the phloem [8], AtAAP3 is mainly responsible for the absorption and transport of amino acids in the vascular tissue of the root [18], and AtAAP6 and AtAAP8 effectively transport neutral acidic amino acids [19]. All AtAAPs are located in the plasma membrane [20]. The function of AAP genes has also been reported in various plants, such as Solanum tuberosum and Vicia narbonensis, amongst others [21][22][23].
The function of AAP genes in A. thaliana has been thoroughly investigated but only Tegeder and Ward showed the molecular evolution of plant AAPs and LHTs [13]. This research incorporated many early plant species, which includes red algae, green algae (Chlorophytes and Charophytes), basal non-vascular (Physcomitrella patens), non-seed vascular (Selaginella moellendorffii), and vascular land plants (eudicots and monocots [13];. According to their study, the AAPs of 14 species were identified to indicate the homologs and construct a phylogenetic tree to explain the evolution relationship. In our study, we will identify some new sequencing species which include Chlorophyta, Bryophyta, lycophytes, Gymnospermae and Angiosperms. In the present study, we will identify the AAPs in each evolutionary stage and analyze the protein characteristics, structures, phylogenetic relationships, and gene ontology (GO) annotations of these genes to explain the evolution of AAPs in the plant kingdom. Further, the characteristics of AAPs will be explored and discussed.
In total, 210 proteins were blasted, with some genes having more than one transcript and we thus only selected the primary one. Through the analysis of predicted proteins, 154 proteins had Aa_trans or SLC5-6-like_sbd superfamily which consisted mainly of sequences to recognize the AAP proteins (Additional file 13). Only 5 AAP-like proteins were predicted in C. subellipsoidea from 7 different chlorophyte species we searched were predicted AAP proteins and the amount of AAP proteins in S. fallax were larger than others. Each tracheophyte speices also predicted AAP proteins, either. In order to visualize the groups of AAP proteins in plants at various stages, we used 7 different colors to distinguish the plant species and noted the plant species (Fig. 1) and the number of AAP proteins (Table 1) in each group.
We have provided some information about AAP proteins, which included the protein length, domain location and number of transmembrane domains and exons (Additional file 1). While for the most part exons numbered 6-8, in some species only 1 exon was identified and in C. subellipsoidea more than 10 exons were identified. In general, the number of exons was relatively stable in all plants. A greater number of exons more short sequences being constructed and the length of the sequence was not correlated with the number of exons.
The AAP protein family as an amino acid transporter had specific repetitive sequences. We predicted the location of the main motif, Aa_trans domain, and the number of transmembrane domains in each protein. The evalue was set − 5 to confirm that the domain showed all of the proteins in these two kinds of motifs. Most proteins had one main Aa_trans domain, except for Pp3c21_14080V3.1, 413,158, pa_MA_889393g0010, ZmAAAP17, ZmAAAP64, and OsAAP19, which had 2 domains which were all incomplete, and pa_MA_101691g0010, which had 3 segments. Six to twelve transmembrane domians were predicted in each protein. Among them, SmAAP9A contained 12 domains, 413, 158, 426,884 and ZmAAAP17 each contained 6 transmembrane domians (Additional files 1 and 10) and we showed all transmembrane domians by Fig. 2.

Phylogenetic analysis of AAP
In order to perform a comprehensive phylogenetic analysis of AAP proteins in plants, we selected some representative plant sequences at different evolutionary stages. In total, 154 proteins in 5 different plant stages, from chlorophytes to angiosperms, were used to construct a phylogenetic tree using the Neighbor-Joining method. We choose this method because it was especially well-suited for datasets comprising lineages with largely varying rates of evolution. It can be used in combination with methods that allow for correction of superimposed substitutions [28]. In the unroot tree we could easily divide to 2 main groups ( Fig. 1). Group I had more branching events and group II could be clearly divided into 2 parts which could reference the bootstrap values. We selected group I proteins to construct a phylogenetic tree in which the bootstrap values separated group I into 5 clades (Fig. 3). Clade 1 contained non-seed plants and Gymnospermae, and separated into 2 clusters based on the bootstrap values. The other 4 clades comprised seed plants, and Gymnospermae were located in clade 3, 4 and 5. We referenced a part of the grouping method from Tegeder and Ward [13] to classify these proteins. In group I, P. patens and S. moellendorffii AAP proteins were identical to those identified in Tegeder and Ward [13]. Group II mainly included early plant species from Chlorophyta, Bryophyta, and lycophytes. A. trichopoda also appeared in this group as the sister group of the remaining flowering plants. Other early plant AAP proteins mainly appeared in clade 1 and amount of these proteins were belonged to clade 1B. But no proteins were appeared in clade 1 till the evolution of angiosperms ( Table 1).

Investigation of gene duplication events and annotations
Gene duplication is potentially advantageous as a primary source of genes with new or modified functions [29]. We analyzed all predicted proteins from each species and found that C. subellipsoidea, P. patens and P. abies exhibited no duplication events. The highest number of tandem duplication events appeared in S. fallax and that of segment duplication events appeared in Z. mays. Oryza sativa had the highest number of duplication events (Additional file 1). Combined with the phylogenetic information it is evident that the duplication events of non-seed plants occurred in 2 main groups. Only M. polymorpha had a tandem duplication event that appeared in group II. All angiosperm duplication events belonged to group I except for those occurring in A. trichopoda. And S. fallax had a duplication event in group I, either (Fig. 4). The analysis of the plant genome duplication database (PGDD) [30] and MCscanX [31] also acquired 8 collinear gene pairs, which were homologous gene pairs in different plants. One of these was identified this event in S. moellendorffii for SmAAP9C, which had homologous genes in early plants, and the others all appeared in angiosperms (Additional file 3).
To better understand the gene evolution, it was necessary to calculate ratios of non-synonymous to synonymous nucleotide substitutions (Ka/Ks). We selected all duplicated Coding sequence (CDS) sequences, from which we had AAPs from different stages. The protein distribution can easily divide into 2 main parts which were showed by greenyellow and violet colors' dash lines and the group II might be divided into 2 subgroups indicated by gray and lightgray lines, respectively deleted the termination codon, to analyze the Ka/Ks ratios using DnaSP6 [32] and PGDD website databases. Firstly, the target genes were aligned using the ClustalX2 'align codons' function. Following this, Ka and Ks values were analyzed in DnaSP6. In total, 48 gene pairs were analyzed, and Ks values could not be determined for 3 collinear gene pairs. Ka/Ks ratio values were slightly above 1.0 in only 2 gene pairs (Sphfalx0007s0031.1/Sphfalx0007s0033.1 and Sphfalx0362s0005.1/Sphfalx0362s0007.1), and no Ka/Ks ratio values were much greater than 1.0. Collinear genes showed Ka/Ks ratios of less than 1.0 between Z. mays and O. sativa, whereas Ks values could not be determined between A. trichopoda and O. sativa, as well as S. moellendorffii and A. thaliana (Additional file 3).
We also used same method to calculate Ka/Ks ratio values in each of the plant species' AAPs (Additional file 4). The highest Ka/Ks value was also Sphfalx0007s031.1/ Sphfalx0007s033.1 and in OsAAP15/OsAAP16 and 174/ 1275 gene pairs the Ka value was 0 while the Ks value could not be calculated (Additional file 4). Overall, the Ka/Ks values of 16 gene pairs were greater than 1, with the majority occurring in monocots and 2 in S. fallax, which were duplication pairs (Additional file 6).
One hundred fifty-four proteins were annotated through Gene Ontology with specific reference to biological process (BP), molecular function (MF), and cellular component (CC). The results indicated that four aspects of CC were annotated to 154 genes and 46 proteins were predicted be related to CC, with majority of proteins belonging to non-seed plants. Seven proteins, which were all group II members, were located in plastids and only AtAAP3 existed in the nuclear envelope. Most proteins were located in the plasma membrane. Four aspects of MF were annotated to 103 proteins that were linked to transmembrane transporter activity. Further, OsAAP13, ZmAAAP09, and ZmAAAP69 were also associated with ion binding, ATPase activity and helicase activity. Four aspects of BP were annotated to 7 genes. Five proteins in Bryophyta participated in transport processes, two S. moellendorffii AAPs were related to transmembrane transport, and OsAAP13, ZmAAAP09, and ZmAAAP69 were associated with DNA metabolic processes and stress response (Fig. 5, Additional file 5).

Analysis of AAP proteins
AAP proteins belonged to the AAAP family and some proteins functioned were absorbing amino acid from roots and leaves and transported to other organs through the phloem. These findings based only on vascular plants and Tegeder and Ward's research showed that this protein family was predicted in Bryophyta [13]. In the present study, we expanded the plant species investigated in predicting the function of AAP proteins. We blasted the target proteins in Chlorophytes and these results were not reported. We then selected some representative plants in various evolutionary stages to explain the evolution of AAP proteins.
The FPKM protein families with biased distribution in Coccomyxa from Blanc et al. [33] showed that 9 chlorophytes which they studied all contained Aa_trans domain. Fig. 2 The division of whole AAP proteins. The tree shows that the 2 main groups are divided; group I is represented by violet and group II by green. It can be inferred from the phylogenetic tree that the two groups are genetically. Eleven plants in 5 main different evolutionary stages were used to build the phylogenetic tree. The main domain, Aa_trans and transmembrane structure. The blue bar in each protein is the location and numbers of Aa_trans and the red boxes are transmembrane structures. There are no distinct differences between group members However, in the present study, AAPs just existed in C. subellipsoidea belonging to the class of Trebouxiophyceae. From this discovery, we inferred the AAPs might originate from Chlorophyta, but we could not find out some other evidences. On the other hand, the studied of Tegeder and Ward [13] showed AAP might only tract back to Bryophyta and Bowman et al. finally indicated that the GH3 protein from M. polymorpha which could belong to group I from Zhang's research [6], but actually it proved that the protein was not related functions [7]. Thus, these hypothesis just depened on the protein prediction and structure analysis. Despite the fact that Chlorophyta are single-celled aquatic eukaryotes with no vascular structure, Blanc presented several protein families which were overrepresented in C. subellipsoidae, including those involved in lipid metabolism, transporters, cellulose synthases, and short alcohol dehydrogenases [33]. Work by Tegeder and Ward, as well as the present study, both identified AAP proteins in Bryophyta. As we We predicted 154 AAP proteins and analyzed Aa_trans and transmembrane domain in each protein. Not only early plants but also other plant species had a phenomenon which was the location of transmembrane domains might locate in Aa_trans domain. This condition was more common in A. trichopoda, S. fallax and C. subellipsoidea. We also labeled these proteins as 'Beyond' in Additional file 10. Additionally, we used the MEME website to acquire the distribution of motifs in each protein. Non-vascular and vascular plants all contained these 10 motifs in the same position and order (Additional files 2, 11 and 12). This structural information validated the potential existence of these predicted proteins.
Exons and introns constituted a genetic sequence and exons which were part of transcript sequences played an important role in gene function. According to the number of exons contained in each plant's AAPs it could be inferred that some introns may have been lost from Chlorophyta in subsequent evolutionary stages. Introns might be lost or gained over evolutionary time, as shown by many comparative studies of orthologous genes [34]. Due to the AAP genes in Chlorophyta all displaying the same transcript sequences, the structure of proteins did not vary greatly. Thus, we suggest that the differences in the number of introns/exons between different species is due to a large number of intron losses occurring during plant evolution. This phenomenon has been confirmed by Roy and Penny [35].

Evolution of AAP proteins
The results of phylogenetic showed a majority of nonvascular plants (Chlorophyta and Bryophyta) and A. trichopoda were composed of group II. Interestingly, only A.  trichopoda which belonged to Angiosperm and as a sister of flowering plant existed in group II which because six exogenous genomes constructed A. trichopoda mitochondrial genome, one from moss, three from green algae, and two from other flowering plants [36]. And we could not find out any AAP proteins belonged to group II in angiosperms. Group II could be divided into closely related 2 clusters. The phylogenetic tree also suggests that chlorophytes could be the origin of this protein. Due to the fact that the group of proteins all belonged to non-seed plants, it is likely that the function of this group is unrelated to amino acid transport in seeds. This suggests that the function of this protein group could disappear in evolution and the reason for this situation needed to be verified before the function of these genes could be further explained. On the other hand, the duplication events of these plant genes occurred mostly in this group which could mean some functionally redundant proteins were also predicted.
The classified about group I might indicated the AAP proteins' functional differentiation might occur in Gymnospermae and the distribution of A. thaliana AAP proteins in each clade also supports this supposition. This group might contain the primary proteins which are associated with amino acid transport. The phylogenetic tree of group I also indicated that clade 2 was closely related to clade 3 and 4 (Fig. 6). Compared to the phylogenetic trees of Tegeder and Ward, some of our branch proteins were grouped into different groups. These differences might be due to various factors, including the use of a different website to download the protein sequences, adding Gymnospermae and A. trichopoda AAP proteins into the analysis, and using a different website/ program to analyze phylogenetic relationships. In our tree, we could infer that the functional AAP proteins originated from Chlorophyta.
The phylogenetic tree indicated that bryophytes and vascular plants might had a common ancestor that was inherited from C. subellipsoidea AAP protein in group I (Additional file 1, Fig. 3). All non-vascular plants and mosses were clustered together, and the familial division started from P. abies. In addition, we found one duplication event in both S. fallax and S. moellendorffii. The evolutionary history of gene duplication events in mosses and lycophytes were independent from those in seed plants. It was not until A. trichopoda that duplicated information appeared and was conserved in angiosperms. Two additional duplication events were inferred before or early on in the evolution of flowering plants, since they were already present in the genome of A. trichopoda, which is considered a basal flowering plant [24]. Angiosperms proteins were lost from our research in clade 1 (Fig.  6), and none Angiosperms were matches which we searched these proteins via NCBI blast. Conversely, this clade was not closely related to the other 4 clades and the specialization of P. abies AAPs might lead to divisions. Based on these phylogenetic inferences, we concluded that AAP group I genes have a complex evolutionary history with several specific duplication and loss events. The duplication of genes increased with plant evolution as the AAP genes went from one copy in Chlorophyta to dozens in eudicots. With the development of vascular plants, AAP members underwent a drastic increase (Fig. 4).

Gene duplication events, Ka/Ks values, and GO annotation information
Gene duplication is a common phenomenon in all life forms and provides resources for novel gene functions  [37]. The most obvious contribution of gene duplication to evolution is the provision of new genetic material for mutation, leading to specialized or new gene functions, and contributed to species divergence and origins of species-specific features [38]. Our analysis of the duplication events showed the AAP family gene duplications were present in bryophytes. Following the evolution of plants, duplication events appeared in each evolutionary stage except P. abies (belonging to Gymnospermae). We blasted some other gymnosperms and acquired no results through the NCBI database. It is possible that there were few sequences for gymnosperm species, and duplication events might be analyzed in future research. Analysis of duplication events in group I revealed that the evolution of AAPs was also based on gene replication. With the evolution of plants, duplication of AAPs gradually increased, providing evidence for the increasingly important role of this family in plant evolution. There was one duplication event in non-vascular plants and following the development of vascular plants, a drastic increase of duplication events appeared, which confirmed the important role of AAP as a transport-related protein.
Through calculating the Ka and Ks of duplication gene pairs in S. fallax, the Ka/Ks value of 3 gene pairs were found to be close to 1, meaning that these genes were not suffering natural selection pressure. The Ka/Ks values for the other duplicated genes were all consistent with purifying selection which were less than 1. And it was because a mutation that changes a protein is much less likely to be different between two species than one which is silent; that is, most of the time selection eliminates deleterious mutations, keeping the protein as it is [39]. In general, AAP duplications were not change protein within a species, as suggested by Arcadi and Barton [40]. The collinearity gene pairs also showed no one was from group II and the Ka/Ks value also indicated the evolution was stable (Additional file 3). Group I and group II had not significant evolutionary relationship.
The Ka/Ks ratio values of each species showed that most genes were stable and that they were all under purify selection or neutral evolution (Additional file 6). Even though some species exhibited distinct Ka/Ks values, the majority did not, which may have been affected by variable sequence alignment. In order to eliminate these distinctions, we separately compared the CDS sequences to calculate their Ka/Ks ratios. However, this produced very similar results to the original analysis. In general, the AAPs were a relatively stable gene family through the process of plant evolution.
Functional annotation of sequences is a key requirement for the successful generation of functional genomes in biological research. GO annotation is one of the ways to predict the function of genes in terms of cellular components, molecular function and biological processes [41]. In our study many plant species were not model organisms and therefore some GO information could not be acquire from website databases. Blast2GO software conveniently assisted with this problem. Based on the results, many proteins clustered in the plasma membrane and the AAP proteins main molecular function was in transmembrane transporter activity. These validated AAPs were integral membrane proteins involved in the transport of amino acids into the cell. Interestingly, OsAAP13, ZmAAAP09, and ZmAAAP69 responded to stress, and only 2 proteins participated in transmembrane transport. The protein structure and phylogenetic tree confirm that these proteins belonged to the AAP family (Fig. 5).

Conclusion
In recent years, the improvement of plant sequencing technology had provided some support for the study of basal lineages. Simultaneously, it also provides a lot of data for the evolutionary study of gene families. Here, we used these databases for the identification of AAPs in the plant kingdom. Firstly, we predicted and analyzed the structure of AAP members. Comparing with others rearch, we newly found AAPs were present in chlorophyte species and more AAP members were also predicted in Bryophyta and Lycophytes [13]. Phylogenetic relationships between members of the whole AAP family showed that these members were explicitly divided into two main groups in our research. This group classification contained a group enriched by a large number of non-seed plant family members. Group I members contained all plant stages. This group indicated the origin and evolution of a functional AAP gene. Group II enriched non-seed plants which might have special functions. The AAP genes in Chlorophyta were predicted in another group and this might advance the period of AAP protein from Bryophyta to Chlorophyta.
We found that each member had the same motifs and Aa_trans was the main sequence. The prediction of transmembrane structure showed that each member occurred in similar numbers and locations. The results indicated the structures of AAP members were relatively conservative in terms of plant evolution. Only the number of exons and introns varied and intron losses might drive this difference during plant evolution. The duplication events indicated that the increase in AAPs was based on the emergence of vascular bundles [42].

Analysis of AAP proteins in 17 plant species
The 17 plants protein/genome/CDS sequenceswere download from the Phytozome V12 website. 1 Arabidopsis thaliana, O. sativa, Z. mays, and S. tuberosum proteins were acquired from researchers [25][26][27]. We used AtAAPs protein sequences as a query to blast against other plants (e-value = 10 − 10 ). To ensure each protein belonged to the AAP subfamily, all target proteins were analyzed by NCBI-CDD 2 and Pfam 3 to check that each protein had an amino acid transporter (Aa_trans) alignment. To ensure candidate proteins contained complete functional areas for AAP, all proteins were aligned using the multiple sequence alignment tool ClustalX2 . 4 After excluding small portions of proteins with a length considerably less than 341.30 aa, which is the average length of the Aa_trans domain, 5 the remaining sequences were considered as putative proteins.
The proteins motifs were analyzed through the Pfam website [5] and MEME website 6 using the default parameters. A combined transmembrane topology and signal peptide was predicted by the TMHMM website. 7

Investigation of gene duplication events, Ka/Ks ratio values and annotation information
According to Zhang et al. (2018), gene tandem duplication pairs should satisfied two requirements. The first is the similarity of each pair gene sequence should be more than 50% and the second is the genes should be physically located in same chromosome with a distance of less than 50 kb from each other [6]. The PGDD website 8 was used to search the gene segmental duplication pairs and the MCScanX program 9 acquired other species databases which did not exist in the website. DnaSP6 software 10 was used to calculate gene pair Ka/Ks ratios to describe the evolutionary pressure. Each CDS sequence was acquired from Phytozome V12 and the termination codons deleted before calculating Ka/Ks ratios.
Gene annotation was carried out by searching gene ontology (GO) through the Blast2GO software .11 After uploading the amino acid sequences to the software, the associated molecular function, cellular components, and biological processes are acquired. This is carried out separately for each species as the software cannot conduct simultaneous species analyzes. Blast2GO is based on the NCBI database, thus all genes can be analyzed at the same time.

Phylogenetic analysis of AAP
The phylogenetic inference was carried out using the MEGA7 software .12 Seventeen species of plants were included in the tree. The Neighbor-Joining (NJ) method was used to calculate genetic distance [43]. To ensure the accuracy of the analysis, the number of bootstrap replications was set to 1000 with a Poisson substitution model and using the pairwise deletion option to handle missing data. The classification of family members is based on the multiple sequence alignment and the genetic distance in phylogenetic tree.