Evolution and structural variations in chloroplast tRNAs in gymnosperms

Background Chloroplast transfer RNAs (tRNAs) can participate in various vital processes. Gymnosperms have important ecological and economic value, and they are the dominant species in forest ecosystems in the Northern Hemisphere. However, the evolution and structural changes in chloroplast tRNAs in gymnosperms remain largely unclear. Results In this study, we determined the nucleotide evolution, phylogenetic relationships, and structural variations in 1779 chloroplast tRNAs in gymnosperms. The numbers and types of tRNA genes present in the chloroplast genomes of different gymnosperms did not differ greatly, where the average number of tRNAs was 33 and the frequencies of occurrence for various types of tRNAs were generally consistent. Nearly half of the anticodons were absent. Molecular sequence variation analysis identified the conserved secondary structures of tRNAs. About a quarter of the tRNA genes were found to contain precoded 3′ CCA tails. A few tRNAs have undergone novel structural changes that are closely related to their minimum free energy, and these structural changes affect the stability of the tRNAs. Phylogenetic analysis showed that tRNAs have evolved from multiple common ancestors. The transition rate was higher than the transversion rate in gymnosperm chloroplast tRNAs. More loss events than duplication events have occurred in gymnosperm chloroplast tRNAs during their evolutionary process. Conclusions These findings provide novel insights into the molecular evolution and biological characteristics of chloroplast tRNAs in gymnosperms. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08058-3.

of plant plastid genome made it a good material for evolutionary and genomics research [19,20]. At the same time, tRNA acts as a bridge in the process of gene expression. Therefore, the analyses of tRNA genes in chloroplast can provide a theoretical basis for the further study of tRNA structure, function and evolutionary relationship, which has great biological signi cance.
At present, systematic studies on the chloroplast tRNA gene of gymnosperms are still lacking. Due to the global widespread distribution of gymnosperms and their important value in many aspects, they have important research signi cance. In this study, we selected 54 species belonging to 54 different genera in the gymnosperm phylum, and systematically analyzed their chloroplast tRNA. Based on the extraction and re-annotation of tRNA genes in the chloroplast genome of each species to discuss the differences in the composition, conservation and structural changes of tRNA in chloroplast of different plants, the evolutionary relationship and the main events that tRNA experienced in the evolutionary process. This study mainly aims at the following questions: (1) How about the distribution rules and conservativeness of different types of tRNA? (2) Why does certain tRNAs always contain precoded 3'CCA tails? (3) How does the minimum free energy affect the stability of tRNA's secondary structure? and (4) the main types of events that tRNA has experienced during its evolutionary history.

Results
Gene compositions of gymnosperm chloroplast tRNA In 54 chloroplast genomes of gymnosperms (Table S1), 1,779 tRNA genes were totally annotated, encoding 20 essential amino acids. The chloroplast tRNA gene content of plants was relatively uniform [8]. In this study, the number of chloroplast tRNA genes in each species was approximately 33 in average. Very few species encode only 27 tRNAs, and very few species encode up to 39 tRNAs. Callitris rhomboidea, Dacrycarpus imbricatus and Pseudotaxus chienii encoded 27 tRNAs; Gnetum parvifolium and Macrozamia mountperriensis encoded 39 tRNAs (Table 1).
In the chloroplast genome of each species, almost every tRNA was encoded, but some tRNAs were not encoded in some species (Table 1). tRNA Ala was found to be missing in 8 species; tRNA Thr , tRNA Glu , tRNA Phe and tRNA Leu were found to be missing in 1 species; tRNA Val was found to be missing in 2 species; tRNA Lys were found to be missing in 14 species; tRNA Gln were found to be missing in 5 species.
There were more tRNA Ser , tRNA Arg and tRNA Leu in the chloroplast genome of all species. tRNA Ser appeared three times in most species and two or four times in some species, and appeared six times in Nothotsuga longibracteata; tRNA Arg and tRNA Leu commonly appeared in various species for 2-3 times, tRNA Leu appeared six times in Ephedra equisetina. tRNA Gly , tRNA Pro and tRNA Thr were the second largest groups, occurring twice in most species. In contrast, suppressor tRNA and selenocysteine were completely absent in the chloroplast genome of gymnosperm.
In addition, the length of the chloroplast tRNAs ranged from 56 nt to 90 nt while the average length was about 82 nt. tRNA Gly (UCC) of Cunninghamia lanceolate is the smallest gene found in this study, only contains 56 nucleotides. The sequence length of tRNA Leu , tRNA Ser and tRNA Tyr were all more than 80 nt. A few tRNA Ser were found to reach 90 nt, and tRNA Gly (UCC) of Sequoia sempervirens was also 90 nt in length. The length of other tRNA groups was all about 73 nt, but a few were shorter than 70 nt.

Gymnosperm chloroplast tRNAs contain 34 anti-codons
The genetic code was degenerate, in that 20 amino acids were encoded by 61 triplet codes [21], but analyses showed gymnosperm tRNA contain 34 different anti-codons in 1,779 tRNAs, the rest of the 27 anti-codons were not found in any of the tRNAs of the investigated gymnosperm chloroplast genomes ( Table 2). The anti-codons encoded in this study include: tRNA Ala (UGC), tRNA Gly (GCC, UCC), tRNA Pro Conservation of gymnosperm chloroplast tRNAs Different tRNAs can transport different amino acids due to their different nucleotide compositions and structures. tRNA sequences were analyzed to nd their conserved regions (Table 3). Through comparative analysis of the nucleotide composition of tRNA loops and arms, it was found that there were conserved nucleotides or nucleotide sequences in multiple positions. Analyses revealed that at the rst position in acceptor arm, tRNA Ala (UGC), tRNA Gly Table 3). The second position of anti-codon loop was found to contain a conserved T nucleotide. The last position of anti-codon loop was mostly a conserved A nucleotide. In addition, it can be noted that the anti-codon loop had a high preference for uracil and adenine. Moreover, nucleotide conservation was very low in the variable region due to its structural variability. But at the last position of the variable region, there were still many tRNAs possess a conserved C nucleotide. The Ψ-arm and Ψ-loop were the most conservative regions whether in nucleotide number or composition of nucleotides. The Ψ-arms all contained ve nucleotides, at the last two positions of this region, the vast majority of tRNAs were G nucleotides. The Ψ-loops all contain seven nucleotides, there was found a totally conserved sequence U-U-C; and vast majority of tRNAs had a conserved U nucleotide at the last position.
Possession of an intact CCA was a basic prerequisite for tRNAs to participate in the decoding process [22]. The 3' terminal of eukaryotic tRNAs generally did not have a CCA sequence, so adding a 3' CCA tail is an important step in tRNA biosynthesis. In the gymnosperms of this study, tRNA Ala , tRNA Arg , tRNA Glu , tRNA Leu , tRNA Tyr and tRNA Lys were found to contain a 3' CCA tail ( Figure. 1), most tRNAs did not have 3' CCA tails.
Nucleotide variation in the arms and loops of tRNA Conservation was also re ected in the number of nucleotides in each loop arm of tRNA. Out of all 1,779 tRNAs in this study, the number of nucleotides in the acceptor arm ranges from 0 to 8 (Table 4) We calculated the minimum free energy (△G) of each type of novel structural tRNA as well as some normal structural tRNAs ( Table 5). The result shows, the minimum free energy of type 1 tRNAs was -12.6 kcal/mol on average. tRNA Gly (UCC) of Cunninghamia lanceolate was very high (△G = -9.6 kcal/mol), it shows that the absence of the acceptor arm leads to a very large impact on the stability of its structure. The minimum free energy of all tRNAs in type 1 was much higher than normal structural tRNAs (△G = -26.5 kcal/mol). Therefore, the absence of the acceptor arm can reduce the structural stability of tRNA in general. The minimum free energy of type 2 was around -19.3 kcal/mol. In type 2, tRNA Gly (GCC) of Sequoia sempervirens possess the lowest value (△G = -28.3 kcal/mol), indicates that the presence of extra nucleotides at the 3' end greatly improves the stability of the structure. On the contrary, tRNA Met (CAU) of Cephalotaxus oliveri possess a highest value (△G = -11.8 kcal/mol), the stability was greatly reduced due to the presence of atypical nucleotides at the 3' end. The minimum free energy of type 3 was -33.2 kcal/mol on average. The minimum free energy values of these tRNAs were basically below -30.0 kcal/mol. tRNA Tyr (GUA) always possess a very low value. Therefore, the loops and arms that appear in the variable region work together with the structure of other regions to create an extremely stable tRNA structure. However, compared with other tRNAs in type 3, tRNA Leu (CAA) was found to be special because of its higher minimum free energy value. The value of tRNA Leu (CAA) was around -26.1 kcal/mol, it's much higher than the average of type 3 (△G = -32.8 kcal/mol) and close to the normal structural tRNAs (△G = -26.5 kcal/mol). So for tRNA Leu (CAA), the structural changes of the variable region did not bring any obvious effects. The minimum free energy of type 4 was -28.3 kcal/mol on average, and the values of each tRNA were quite different, some were above the average value and some were below. Therefore, when the structural changed at the 3'end and the variable region coexist, multiple in uences may be caused. Moreover, took the value of normal structural tRNAs (△G = -26.5 kcal/mol) as a reference, type 1 and type 2 were much higher than the reference value; type 3 was lower than the reference value. As can be seen, various changes in the structure of tRNA can impact its stability.
Gymnosperm tRNA were evolved from multiple common ancestors Constructed a phylogenetic tree used the maximum likelihood method to discuss the evolutionary relationship of all gymnosperm tRNAs ( Figure. 3). Phylogenetic tree contained 2 large clusters and 32 small groups. Cluster I was much larger than cluster II as can be seen contained 28 groups while Cluster II (GUG) had the same case. We can also see from the phylogenetic tree that tRNA Phe (GAA) and tRNA Ile (GAU) were grouped separately, they each occupied a small branch instead of grouping together with the other types of tRNAs.
The rate of transition was higher than transversion Transition refers to a change from one purine to another purine (A to G or G to A) or one pyrimidine to another pyrimidine (C to U/T or U/T to C). Transversion refers to a change from one purine to a pyrimidine (A or G to U/T or C) or the opposite (U/T or C to A or G) [23]. Analyzing patterns of base mutations can help understand the molecular basis of evolution. Table 6 showed the transition and transversion rates for each tRNA as well as the overall level of gymnosperm investigated. As can be seen from

Duplication and loss events of gymnosperms chloroplast tRNA
After a gene duplication event, a copy of each replicated gene pair tends to happen a loss event. And the gene loss events have been happening all the time [24]. We calculated the duplication and loss events of gymnosperms chloroplast tRNA ( Figure. 4). The results showed that 1,333 genes were duplicated whereas 3,657 genes were found to be lost. There were 314 genes happened conditional duplication events. The loss events were far more than duplication events, majority of chloroplast tRNAs underwent loss events during the course of evolution.

Discussion
Distribution of tRNA Analysis of gymnosperm chloroplast tRNAs revealed that tRNA genes were conserved whether in quantity or composition. The number of tRNA genes in the chloroplast genome of each species was not much different, and the rules of the frequency of each tRNA gene were basically the same, with only slight differences. Some tRNAs may be lost occasionally in a few species. For these missing tRNA genes, there may be tRNA genes from the nucleus or other organelles to replace their functions [25]. It has been found that tRNA Ser , tRNA Arg and tRNA Leu always have higher occurrence frequency in chloroplast genome of gymnosperms. Also, the sequence length of tRNA Ser and tRNA Leu can be obviously longer due to the different nucleotides in the variable region. And it has been proven that tRNA Leu had a large variable region [26]. The main function of the variable region of tRNA has not been thoroughly studied. However, it has been shown that larger variable regions can increase the a nity of tRNA for ribosomes and keep the tRNA-ribosomal complex stable in various environments to promote good interaction between tRNA and ribosome [27]. This may indicate that these types of tRNA were more commonly used in plant chloroplasts and were associated with more biological processes.
Suppressor tRNA was a mutated form of tRNAs, it read mRNA in a new way and allow insertion of appropriate amino acids at the mutation site of the protein-coding gene to suppresses the phenotypic effect of coding mutation, which affect the production of functional cellular protein [28, 29,30,31]. So suppressor tRNA was not found in the chloroplast genome of gymnosperms. At the same time, selenocysteine was completely lacked. Selenocysteine was a kind of atypical amino acids [32]. It was the 21st amino acid in ribosome-mediated synthesis of proteins with a codon of UGA. And selenocysteine was found in both prokaryotes and eukaryotes [33]. However, selenocysteine was an oxygen-labile amino acid and had certain toxicity [34], so it wasn't found in chloroplast genome of gymnosperms. In addition, it was found that there was at least one tRNA Met and one tRNA fMet in each species, both corresponding to the anti-codon CAU. It is currently known that tRNA fMet is necessary for the initiation of the protein translation process in prokaryotes [35,36,37,38]. The initiator tRNA was fairly conservative at all times [39]. And in plants, tRNA fMet (CAU) and tRNA Met (CAU) were both indispensable [40]. Interestingly, tRNA Met and tRNA Ile contained the same anti-codon CAU. There was a complex relationship between their identi cation and matching with codons. Previous studies [41,42,43] have shown that, in bacteria, when the C nucleotide in the anti-codon CAU was modi ed, tRNA Ile can recognize isoleucine; while the unmodi ed tRNA Ile with the CAU anti-codon will interact with methionine. And it was proved that the same situation exists in plant chloroplasts [40]. These events can be explained by the prokaryotic origin of the chloroplast.

Distribution of anti-codon
There were totally 64 codons in the genetic code, 61 of them can encode amino acids and 3 were stop codons, but they usually did not all show up together. In gymnosperm chloroplast tRNA genes, there were only 34 kinds of anti-codons found. Some anti-codons occured in every species, and some occured only occasionally in a few species. These 34 kinds of anticodons were able to perform all 61 kinds of anticodons that were responsible for all protein translation in the chloroplast. One reason is the degeneracy of genetic code, in other words, the "Wobble Hypothesis". The rst and second bases of the codon can be rmly paired with the anti-codon, but the third base can form a non-Watson-Crick base pair with the anti-codon [44]. So for some tRNA species, it corresponded to multiple anti-codon types, that is, one amino acid can be carried by multiple tRNAs. Substitutions in protein-coding genes were usually distributed according to the codon structure, and substitutions often occured at the third position of the codon. Moreover, multiple anti-codons corresponding to one tRNA have the same "evolution potential" [45,21]. Another reason is the organism's preference for codon usage. The use of synonymous codons was non-random and had a certain preference, which was mainly determined by the translation process [46]. There was a strong correlation between codon use and tRNA content, and the selection pattern of codon showed a high conservatism in the evolutionary process. Genes with high expression levels often have codons corresponding to more abundant tRNA species [47,48,49]. That is also to say gene expression levels were largely related to codon usage preference [50,51]. This is consistent with the results in Table 1 and Table 2: the total frequency of codons contained in tRNA Ser , tRNA Arg , and tRNA Leu was higher. The selectivity of codons in organisms was due to the fact that when the tRNA usage was high, the presence of richer codons can greatly reduce the risk of depleting the translation mechanism [52].
tRNA secondary structure was very conservative The secondary structure of tRNA was a shape of clover leaf, with the acceptor arm containing 7 nucleotides, the D-arm containing 3-4, the D-loop containing 4-12, the anti-codon arm containing 5, the anti-codon loop containing 3, the variable region containing 4-23, the Ψ-loop containing 5, and the Ψ-arm containing 7 [53,54]. However, a few of the chloroplast tRNAs in gymnosperms were different, and not all tRNAs fully conform to the traditional secondary structure. Moreover, the difference in the number of nucleotides in different regions of tRNA was largely related to the type of tRNA, and even varies with the corresponding anti-codon. For example, in the variable region, tRNA Gly (GCC) contained 4 nucleotides, but tRNA Gly (UCC) contained 5 nucleotides (Table 3). Not only the number of nucleotides, but also the composition of nucleotides in different regions, have similar rules. It can be seen that the sequence of tRNA was very conservative, there were common nucleotides or common sequences in almost every region, and the conservation was largely related to the type of tRNA. From the results of tRNA sequence alignment, the Ψ-loop was the most conserved without any changes, and the Ψ-arm was also extremely conserved, only a small part of the tRNA was mutated in this region. And in Ψ-loop there was a common sequence U-U-C. And it was reported that in thermophilic bacteria, the conserved bases in the Ψ-loop were the determinant of the stability of the tRNA [55]. Anticodon loops were also highly conserved, with most of them contained seven nucleotides. Because the anticodon loop was the region that matches the codon of the mRNA, it required higher accuracy. Under the mediation of tRNA nucleotidyltransferase, adding a conservative C-C-A sequence to the 3' end of tRNA was necessary for tRNA maturation. Only when the CCA tail was obtained can it carry amino acids [56,57]. However, not all cells need to add CCA tails with the help of tRNA nucleotidyltransferase. It has been reported that in bacteria, CCA tails were sometimes included in tRNA gene templates. It has been reported that the templated 3'CCA sequence in bacteria was very common in the initial tRNA (tRNA fMet ), and it was also common in tRNA Tyr [58]. Coincidentally, in our gymnosperm plant chloroplast tRNA gene, tRNA fMet and tRNA Tyr of each species all carried the encoded 3'CCA sequence. This suggests that during the evolution of chloroplast, it retained part of the prokaryotic translation mechanism. In addition, the biggest in uencing factor of protein synthesis was the initiation of translation [59,60], so the templating of 3'CCA can greatly improve the rate of protein expression, because it accelerates the maturation of tRNA.

Phylogenetic relationships
In phylogenetic analysis of all tRNA genes, it was found that tRNA Leu (CAA) was the evolutionary ancestor of them. The same results were obtained in the phylogenetic analysis of chloroplast tRNA in monocots In thermophiles, tRNA folding undergoes adaptive changes to improve its stability. That is, changes in tertiary structure can have a certain effect on the stability of tRNA [67]. In this study, there were some changes in the structure of tRNA, which can be roughly divided into four categories, and there were obvious rules in the corresponding minimum free energy values. Compared with the normal structural tRNAs, the structural changes will have two effects on the minimum free energy of tRNA (make it higher or lower). It is reported that changes in the acceptor arm will increase the free energy of tRNA [68]. This is consistent with the results of this study: the free energy is higher when the acceptor arm is lacking or redundant nucleotides are present. tRNAs with large variable regions were not rare, and even large variable regions have evolved into conserved structures in some types of tRNAs, such as tRNA Leu .
So it indicates this kind of structural change greatly reduced the free energy of tRNA, which improved the stability of tRNA structure. We have learned from table 1 that the occurrence frequency of tRNA Leu was relatively high, which may indicate that this kind of structural change of tRNA Leu was bene cial to its utilization by plants.

Evolution of substitution rate
There were 8 forms of transversion and 4 forms of transition. From the perspective of probability, transversion should be more frequent, but the statistical results show a high transition rate. This phenomenon was called "transition bias" [69]. The reason is that the transition has less effect on the protein than the transversion [70]. Because conversion was the substitution of bases of the same type, and inversion was the substitution of bases of different types. Within the respective family members of purines and pyrimidines, there was little structural difference, while the structural differences between purines and pyrimidines were large. So transition has less effect on protein structure. In addition, the transversion of tRNA Asp was 0. One possibility is that it has not undergone transversion during its evolution. Another possibility is that if a transversion occurs in this gene, the synthesis of this tRNA will be terminated, resulting in an undetectable transversion rate. Overall, the chloroplast tRNA of gymnosperms mainly underwent base transversion during evolution.

Evolution of duplication and loss
In plant genomes, gene duplication and loss occur very frequently and they were important factors in evolution [71,72,73]. Throughout the evolutionary history of chloroplasts, the size of its genome has been shrinking, and it continues to experience gene loss events [74]. Also, it can be seen from Table 1 and Table 2 that some tRNAs were absent in certain species, and nearly half of the anti-codons were also absent. Combined with the analysis of tRNA gene duplication and loss, it can be concluded that the chloroplast tRNA gene of gymnosperm mainly experienced loss events during evolution.

Materials And Methods
Acquisition of chloroplast tRNA genes and analysis of secondary structure Chloroplast genomes of 54 gymnosperms (Table S1) were downloaded from the public database of the National Center for Biotechnology Information (NCBI, https:// www.ncbi.nlm.nih.gov/). Annotation of the chloroplast genomes and extraction of tRNA genes were carried out in geneious [75]. All the sequences obtained of gymnosperms chloroplast tRNA genes were uploaded to the tRNAscan-Se server to predict the secondary structure and record other related results [76]. The free energy calculation of tRNAs with structural changes were performed using the RNAalifold webserver with default parameters.

Multiple sequence alignment
All tRNAs were classi ed according to their different types to nd the consensus sequences of each regions. Similarly, the consensus sequences of each regions were observed at the overall level of tRNA.
Multiple sequence alignment of tRNA gene was performed in the Multalin server [77]. All of the sequences, were used in the alignment analysis with the following parameters in FASTA format; sequence input format, auto; display of sequence alignment, colored; alignment matrix, Blosum61-12-2; gap penalty at opening and extension, default; gap penalty at extremities, none and one iteration only, none; highest consensus value, 90% (default); whereas, low consensus value, 50% (default). In the displayed alignments, red indicates a similarity/conservation of 90% or more; blue indicates a sequence conservation less than 90%. Alignments displayed in black indicates no conservation.

Construction of phylogenetic tree
To reveal the phylogenetic relationship of all the tRNAs, a phylogenetic tree was constructed using

Analysis of transition and transversion
The tRNA gene sequences used to construct the phylogenetic tree were also used to calculate the rate of transition and transversion. All tRNA gene sequences were classi ed according to different types, and then calculate the transition and transversion rates respectively. Similarly, the same calculations were performed at the overall level of tRNA. Use the following parameters to calculate the rates of transition and transversion: analysis, substitution pattern estimation (ML); tree to use, automatic (neighbor-joining tree); statistical method, maximum likelihood; substitution type, nucleotide; model/method, Kimura2parameter model; rates among sites, Gamma distributed (G); no. of discrete Gamma categories, 5; gaps/missing data treatment, partial deletion, site coverage cutoff 95%, and branch swap lter, very strong.

Analysis of gene duplication and loss
To calculate duplication and loss events of tRNA genes, the phylogenetic tree of the tRNA genes and the species tree need to be reconciled. Species tree of 54 gymnosperms was constructed in NCBI taxonomy server (https://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi). Reconcile the species tree and gene tree in software Notung2.9 [79].

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and material
All the genomic tRNA sequences used in this study a can be found at NCBI with accession numbers in Table S1.
Declaration of interest