Identification of SPL genes in C. quinoa
A total of 23 CqSPL genes were identified in quinoa using two BLAST methods. These were named CqSPL1-CqSPL23 based on their chromosome number (Additional file 2: Table S1). The general characteristics of all CqSPLs, including coding sequence length, molecular weight (MW), isoelectric point (pI), and subcellular localization, were determined using CELLO version 2.5 (http://cello.life.nctu.edu.tw/).
Among the 23 CqSPL proteins, CqSPL11 and CqSPL12 were the smallest with each containing only 119 amino acids. In contrast, CqSPL17 was the largest, and contained 1190 amino acids. Protein molecular mass ranged from 21.3 kDa (CqSPL12) to 132.135 kDa (CqSPL17), and pI values ranged from 5.74 (CqSPL15) to 10.24 (CqSPL1 and CqSPL12), with a mean of 6.69. We also found that four of the 23 CqSPL proteins contained the ANK domain. Subcellular localization results showed that all CqSPL proteins were located in the nucleus, with seven also present in the endoplasmic reticulum, eight in the cytoplasm and plasmid, nine in the chloroplast, and one (CqSPL9) in the plasmid (Table S1). We also found that C. quinoa contained more SPL genes (23) than A. thaliana (15), S. lycopersicum (15), V. vinifera (17), or S. bicolor (19), but less than O. sativa and Z. mays, each of which has 29 SPL genes [37,38,39,40].
Multiple sequence alignment, phylogenetic analysis, and classification of CqSPL proteins
The 23 CqSPL proteins were then divided into eight phylogenetic clades (groups 1–8) based on the previously proposed classification method. Their consensus with the classification groups of Arabidopsis SPL proteins suggests that SPL genes are strongly conserved during molecular evolution (Fig. 1; Additional file 2: Table S1).
Among the eight subfamilies, subfamily II had the most members (6 CqSPLs), while subfamily VI contained only one CqSPL. Subfamilies I, III, V, and VIII had two CqSPL genes each, and subfamilies IV and VII each contained four CqSPLs. The phylogenetic tree also showed that some CqSPLs clustered closely with AtSPLs (bootstrap support ≥ 70) (Fig. 1), which suggests that these proteins might be orthologous and therefore may possess similar biological functions.
Multiple sequence alignment of AtSPLs with the eight CqSPL subfamilies
Previous studies have reported that all SPL proteins contain conserved SBP domains. This includes two zinc fingers (Zn 1 and 2) and a bipartite nuclear localization signal (NLS) motif. The basic region consists of 14 conserved amino acids in a span of 70–80 amino acids (Fig. 2, Table S1). In the present study, only subfamily I was found to be not fully conserved between C. quinoa and Arabidopsis. The Zn-1 (Cys3His-type) finger of CqSPL6 (subfamily I) lacked a Cys residue, and the Zn-2 (Cys2HisCys-type) finger from the same protein lacked C2H; these sequences are still conserved in Arabidopsis (Fig. 2). Conversely, the NLS motif was relatively conserved in quinoa but contains a mutation in one of the R’s in the RRRK sequence located at the C-terminus of the SBP domain in Arabidopsis. Finally, we found that the SBP domains of Arabidopsis and C. quinoa were very alike and therefore highly conserved, which suggests that the SBP structural domain was established at an early stage in plants.
Conserved motifs and structural analysis of CqSPL genes
The exons and introns of CqSPL genes were identified by comparing them with their corresponding genomic DNA sequences. These results revealed that the 23 CqSPL genes contained different numbers of exons, ranging from 3 to 17. We also found that the SBP domain was present in most (17 or ~ 69.5%) CqSPL genes (Fig. 3, Additional files 2 and 3: Tables S1 and S2). Furthermore, CqSPL1, CqSPL12, and CqSPL18 showed identical intron and exon structures, each containing three exons and two introns each (Fig. 3B). Six CqSPL genes had four introns, while CqSPL13 and CqSPL17, both of which belong to subfamily II, had the most introns (16) (Fig. 3A, B). Generally, we found that CqSPL genes from the same subfamily had similar gene structures, but subfamily II showed greater differences in the number of introns. This may be due to evolution for more diverse functional roles.
Further structural analysis of CqSPL genes identified ten diverse motifs (denoted motifs 1–10). As shown in Fig. 3C, motifs 3 and 4 were widely distributed and were located adjacent to each other in the CqSPLs. CqSPL genes from the same subfamily usually possessed similar motif compositions. For instance, subfamily I genes contained motifs 2, 3, 4, 6, 7, and 9 (except for CqSPL13), while subfamily II contained all motifs (1–10). We also found that subfamilies III, IV, V, VI, VII, and VIII all contained the same motifs (1, 3, and 4). Furthermore, some motifs were found only in specific positions. For example, motifs 3 and 7 were always found at the start and the end of the series of unique motifs, while motif 1 was always located between motifs 3 and 4 in subfamily I (Fig. 3C, Table S2). In general, we found that genes from the same subfamily had similar structural compositions and clustered together, a finding that was consistent with the classification based on the phylogenetic tree.
Chromosomal distribution and gene duplication of CqSPL genes
Using the latest genome database, our analysis of the chromosomal localization of SPL genes demonstrated that the 23 CqSPL genes were unevenly distributed on chromosomes (Chr)1 to 18 (Fig. 4, Additional file 4: Table S3). Each SPL gene was named based on its physical location on chromosomes (Chr) 1 to 18. Conversely, CqSPL genes were not found on Chr2, Chr4, Chr5, Chr13, Chr17, and Chr18. In addition, we also found that Chr11 contained the most CqSPL genes (four or ~ 17.39% of the total), followed by Chr6, Chr7, and Chr14, which contained three (~ 13.04%) and Chr8 and Chr10, which both contained two (~ 8.70%) CqSPL genes. Finally, Chr1, Chr3, Chr9, Chr12, Chr15, and Ch16 each contained a single CqSPL gene (~ 4.35%). Almost all SPL genes were distributed at one of the ends of the 23 chromosomes; however Chr7 was an exception. Only one SPL gene duplication event was evident in C. quinoa, which featured CqSPL16 and CqSPL17 on Chr 11 (Fig. 4, Table S3).
Gene duplication events, which mainly include tandem repeat events and segmental duplications, play an essential role in gene amplification and the generation of new functions [41]. Tandem repeat events refer to the co-occurrence of two or more genes within a chromosomal region of ~ 200 kb [42]. Therefore, we performed a duplication event analysis of CqSPL genes to explore the evolutionary conservation of this gene family. We found that the quinoa genome exhibited seven pairs of duplicated fragments but no tandem repeat events (Fig. 5, Additional file 5: Table S4). The 14 paralogs that resulted from the seven pairs of duplicated fragments were denoted LG1-14, and their existence suggests an evolutionary relationship among the CqSPL genes. LG6 had the most CqSPLs (n = 3), followed by LG7, LG10, and LG14 (n = 2 each), while LG1, LG3, LG8, LG9, and LG14 each contained only one. As expected, all genes were linked within their subfamilies. Subfamily II had the most linked genes (e.g., four SPL genes), while subfamilies III, IV, V, VII, and VIII had two SPL genes each (Table S4). These results showed that some CqSPL genes may have been produced during fragment duplication and that these duplication events may have acted as a main evolutionary driver of the neofunctionalization of CqSPL genes.
Evolutionary analysis of the CqSPL and SPL genes of different species
We selected three dicotyledonous plants (Z. mays, O. sativa, and S. bicolor) and three monocotyledonous plants (A. thaliana, S. lycopersicum and V. vinifera) for comparisons of SPL genes with CqSPLs. We used sequence data from the 23 CqSPLs and the SPL genes from the six other plants to construct a phylogenetic tree with ten conserved motifs (identified by the MEME web server) using the NJ method implemented in Geneious R11. The CqSPL genes exhibited an uneven distribution in the phylogenetic tree because genes from the same subfamily have the same motifs and therefore cluster together. Almost all SPL genes from the seven plants studied here contained motifs 1, 2, 4, and 5, but the first subfamily in quinoa (CqSPL6 and CqSPL15) did not (Fig. 6, Additional file 2: Table S1). Subfamilies I and II contained the most diverse motifs, and motifs 10 and 7 were almost always distributed at the beginning and the end of the motif patterns, respectively. Meanwhile, we also found that motif 9 was always distributed at the end of the pattern in subfamilies III, IV, VII, and VIII. In conclusion, we found that CqSPL genes from groups I and III showed a high degree of homology with SPL gene clusters from S. lycopersicum. In contrast, most SPL genes in other groups clustered with SPLs from A. thaliana, S. lycopersicum, and V. vinifera, implying that they may be closely related and may therefore have similar functions.
To further understand the phylogenetic relationships among the SPL genes, we constructed comparative syngeneic maps of quinoa and with the six other representative species. The 23 CqSPL genes showed collinear relationships with various SPLs found in A. thaliana (15), S. lycopersicum (15), V. vinifera (17), S. bicolor (19), O. sativa (29), and Z. mays (29) (Additional file 6: Table S5). The number of identified homologous pairs between quinoa and Z. mays, O. sativa, S. bicolor, A. thaliana, S. lycopersicum, and V. vinifera were 3, 3, 6, 16, 20, and 25, respectively (Fig. 7, Table S5).
We found at least one gene from each of the six plants that was collinear with an CqSPL, such as CqSPL21, which was collinear with Solyc05g015840/EER97011/AT5G50670.2/VIT_14s0068g01780/BGIOSGA005075/Zm00001d021056. This suggests that these orthologous genes were more highly conserved before divergence. We therefore speculate that they might have played an essential function in the evolution of the quinoa SPL gene family. Interestingly, some gene pairs collineating with 12 CqSPL genes were identified in A. thaliana, S. lycopersicum, and V. vinifera and not in S. bicolor, O. sativa, and Z. mays. This suggests that these orthologous pairs might have been formed via gene duplication events during the differentiation of dicotyledonous and monocotyledonous plants.
Expression patterns of CqSPL genes in different plant organs
The relative expression levels of 15 representative genes (selected from the eight subfamilies) was then analyzed in four organs (root, stem, leaf, and flower) by qRT-PCR to evaluate the potential function of CqSPL genes. We found that the CqSPL genes exhibited different expression patterns in roots, stems, leaves, and flowers, suggesting that these genes might play different regulatory roles. Three genes (CqSPL3, CqSPL7, and CqSPL19) showed the highest expression levels in stems, while eight genes (CqSPL2, CqSPL5, CqSPL6, CqSPL9, CqSPL11, CqSPL14, CqSPL15, and CqSPL20) showed the highest expression levels in leaves. Finally, CqSPL1, CqSPL12, CqSPL18, and CqSPL20 were highly expressed in flowers (Fig. 8A) (p < 0.05). Most genes from the same subfamily exhibited similar expression patterns, suggesting that their functions might also be similar. In general, we found that CqSPL genes were expressed in root tissue to a lesser extent than in stems, leaves, or flowers. Therefore, we speculated that SPL genes might be more closely associated with stem, leaf, and flower development. The qRT-PCR analysis also showed differential expression patterns of SPL genes in different tissues and provides preliminary confirmation of the biological functions of SPL genes in quinoa.
Next, we reasoned that some CqSPLs might regulate fruit development of quinoa, thereby affecting its nutritional composition and development rate [3, 4]. We then analyzed the expression of 15 CqSPL genes at five different post anthesis intervals (i.e., 7 DPA, 14 DPA, 21 DPA, 28 DPA, and 35 DPA) to identify genes that may potentially regulate genes related to fruiting. Our results showed that most CqSPL genes exhibited different expression patterns at the five stages of fruit development. We found a significant increase in the expression of two genes (CqSPL2 and CqSPL15) and a decrease in the expression of another two genes (CqSPL7 and CqSPL18) in quinoa fruit. Interestingly, we also found that CqSPL1, CqSPL3, CqSPL5, CqSPL11, and CqSPL20 showed the highest expression on day 21 of fruit development, while the expression of most CqSPL genes (i.e., CqSPL5, CqSPL11, CqSPL12, CqSPL14, CqSPL18, CqSPL19, CqSPL19, and CqSPL20) was the highest at 28 days (Fig. 8C) (p < 0.05). These findings also demonstrated that SPL genes play an essential role in fruit development, and provides a theoretical basis for studying the nutritional value of quinoa. Furthermore, we also observed notable correlations between patterns of CqSPL gene expression (Fig. 8). In general, we observed positive correlations between the expression levels of most CqSPL genes. However, we also found significant negative correlations between the expression levels of several CqSPL genes, such as CqSPL6 with CqSPL21/CqSPL1 and CqSPL1 with CqSPL9 (p < 0.05).
Expression patterns of CqSPL genes under abiotic stress conditions
To determine whether different abiotic stresses affected the expression of CqSPL genes, we then evaluated the expression of 15 CqSPL genes in root, leaf, and stem tissue after subjecting plants to one of six abiotic stress treatments. Our results showed that some CqSPL genes were significantly up-regulated, while others were significantly downregulated, under different stress treatments. Most CqSPL genes also showed significant differences in expression levels among different tissues, and this effect often increased with treatment time, depending on the stress treatment [43]. For example, the expression of most SPL genes was up-regulated by cold stress treatment in stems, and the expression of CqSPL11 and CqSPL12 genes was initially up-regulated but later downregulated in roots, leaves, and stems. Moreover, in stems under flooding stress, CqSPL1 and CqSPL5 were significantly up-regulated, while CqSPL2 was significantly downregulated. In general, most genes exhibited different patterns in plants subjected to different treatments and were significantly downregulated during the early phases of the treatments. CqSPL1, CqSPL7, CqSPL5, CqSPL18, and CqSPL20 demonstrated similar expression patterns under different conditions. Moreover, we also found that in all tissue types many SPLs were up-regulated after prolonged treatment times, indicating that their expression can be rapidly inhibited by abiotic stress. However, the expression patterns of some SPLs, including CqSPL2, CqSP19, and CqSPL20, showed the opposite trend. For example, their expression was up-regulated by heat stress but downregulated by cold stress in stem samples (Fig. 9) (p < 0.05). Notably, we found that CqSPL1 was highly expressed in all plant tissues under all six stress treatments. Thus, it may be generally responsible for abiotic stress responses in quinoa.
The expression patterns of CqSPL genes showed instances of coordinated expressions in response to several abiotic stress treatments (Fig. 9B). Moreover, we observed positive correlations between the expression levels of most CqSPL genes. For example, nine genes (i.e., CqSPL12, CqSPL15, CqSPL2, CqSPL3, CqSPL18, CqSPL6, CqSPL19, CqSPL11, CqSPL9, and CqSPL14) were significantly positively correlated with each other, and CqSPL1 and CqSPL5 were also significantly positively correlated with each other. On the other hand, we also identified pairs of CqSPL genes (e.g., CqSPL5 and CqSPL20) whose expression levels were significantly negatively correlated (p < 0.05).