De novo assembly and analysis of Polygonatum cyrtonema Hua and identification of genes involved in polysaccharide and saponin biosynthesis

Background The investigation of molecular mechanisms involved in polysaccharides and saponin metabolism is critical for genetic engineering of Polygonatum cyrtonema Hua to raise major active ingredient content. Up to now, the transcript sequences are available for different tissues of P. cyrtonema, a wide range scanning about temporal transcript at different ages’ rhizomes was still absent in P. cyrtonema. Results Transcriptome sequencing for rhizomes at different ages was performed. Sixty-two thousand six hundred thirty-five unigenes were generated by assembling transcripts from all samples. A total of 89 unigenes encoding key enzymes involved in polysaccharide biosynthesis and 56 unigenes encoding key enzymes involved in saponin biosynthesis. The content of total polysaccharide and total saponin was positively correlated with the expression patterns of mannose-6-phosphate isomerase (MPI), GDP-L-fucose synthase (TSTA3), UDP-apiose/xylose synthase (AXS), UDP-glucose 6-dehydrogenase (UGDH), Hydroxymethylglutaryl CoA synthase (HMGS), Mevalonate kinase (MVK), 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (ispF), (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase (ispG), 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (ispH), Farnesyl diphosphate synthase (FPPS). Finally, a number of key genes were selected and quantitative real-time PCR were performed to validate the transcriptome analysis results. Conclusions These results create the link between polysaccharides and saponin biosynthesis and gene expression, provide insight for underlying key active substances, and reveal novel candidate genes including TFs that are worth further exploration for their functions and values. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08421-y.


Background
Polygonatum cyrtonema Hua (Asparagaceae) is a renowned traditional Chinese herb, and is also an edible plant. It has been widely applied for the treatment of many diseases such as dizziness, coughs et al. [1]. In Chinese Pharmacopoeia, "polygonati rhizoma" is often prescribed as the dried rhizome of Polygonatum cyrtonema Hua, Polygonatum kingianum Coll. et Hemsl and Polygonatum sibiricum Red [2]. A variety of medicinal effective ingredients have been isolated from "polygonati rhizoma" including Polysaccharides, saponins, flavonoids et al., and these effective ingredients Open Access *Correspondence: 13027865861@163.com 2 Guizhou Key Laboratory of Propagation and Cultivation on Medicinal Plants, Huaxi, Guiyang, Guizhou 550025, P. R. China Full list of author information is available at the end of the article exhibit a variety of vital pharmacological activities such as antioxidant, immunomodulatory, and antiinflammatory et al. [3][4][5]. The previous research demonstrated the content of these effective ingredients including total polysaccharides and total saponins in P. cyrtonema plants changes with growth environment, cultivation technique, and growth years [6,7]. This is of great significance for recognizing the biosynthesis and metabolism of polysaccharides and saponins.
A large volume of transcriptome, proteome and metabolomic have been executed in the post-genomic era [18]. Particularly, cause of the exact quantification of gene expression when lacking a reference genome, transcriptome sequencing (RNA-Seq) has been verified as the most useful, cost-effective technique for the research of metabolic pathways and function gene identification of effective ingredients [19].
In this study, we conducted a comprehensive analysis of the transcriptomes for different growth years rhizome of P. cyrtonema and identified plentiful candidate genes related to polysaccharide and triterpene saponins biosynthesis. The quality of our dataset was verified through quantitative real-time PCR (qRT-PCR). Our results provide a foundation for future researches that tackle the molecular mechanisms of polysaccharide and triterpene saponins biosynthesis in this species.

Total polysaccharide content of P. cyrtonema samples
We extracted polysaccharides from the rhizomes with different growing years of P. cyrtonema. Results reveal that total polysaccharide content increased with the developmental years, the value was highest in three-year rhizomes (16.004%), subsequently decreased from three-year rhizomes to four-year rhizomes. The lowest value emerges in one-year rhizomes (7.76%) (Additional file 1: Fig. S1).

Total saponin content of P. cyrtonema samples
Total saponin from the rhizomes with different growing years of P. cyrtonema were extracted. Results reveal that total saponin content increased with the developmental years, the value was highest in threeyear rhizomes, subsequently decreased from threeyear rhizomes to four-year rhizomes (Additional file 2: Fig. S2).

Illumina sequencing and de novo transcriptome assembly
The results of sequencing data quality were presented in Additional file 3: Table S1. All these data sets were characterized by Q30 ≥ 94.79%. A total of 62,635 unigenes were generated. These unigenes had a mean length of 1007.11 bp and an N50 value of 1456 bp; 34.14% (21,388) and 63.47% (39,752) of these exceeded 1000 bp and 500 bp in length, respectively (Additional file 4: Fig. S3).  Fig. S4). A total of 18,283 of these unigenes were then matched with one or more GO terms and comprise 50 functional groups (Additional file 6: Fig. S5). We found that 'cellular process' and 'metabolic process' were the most abundant categories within biological processes, while within the molecular function term, 'binding' and 'catalytic activity' were the most abundant.  Unigenes with FPKM> 1 was counted in each tissue. The results of this comparison showed that average 44,403, 43,030, 41,245 and 45,076 unigenes were expressed in one-year, two-year, three-year, four-year rhizome samples, respectively (Fig. 1a). Gene expression level was highest in four-year rhizome compared with other rhizomes (Fig. 1b).

Identification of genes involved in polysaccharide biosynthesis
To comprehend the most noteworthy biological processes in P. cyrtonema, a total of 12,015 unigenes were annotated and allocated to 125 pathways (20 subcategories) (Additional file 7: Fig. S6 and Additional file 8: Table S2). The 'carbohydrate metabolism' subcategory involved in14 pathways with the largest number of unigenes (235) included glycolysis/gluconeogenesis metabolism. Besides, 588 unigenes were corresponding in polysaccharide biosynthesis pathways, including amino and nucleotide sugar metabolism, fructose and mannose metabolism, glycolysis/gluconeogenesis, and pentose and glucuronate interconversions (Fig. 2a). A total of 10 pathways were allocated to the biosynthesis of other secondary metabolites and the amplest unigenes within this set were marked within the phenylpropanoid biosynthesis pathway (Fig. 2b).

Identification of genes involved in saponins biosynthesis
In order to enhance our understanding of triterpene saponins biosynthesis, we also annotated unigenes involved in terpenoid backbone biosynthesis (Ko00900) and carotenoid biosynthesis (Ko00906) pathways based on the KEGG database. A total of 56 unigenes encoding key  (Table 3). These data enabled the identification of genes encoding enzymes involved in triterpene saponins biosynthesis using the FPKM approach (Fig. 5).

Validation and expression analysis of genes encoding key enzymes
To validated the reliability of transcriptome sequencing data, the expression levels of genes encoding beta-fructofuranosidase (sacA), fructokinase (scrk), mannose-6-phosphate isomerase (MPI), phosphoglucomutase (PGM), UDP-apiose/xylose synthase (AXS), hydroxymethylglutaryl CoA synthase (HMGS), mevalonate diphosphosphate decarboxylase (MVD), isopentenyl-diphosphate delta-isomerase (IDI), farnesyl diphosphate synthase (FPPS) and squalene synthase (SS) et al. were tested by qRT-PCR assays The results revealed the qRT-PCR data for these 16 genes were basically consistent with the RNA-Seq data (Fig. 6, the numerical values of error bar is presented in Additional file 9: Table S3). Generally, the above results revealed that our transcriptome data were reliable for genes temporal expression analysis during the rhizome developmental processes in P. cyrtonema.

Identification of DEGs
DEGs were recognized in all different developmental rhizomes using FPKM values for unigenes. When one-year rhizomes were set as the control, and 8850, 13,361 and 23,107 different expressed genes (DEGs) (p-value< 0.05 and fold change> 1.5) were identified at two-year, threeyear and four-year rhizomes, respectively. When twoyear rhizomes were set as the control, a total of 9101 and 18,067 DEG were identified at three-year and four-year rhizomes, respectively. When three-year rhizomes were set as the control, a total of 23,332 DEGs were identified at four-year rhizomes (Fig. 7).

Discussion
P. cyrtonema is a well-known medical and edible plant, and it has a variety of biology activities such as anti-aging, nourishing yin, anti-inflammatory and immunomodulatory et al. [3][4][5]. Although polysaccharides and saponins are the significant effective constituents, however, up to now, genomic data is still unknown and only a copy of transcriptome data without biological duplications for three tissues of P. cyrtonema is available [20], that is obviously inadequate for demonstrating the molecule mechanisms of active constituents' biosynthesis such as polysaccharide and saponins. In this study, we obtained a more reliable and high-quality assembly result (unigenes with an average length of 1007.11 bp) than previous transcriptome data (mean length 710 bp) in P. cyrtonema, also enriches the types of gene expression data, and facilitate the selection of key candidate genes involved in polysaccharides and saponins biosynthesis, condense the number of candidate genes to be verified. A large number of unigenes participated in polysaccharide and saponins biosynthesis were identified (Figs. 4 and 5). For polysaccharide biosynthesis pathway, the genes encoding MPI, AXS, TSTA3, UER1, GALE and UGDH enzymes were high expressed in three-year rhizomes compared with other-year rhizomes, and these gene expression pattern is consistent with the accumulation pattern of polysaccharide with the rhizome development from one-year to four-year (Fig. 4, Additional file 1: Fig. S1), while the genes encoding HK and scrk demonstrate opposite pattern of expression against the accumulation of polysaccharide. Similar phenomenon was also observed in previous researches [20][21][22][23]. We speculate that MPI, AXS, TSTA3, UER1, GALE and UGDH are underlying key enzyme genes play vital roles in regulating the polysaccharide content of P. cyrtonema rhizomes and HK, scrk are mainly participate in other pathway such as sugar signaling, carbohydrate metabolism et al. [24]. For saponin biosynthesis pathway, the genes encoding HMGS, MVK, ispF, ispG, ispH and FPPS enzymes were high expressed in three-year rhizomes, and these gene expression pattern is consistent with the accumulation pattern of total saponin with the rhizome development (Fig. 5, Additional file 2: Fig. S2). It seems that MEP and MVA pathway all participated in the saponin biosynthesis [15,17].
Plenty of TFs have been isolated and verified participating in a diversity of plant biological processes including biosynthesis of polysaccharides, saponins and other secondary metabolism processes. In our results, A total of 380 candidate TFs were allocated to the AP2/ERF-ERF, WRKY, NAC, bHLH, C2H2, C3H and MYB-related families; these TFs probably play roles in regulating polysaccharide and saponins biosynthesis. Previous Researches revealed GubHLH3 positively regulates soyasaponin biosynthetic genes in Glycyrrhiza uralensis [25] and the bHLH transcription factors TSAR1 and TSAR2 regulate triterpene saponin biosynthesis in Medicago truncatula [26], A total of 85 candidate unigenes encoding bHLH TFs were identified, of which 18 and 9 were up-regulated in the threeyear rhizome compared with other-year rhizome, respectively (Table 4). Over-expression of AtMYB46 gene can enhance mannan content of hemicellulose polysaccharides [27]. A total of 67 candidate unigenes encoding MYB TFs were recognized, of which 19 and 4 were up-regulated in the three-year rhizome, respectively. These up-regulated unigenes are vital for subsequent studies aimed at exploring the regulation of polysaccharide and saponins biosynthesis in P. cyrtonema. The characterization of these unigenes will be beneficial for realizing the molecular mechanisms underlying polysaccharide and saponin biosynthesis.

Ethics statement
Experimental materials were harvested across China, but the field studies did not involve endangered or protected species. This study was conducted at the in Guizhou Key Laboratory of Propagation and Cultivation on Medicinal Plants in Southwest China, Guiyang, China.

Extraction and determination of total polysaccharide and saponins
Total polysaccharides were extracted and detected from freeze-dried rhizomes samples of P. cyrtonema as described in Chinese Pharmacopoeia [2]. Total saponins were extracted and detected by colorimetry. Three repetitions have been done and a statistical analysis been performed by SPSS 22.0 software.

Total RNA extraction, cDNA library construction and sequencing
The total RNA of one-year, two-year, three-year, fouryear rhizomes with three biological replicates isolated using an E.Z.N.A. Plant RNA Kit (Omega Biotech Co. Ltd., USA) (Additional file 10: Table S4). RNA quality including integrity and concentration were evaluated

De novo assembly and unigene function annotation
Low quality reads were removed before data analysis and high-quality clean reads were used to assemble using Trinity software [29]. For the CDS sequences which had no hits in Blast, ESTScan was used for predicting [30]. According to sequence similarities, functional annotations for unigenes were executed and mapped to seven databases including NCBI nonredundant, Swiss-Prot, KEGG (Kyoto Encyclopedia of Genes and Genomes) protein databases, KOG database, eggNOG database, GO and Pfam database. In addition, GO functional annotations were also attained with Nr annotation using the Blast2GO (version 2.5.0) [31]. KEGG Orthology annotations were further conducted using BlastX algorithm against KEGG database.

Differential expression analysis
The quantitative expression level of unigenes for four rhizomes with different growth years were subjected using Expression Analyzer and DisplayER software (EXPANDER) [32]. The abundance of corresponding unigene transcripts were determined by the FPKM method. We compared unigenes that display differences in expression level between two rhizomes (i.e., one-year rhizome vs. two-year rhizome) using DESeq Software [33]. The FDR ≤ 0.001 and the fold change (FC) ≥ 2 were identified as DEGs.

Analysis of transcription factors (TFs)
For transcriptome data, in P. cyrtonema, the open reading frames (ORF) were determined by the getorf software [34]. Then we aligned these ORFs to all TF protein domains using the plant transcription factor database (PlnTFDB) via BLASTX (e-value≤1e − 5 ) [35].

Real-time PCR (qRT-PCR) analysis
Total RNA was isolated from P. cyrtonema rhizome (oneyear, two-year, three-year, four-year) using the E.Z.N.A. Total RNA Kit I (Omega, USA) and reverse-transcribed to cDNA with TaKaRa reverse transcription reagents (TaKaRa Bio, Dalian, China). The elongation factor 1-ɑ (EF1ɑ, TRINITY_DN27092_c0_g5_i1_1) genes were selected as endogenous references for normalization according to its expression level and stability in transcriptome data. Specific primers were designed by primer 3.0 (Additional file 11: Table S5). Real-time PCR was performed by QuantiNova Syb r Green PCR kit (Qiagen). The results of the target gene relative to the reference gene were calculated by the 2 -ΔΔCt method [36]. Data are presented as the mean ± standard deviation (SD) of three reactions performed in different 96-well plates. The data were analyzed using CFX Manager ™ v3.0.

Conclusion
A comprehensive transcriptome analysis of one-year, twoyear, three-year and four-year rhizome with three duplications in P. cyrtonema were executed and abundant genes and TFs related to polysaccharide and saponin biosynthesis and regulation were identified, respectively. In addition, adequate SSRs marker were founded in transcriptome data that provides a significant convenience for the identification of P. cyrtonema plant. We used qRT-PCR technology to validate the results of transcriptome sequence and our results play a vital role in illuminating the polysaccharide and saponin biosynthesis pathways and facilitate future researches involved in accumulation of secondary metabolism in P. cyrtonema.