Skip to main content

Transcriptome analysis of Clinopodium gracile (Benth.) Matsum and identification of genes related to Triterpenoid Saponin biosynthesis



Clinopodium gracile (Benth.) Matsum (C. gracile) is an annual herb with pharmacological properties effective in the treatment of various diseases, including hepatic carcinoma. Triterpenoid saponins are crucial bioactive compounds in C. gracile. However, the molecular understanding of the triterpenoid saponin biosynthesis pathway remains unclear.


In this study, we performed RNA sequencing (RNA-Seq) analysis of the flowers, leaves, roots, and stems of C. gracile plants using the BGISEQ-500 platform. The assembly of transcripts from all four types of tissues generated 128,856 unigenes, of which 99,020 were mapped to several public databases for functional annotation. Differentially expressed genes (DEGs) were identified via the comparison of gene expression levels between leaves and other tissues (flowers, roots, and stems). Multiple genes encoding pivotal enzymes, such as squalene synthase (SS), or transcription factors (TFs) related to triterpenoid saponin biosynthesis were identified and further analyzed. The expression levels of unigenes encoding important enzymes were verified by quantitative real-time PCR (qRT-PCR). Different chemical constituents of triterpenoid saponins were identified by Ultra-Performance Liquid Chromatography coupled with quadrupole time-of-flight mass spectrometry (UPLC/Q-TOF-MS).


Our results greatly extend the public transcriptome dataset of C. gracile and provide valuable information for the identification of candidate genes involved in the biosynthesis of triterpenoid saponins and other important secondary metabolites.


Clinopodium gracile (Benth.) Matsum (C. gracile), known as the tower flower, is a traditional Chinese herb, which belongs to Lamiaceae and grows on wasteland, roadsides, and hillsides [1, 2]. Approximately 20 Clinopodium species are found in Europe, Central Asia, and East Asia [3]. According to the results of previous studies, triterpenoid saponins in C. gracile exhibit several pharmacological effects, as these possess anti-inflammatory [4], anti-hepatoma [4, 5], cardioprotective [6], anti-tumor [7], and immunoregulatory [8] properties. Triterpenoid saponins were synthesized from two C5 isoprene units, isopentenyl pyrophosphate (IPP) and dimethylallyl diphosphate (DMAPP). These components were condensed in a sequential manner by prenyltransferases, resulting in the formation of prenyl diphosphates, such as geranyl pyrophosphate (GPP) and farnesyl pyrophosphate (FPP), which are further transformed into the carbocyclic skeleton of triterpenoid saponins by the action of squalene synthase (SS) and squalene epoxidase (SE). Finally, the backbone is chemically modified by cytochrome P450 monooxygenase (CYP450) and UDP-glycosyltransferase (UGT), resulting in the production of different types of triterpenoid saponins [9,10,11,12]. However, genes encoding key triterpenoid saponin biosynthesis enzymes in C. gracile are largely unknown.

Squalene synthase (SS; EC is a key bifunctional enzyme in the terpenoid biosynthesis pathway; SS first catalyzes the formation of presqualene diphosphate (PSPP) from two molecules of FPP and then converts PSPP to squalene with NADPH and Mg2+ ions [13, 14]. The level of SS gene expression shows a positive correlation with the amount of triterpenes [15]. Although SS cDNAs have been cloned and analyzed from a number of herbal plant species, such as Glycyrrhiza glabra [16], Siraitia grosvenorii [17], and Lotus japonicas [18], no sequence or structural information is available on the SS gene in C. gracile.

RNA sequencing (RNA-Seq) analysis has been used to capture both coding and non-coding sequences and to quantify gene expression not only in a heterogeneous mixture of cells, tissues, and organs but also at the whole organism level [19]. Thus, RNA-Seq is a powerful tool for acquiring gene expression information and is of great significance in the mining of functional genes, analysis of gene expression profiles, and discovery of genetic metabolic networks [20]. At present, transcriptome sequencing has been applied to medicinal plants of the Labiatae family, including Scutellaria baicalensis Georgi [21], Ocimum sanctum and Ocimum basilicum [22], Mentha piperita and Mentha arvensis [23], and Clinopodium chinense [24]; however, the use of RNA-Seq analysis has not been reported in C. gracile to date.

In this study, we performed deep transcriptome analysis of four different tissues (flowers, leaves, roots, and stems) of C. gracile plants and identified genes potentially involved in triterpenoid saponin biosynthesis. This work lays a foundation for further exploration of the molecular mechanism of triterpenoid saponin biosynthesis in C. gracile.


Total saponin content in different tissues of C. gracile

Saponins were extracted from approximately 0.1 g dried powder of flowers, leaves, stems, and roots of C. gracile. Total saponin content was the highest in leaves (0.29%), followed by stems (0.23%), flowers (0.21%), and roots (0.18%), with a standard error of 0.0005, 0.0004, 0.0033, and 0.0011, respectively. The results of analysis of variance (ANOVA) of C. gracile data showed that the differences among the total saponin content of the four tissues were statistically significant (Fig. 1, Additional file 1: Table S1).

Fig. 1

The total saponin content of leaves, stems, flowers, and roots of Clinopodium gracile (Benth.) Matsum plants. The X-axis indicates the type of C. gracile tissues, and the Y-axis indicates the total saponin content. Significant differences were determined by analysis of variance (ANOVA)

Analysis of triterpenoid saponins in C. gracile by UPLC/Q-TOF-MS

Qualitative analysis of triterpenoid saponin metabolites of C. gracile was conducted by UPLC/Q-TOF-MS. The results confirmed the presence of Buddlejasaponin IV in C. gracile. Additionally, Saikosaponin a, Clinoposaponin III, and Clinoposaponin V probably also exist in C. gracile, according to the retention time (tR), maximum ultraviolet absorption wavelength (λmax), molecular ion peak, and ESI-MS data (Fig. 2, Additional file 2: Table S2).

Fig. 2

MS/MS spectra of Saikosaponin a (a), Buddlejasaponin IV (b), Clinoposaponin III (c) and Clinoposaponin V (d) of C. gracile by UPLC/Q-TOF-MS in negative ion mode

RNA sequencing and the de novo assembly of C. gracile transcriptome

RNA-Seq analysis of C. gracile flowers, leaves, stems, and roots generated 43.81 Gb of high quality reads. Full-length transcripts were reconstructed, and a total of 128,856 unigenes, which refer to a uniquely assembled transcript or a cluster of genes that perform a particular function, were generated using Trinity and TGICL, with a mean length of 1354 bp and an N50 value of 2268 bp. Of the 128,856 unigenes, 47.37% (61,043 unigenes) were longer than 1000 bp, and 34.87% (44,929 unigenes) were longer than 1500 bp (Additional file 3: Figure S1).

Functional classification and expressed overview of unigenes

Of the 128,856 unigenes, 38.99, 57.16, 58.94, 72.57, 54.23, 54.96 and 55.67% mapped to GO, KEGG, KOG, NR, NT, Pfam, and SwissProt databases, respectively. Additionally, 99,020 (76.85%) unigenes were annotated in at least one public database, while a total of 25,975 (20.16%) unigenes were co-annotated in five databases (Table 1). Species distribution analysis showed that C. gracile unigenes showed the highest homology to Sesamum indicum sequences (40,961 unigenes; 43.81%), followed by Erythranthe guttata (14,690 unigenes; 15.71%) and Daucus carota subsp. sativus (8866 unigenes; 9.48%) (Additional file 4: Figure S2).

Table 1 Clinopodium gracile unigenes annotated using seven databases

The GO database divided 57.72% of unigenes into “cellular component”, 79.93% into “molecular function”, and 49.12% into “biological process”, and 50,235 unigenes were categorized into at least one GO term. Furthermore, “membrane part” (15,180 unigenes [43.72%]) and “cell” (12,907 unigenes [37.18%]) were the most enriched GO terms within the “cellular component” category; “catalytic activity” (24,182 unigenes [43.45%]), and “binding” (24,147 unigenes [43.48%]) were the most enriched GO terms under “molecular function”; “cellular process” (13,531 unigenes [43.45%]), and “biological regulation” (4852 unigenes [15.58%]) were the most enriched GO terms under “biological process” (Additional file 5: Figure S3).

A total of 80,140, 79,085, 96,382, and 73,919 unigenes were counted in the RNA-Seq data sets of flower, leaf, stem, and root tissues, of which 10,888, 8128, 7145 and 9085 unigenes with FPKM ≥10 showed high expression, 27,832, 26,890, 26,903 and 21,844 unigenes with FPKM = 1–10 showed medium expression, and 41,420, 44,067, 62,334 and 42,950 unigenes with FPKM ≤1 showed low expression, respectively (Fig. 3a) [25, 26]. The overall expression level of unigenes was the highest in flowers, followed by stems, leaves, and roots (Fig. 3b).

Fig. 3

Gene expression profiles in the four tissues of C. gracile plants. (a) unigenes expression distribution in all four tissues (leaves, flowers, stems, and roots). The X-axis represents the sample name, and Y-axis represents the transcript levels. The level of gene expression is indicated by varying color intensity: low expression (FPKM ≤1), medium expression (FPKM 1–10), and high expression (FPKM ≥10). (b) Expressed unigenes in the four tissues by a boxplot. The X-axis indicates the type of C. gracile tissues and the Y-axis indicates the log10 (FPKM + 1) values of unigenes. Significant differences were determined by the Kruskal–Wallis nonparametric test

Identification of candidate genes involved in the triterpenoid biosynthesis pathway

Gene function annotation using the KEGG database assigned 73,648 unigenes to 136 pathways (20 subcategories), 14 of which were related to the biosynthesis of secondary metabolites (Additional file 6: Figure S4 and Additional file 7: Table S3). A total of 1406 unigenes were enriched in the phenylpropanoid biosynthesis pathway (Fig. 4a). Terpenoid metabolism involves six pathways; the largest number of unigenes (360) mapped to “Terpenoid backbone biosynthesis,” followed by “Carotenoid biosynthesis,” “Ubiquinone and other terpenoid quinone biosynthesis,” “Diterpenoid biosynthesis,” and “Sesquiterpenoid and triterpenoid biosynthesis” (Fig. 4b).

Fig. 4

KEGG annotation of C. gracile unigenes (a and b) The number of unigenes involved in the biosynthesis of other secondary metabolites (a) and terpenoids (b)

A total of 523 unigenes were enriched in the “terpenoid backbone biosynthesis” (ko00900) and “sesquiterpenoid and triterpenoid biosynthesis” (ko00909) pathways. Additionally, 281 unigenes encoded key enzymes involved in these pathways, including 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR; 17 unigenes), 1-deoxy-D-xylulose-5-phosphate synthase (DXS; 36 unigenes), 1-deoxy-D-xylulose-5-phosphate reductase (DXR; 2 unigenes), 4-hydroxy-3-methylbut-2-en-1-yl diphosphate reductase (HDR; 11 unigenes), isopentenyl-diphosphate Delta-isomerase (IDI; 12 unigenes), SS (19 unigenes), squalene epoxidase (SE, 14 unigenes), and beta-amyrin synthase (β-AS; 32 unigenes). A total of 108 unigenes encoded 6 key enzymes are involved in the MVA pathway; 74 unigenes encoded 8 key enzymes of the MEP pathway; and 99 unigenes encoded 5 key enzymes involved in the conversion of IPP to β-Amyrin (Table 2). The expression level of unigenes encoding key enzymes involved in the triterpenoid biosynthesis pathways is shown in the heat map. Most of the unigenes encoding 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK) and HDR showed the highest expression in leaves, while most of the unigenes encoding acetyl-CoA acetyltransferase (AACT), hydroxymethylglutaryl-CoA synthase (HMGS), HMGR, diphosphomevalonate decarboxylase (PMD), SE, and β-AS showed the highest expression in roots. Unigenes encoding DXR and 4-hydroxy-3-methylbut-2-enyl diphosphate synthase (HDS) were highly expressed in either roots or leaves (Fig. 5).

Table 2 Distribution of triterpenoid saponin biosynthesis unigenes in four C. gracile tissues
Fig. 5

Proposed pathways for triterpenoid saponin biosynthesis in C. gracile. Expression levels of unigenes encoding enzymes that catalyze each step of the triterpenoid saponin biosynthesis pathway are shown. “CL” and “Un” indicates a cluster of transcripts and unigenes, respectively. F, flowers; L, leaves; S, stems; R, roots. The higher and lower expression level of unigenes are indicated in red and green, respectively

Three unigenes (CL12163–1, CL12163–4, CL6527–11) encoding the SS enzyme were identified, with a sequence identity of 88.14%. We analyzed the structure of the CL12163–4 unigene, as it showed the highest expression in leaves. The CgSS (C. gracile squalene synthase) ORF with 1272 bp is expected to encode a protein of 423 amino acids. The amino acid sequence of CgSS showed high sequence similarity to SS proteins from other herbs, namely, Bacopa monniera (ADX01171.1; 83.02%), Eleutherococcus senticosus (AEA41712.1; 77.91%), Panax ginseng (ACV88718.1; 78.14%), Panax notoginseng (ABA29019.1; 77.91%), and Panax quinquefolius (CAJ58418.1; 77.91%). In the multiple sequence alignment of these SS amino acid sequences, six domains (I–VI) and two aspartate-rich regions, important for the catalytic activity of SS enzymes, are highlighted. Two domains (III and IV) of SS amino acid sequences were highly conserved among plant species, three domains (I, II, and V) were fairly well conserved, whereas domain VI was the least conserved. The secondary structure of CgSS contained 18 alpha helixes, which were the main component of the SS enzyme (Fig. 6).

Fig. 6

Sequence alignment and secondary structure of CgSS. Amino acid sequences of SS enzymes were retrieved from GenBank. BmSS, Bacopa monniera (ADX01171.1); EsSS, Eleutherococcus senticosus (AEA41712.1); PgSS, Panax ginseng (ACV88718.1); PnSS, Panax notoginseng (ABA29019.1); and PqSS, Panax quinquefolius (CAJ58418.1). The six domains in SS protein are outlined in blue. DXXXD domains are indicated using ‘’. White letters on a red background represent identical amino acids, red letters on a white background represent similar amino acids, and black letters represent different amino acids

The 3D model of CgSS was constructed on the basis of the crystal structure of SS from Trypanosoma cruzi (PDB ID 3wca.3.A), which shares 46.02% sequence identity with CgSS [27]. Five domains (I–V) of CgSS are highlighted in different colors in Fig. 7a, and amino acids D77, D81, Y168, D213, and D217 comprising the active site of CgSS are indicated in Fig. 7b.

Fig. 7

Tertiary structure model and schematic diagram of the CgSS active site. a Cartoon display of the three-dimensional structure of CgSS and five conserved domains (I: red, II: green, III: blue, IV: yellow, V: magenta). b Schematic diagram showing the active site of CgSS (green ellipse) and five amino acid residues (D77, D81, Y168, D213, and D217; red dots).) The figures were performed using the Swiss Model ( and the PyMOL software based on the the crystal structure of Trypanosoma cruzi SS (Template 3wca.3.A, with sequence identity of 46.02%, the ranges from E34–S370)

Validation of RNA-Seq data by qRT-PCR

To independently verify the transcriptome data and differential gene expression among different tissues, six unigenes involved in triterpenoid saponin biosynthesis were selected for qRT-PCR analysis. The expression levels of Un 41,982 (DXR), CL10352–1 (HMGR), and Un 5223 (HMGR) were the highest in roots. Similarly, the expression levels of CL12163–4 (SS), and CL1648–1 (HDR) were the highest in leaves, while that of Un 17,275 (HMGR) was the highest in flowers. These data were consistent with the FPKM values of these genes determined from RNA-Seq analysis (Fig. 8).

Fig. 8

Quantitative real-time PCR (qRT-PCR) analysis of six unigenes encoding enzymes involved in the triterpenoid biosynthesis pathway in C. gracile. a Un 41,982 encoding DXR. b CL1648-1 encoding HDR. c CL10352–1 encoding HMGR. d Un 5223 encoding HMGR. e Un 17, 275 encoding HMGR. f CL12163–4 encoding SS. Blue bars show the results of the qRT-PCR analysis, and red lines show the FPKM values determined by RNA-Seq analysis. The left Y-axis indicates the relative expression level of the qRT-PCR detection gene, and the right Y-axis indicates the FPKM value in the RNA-Seq data. F, flowers; L, leaves; S, stems; R, roots. Data represent mean ± standard error (SE; n = 3)

Identification of DEGs

A total of 52,646 unigenes were co-expressed in all four tissues, whereas 3216 unigenes were expressed only in leaves (Fig. 9a). DEGs were identified based on the comparison of expression profiles of unigenes between leaves and the other tissues (flowers, stems, and roots) (Fig. 9b). A comparison of gene expression between leaves and flowers revealed 37,744 DEGs, of which 13,899 were up-regulated and 23,845 were down-regulated in leaves. Comparison of gene expression between leaves and roots revealed 59,154 DEGs, of which 31,166 were up-regulated and 27,988 were down-regulated in leaves. Comparison of gene expression between leaves and stems revealed 36,467 DEGs, of which 20,104 were up-regulated and 16,363 were down-regulated in leaves. A total of 239, 397, and 332 unigenes were up-regulated in leaves compared with flowers, roots, and stems, respectively (Table 3).

Fig. 9

Expression of unigenes in all four tissues of C. gracile. a The number of unigenes expressed in four tissues by veen diagram. b Number of DEGs identified in the leaves compared with roots, flowers, and stems of C. gracile. Genes highly expressed in leaves compared with roots, flowers, and stems were defined as “up-regulated,” whereas genes with lower expression levels in leaves were defined as “down-regulated”

Table 3 KEGG pathway annotations of terpenoid metabolic genes up-regulated in leaves compared with other tissues

The 37,744 DEGs identified in the leaves vs. flowers comparison were mainly enriched in “Indole alkaloid biosynthesis,” “Photosynthesis,” “Taurine and hypotaurine metabolism,” “Riboflavin metabolism,” and “Glycosphingolipid biosynthesis globo and isoglobo series” (Fig. 10a). The 59,154 DEGs identified in the leaves vs. roots comparison were primarily enriched in “Photosynthesis,” “Circadian rhythm plant,” “Photosynthesis antenna proteins,” “Brassinosteroid biosynthesis,” and “Biosynthesis of unsaturated fatty acids” (Fig. 10b). The 36,467 DEGs identified in the leaves vs. stems comparison were primarily enriched in “Riboflavin metabolism,” “Glucosinolate biosynthesis,” “Photosynthesis,” “Cyanoamino acid metabolism,” and “Sesquiterpenoid and triterpenoid biosynthesis” (Fig. 10c).

Fig. 10

The DEGs significantly enriched in KEGG pathways. (ac) Significantly enriched pathways in the DEGs identified in the comparison between leaves vs. flowers (a), leaves vs. roots (b) and leaves vs. stems (c)

Identification of leaf-specific expression unigenes

A total of 9094 up-regulated genes showing leaf-specific expression (log2FC > 1) were mapped to 132 pathways in the KEGG database; these DEGs were mainly enriched in “Photosynthesis,” “Photosynthesis-antenna proteins,” “Riboflavin metabolism,” “Benzoxazinoid biosynthesis,” and “Linoleic acid metabolism” (Fig. 11a). In the GO database, unigenes with rhizome-specific expression were mapped to 49 subcategories in three functional categories, with the highest enrichment in “metabolic process,” “cellular process,” and “biological regulation” under “biological process”; “membrane,” “membrane part,” and “cell” under “cellular component”; and “catalytic activity,” “binding,” and “transporter” activity under “molecular function” (Fig. 11b).

Fig. 11

KEGG and GO annotations of unigenes expressed specifically in the leaves of C. gracile. a Significantly enriched pathways among unigenes showing leaf-specific expression. The triterpenoid saponin biosynthesis pathways are indicated in red. b Classification of leaf-specific unigenes annotated using the GO database

Analysis of genes encoding TFs related to the biosynthesis of terpenoids and other secondary metabolites

Transcription factor (TF) families play key roles in regulating the activity of genes involved in triterpenoid saponin biosynthesis and other secondary metabolic processes by specifically binding to cis-regulatory elements in their promoter regions. A total of 3536 unigenes encoding TFs were identified in the C. gracile transcriptome. These included 517, 1128, and 742 TF-encoding unigenes that were up-regulated in leaves compared with flowers, roots, and stems, respectively (Table 4). Most of these unigenes encoded TFs belonging to the MYB (404 unigenes), AP2-EREBP (268 unigenes), bHLH (273 unigenes), WRKY (200 unigenes), C3H (175 unigenes), and NAC (156 unigenes) families. Moreover, unigenes encoding FHA (4 unigenes), Trihelix (4 unigenes), FAR1 (2 unigenes), and MYB (1 unigenes) TFs were involved in terpenoid metabolism. A total of 23 unigenes encoding TF participated in the biosynthesis of secondary metabolites (Additional file 8: Table S4, Fig. 12a and b).

Table 4 Number of transcription factor (TF) genes showing differential expression in different tissues of C. gracile
Fig. 12

The number of unigenes encoding TFs involved in the metabolic pathways in C. gracile. a TF families and unigene numbers related to the metabolism of terpenoids. b TF families and unigene numbers related to “metabolism of biosynthesis of other secondary metabolites”


Triterpenoid saponins are the major bioactive compounds in C. gracile, with extensive pharmacological effects. However, genomic information on saponin biosynthesis is limited. In this study, we aimed to identify putative genes involved in triterpenoid saponin biosynthesis in C. gracile. Therefore, we conducted deep transcriptome sequencing on different tissues of C. gracile plants using RNA-Seq. Analysis of RNA-Seq data revealed 128,856 unigenes, with a mean length of 1354 bp and an N50 value of 2268 bp. Of the 128,856 unigenes, 99,020 were annotated, whereas 23.15% of the unigenes remained unannotated, probably due to the lack of public data of plant transcriptome and genome. The assembly quality of C. gracile transcriptome was better than that of other medicinal plants in the Labiatae family such as Red Perilla frutescens var. crispa (no. of unigenes = 47,788; mean unigene length = 876 bp; N50 = 1349 bp) [28], Salvia guaranitica L. (no. of unigenes = 61,400; mean unigene length = 731 bp; N50 = 1334 bp) [29], and Salvia miltiorrhiza (no. of unigenes = 50,778; mean unigene length = 868.75 bp; N50 = 1618 bp) [30]. These results demonstrate the reliability of the C. gracile transcriptome.

We performed transcriptome analysis of the flowers, leaves, roots, and stems of C. gracile to facilitate the dissection of genes involved in the tissue-specific biosynthesis of triterpenoid saponins. This approach is widely used for the identification of novel genes involved in the biosynthesis of secondary metabolites and analysis of the molecular mechanism of triterpenoid saponin biosynthesis [31].

In the transcriptome data sets of C. gracile, the up-regulated unigenes in leaves were mainly enriched in “Terpenoid backbone biosynthesis” and “Sesquiterpenoid and triterpenoid biosynthesis” in the KEGG database and were mainly annotated under “metabolic process” and “catalytic activity” in the GO database. These up-regulated genes may explain the molecular basis of the medicinal value of C. gracile leaves.

The sponin content of different C. gracile tissues was determined by UV-spectrophotometry, with buddlejasaponin IV as a standard, which was identified and confirmed by UPLC/Q-TOF-MS. Saponins include steroid and triterpenoid saponins in most of plant species. Triterpenoid saponins are generally predominant in dicotyledons, while steroidal saponins are mainly identified in monocotyledons [32]. Triterpenoid saponins are some of the most important components of C. gracile, although the relationship between the expression of triterpenoid biosynthesis genes and accumulation of triterpenoid saponins in C. gracile is unclear.

In this study, UPLC/Q-TOF-MS analysis was performed to identify the different constituents of triterpenoid saponins. UV-spectrophotometry analysis confirmed that the content of total saponin in C. gracile was the highest in leaves compared with other tissues. Higher expression levels of some of the unigenes encoding SS, HDR, HDS, DXS, and DXR enzymes in leaves compared with stems, flowers, and roots were consistent with the higher saponin accumulation in C. gracile leaves (Fig. 1 and Fig. 5). This suggests that overexpression of genes encoding these key enzymes could increase the production of triterpenoid saponins in C. gracile plants. A similar approach, i.e., overexpression of the SS gene has been previously used to enhance the production of total saponins in Medicago truncatula [33].

Roots showed the highest expression of unigenes encoding most of the key enzymes involved in the biosynthesis of triterpenoid saponins (Fig. 5); however, this result contradicts the high levels of triterpenoid saponins in leaves. Expression levels of genes do not always correlate with enzyme activity [34]. Post-transcriptional modifications influence features of mRNAs, thus affecting protein synthesis. For example, in a previous study, the mismatch between ginsenoside content and HMGR gene transcript levels was reportedly affected by post-transcriptional modifications and feedback regulation [35]. Post-translational modifications such as phosphorylation and ubiquitination can also determine the rate of protein degradation [36].

The expression levels of CL10352–1 (HMGR), Un 5223 (HMGR), Un 17,275 (HMGR), Un 41,982 (DXR), CL12163–4 (SS), and CL1648–1 (HDR) determined by qRT-PCR were consistent to the FPKM values determined by RNA-Seq, thus confirming the reliability of transcriptomic data, which will be helpful for understanding the biosynthesis pathway of triterpenoid saponins in C. gracile.

Multiple sequence alignment of SS amino acid sequences suggested the presence of six domains (I–VI) required for the functional activity of the enzyme. Domains I, II, and III are involved in the first step of SS biosynthesis; domain IV participates in the second step; domain VI is a hydrophobic region that binds to the endoplasmic reticulum. The Mg2+-binding aspartate-rich regions and amino acids in the active site (Asp77, Asp81, Tyr168, Asp213, and Asp217) were highly conserved in SS amino acid sequences among different plant species [37,38,39]. Previously, it has been shown that overexpression of the SS (AB115496) gene significantly increases the ginsenoside content [40] in Panax ginseng. Therefore, our findings will be beneficial for further investigation into the role of SS in triterpenoid saponin biosynthesis.

TFs are sequence-specific DNA-binding proteins that modulate target gene expression. In the C. gracile transcriptome, only one unigene (Un 15,683) encoding an MYB TF was annotated to play a role in the metabolism of terpenoids. Previous studies have shown that the overexpression of MYB9b enhances tanshinone concentration in Salvia miltiorrhiza hairy roots [41], and MYB1 increases artemisinin content and trichome proliferation in Artemisia annua [42]. Additionally, eight unigenes encoding MYB TFs were mapped to “metabolism of biosynthesis of other secondary metabolites” in C. gracile. Overexpression of the gene encoding MYB1 and MYB2 TFs enhances the synthesis and accumulation of anthocyanins in Fagopyrum tataricum [43]. Moreover, MYB5, MYB26, and MYB31 participate in flavonoid biosynthesis in Ginkgo biloba under adverse conditions [44]. These TFs may also play a significant role in the regulation of triterpenoid saponins and other secondary metabolites in C. gracile.


Overall, this is the first report of a comprehensive RNA-Seq analysis of C. gracile tissues to identify genes involved in the biosynthesis of triterpenoid saponins. The transcriptome data will facilitate further examination of the molecular mechanism of triterpenoid saponin biosynthesis in C. gracile and promote the study of C. gracile functional genomics.


Plant material and RNA isolation

Ten plants of C. gracile (height = ~ 18 cm) were harvested from the herbal garden of the Anhui University of Chinese Medicine under the permission of managers and professionals on April 18, 2018 (Additional file 9: Figure S5), which were identified by Prof. Qingshan Yang (Anhui University of Chinese Medicine). Flower, leaf, root, and stem tissues were separated from three C. gracile plants, cleaned with ultrapure water, dried on filter paper, and then frozen in liquid nitrogen.

Total RNA was isolated from four tissues (flowers, leaves, roots, and stems) of C. gracile plants in three replicates using the ethanol precipitation protocol and RNA Plant Kit (Aidlab Biotech, Beijing, China), according to the manufacturer’s instructions. The quality and quantity of total RNA were verified using NanoDrop 8000 and Agilent 2100 Bioanalyzer (Thermo Fisher Scientific, MA, USA), respectively (Additional file 10: Table S5).

Determination of total saponin content

Saponins were extracted from dried C. gracile tissues (flowers, leaves, roots, and stems) and detected, as described previously [45], with slight modifications. Briefly, dried powder (0.1 g) of each tissue was extracted in 2 ml of 50% methanol by ultrasonication for 30 min. The filtrate obtained by vacuum filtration was concentrated at 60 °C to near dryness and then resuspended in 25 ml of 50% methanol. The absorbance of the solution was determined by UV-spectrophotometry (JASCO Company, Japan). A curve of concentration versus absorbance was established using buddlejasaponin IV, as a standard, to determine the saponin content of C. gracile tissue (Additional file 11: Figure S6a and b). Total saponin yield (%) was calculated as follows:

$$ \mathrm{Yield}\ \left(\%\right)=\left[\left(\frac{A+0.0081}{0.0741}\right)\times \left(\frac{1}{40m}\right)\right]\times 100\%. $$

where A and m represent the solution absorbance of sample and weight of sample dried powder, respectively.

UPLC/Q-TOF-MS analysis

Triterpenoid saponin metabolite profiling was performed on a Waters ACQUITY UPLC /Xevo G2 Q-TOF (Waters Corporation, Milford, MA, USA). Samples were separated on Agilent ZORBAX Eclipse Plus C18 (2.1 mm × 50 mm, 1.8 μm). The column temperature was maintained at 35 °C, and the flow rate was 0.20 mL/min. The mobile phase consisted of acetonitrile and 0.1% formic acid. The gradient was initiated with 10% acetonitrile for 5 min, 10% acetonitrile linearly increased to 20% acetonitrile within 1 min and held at 20% for 4 min, then 20% acetonitrile increased to 30% acetonitrile within 1 min and held at 30% for 4 min, and 30% acetonitrile increased to 45% acetonitrile within 1 min and held at 45% for 4 min. The Xevo G2-S Q-TOF mass spectrometer was run in the negative mode. Mass spectra were obtained with a scanning mass range of 50 to 1500 Da. High-purity nitrogen (N2) was used as nebulizing gas, and ultra-high pure helium (He) was used as the collision gas. Other parameters were as follows: Capillary voltage, 2.00 kV; sampling cone voltage, 40.0 V; ion source temperature, 120 °C; desolvation temperature, 350 °C; cone gas flow, 50 L/h; desolvation gas flow rate, 600 L/h; collision energy (CE), 40–60 V; and leucine enkephalin was used as lock mass (m/z 554.2615).

cDNA library preparation and RNA-Seq analysis

Total RNA isolated from plant tissues was treated with RNase-free DNase I (TaKaRa, China) to eliminate any traces of contaminating DNA, which was then mixed with oligo (dT) magnetic beads to purify mRNA. The purified mRNA was fragmented at an appropriate temperature. First-strand cDNA was generated by reverse transcription PCR (RT-PCR) using random hexamer primers. This was followed by second-strand cDNA synthesis. Subsequently, A-Tailing Mix (Enzymatics, USA) and RNA Index Adapters were added to perform end-repair, and the resulting cDNA was amplified by PCR. The PCR products were purified by Ampure XP Beads and eluted in EB solution. The quantity and quality of cDNA libraries were evaluated using Agilent 2100 Bioanalyzer (ABI, New York, NY, USA). The double-stranded PCR products were heated denatured and circularized by the splint oligo sequence to obtain the final library. The single-stranded circular DNA was amplified using phi29 (Thermo Scientific, USA) to generate DNA nanoball (DNB). The DNBs were loaded into the patterned nanoarray and single-end 50 bp reads were generated on the BGISEQ-500 platform (Beijing Genomics Institute, Wuhan, China) [46].

De novo transcriptome assembly

To obtain high-quality reads, low quality reads (i.e., reads with > 20% nucleotides with base quality < 10) and reads with adaptor sequences and/or unknown nucleotides (> 5%) were filtered using SOAPnuke [47] and Trimmomatic. De novo assembly of the high-quality reads was performed using Trinity [48]; PCR duplicates were removed prior to assembly to improve assembly efficiency. The assembled transcripts were clustered using the TGI clustering tool (TGICL) [49] to remove redundant sequences and obtain non-redundant sequences, termed unigenes.

Annotation of unigenes

The functional annotation of unigenes was performed by searching for homologous sequences in public databases, including NT (NCBI nucleotide;, NR (NCBI non-redundant protein sequence;, KOG (clusters of euKaryotic Orthologous Groups;, KEGG (Kyoto Encyclopedia of Genes and Genome;, GO (Gene Ontology;, Pfam (protein families;, and SwissProt (a manually annotated and reviewed protein sequence database; Sequences in the NT database were searched using Blastn [50], while those in the NR, KOG, SwissProt, and KEGG databases were searched using Blastx [51]. With NR annotation, the Blast2GO (version 2.5.0) was used to obtain the GO annotations of unigenes [52], while unigenes were mapped to the Pfam database using Hmmscan.

The full-length open reading frame (ORF) of C. gracile SS (CgSS) was translated using the ExPASy translation tool (, and multiple sequence alignment was implemented using DNAMAN and Clustalx 1.83 software. Conserved domains in the amino acid sequence of CgSS were detected using the Conserved Domains Database ( The secondary structure of CgSS was predicted using ESPript 2.2 ( Three-dimensional (3D) homologous modeling of CgSS was performed using the Swiss Model (, and the 3D structure was described using the PyMOL software.

Differentially expressed genes analysis

High-quality reads of each sample were queried against the genome sequence of C. gracile using Bowtie2 (version 2.2.5) [53], and gene expression level in each sample was calculated using RSEM (version 1.2.8) [54]. Unigenes showing significant differences in expression levels (fold-change [FC] ≥ 2.00; false discovery rate [FDR] ≤ 0.001) between leaves and other tissues (root, flowers, and stems) were identified by the PoissonDis method [55,56,57]. These DEGs were then used for GO functional analysis and KEGG pathways enrichment.

Analysis of transcription factor (TF) encoding genes

The ORF of unigenes was identified by Getorf (EMBOSS: [58], and then annotated to the TF domains in the plant transcription factor database (PlantTFDB) using Hmmsearch (version 3.0) [59].

Analysis of the key genes expression level in triterpenoid saponin biosynthesis by qRT-PCR

To validate the C. gracile transcriptome data sets, qRT-PCR was performed in triplicate using GoTaq qPCR Master Mix (Promega) on a Real-time Thermal Cycler 5100 System (Thermo Scientific, Waltham, MA, USA). Gene-specific primers for the actin gene (Un 11,691) and six unigenes (CL10352–1, Un 17,275, Un 41,982, Un 5223, CL12163–4, and CL1648–1) involved in triterpenoid biosynthesis were designed using Primer Premier (version 5.0) (Additional file 12: Table S6). Each PCR reaction was prepared in a 15-μl mixture volume containing diluted cDNA (2 μl), forward primer (1 μl), reverse primer (1 μl), qPCR Mixture (2X, 7.5 μl), and RNase-free water (3.5 μl) using the following conditions: initial denaturation at 95 °C for 2 min, followed by 40 cycles of denaturation at 95 °C for 15 s, and annealing at 60 °C for 30 s. The relative expression level of each selected unigene was normalized with the CgActin gene (Un 11,691) and detected by the 2-ΔΔCt method [60].

Availability of data and materials

RNA-Seq data sets of C. gracile have been deposited in the NCBI Sequence Read Archive (SRA) database under the accession number SRP194041.



Acetyl-CoA acetyltransferase

C. gracile :

Clinopodium gracile (Benth.) Matsum


C. gracile squalene synthase


4-diphosphocytidyl-2-C-methyl-D-erythritol kinase


Cytochrome P450 monooxygenase


Differentially expressed genes


1-deoxy-D-xylulose-5-phosphate reductase


1-deoxy-D-xylulose-5-phosphate synthase




False discovery rate


Farnesyl pyrophosphate


Farnesyl diphosphate synthase


Gene ontology


Geranyl pyrophosphate


Geranyl diphosphate synthase


4-hydroxy-3-methylbut-2-en-1-yl diphosphate reductase


4-hydroxy-3-methylbut-2-enyl diphosphate synthase


3-hydroxy-3-methylglutaryl coenzyme A reductase


Hydroxymethylglutaryl-CoA synthase


Isopentenyl-diphosphate delta-isomerase


Isopentenyl pyrophosphate


Kyoto encyclopedia of genes and genome


2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase


2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase


The 2-C-methyl-D-erythrirtol-4-phosphate


Mevalonate kinase


The mevalonic acid


Next-generation sequencing


NCBI non-redundant protein sequence


Open reading frame


The plant transcription factor database


Diphosphomevalonate decarboxylase


Phosphomevalonate kinase


Presqualene diphosphate


Quantitative real-time PCR


RNA sequencing


Squalene epoxidase


Squalene synthase


Transcription factors




Beta-amyrin synthase


  1. 1.

    Dai JRSD. Morphology, anatomy and chemical constituents of five species of Clinopodium. Yao Xue Xue Bao. 1984;19(6):425–30.

    CAS  PubMed  Google Scholar 

  2. 2.

    Chen YY, Huang YL, Wen YX, Dian-Peng LI, Chen WJ. GC-MS analysis of the volatile oil from Clinopodium Gracile. Fine Chemicals. 2009;26:770–2.

    CAS  Google Scholar 

  3. 3.

    Wang SN, Shi-Chun YU, Xu-Dong XU, She SJ, Fan SH. Triterpenoid Saponins and flavonoids from ClinopodiumLinn.:chemical component analyses and NMR spectral features. Chin J Magn Reson. 2013;30(3):447–60.

    CAS  Google Scholar 

  4. 4.

    Seungbin P, Sanghyun K, Kyoungho S, Hyunshik L, Taegkyu K, Ju MG, Hoon J, Daekeun K, Jongpil L, Taeyong S. Clinopodium gracile inhibits mast cell-mediated allergic inflammation: involvement of calcium and nuclear factor-κB. Exp Biol Med. 2010;235(5):606.

    Article  CAS  Google Scholar 

  5. 5.

    Wu WS, Hsu HY. Involvement of p-15(INK4b) and p-16(INK4a) gene expression in saikosaponin a and TPA-induced growth inhibition of HepG2 cells. Biochem Biophys Res Commun. 2001;285(2):183–7.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  6. 6.

    Hu YX, Zhang W, Zhang W, Zhu YD, Ma GX, Zhu NL, Sun W, Ma ZX, Yu SC, Xu XD, et al. Oleanane triterpene saponins with cardioprotective activity from Clinopodium polycephalum. J Asian Nat Prod Res. 2017;19(7):697–703.

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Balik D, Sashka D, Adriana M, Nikola P. In vitro screening for antitumour activity of Clinopodium vulgare L. (Lamiaceae) extracts. Biol Pharm Bull. 2002;25(4):499–504.

    Article  Google Scholar 

  8. 8.

    Haridas V, Higuchi M, Jayatilake GS, Bailey D, Mujoo K, Blake ME, Arntzen CJ, Gutterman JU. Avicins: Triterpenoid Saponins from Acacia victoriae (Bentham) induce apoptosis by mitochondrial perturbation. Proc Natl Acad Sci U S A. 2001;98(10):5821–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Thimmappa R, Geisler K, Louveau T, O'Maille P, Osbourn A. Triterpene biosynthesis in plants. Annu Rev Plant Biol. 2014;65(1):225–57.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  10. 10.

    Kumar S, Kalra S, Kumar S, Kaur J, Singh K. Differentially expressed transcripts from leaf and root tissue of Chlorophytum borivilianum: a plant with high medicinal value. Gene. 2012;511(1):79–87.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  11. 11.

    Chang LZ, Xiu MC, Yan PC, Quan L. Key enzymes of triterpenoid saponin biosynthesis and the induction of their activities and gene expressions in plants. Nat Prod Commun. 2010;5(7):1147–58.

    Google Scholar 

  12. 12.

    Zhan C, Ahmed S, Hu S, Dong S, Cai Q, Yang T, Wang X, Li X, Hu X. Cytochrome P450 CYP716A254 catalyzes the formation of oleanolic acid from β-amyrin during oleanane-type triterpenoid saponins biosynthesis in Anemone flaccida. Biochem Biophys Res Commun. 2017;495(1):1271–7.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  13. 13.

    Zhang P, Cao X, Li C, Zheng Z, Yong S, Jiang JH. Cloning and characterization of a Squalene synthase gene from the Chaga medicinal mushroom, Inonotus obliquus (Agaricomycetes). Int J Med Mushrooms. 2016;18(5):445–55.

    PubMed  Article  PubMed Central  Google Scholar 

  14. 14.

    Nguyen HTM, Neelakadan AK, Quach TN, Valliyodan B, Kumar R, Zhang Z, Nguyen HT. Molecular characterization of Glycine max squalene synthase genes in seed phytosterol biosynthesis. Plant Physiol Biochem. 2013;73(41):23–32.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  15. 15.

    Xu JW, Xu YN, Zhong JJ. Production of individual ganoderic acids and expression of biosynthetic genes in liquid static and shaking cultures of Ganoderma lucidum. Appl Microbiol Biotechnol. 2010;85(4):941.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  16. 16.

    Hayashi H, ., Hiraoka N, ., Ikeshiro Y, . Molecular cloning and functional expression of cDNAs for Glycyrrhiza glabra squalene synthase. Biol Pharm Bull 1996, 19(10):1387–1389.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  17. 17.

    Zhao H, Tang Q, Mo C, Bai L, Tu D, Ma X. Cloning and characterization of squalene synthase and cycloartenol synthase from Siraitia grosvenorii. Acta Pharm Sin B. 2017;7(2):215–22.

    PubMed  Article  PubMed Central  Google Scholar 

  18. 18.

    Akamine S, Nakamori K, Chechetka SA, Banba M, Umehara Y, Kouchi H, Izui K, Hata S. cDNA cloning, mRNA expression, and mutational analysis of the squalene synthase gene of Lotus japonicus. Biochim Biophys Acta. 2003;1626(1–3):97–101.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  19. 19.

    Jiang Z, Zhou X, Li R, Michal JJ, Zhang S, Dodson MV, Zhang Z, Harland RM. Whole transcriptome analysis with sequencing: methods, challenges and potential solutions. Cell Mol Life Sci. 2015;72(18):3425–39.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Lo H-Y, Li C-C, Huang H-C, Lin L-J, Hsiang C-Y, Ho T-Y. Application of transcriptomics in Chinese herbal medicine studies. J Tradit Complement Med. 2012;2(2):105–14.

    PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Liu J, Hou J, Jiang C, Li G, Lu H, Meng F, Shi L. Deep sequencing of the Scutellaria baicalensis Georgi Transcriptome reveals flavonoid biosynthetic profiling and organ-specific gene expression. PLoS One. 2015;10(8):e0136397.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. 22.

    Rastogi S, Meena S, Bhattacharya A, Ghosh S, Shukla RK, Sangwan NS, Lal RK, Gupta MM, Lavania UC, Gupta V. De novo sequencing and comparative analysis of holy and sweet basil transcriptomes. BMC Genomics. 2014;15(1):588.

    PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Akhtar MQ, Qamar N, Yadav P, Kulkarni P, Kumar A, Shasany AK. Comparative glandular trichome transcriptome-based gene characterization reveals reasons for differential (−)-menthol biosynthesis in Mentha species. Physiol Plant. 2017;160(2):128–41.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  24. 24.

    Shi Y, Zhang S, Peng D, Wang C, Zhao D, Ma K, Wu J, Huang L. Transcriptome Analysis of Clinopodium chinense (Benth.) O. Kuntze and Identification of Genes Involved in Triterpenoid Saponin Biosynthesis. Int J Mol Sci. 2019;20(11).

    CAS  PubMed Central  Article  Google Scholar 

  25. 25.

    Tsai CC, Wu KM, Chiang TY, Huang CY, Chou CH, Li SJ, Chiang YC. Comparative transcriptome analysis of Gastrodia elata (Orchidaceae) in response to fungus symbiosis to identify gastrodin biosynthesis-related genes. BMC Genomics. 2016;17:212.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  26. 26.

    Brooks MJ, Rajasimha HK, Roger JE, Swaroop A. Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl−/− retinal transcriptomes. Mol Vis. 2011;17(327–30):3034–54.

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Shang N, Li Q, Ko T-P, Chan H-C, Li J, Zheng Y, Huang C-H, Ren F, Chen C-C, Zhu Z. Squalene synthase as a target for Chagas disease therapeutics. PLoS Pathog. 2014;10(5):e1004114.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  28. 28.

    Fukushima A, Nakamura M, Suzuki H, Saito K, Yamazaki M. High-Throughput Sequencing and De Novo Assembly of Red and Green Forms of the Perilla frutescens var. crispa Transcriptome. PLoS One. 2015;10(6):e0129154.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  29. 29.

    Mohammed A, Hussain RM, Ur RN, Guangbiao S, Penghui L, Xiaochun W, Liang G, Jian Z. De novo transcriptome sequencing and metabolite profiling analyses reveal the complex metabolic genes involved in the terpenoid biosynthesis in blue Anise sage (Salvia guaranitica L.). DNA Res.

  30. 30.

    Zhang X, Dong J, Liu H, Wang J, Qi Y, Liang Z. Transcriptome sequencing in response to salicylic acid inSalvia miltiorrhiza. PLoS One. 2016;11(1):e0147849.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  31. 31.

    Mohammed A, HR M, Ur RN, Guangbiao S, Penghui L, Xiaochun W, Liang G, Jian Z. De novo transcriptome sequencing and metabolite profiling analyses reveal the complex metabolic genes involved in the terpenoid biosynthesis in Blue Anise Sage (Salvia guaranitica L.). DNA Res. 2018.

  32. 32.

    Sparg SG, Light ME, Jv S. Biological activities and distribution of plant saponins. 243. 94(2–3):219.

  33. 33.

    Kang J, Zhang Q, Jiang X, Zhang T, Long R, Yang Q, Wang Z. Molecular Cloning and Functional Identification of a Squalene Synthase Encoding Gene from Alfalfa (Medicago sativa L.). Int J Mol Sci. 2019;20(18):4499.

    PubMed Central  Article  Google Scholar 

  34. 34.

    Wojtyla A, Gladych M, Rubis B. Human telomerase activity regulation. Mol Biol Rep. 2011;38(5):3339–49.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  35. 35.

    Kim YJ, Lee OR, Oh JY, Jang MG, Yang DC. Functional analysis of 3-Hydroxy-3-Methylglutaryl coenzyme a Reductase encoding genes in Triterpene Saponin-producing ginseng. Plant Physiol. 2014;165(1):373–87.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. 36.

    Nelson CJ, Millar AH. Protein turnover in plant biology. Nature plants. 2015;1:15017.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  37. 37.

    Vishwakarma RK, Patel K, Sonawane P, Kumari U, Singh S, Ruby, Abbassi S, Agrawal DC, Tsay HS, Khan BM: Squalene synthase gene from medicinal HerbBacopa monniera: molecular characterization, differential expression, comparative modeling, and docking studies. Plant Mol Biol Report 2015, 33(6):1675–1685.

    CAS  Article  Google Scholar 

  38. 38.

    Haudenschild C, Hartmann M-A. Inhibition of sterol biosynthesis during elicitor-induced accumulation of furanocoumarins in parsley cell suspension cultures. Phytochemistry. 1995;40(4):1117–24.

    CAS  Article  Google Scholar 

  39. 39.

    Devarenne PT. Regulation of Squalene synthase, a key enzyme of sterol biosynthesis, in tobacco. Plant Physiol. 2002;129(3):1095–106.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Mi-Hyun L, Jae-Hun J, Jin-Wook S, Cha-Gyun S, Young-Soon K, Jun-Gygo I, Deok-Chun Y, Jae-Seon Y, Yong-Eui C. Enhanced triterpene and phytosterol biosynthesis in Panax ginseng overexpressing squalene synthase gene. Plant Cell Physiol. 2004;45(8):976–84.

    Article  Google Scholar 

  41. 41.

    Zhang J, Zhou L, Zheng X, Zhang J, Yang L, Tan R, Zhao S. Overexpression of SmMYB9b enhances tanshinone concentration in Salvia miltiorrhiza hairy roots. Plant Cell Rep. 2017;36(7):1297–309.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  42. 42.

    Matías-Hernández L, Jiang W, Yang K, Tang K, Brodelius PE, Pelaz S. AaMYB1 and its orthologue AtMYB61 affect terpene metabolism and trichome development in Artemisia annua and Arabidopsis thaliana. Plant J Cell Mol Biol. 2017;(3):90.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  43. 43.

    Bai YC, Li CL, Zhang JW, Li SJ, Luo XP, Yao HP, Chen H, Zhao HX, Park SU, Wu Q. Characterization of two tartary buckwheat R2R3-MYB transcription factors and their regulation of proanthocyanidin biosynthesis. Physiol Plant. 2014;152(3):431–40.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Liu X, Yu W, Zhang X, Wang G, Cao F, Cheng H. Identification and expression analysis under abiotic stress of the R2R3 - MYB genes in Ginkgo biloba L. Physiol Mol Biol Plants. 2017;23(3):503.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Yue-yuan C, Dian-peng L, Feng-lai L, Jin-lei L, Yong-xin W. Determination of Buddlejasaponin IV in Clinopodium gracile by HPLC-ELSD. Agric Res Appl. 2011;3:22–4.

    Google Scholar 

  46. 46.

    Zhang B, Zhang W, Nie R-E, Li W-Z, Segraves KA, Yang X-K, Xue H-J. Comparative transcriptome analysis of chemosensory genes in two sister leaf beetles provides insights into chemosensory speciation. Insect Biochem Mol Biol. 2016;79:108–18.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  47. 47.

    Chen Y, Chen Y, Shi C, Huang Z, Zhang Y, Li S, Li Y, Ye J, Yu C, Li Z, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience. 2018;7(1):1–6.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  48. 48.

    Grabherr MG, Haas BJ, Moran Y, Levin JZ, Thompson DA, Ido A, Xian A, Lin F, Raktima R, Qiandong Z. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B. TIGR gene indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003;19(5):651–2.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  50. 50.

    Chen Y, Ye W, Zhang Y, Xu Y. High speed BLASTN: an accelerated MegaBLAST search tool. Nucleic Acids Res. 2015;43(16):7762–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Yang Y, Jiang XT, Zhang T. Evaluation of a hybrid approach using UBLAST and BLASTX for metagenomic sequences annotation of specific functional genes. PLoS One. 2014;9(10):e110947.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  52. 52.

    Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    Langdon WB. Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks. BioData mining. 2015;8(1):1.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. 54.

    Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12(1).

  55. 55.

    Dembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. Bmc Bioinformatics. 2014;15(1):14.

    PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Kim KI, MAvd W. Effects of dependence in high-dimensional multiple testing problems. Bmc Bioinformatics. 2008;9(1):114.

    PubMed  PubMed Central  Article  Google Scholar 

  57. 57.

    AUDIC S. The significance of digital gene expression profiles. Genome Res. 1997;7(10):986–95.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  58. 58.

    Rice P, Longden I, Bleasby A. EMBOSS: the European molecular biology open software suite. Trends Genet. 2000;16(6):276–7.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  59. 59.

    Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41(12):e121.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. 60.

    Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2 −ΔΔ C T method. Methods. 2001;25(4):402–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references


We thank the Beijing Genomics Institute for assistance with experiments.


The design of the study was supported by the Sustainable Utilization Project of Chinese Medicine Resources (Grant No. 2060302), the collection, analysis, and interpretation of data were supported by the National Key Research and Development Program (2017YFC1701600), the Natural Science Foundation of Anhui Province of China (1408085QH182) and National project cultivation fund of Anhui University of Chinese Medicine (2020py02). Writing the manuscript and this publication were supported by Natural Science Research Grant of Higher Education of Anhui Province (Grant No. KJ2018ZD028).

Author information




Project design: JWW. Experiments and data analysis: CMS, CKW, SXZ, YYS, KLM and QSY. Manuscript preparation: CMS and CKW. Manuscript revision: JWW. Sample preparation: JWW and QSY. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiawen Wu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

Total saponin content of leaves, stems, flowers, and roots of Clinopodium gracile (Benth.) Matsum.

Additional file 2: Table S2.

Identification of the constituents of triterpenoid saponins in C.gracile by UPLC/Q-TOF-MS.

Additional file 3: Figure S1.

Length distribution of C. gracile unigenes.

Additional file 4: Figure S2.

Percent homology between the sequences of C. gracile and other plant species determined using the NR database.

Additional file 5: Figure S3.

GO function annotation of C. gracile transcriptome.

Additional file 6: Figure S4.

KEGG functional classification of the annotated unigenes in C. gracile.

Additional file 7: Table S3.

KEGG annotations of all unigenes.

Additional file 8: Table S4.

Number of unigenes encoding TFs involved in terpenoid metabolism.

Additional file 9: Figure S5.

Photograph of C. gracile plants. The picture was photographed for the plants of C. gracile in the herbal garden of the Anhui University of Chinese Medicine on April 18, 2018.

Additional file 10: Table S5.

Characteristics of RNA isolated from different tissues of C. gracile.

Additional file 11: Figure S6.

(a) The ultraviolet Absorption Spectrum of the buddlejasaponin IV. (b) Standard curve of buddlejasaponin IV at 250 nm.

Additional file 12: Table S6.

List of genes amplified using the indicated primers by qRT-PCR.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shan, C., Wang, C., Zhang, S. et al. Transcriptome analysis of Clinopodium gracile (Benth.) Matsum and identification of genes related to Triterpenoid Saponin biosynthesis. BMC Genomics 21, 49 (2020).

Download citation


  • Clinopodium gracile (Benth.) Matsum
  • Transcriptome
  • RNA-Seq
  • Triterpenoid saponin biosynthesis
  • Differentially expressed genes