De novo transcriptome and phytochemical analyses reveal differentially expressed genes and characteristic secondary metabolites in the original oolong tea (Camellia sinensis) cultivar ‘Tieguanyin’ compared with cultivar ‘Benshan’
De novo transcriptome and phytochemical analyses reveal differentially expressed genes and characteristic secondary metabolites in the original oolong tea (Camellia sinensis) cultivar ‘Tieguanyin’ compared with cultivar ‘Benshan’
The two original plants of the oolong tea cultivar (‘Tieguanyin’) are “Wei shuo” ‘Tieguanyin’—TGY (Wei) and “Wang shuo” ‘Tieguanyin’—TGY (Wang). Another cultivar, ‘Benshan’ (BS), is similar to TGY in its aroma, taste, and genetic make-up, but it lacks the “Yin Rhyme” flavor. We aimed to identify differences in biochemical characteristics and gene expression among these tea plants.
The results of spectrophotometric, high performance liquid chromatography (HPLC), and gas chromatography-mass spectrometry (GC-MS) analyses revealed that TGY (Wei) and TGY (Wang) had deeper purple-colored leaves and higher contents of anthocyanin, catechins, caffeine, and limonene compared with BS. Analyses of transcriptome data revealed 12,420 differentially expressed genes (DEGs) among the cultivars. According to a Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, the flavonoid, caffeine, and limonene metabolic pathways were highly enriched. The transcript levels of the genes involved in these three metabolic pathways were not significantly different between TGY (Wei) and TGY (Wang), except for two unigenes encoding IMPDH and SAMS, which are involved in caffeine metabolism. The comparison of TGY vs. BS revealed eight up-regulated genes (PAL, C4H, CHS, F3’H, F3H, DFR, ANS, and ANR) and two down-regulated genes (FLS and CCR) in flavonoid metabolism, four up-regulated genes (AMPD, IMPDH, SAMS, and 5′-Nase) and one down-regulated XDH gene in caffeine metabolism; and two down-regulated genes (ALDH and HIBADH) in limonene degradation. In addition, the expression levels of the transcription factor (TF) PAP1 were significantly higher in TGY than in BS. Therefore, high accumulation of flavonoids, caffeine, and limonene metabolites and the expression patterns of their related genes in TGY might be beneficial for the formation of the “Yin Rhyme” flavor.
Transcriptomic, HPLC, and GC-MS analyses of TGY (Wei), TGY (Wang), and BS indicated that the expression levels of genes related to secondary metabolism and high contents of catechins, anthocyanin, caffeine, and limonene may contribute to the formation of the “Yin Rhyme” flavor in TGY. These findings provide new insights into the relationship between the accumulation of secondary metabolites and sensory quality, and the molecular mechanisms underlying the formation of the unique flavor “Yin Rhyme” in TGY.
Camellia sinensis (L.) is an important economic crop which originated from the southwest of China. Its product, tea, is one of the three major non-alcoholic beverages worldwide . Fujian Province (southeast China) is one of the most important tea production and marketing areas in China, and numerous national and provincial improved tea cultivars have contributed to the importance of tea in the local economy. C. sinensis cv. ‘Tieguanyin’ (TGY) originated from Xiping town, Anxi county, Fujian Province, China, and was identified as a Chinese national improved cultivar in 1985. The characteristic compounds of tea are catechins, caffeine, theanine, and volatiles, and these secondary metabolites are also important indicators of tea quality . The TGY cultivar is the highest grade of oolong tea due to its unique “Yin Rhyme” flavor, which can be described as an elegant, gentle floral flavor, mellow and thick taste, and a sweet aftertaste. There are some reports that specific secondary metabolites may be related to the formation of the “Yin Rhyme” flavor in TGY tea [3, 4]. In the core area in which TGY originated, two of the earliest original tea plants of TGY are preserved; “Wei shuo” ‘Tieguanyin’—TGY (Wei) and “Wang shuo” ‘Tieguanyin’—TGY (Wang). These two original TGY tea plants are very similar in their physiological characteristics, but the differences in their biochemical characteristics and secondary metabolites are not yet clear. Another oolong tea national improved cultivar, C. sinensis cv. ‘Benshan’ (BS), also originated from Xiping town, Anxi county, Fujian Province, China. The aroma and taste characteristics of BS tea are very similar to those of TGY tea, with a strong aroma, endurable fragrance, and mellow, thick, and brisk taste, but it lacks the “Yin Rhyme” flavor. “Yin Rhyme” is the typical aroma and taste characteristics of TGY cultivar. The TGY and BS cultivars have similar phenotypes and quality characteristics, and have a close genetic relationship [5, 6]. These two tea cultivars and their tea products are sometimes confused. To date, there have been no reports revealing the major differences in metabolites between TGY and BS. In addition, the factors contributing to the formation of the “Yin Rhyme” flavor in TGY are not fully understood.
Transcriptome sequencing using next-generation sequencing technologies is a fast and cost-effective approach to generate genome-scale sequence resources [7, 8]. Consequently, this method is being used to analyze an increasing number of plant species and cultivars. Illumina sequencing technology, sequencing by synthesis, achieves high sequencing coverage at a lower cost, and has been used extensively for de novo transcriptome studies [9, 10]. High-throughput sequencing analyses have revealed many genes related to the flavor and quality of tea [11,12,13]. An RNA-seq study to identify differentially expressed genes (DEGs) among several tea cultivars revealed that many DEGs were involved in secondary metabolic pathways . Tai et al. reported that DEGs between tea and oil-tea plants were related to major secondary metabolites (catechins, caffeine, and theanine), especially key genes at branch points in these three metabolic pathways, which might explain the differences in the contents of these three metabolites between tea and oil-tea. To date, however, little is known about the differences in secondary metabolites and DEGs between TGY and BS, and their related molecular mechanisms.
To reveal the varietal characters and quality characteristics of TGY (Wei), TGY (Wang), and BS, we used the two original tea plants of TGY and the original tea plant of BS as materials, and used custom RNA-seq to investigate DEGs between TGY and BS. We also analyzed a set of physiological indicators (anthocyanin, total carotenoids, catechins, caffeine, theanine, and limonene contents). These analyses provide important insights into the molecular mechanisms underlying secondary metabolite biosynthesis in TGY (Wei), TGY (Wang), and BS, and describe the phytochemical characteristics of the main metabolites. We also analyzed the relationship between differentially accumulated metabolites and their related DEGs between TGY and BS, and explored the formation of the “Yin Rhyme” flavor in TGY.
Plant materials and sample preparation
The original tea plants of TGY (Wei) (C. sinensis cv. ‘Tieguanyin’), TGY (Wang) (C. sinensis cv. ‘Tieguanyin’), and BS (C. sinensis cv. ‘Benshan’) used in this study were grown in Xiping town, Anxi county, Fujian province, China (Additional file 1: Figure S1). The first new sprouting fresh shoots and two leaves were picked from each plant and mixed separately. The tissues were frozen in liquid nitrogen immediately and kept in polyethylene bags at − 80 °C until further experiments.
Investigation of main biological characteristics
The methods used to investigate the main biological characteristics of the three tea samples followed the national agriculture industry standard and are described in the following technical papers: “Technical code for evaluating crop germplasm tea plant (Camellia sinensis)” (NY/T 1312–2007) and “Guidelines for the conduct of tests for distinctness, uniformity and stability-tea (Camellia sinensis)” (NY/T 2422–2013).
Determination of chlorophylls, carotenoids, and anthocyanin contents
Chlorophyll (Chl) and total carotenoids (Car) were extracted from tea leaves using acetone solution. Equal weights (0.1 g) of fresh leaves collected from three tea samples were immersed in 45% acetone (acetone: ethanol: distilled water = 4.5: 4.5: 1, 10 mL) solution for 12 h in the dark. After centrifugation at 10,000 g for 10 min, the absorbance of the extracts was read at 440 nm, 645 nm, and 663 nm using a spectrophotometer. Each sample was analyzed in triplicate. The Chl and Car contents were calculated according to the following formulae described by Deng :
Where V is the volume of mixed extraction solution (mL) and m is the sample mass (g).
Anthocyanins were also extracted from the three samples according to the method described by Zhou  with minor modifications. Briefly, 0.1 g fresh leaf tissue was ground into a powder in liquid nitrogen, and then extracted with 1 mL 0.1 mol/L hydrochloric acid in a water bath at 32 °C for 4 h. After centrifugation at 6000 g for 10 min, the supernatant was filtered through a 0.22 μm organic membrane, and absorbance was measured at 530 nm, 620 nm, and 650 nm against a blank of 0.1 mol/L hydrochloric acid solution. Each sample was analyzed in triplicate. The anthocyanin content was calculated according to the following formulae:
Where ODλ is the absorbance value of the anthocyanin extract after correction at 530 nm; V is the volume of mixed extraction solution (mL); m is the sample mass (g); and M is molar mass of anthocyanin (g/mol).
Determination of total catechins, caffeine, theanine, and volatiles contents
Catechins, caffeine, and theanine were extracted and determined from three samples according to the method described by Tai et al. .
Catechins and caffeine were analyzed using a Waters 2695 high performance liquid chromatography (HPLC) system equipped with a 2489 ultraviolet (UV)-visible detector, and eluted catechins and caffeine were detected at 278 nm. The column temperature was 25 °C. Authentic standards of catechins and caffeine were purchased from Solarbio (Beijing, China). Each sample was analyzed in triplicate.
Theanine was analyzed using the same HPLC system equipped with a 2489 UV-visible detector and a 2475 fluorescence detector. The detection wavelength was set to 199 nm . The column temperature was 25 °C. Authentic standards of theanine were purchased from Solarbio (Beijing, China). Each sample was analyzed in triplicate.
Volatiles were extracted from tea leaves and analyzed as described by Hu et al. . A Clarus SQ 8 gas chromatograph-mass spectrometer (GC-MS; PerkinElmer, New York, NY, USA) fitted with an Elite-5MS capillary column (30 m × 0.25 mm × 0.25 μm) and a Turbmatix Headspace System (PerkinElmer) was used for analyses of volatile compounds. Data were analyzed using TurboMass 6.1 software (PerkinElmer). Each sample was analyzed in triplicate. Separated compounds were identified according to their retention indices and data in the NIST Mass Spectral Library and the Wiley Mass Spectral Library of Drugs, Pollutants, Pesticides, and Metabolites, using AMDIS Deconvolution software. The concentrations of volatiles in the tea cultivars were calculated as described by Hu et al. . All data are expressed as mean ± standard deviation (SD).
Total RNA extraction, library construction, and high-throughput sequencing
Total RNAs were separately extracted from TGY (Wei), TGY (Wang), and BS using a Trizol Reagent kit (Invitrogen/Life Technologies, Carlsbad, CA, USA) according to the manufacturer’s instructions. To identify the mRNAs in tea plants, a pooled RNA library was generated from the original tea plants (TGY (Wei), TGY (Wang), and BS). Extracted RNAs were quantified using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and checked for integrity on an Agilent 2100 bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) by denaturing agarose gel electrophoresis with ethidium bromide staining. The three libraries were constructed and sequenced by Biomarker Technologies (Beijing, China) using the Illumina HiSeq-2500 platform. Other details are as described by Lai et al. .
De novo transcriptome assembly, functional annotation, and classification
To obtain clean reads from the three samples, the raw reads were first pre-screened to remove adapter sequences, reads containing more than 5% unknown nucleotides (N), and low-quality reads (those with > 20% of bases with a quality value of < 15) using SOAPnuke software (https://github.com/BGI-flexlab/SOAPnuke; parameters: -l 15; −q 0.2; −n 0.05). The trimmed and size-selected reads were then de novo assembled using the Trinity package (https://github.com/trinityrnaseq/trinityrnaseq; parameters: -SS_lib_type FR; −min contig length 200) . According to the method described by Kim , hierarchical indexing for spliced alignment of transcripts 2 (Hisat2; https://github.com/infphilo/hisat2; parameters: -phred64; −sensitive; −no-discordant; −no-mixed; −I 1; −X 1000) was used to calculate the coverage of the three groups of tea transcriptome data relative to the two published versions of the tea genome [2, 22].
For gene annotation and protein prediction, all unigenes were searched against the Non-Redundant protein (NR), Swiss-prot, Gene Ontology (GO), Clusters of Orthologous Groups of proteins (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases by BLAST (E-value < 1.0E− 6), The results of KEGG orthology analyses in the KEGG database of unigenes were obtained using KOBAS 2.0 . Getorf  was used to find the coding DNA sequence (CDS) of each unigene longer than 200 bp. After predicting the amino acid sequence of each unigene, we used HMMER software  to compare it with the pfam database to obtain annotation information. We used HMMER software to identify transcription factor (TF) domains from the plant TF database (PlantTFDB), and TF family characteristics were identified according to the regulations described in the database (http://planttfdb.cbi.pku.edu.cn/) .
Bowtie software (https://github.com/BenLangmead/bowtie2; parameters: -q; −phred64; −sensitive; −dpad 0; −gbar 99,999,999; −mp 1,1; −np 1; −score-min L,0,-0.1; −I 1; −X 1000; −no-mixed; −no-discordant; −p 1; −k 200)  was used to align reads from each sample to the unigene library. Gene transcript levels were estimated using RSEM software (https://github.com/deweylab/RSEM; parameters: -default) . The DEGs among the three samples were identified using EBSeq software (https://github.com/lengning/EBSeq) . And P-values were adjusted using the FDR value . The thresholds for judging the significance of differential gene expression were FDR ≤ 0.01 and |log2 (fold change)| ≥ 1. The DEGs were subjected to GO and KEGG enrichment analyses. For a given unigene, fragments per kilobase of exon per million fragments mapped (FPKM) values in the three tea plants were generated from each of the three transcriptomes. Major metabolic pathways in these three tea samples were identified by enrichment factor analyses. We classified DEGs according to KEGG classifications, and conducted enrichment analyses using clusterProfiler (https://github.com/GuangchuangYu/clusterProfiler; parameters: -gene; −P-value ≤0.01; −readable TRUE). Then, we calculated the FDR for each P-value. In general, the terms with an FDR < 0.01 were defined as significantly enriched. The enrichment results were visualized using ggplot2 (https://ggplot2.tidyverse.org/reference/geom_point.html).
Identification of production of anthocyanin pigment 1 (PAP1) TF and phylogenetic tree construction
The amino acid sequence of the Arabidopsis thaliana PAP1 protein was downloaded from PlantTFDB , and used for BLAST searches of our transcriptome data. Local BLAST searches were conducted using BioEdit  with E-value < 1.0E− 6. We downloaded the Hidden Markov Model (HMM) file of the PAP1 domain (Pfam accession number: PF00249) from the Pfam database and used HMMER 3.0 software  with the default settings to perform an alignment search of PAP1 TFs. The results of these analyses were verified by local BLAST searches. We also confirmed the identity of CsPAP1 TF using tools at the InterPro database (http://www.ebi.ac.uk/interpro/).
To understand the structure of CsPAP1, the protein sequences of PAP1 from different plants were aligned by ClustalW with the default parameters. A phylogenetic tree was constructed using MEGA 7.0 with the neighbor-joining method with 1000 bootstrap replications . Evolutionary distances were computed using the Poisson correction method. The function of the CsPAP1 TF was predicted using tools at the PlantTFDB.
Relative expression analyses of selected genes
Total RNAs were reverse-transcribed using the PrimeScript RT reagent Kit (Takara, Otsu, Japan). To determine the relative transcript levels of the DEGs, the cDNA samples from the above-mentioned different tissues and samples under different treatments were diluted as the template, and then qRT-PCR analyses were conducted using gene-specific primers. The two-step qRT-PCR had a 20-μL reaction system consisting of 10 μL SYBR Premix Ex Taq II (TakaRa), 0.8 μL gene-specific primers, 1.0 μL cDNA, and ddH2O up to 20 μL. A Light Cycler 480 instrument (Roche, Switzerland) was used to detect gene transcripts. The qRT-PCR reactions included initial denaturation at 94 °C for 3 min followed by 40 cycles of denaturation at 94 °C for 10 s; annealing for 30 s, and extension at 72 °C for 15 s. Different annealing temperatures were used for different genes. Standard curves were obtained using a five-times dilution gradient of the mixed samples, and the reliability of the standard curve was determined. The GAPDH and β-actin genes were used as internal controls. Relative transcript levels were calculated using the 2-ΔCt formula . The primer sequences were designed using Primer 3 software (Additional file 2: Table S1). All qRT-PCR analyses were performed with three biological and technical replications.
Each experiment represents three independent biological replicates, and all data are expressed as mean ± standard deviation (SD). Differences among groups were tested using one-way ANOVA and Duncan’s test, and significant differences among various groups are represented by different letters. Lowercase letters indicate significant difference (P < 0.05); uppercase letters indicate highly significant difference (P < 0.01). The data were analyzed using SPSS 20 software.
Biological characteristics of tea cultivars
Both TGY (Wei) and TGY (Wang) are shrub-type, medium leaf, late-sprouting tea plants, but they showed some differences in their biological characteristics. The colors of the buds, first leaves, and second leaves of both TGY plants were green with purple (Fig. 1). The traits of leaf length, leaf width, and hundred-bud weight (one bud and three leaves) differed between TGY (Wei) and TGY (Wang). The leaf length and width were 7.96 cm and 3.09 cm, respectively, in TGY (Wei), compared with 7.77 cm and 2.92 cm, respectively, in TGY (Wang) (Additional file 3: Table S2). Therefore, the leaves of TGY (Wei) were longer and wider than those of TGY (Wei). The hundred-bud weight of TGY (Wang) was 165.0 g, which was 27.2 g heavier than that of TGY (Wei) (137.8 g).
The BS cultivar is a shrub-type, medium-leaf, medium-sprouting tea cultivar, which is very similar genetically to TGY [5, 6]. Compared with TGY, BS had earlier germinating and picking periods. The germinating and picking periods of TGY (Wei) and TGY (Wang) were similar, and differed by only 2 days. The bud color of BS was light green, unlike the purple bud of TGY (Wei) and TGY (Wang). The leaf color of BS was light green with lighter purple coloring than that of TGY (Wei) and TGY (Wang) leaves. The leaves of BS were shorter in length but wider than those of TGY (Wei) and TGY (Wang). Thus, the leaf shape of BS was different from the long oval shape of TGY leaves. The hundred-bud weight of BS was 98.2 g, lower than those of TGY (Wang) and TGY (Wei).
Analysis of chlorophylls, carotenoids, and anthocyanin contents in ‘Tieguanyin’ (Wei), ‘Tieguanyin’ (Wang), and ‘Benshan’
To explore the reasons for the differences in leaf color among TGY (Wei), TGY (Wang), and BS, we determined the contents of major pigments, including chlorophylls, carotenoids, and anthocyanin. The anthocyanin contents in TGY (Wei), TGY (Wang), and BS were 2.53 mg/g, 2.63 mg/g, and 1.91 mg/g, respectively (Fig. 2). The anthocyanin contents were not significantly different between TGY (Wang) and TGY (Wei), but were significantly higher in TGY than in BS.
The total chlorophylls, chlorophyll a, and chlorophyll b contents were also higher in TGY (Wei) and TGY (Wang) than in BS. The carotenoids contents were significantly higher in TGY (Wei) (0.46 mg/g) and TGY (Wang) (0.47 mg/g) than in BS (0.30 mg/g), but were not significantly different between TGY (Wei) and TGY (Wang). These results confirmed that the contents of anthocyanin, carotenoids, and chlorophylls were higher in TGY than in BS.
Total catechins, caffeine, theanine, and volatiles contents in ‘Tieguanyin’ (Wei), ‘Tieguanyin’ (Wang), and ‘Benshan’
Catechins, caffeine, and theanine play significant roles in the taste and flavor of tea, and the volatiles content is an important factor for determining tea aroma and quality [34, 35]. To explore the differences in these compounds among the three tea plants, the contents of catechins, caffeine, and theanine were determined by HPLC and the volatiles were analyzed by GC-MS.
The total catechins contents were 115.21, 118.35, and 87.12 mg/g in TGY (Wei), TGY (Wang), and BS, respectively (Fig. 3). Among all the gallocatechin-type catechins, including gallocatechin (GC), epigallocatechin (EGC), gallocatechin gallate (GCG), and epigallocatechin gallate (EGCG), EGCG was the most abundant at concentrations of 49.12, 49.81, and 40.19 mg/g in TGY (Wei), TGY (Wang), and BS, respectively. The main catechin, EGCG, accounted for about 42% of total catechins in the three tea samples. Apart from catechin (C) and catechin gallate (CG), other catechins including epicatechin (EC), EGC, epicatechin gallate (ECG), GCG, and EGCG showed significantly higher concentrations in TGY (Wei) and TGY (Wang) than in BS. Overall, most catechins were present at higher concentrations in TGY (Wei) and TGY (Wang) than in BS.
The caffeine contents were significantly higher in TGY (Wei) (24.16 mg/g) and TGY (Wang) (26.36 mg/g) than in BS (19.56 mg/g), but did not differ significantly between TGY (Wei) and TGY (Wang). The theanine contents in TGY (Wei), TGY (Wang), and BS were 3.73, 3.80, and 3.59 mg/g, respectively. Statistical analyses showed that there was no significant difference in theanine contents between TGY (Wei) and TGY (Wang), between TGY (Wei) and BS, and between TGY (Wang) and BS. Thus, the difference in theanine accumulation among the three tea plants was very small.
The top 10 volatile components in TGY (Wei), TGY (Wang), and BS detected by GC-MS were verbenone, terpinolene, methyl cyclopentene, hexyl hexanoate, limonene, benzene acetaldehyde, alpha-farnesene, ocimene, methyl salicylate, and cyclohexadiene (Table 1). Except for limonene, there were no significant differences in the contents of the other nine volatile aroma compounds between TGY and BS. The limonene contents in TGY (Wei) and TGY (Wang) were 2.55 and 2.47%, respectively, significantly higher than that in BS (1.43%). The high concentrations of catechins, caffeine, and limonene may be critical factors affecting tea flavor and aroma characteristics, especially the unique “Yin Rhyme” flavor of TGY.
Sequencing and de novo assembly of transcriptome data
After removing adapter sequences, low-quality reads, and reads with a high content of unknown base N (N% > 5%) from the raw data, a total of 15.03 Gb clean data were obtained by RNA-seq of the TGY (Wei), TGY (Wang), and BS libraries. We obtained 26,931,717, 21,533,093, and 26,690,143 clean reads from TGY (Wei), TGY (Wang), and BS (Table 2). For the TGY (Wei), TGY (Wang), and BS datasets, the Q30 percentages (sequencing correct rate ≥ 99.9%) were 88.30, 88.37, and 88.68%, respectively; and the GC percentages were 45.09, 45.21, and 45.13%, respectively. The coverages of TGY (Wei), TGY (Wang), and BS transcriptome data relative to the whole tea genome (C. sinensis var. assamica)  were 86.42, 86.42, and 85.99%, respectively; and relative to another tea genome (C. sinensis var. sinensis)  were 84.28, 84.35, and 82.97%, respectively. According to coverage criteria , more than 70% coverage is acceptable.
All high-quality reads were assembled using the Trinity program. After assembly, 241,903 transcripts and 118,637 unigenes were obtained (Table 3). The N50 of transcripts and unigenes was 1640 and 1044, respectively. Of the assembled unigenes, the majority (64.28%) had lengths distributed between 250 and 500 bp. The length of 21,093 genes was > 1000 bp. We also obtained read count data for the transcriptome dataset (Additional file 4: Table S3). These values indicated that the sequencing data had sufficient quality for accurate assembly and adequate transcriptome coverage.
Functional annotation and classification of unigenes in the transcriptome
For gene function annotations and protein prediction, all unigenes were aligned to the non-redundant (NR), Swiss-Prot, Cluster of Orthologous Groups (COG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases using the BLAST program. Functional annotation analyses revealed that 16,330 (28.54%), 42,562 (74.39%), 2433 (21.73%), 36,422 (63.65%), and 56,939 (99.51%) unigenes had significant hits in the COG, GO, KEGG, Swiss-Prot, and NR databases, respectively.
In the COG function classification, unigenes were categorized into 25 clusters, The top five functions were “general function prediction only”, “replication, recombination and repair”, “transcription”, “signal transduction mechanisms”, and “posttranslational modification, protein turnover, chaperones” (Additional file 5: Figure S2). Based on GO classification analysis, 42,562 genes were annotated to three major categories: cellular component, molecular function, and biological process. The top three subgroups in the cellular component category were cell part, cell, and organelle. In the molecular function category, most unigenes were annotated into two subgroups: catalytic activity, and binding and transporter activity. In the biological process category, most unigenes were classified into three subgroups: metabolic process, cellular process, and single-organism process (Additional file 6: Figure S3).
In the functional annotation analysis, 50,077 CDS were detected by getorf prediction. Also, 1462 unigenes in 56 TF families were predicted. The five TF families with the largest number of genes were the basic helix-loop-helix (bHLH), ethylene-responsive factor (ERF), GRAS, C2H2, and MYB-related TF families, with 113, 104, 90, 88, and 84 genes, respectively (Additional file 7: Figure S4).
Analysis of DEGs in TGY (Wei) vs. TGY (Wang), TGY (Wei) vs. BS, and TGY (Wang) vs. BS based on KEGG database
To identify DEGs among the three tea samples, we compared and analyzed the three sets of tea transcriptome data. In total, 12,420 DEGs were found. As mentioned above, most DEGs showed similar transcript levels in TGY (Wei) and TGY (Wang), but different transcript levels in BS compared with TGY (Wei) and TGY (Wang) (Additional file 8: Figure S5). To clarify the roles of these DEGs, enriched metabolic pathways were identified in TGY (Wei) vs. TGY (Wang), TGY (Wei) vs. BS, and TGY (Wang) vs. BS.
In the TGY (Wei) vs. TGY (Wang) comparison, the DEGs could be divided into four major categories: metabolism, genetic information processing, environmental information processing, and organismal systems (Fig. 4). The metabolism category had the highest number of DEGs. Most of the enriched pathways were involved in metabolism and biosynthesis. The top 10 enriched pathways were stilbenoid, diarylheptanoid, and gingerol biosynthesis; limonene and pinene degradation; flavone and flavonol biosynthesis; flavonoid biosynthesis; phenylalanine metabolism; tropane, piperidine, and pyridine alkaloid biosynthesis; phenylpropanoid biosynthesis; zeatin biosynthesis; ascorbate and aldarate metabolism; and lysine degradation.
The pathways with the higher enrichment rankings were the flavonoid biosynthesis pathway; the tropane, piperidine and pyridine alkaloid biosynthesis pathway, and the limonene and pinene degradation pathway (Fig. 5). Considering the effects of flavonoids and caffeine on tea taste and the effects of terpenoids on tea aroma, and the results of the KEGG analysis of DEGs in TGY (Wei) vs. TGY (Wang), we focused on the flavonoid, caffeine, and limonene metabolic pathways for further analyses.
In the TGY (Wei) vs. BS and TGY (Wang) vs. BS comparisons, six out of the top 10 enriched pathways were consistent (Figs. 6 and 7). These six pathways were flavone and flavonol biosynthesis; thiamine metabolism; citrate cycle (TCA cycle); butanoate metabolism; valine, leucine, and isoleucine biosynthesis; and protein export. The flavone and flavonol biosynthesis pathway had the higher ranking in both comparisons. Based on these results, combined with the differences in leaf color and sensory quality between TGY and BS tea, we selected pigment-related and tea quality-related flavonoid metabolic pathways for further analyses.
Analysis of DEGs involved in flavonoid metabolic pathways in ‘Tieguanyin’ (Wei), ‘Tieguanyin’ (Wang), and ‘Benshan’
To further understand the expression of DEGs involved in flavonoid metabolic pathways in TGY (Wei), TGY (Wang), and BS, we focused on the 30 differentially expressed unigenes annotated to this pathway. More than two unigenes corresponded to PAL (encoding phenylalanine ammonia-lyase), C4H (encoding 4-coumarate CoA ligase), F3H (encoding flavanone 3-hydroxylase), FLS (encoding flavonol synthase), DFR (encoding dihydroflavonol 4-reductase), and F3’H (encoding flavonoid 3′-hydroxylase) (Table 4). Only one unigene corresponded to CHS (encoding chalcone synthase), ANS (encoding anthocyanidin synthase), ANR (encoding anthocyanidin reductase), and CCR (encoding cinnamoyl-CoA reductase). On the basis of these identified DEGs, a proposed flavonoid metabolic pathway was constructed (Fig. 8a).
Comparisons of the transcript levels (FPKM values) of these DEGs revealed no significant differences in their transcript levels between TGY (Wei) and TGY (Wang) (Fig. 8b). However, in the TGY (Wei) vs. BS and TGY (Wang) vs. BS comparisons, more than half of these 30 unigenes showed higher transcript levels in TGY (Wei) and TGY (Wang) than in BS. The genes PAL, C4H, CHS, F3’H, F3H, DFR, ANS, and ANR had higher transcript levels in TGY (Wei) and TGY (Wang) than in BS. Seven of the unigenes corresponding to 4CL, FLS, and CCR had lower transcript levels in TGY (Wei) and TGY (Wang) than in BS. These results implied that flavonoid metabolism had a stronger metabolic flux in TGY (Wei) and TGY (Wang) than in BS. The transcript level of FLS may be positively correlated with flavonol biosynthesis, but negatively correlated with anthocyanin and catechins accumulation. Hence, the high transcript level of FLS in BS may affect the accumulation of flavanols, anthocyanin, and catechins.
To confirm the gene expression patterns detected in the transcriptome dataset, we conducted qRT-PCR analyses to determine the transcript levels of PAL (c83716.graph_c0), C4H (c111306.graph_c0), 4CL (c116268.graph_c0), CHS (c116615.graph_c1), F3H (c107454.graph_c0), FLS (c114631.graph_c0), DFR (c122980.graph_c0), F3’H (c50625.graph_c0), ANS (c97715.graph_c0), ANR (c95232.graph_c0), and CCR (c86572.graph_c0) in TGY (Wei), TGY (Wang), and BS (Fig. 9). The expression trends of these unigenes detected in the qRT-PCR analyses were consistent with those detected in the RNA-seq dataset.
Analysis of DEGs involved in caffeine metabolic pathway in ‘Tieguanyin’ (Wei), ‘Tieguanyin’ (Wang), and ‘Benshan’
The KEGG analysis indicated that the caffeine metabolic pathway was one of the representative pathways in these tea plants. A total of 31 differentially expressed unigenes were assigned to genes in the caffeine metabolic pathway. Among them, only one unigene corresponded to AMPD (encoding AMP deaminase), while more than one unigene corresponded to IMPDH (encoding IMP dehydrogenase), SAMS (encoding S-adenosylmethionine synthase), 5′-Nase (encoding 5′-nucleotidase), and XDH (encoding xanthine dehydrogenase). The IMPDH gene had the largest number of corresponding unigenes (Table 5). On the basis of these identified DEGs, a proposed caffeine metabolic pathway was constructed (Fig. 10a).
The expression trends of the 31 differentially expressed unigenes involved in caffeine metabolism were compared based on transcriptome FPKM values (Fig. 10b). In the TGY (Wei) vs. TGY (Wang) comparison, except for two unigenes corresponding to IMPDH and SAMS, the key caffeine metabolic genes showed no significant difference in their expression levels between the two TGY samples.
In the TGY (Wei) vs. BS and TGY (Wang) vs. BS comparisons, most genes involved in the caffeine metabolic pathway had higher transcript levels in TGY than in BS. Four genes, AMPD (1 unigene), IMPDH (17 unigenes), SAMS (4 unigenes), 5′-Nase (2 unigenes) had higher transcript levels in TGY (Wei) and TGY (Wang) than in BS. Increased transcript levels of AMPD, IMPDH, SAMS and 5′-Nase genes may be positively correlated with the amount of precursors for caffeine metabolism. However, the up-regulation of XDH in BS could result in competition for precursors for caffeine biosynthesis and could also affect caffeine accumulation.
To confirm the gene expression trends detected in the transcriptome dataset, we conducted qRT-PCR analyses to detect the transcript levels of AMPD (c106010.graph_c0), IMPDH (c108398.graph_c0, c94772.graph_c0, c98658.graph_c0 and c107967.graph_c1), SAMS (c107073.graph_c0 and c71387.graph_c0), 5′-Nase (c108997.graph_c0), and XDH (c66037.graph_c0) in TGY (Wei), TGY (Wang), and BS (Fig. 11). The trends in gene expression were similar between the qRT-PCR and RNA-seq analyses. Together, these results confirmed that there were transcriptional-level differences in the caffeine metabolic pathway and caffeine biosynthesis between TGY and BS.
Analyses of DEGs involved in limonene degradation pathway in ‘Tieguanyin’ (Wei), ‘Tieguanyin’ (Wang), and ‘Benshan’
Next, we identified unigenes corresponding to genes in the limonene degradation pathway in the transcriptome data. Three unigenes encoding two enzymes were identified by mapping to the KEGG pathway. One unigene corresponded to HIBADH, encoding hydroxyisobutyrate dehydrogenase, and more than one unigene corresponded to ALDH, encoding aldehyde dehydrogenase (Additional file 9: Table S4). A proposed limonene degradation pathway was constructed on the basis of these identified DEGs (Fig. 12a).
The expression trends of these three unigenes were compared among TGY (Wei), TGY (Wang), and BS based on FPKM values (Fig. 12b). There was no significant difference in the expression levels of ALDH and HIBADH between TGY (Wei) and TGY (Wang), but both of these genes were significantly up-regulated in BS vs. TGY (Wei) and BS vs. TGY (Wang). The expression levels of ALDH and HIBADH are known to be negatively correlated with limonene content. Thus, high transcript levels of these two genes may lead to greater decomposition of limonene in BS than in TGY (Wei) and TGY (Wang). In other words, more limonene may be retained in TGY. A previous report indicated that limonene plays a significant role in tea flavor , so it may be a key factor in the “Yin Rhyme” flavor of TGY.
To confirm the gene expression trends detected in the transcriptome dataset, we conducted qRT-PCR analyses to determine the transcript levels of ALDH (c104957.graph_c0), ALDH (c93917.graph_c0), and HIBADH (c119421.graph_c0) in TGY (Wei), TGY (Wang), and BS (Fig. 12b). The expression patterns of the detected unigenes were consistent with the RNA-seq results. The gene expression patterns were also consistent with the differences in limonene contents among the tea plants. Thus, limonene accumulation may be a factor contributing to the unique “Yin Rhyme” flavor of TGY.
Analyses of CsPAP1 TF involved in flavonoid and anthocyanin biosynthesis in ‘Tieguanyin’ (Wei), ‘Tieguanyin’ (Wang), and ‘Benshan’
The PAP1 TF has been shown to play roles in controlling flavonoid and anthocyanin biosynthesis in Arabidopsis thaliana  and rose . However, the relationship between CsPAP1 expression and anthocyanin accumulation in tea is unclear. In our transcriptome data, among all the TF genes, PAP1 had the largest difference in FPKM values in the TGY (Wei) vs. BS and TGY (Wang) vs. BS comparisons. Therefore, we analyzed CsPAP1 in more detail. Using the HMM file of the PAP1 domain, we searched the tea transcriptome data using HMMER 3.0 software, and obtained candidate CsPAP1 TFs. The protein sequences of the candidate CsPAP1 TFs were further confirmed using tools at the InterPro database. We finally identified five unigenes corresponding to CsPAP1 (c65777.graph_c0, c112219.graph_c0, c53572.graph_c0, c121522.graph_c0, and c124394.graph_c0) in TGY (Wei), TGY (Wang), and BS.
In the phylogenetic tree (Fig. 13), the five CsPAP1s were distributed in three clusters: one cluster contained three CsPAP1 sequences (c53572.graph_c0, c112219.graph_c0, and c121522.graph_c0), while the two other clusters contained c65777.graph_c0 and c124394.graph_c0. The c65777.graph_c0 was in a cluster genetically distant from the other four CsPAP1. The c124394.graph_c0 clustered with PAP1 from Brassica oleracea, but was separate from three tea CsPAP1s (c53572.graph_c0, c112219.graph_c0, and c121522.graph_c0). Among those three tea CsPAP1s, c112219.graph_c0 and c121522.graph_c0 had closer genetic distances with each other than with c53572.graph_c0.
The expression patterns of the five CsPAP1 genes were analyzed by qRT-PCR in TGY (Wei), TGY (Wang), and BS (Fig. 14). Four of the CsPAP1 genes (c112219.graph_c0, c53572.graph_c0, c121522.graph_c0, and c124394.graph_c0) had higher transcript levels in TGY (Wei) and TGY (Wang) than in BS. The other one, c65777.graph_c0, had slightly higher transcript levels in TGY (Wei) and TGY (Wang) than in BS.
A functional prediction analysis of the CsPAP1s showed that c112219.graph_c0, c53572.graph_c0, and c121522.graph_c0 may promote the synthesis of phenylpropanoid-derived compounds such as anthocyanin and catechins, probably together with ENHANCER OF GLABRA3 (EGL3) and bHLH. There may be a positive correlation between the expression of PAP1 and the accumulation of anthocyanin and catechins. Therefore, differences in the transcript levels of CsPAP1 genes may explain differences in catechins and anthocyanin contents between TGY and BS.
Differences in anthocyanin and carotenoids contents may be responsible for deeper purple leaf color of ‘Tieguanyin’ than ‘Benshan’
Chlorophylls, anthocyanin, and carotenoids are the main plant pigments that affect leaf color formation in higher plants. Differences in leaf color among different cultivars are due to the relative contents of three main pigment groups (chlorophylls, carotenoids, and anthocyanin) [39,40,41,42]. Previous analyses of biological characteristics have confirmed that the leaf color of BS is green, while those of the two TGY plants are purple. However, the main pigment components and their contents in these tea plants had not been determined. Therefore, we determined the chlorophylls, carotenoids, and anthocyanin contents in TGY (Wang), TGY (Wei), and BS. The anthocyanin contents were higher in TGY (Wei) and TGY (Wang) than in BS. Similarly, the chlorophylls and carotenoids contents were higher in TGY than in BS. High concentrations of anthocyanin are responsible for the red, blue, and purple colors of plant leaves . Carotenoids, which are synthesized by a variety of enzymes in plastids, are responsible for yellow and red colors in leaves . The original TGY tea plants had high chlorophylls contents but still had a purple leaf color. Therefore, we inferred that anthocyanin and carotenoids at sufficiently high concentrations can mask the green color of chlorophylls. The results indicated that the leaf color of TGY mainly depends on high levels of anthocyanin and carotenoids. These observations were consistent with those of a previous study on Quercus coccifera . Together, these results indicated that high contents of anthocyanin and carotenoids, but not chlorophylls, play critical roles in the purple leaf color of TGY (Wei) and TGY (Wang).
Previous studies found that the leaves of the tea cultivar ‘Ziyan’  and eucalyptus  with high concentrations of anthocyanin also contained high levels of phenolic compounds. The results of this study suggested that TGY (Wei) and TGY (Wang), which had high anthocyanin contents, may also be rich in other flavonoid substances, especially catechins. This may contribute to the high quality of TGY tea.
Combined with the expression levels of genes related to secondary metabolism, high contents of catechins, anthocyanin, caffeine, and limonene contribute to unique “Yin Rhyme” flavor in ‘Tieguanyin’
In this study, compared with leaves of BS, the leaves of TGY (Wei) and TGY (Wang) had higher levels of catechins, caffeine, and limonene. However, there were no significant differences in the levels of these three metabolites between TGY (Wei) and TGY (Wang). To explore the mechanism underlying the high contents of these metabolites in TGY, we conducted pathway enrichment analyses. The results of KEGG analyses indicated that the flavonoid, caffeine, and limonene metabolic pathways were the representative pathways in these tea cultivars. In total, 64 differentially expressed unigenes were assigned to these three metabolic pathways. The expression patterns of these unigenes were analyzed by qRT-PCR in TGY (Wei), TGY (Wang), and BS. The transcript levels of PAL, C4H, CHS, F3’H, F3H, DFR, ANS, and ANR, all of which are involved in the flavonoid metabolic pathway, were up-regulated in TGY (Wei) and TGY (Wang) compared with BS. Two genes (FLS and CCR) showed higher transcript levels in BS than in TGY (Wei) and TGY (Wang). In the caffeine metabolism pathway, AMPD, IMPDH, SAMS, and 5′-Nase had higher transcript levels in TGY (Wei) and TGY (Wang) than in BS, but XDH had lower transcript levels in TGY (Wei) and TGY (Wang) than in BS. Two genes in the limonene degradation pathway, ALDH and HIBADH, were down-regulated in TGY (Wei) vs. BS and TGY (Wang) vs. BS.
Eight genes in the flavonoid and anthocyanin biosynthesis pathways (PAL, C4H, CHS, F3’H, F3H, DFR, ANS, and ANR) had higher transcript levels in TGY than in BS. PAL is the first limiting enzyme in the flavonoid metabolic pathway and its product is a direct or indirect precursor of many secondary metabolites [46, 47]. Many studies have shown that PAL expression is significantly positively correlated with flavonoid and anthocyanin accumulation, indicating that it plays a critical role in regulating the production of secondary metabolites [48,49,50]. Its high expression directs metabolic flux into flavonoid biosynthesis to define the size of the anthocyanin and catechins pool . There was a positive relationship between PAL expression levels and anthocyanin and catechins contents in the two TGY tea plants, and these compounds were present at significantly higher concentrations in TGY than in BS. The second step of the flavonoid metabolic pathway is catalyzed by C4H. In Arabidopsis, AtC4H and AtPAL1 were found to have similar expression patterns, both in terms of tissue-specific expression and expression under stress treatments. The expression patterns of these two genes were shown to be positively correlated . The promoter of AtC4H contains the same three cis-acting elements, like that of AtPAL . Liu et al. reported that the flavonoid and catechins contents in tea leaves were closely related to C4H expression . Consistent with this, we observed that the transcript levels of C4H and catechins contents were significantly higher in TGY (Wei) and TGY (Wang) than in BS.
In the flavonoid synthesis pathway, F3’H catalyzes the conversion of naringenin and dihydrokaempferol into eriodictyol and dihydroquercetin, which are important intermediates for the biosynthesis of anthocyanin and catechins [55, 56]. A previous study revealed a strong positive correlation between the transcript levels of F3’H and catechins contents in tea plants . Another key enzyme in the flavonoid metabolic pathway is F3H. High transcript levels of F3H were found to promote the accumulation of anthocyanin and flavonoids [57,58,59]. This enzyme co-ordinates with CHS and CHI , while DFR, ANS, and ANR play important roles in the biosynthesis of anthocyanin and catechins [61,62,63]. Vinay et al. found that the overexpression of CsDFR and CsANR in tobacco led to improved flavonoid contents, especially total catechins, EC, and EGC. Other studies have reported that DFR and ANS are key genes for increasing the catechins and anthocyanin contents in C. sinensis [64, 65]. We found that high transcript levels of DFR, ANS, and ANR were positively correlated with anthocyanin and catechins contents in TGY (Wei) and TGY (Wang). Together, our results showed that up-regulation of these eight genes in the flavonoid metabolic pathway was positively correlated with the accumulation of catechins and anthocyanin. The high transcript levels of these genes may be the key factors in the formation of the “Yin Rhyme” flavor of TGY.
A key enzyme in lignin synthesis is CCR, and its abundance directly affects the metabolism of lignin and many phenolic substances in plants . The lignin biosynthesis pathway is located downstream of, and is a branching point in, the phenylpropanoid metabolic pathway. It has parallel status with other branching pathways of phenylpropanoid metabolism such as the flavonoid metabolic pathway. A previous study showed that a high expression level of CCR is positively correlated with lignin production, while down-regulation of CCR may lead to reduced lignin content . Therefore, the higher transcript level of CCR in BS than in TGY may shift the metabolic flux to lignin synthesis, away from flavonoid synthesis. This would reduce the amount of precursors for catechins synthesis in BS, which may explain the significantly lower catechins content in BS than in TGY.
Downstream of flavonoid synthesis, FLS is a branching enzyme whose activity is related to the synthesis of flavonols and flavan-3-ols [68,69,70,71]. A recent report on the draft genome of C. sinensis var. sinensis noted that high expression levels of FLS were positively correlated with the accumulation of monomeric galloylated catechins . In this study, upregulation of FLS was negatively correlated with anthocyanin, epicatechin, and epigallocatechin contents; thus, FLS limited the size of these metabolite pools. These results indicated that high transcript levels of PAL, C4H, CHS, F3’H, F3H, DFR, ANS, and ANR and low transcript levels of CCR and FLS in TGY (Wei) and TGY (Wang) play critical roles in flavonoid metabolism. Metabolite analyses revealed that the contents of catechins (EC, EGC, ECG, GCG, and EGCG), total catechins, and anthocyanin were significantly higher in TGY (Wei) and TGY (Wang) than in BS. The high contents of catechins and anthocyanin may be correlated with the expression levels of genes related to flavonoid metabolism. This relationship may be an important factor in the formation of the unique “Yin Rhyme” flavor in ‘Tieguanyin’.
Caffeine is an indole alkaloid present in tea and coffee. It is one of the three main components that affect the sensory quality of tea [72, 73]. In this study, the contents of caffeine were significantly higher in TGY than in BS. This result may be related to the expression of DEGs involved in the caffeine metabolic pathway.
Compared with BS, TGY showed up-regulated expression of most of the DEGs in the caffeine metabolic pathway. This result indicated that there was stronger metabolic flux into the caffeine metabolic pathway in TGY than in BS. In total, 23 unigenes in the tea transcriptome corresponded to AMPD, IMPDH, SAMS, and 5′-Nase, which encode four vital enzymes in the caffeine metabolic pathway. These four genes had higher transcript levels in TGY (Wei) and TGY (Wang) than in BS. In tea plants, AMP deaminase is a key enzyme involved in caffeine biosynthesis, and it mainly catalyzes the deamination of adenosine to produce hypoxanthine nucleotides. Addition of an AMP deaminase inhibitor to tea flower buds was shown to reduce AMPD expression and caffeine content , indicating that the expression level of AMPD is positively correlated caffeine content. Another crucial enzyme is SAMS, which produces xanthine nucleosides and functions as the only methyl donor for methylation in caffeine metabolism. The reaction converting inosine monophosphate (IMP) into xanthosine monophosphate (XMP) is catalyzed by IMPDH, which promotes the synthesis of another caffeine precursor [75, 76]. Previous studies have shown that SAMS expression levels regulate the rate of SAM synthesis to provide sufficient precursors for the downstream synthesis of caffeine [76, 77]. Caffeine accumulation has also been found to be affected by 5′-Nase expression levels . Therefore, the up-regulated expression of these four genes may be positively correlated with caffeine content, which would explain the greater accumulation of caffeine in TGY (Wei) and TGY (Wang) than in BS.
The reason for the low caffeine content in BS may be the low transcript levels of AMPD, IMPDH, SAMS, 5′-Nase, consistent with the results of previous studies [79, 80]. In addition, XDH, which encodes an intermediate enzyme in caffeine metabolism that catalyzes the formation of urate from xanthine and hypoxanthine [81, 82], was expressed at higher levels in BS than in TGY. The high expression of XDH may have steered the metabolic flow to 7-methyluric acid synthesis. We found that the transcript level of XDH was negatively correlated with the accumulation of precursors for caffeine synthesis. Previous studies on Arabidopsis revealed that caffeine accumulated to high concentrations after inhibition of XDH expression [83, 84], suggesting that caffeine content is negatively correlated with XDH expression. Based on these results, the high caffeine content in TGY may have been related to low XDH expression. Two previous studies on the tea genome [2, 22] suggested that some genes in the caffeine metabolic pathway have undergone expansion events, and that high expression of caffeine metabolism-related genes is positively correlated with caffeine accumulation. Those studies also suggested that high expression levels of flavonoid and caffeine metabolism-related genes, but not theanine metabolism-related genes, are the crucial factors for improving tea taste and aroma quality. Those results are consistent with our findings that the expression levels of four caffeine metabolism-related genes (AMPD, IMPDH, SAMS, and 5′-Nase genes) were expressed at higher levels in TGY (Wei) and TGY (Wang) than in BS, but their expression levels did not differ significantly between TGY (Wei) and TGY (Wang). Combined with the caffeine content determination results, these results suggested that the high caffeine content in two original TGY tea plants may be one reason for unique “Yin Rhyme” flavor of TGY (Wei) and TGY (Wang).
Monoterpenoids are a class of terpenoids that are components of volatile aroma [35, 85]. Limonene is a monocyclic terpenoid that is a precursor for many other cyclic terpenoids; thus, it significantly affects terpenoid synthesis [86, 87]. Limonene is also an important component of tea aroma, and is mainly responsible for the floral fragrance of tea [35, 36].
We analyzed volatiles in TGY and BS, and found that except for limonene, the nine other main volatile compounds did not differ significantly in concentration between TGY and BS. To explore the differences in limonene between TGY and BS, we further analyzed the transcript level of genes related to this compound in the transcriptome data and by qRT-PCR. We found that ALDH and HIBADH were significantly up-regulated in BS vs. TGY (Wei) and BS vs. TGY (Wang). These genes encode enzymes that catalyze the decomposition of limonene into perillic acid and isopropenylpimelyl-CoA [88,89,90]. In BS, there were high transcript levels of these genes and a low limonene content. Lower expression levels of these genes in TGY would mean that less limonene would be degraded, which explains the significantly higher limonene content in TGY than in BS. Xie et al. found that the expression level of ALDH was inversely proportional to limonene content . These results indicated that a low limonene content in BS may lead to a lack of sufficient precursor substances for the synthesis of cyclic terpenoids.
In the present study, compared with BS, TGY had eight up-regulated genes (PAL, C4H, CHS, F3’H, F3H, DFR, ANS, and ANR) and two down-regulated genes (FLS and CCR) in the flavonoid metabolic pathway; four up-regulated genes (AMPD, IMPDH, SAMS, and 5′-Nase) and one down-regulated XDH gene in the caffeine metabolic pathway; and two down-regulated genes (ALDH and HIBADH) in the limonene degradation pathway. The expression patterns of these genes were related to the greater formation of catechins, anthocyanin, caffeine, and limonene in TGY (Wei) and TGY (Wang) than in BS. Combined with the expression levels of genes related to secondary metabolism, high contents of catechins, anthocyanin, caffeine, and limonene may be correlated with the formation of the “Yin Rhyme” flavor in ‘Tieguanyin’ (Fig. 15).
Apart from two unigenes corresponding to IMPDH (c107967.graph_c1) and SAMS (c107073.graph_c0), which are involved in caffeine metabolism, the expression levels of all other genes were not significantly different between TGY (Wei) and TGY (Wang). In addition, the contents of related metabolites, including flavonoids, anthocyanin, catechins, caffeine, and limonene, were not significantly different between the two TGY samples. Therefore, we inferred that the other unigenes corresponding to IMPDH and SAMS that were expressed at similar levels in TGY (Wei) and TGY (Wang) compensated for the differences in the expression of c107967.graph_c1 and c107073.graph_c0 between TGY (Wei) and TGY (Wang). On the basis of their similar secondary metabolite contents and expression levels of secondary metabolism-related genes, we speculated that these two original TGY plants may be sister strains of ‘Tieguanyin’, with similar tea aroma quality and the “Yin Rhyme” flavor.
Up-regulated CsPAP1 in ‘Tieguanyin’ compared with ‘Benshan’ promotes flavonoid and anthocyanin accumulation
A previous study has shown that the transcriptional regulation of structural genes is a major mechanism by which flavonoid and anthocyanin biosynthesis is regulated in plants . In many plants, regulation occurs through a set of TFs including MYB, bHLH, and WD-repeat proteins. Members of the MYB family are often the major determinants of variation in anthocyanin pigmentation . For example, MYB114 has been proven to coordinately regulate anthocyanin biosynthesis via its interaction of bHLH3 and ERF3 in pear , and the R2R3-MYB TF anthocyanin 1 (CsAN1) regulates anthocyanin accumulation in purple-leaf tea . PAP1 is a MYB TF that is involved in regulating flavonoid and anthocyanin biosynthesis . Transgenic rose expressing PAP1 not only showed a more intense purple color, but also had increased concentrations of terpenoid scent compounds in its flowers .
Interestingly, the TGY transcriptome dataset included unigenes corresponding to PAP1 (GenBank accession: AF325123) with E-value < 1.0E− 6. The phylogenetic tree showed that CsPAP1 (c124394.graph_c0) clustered with PAP1 in Brassica oleracea, but deviated from the other three CsPAP1s (c112219.graph_c0, c53572.graph_c0, and c121522.graph_c0). Among these three CsPAP1s, c112219.graph_c0 and c121522.graph_c0 had a closer genetic distance with each other than with c53572.graph_c0. However, the evolutionary distance of CsPAP1 (c124394.graph_c0) was further from the other four tea PAP1 proteins. The qRT-PCR results confirmed that four CsPAP1 genes (c112219.graph_c0, c53572.graph_c0, c121522.graph_c0, and c124394.graph_c0) were significantly up-regulated in TGY (Wei) and TGY (Wang) compared with BS. The products of these genes may contribute to anthocyanin accumulation. The PlantTFDB  BLAST results indicated that three CsPAP1s (encoded by c112219.graph_c0, c53572.graph_c0, and c121522.graph_c0) may coordinately regulate the biosynthesis of phenylpropanoid-derived compounds such as anthocyanin and catechins via an interaction with EGL3 in TGY. Thus, we speculated that anthocyanin biosynthesis in TGY is concurrently regulated by PAP1 TFs, and that the expression level of CsPAP1 is positively correlated with catechins production. This would explain the difference in leaf color and flavor between the two original TGY samples and BS. Further research is required to identify the mechanisms that control the interactions responsible for producing specialized metabolites in these metabolic pathways.
Transcriptome analyses and HPLC and GC-MS analyses of TGY (Wei), TGY (Wang), and BS revealed various pigments (chlorophylls, carotenoids, and anthocyanin), metabolites (catechins, caffeine, and limonene), and related metabolic pathways (flavonoids metabolism, caffeine metabolism, and limonene degradation) responsible for the leaf color and secondary metabolite profiles of TGY and BS. Our results showed that, compared with BS, TGY contains higher levels of anthocyanin, carotenoids, catechins, caffeine, and limonene. High contents of anthocyanin and carotenoids contribute to the deeper purple leaf color of TGY than BS. We also analyzed the expression profiles genes involved in flavonoid metabolism, caffeine metabolism, and limonene degradation pathways that were differentially expressed among the three tea plants. The results suggested that the expression of DEGs involved in the flavonoid, caffeine, and limonene metabolic pathways might be correlated with the accumulation of related metabolites in TGY. Further analyses of TFs revealed that the CsPAP1 TF may participate in regulating the flavonoid and anthocyanin metabolic pathways. High transcript levels of CsPAP1 corresponded to increased accumulation of anthocyanin and catechins in TGY. Therefore, the sensory quality of TGY (Wei) and TGY (Wang) might be better than that of BS because of the accumulation of flavonoids, caffeine, and limonene metabolites. These metabolites did not show significantly different concentrations between TGY (Wei) and TGY (Wang), suggesting that these two plants may be sister strains of ‘Tieguanyin’. Our study underpins future research on the relationships and molecular mechanisms that contribute to the accumulation of secondary metabolites in tea. These findings also highlight that the expression levels of genes encoding important enzymes in secondary metabolic pathways affect the formation of the “Yin Rhyme” flavor in ‘Tieguanyin’.
Differentially expressed genes
Fragments per kilobase of exon per million fragments
Gas chromatography-mass spectrometry
Hidden Markov model
High performance liquid chromatography
Quantitative real-time polymerase chain reaction
Wang L, Yue C, Cao HL, Zhou YH, Zeng JM, Yang YJ, Wang XC. Biochemical and transcriptome analyses of a novel chlorophyll-deficient chlorina tea plant cultivar. BMC Plant Biol. 2014;14:352.
Wei CL, Yang H, Wang SB, Zhao J, Liu C, Gao LP, Xia EH, Lu Y, Tai YL, She GB, et al. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc Natl Acad Sci U S A. 2018;115(18):4151–8.
Liu SR, An YL, Li FD, Li SJ, Liu LL, Zhou QY, Zhao SQ, Wei CL. Genome-wide identification of simple sequence repeats and development of polymorphic SSR markers for genetic studies in tea plant (Camellia sinensis). Mol Breeding. 2018;38(5):59.
Vaidya K, Ghosh A, Kumar V, Chaudhary S, Srivastava N, Katudia K, Tiwari T, Chikara SK. De novo transcriptome sequencing in Trigonella foenum-graecum L. to identify genes involved in the biosynthesis of diosgenin. Plant Genome. 2012;6(2):1–11.
Liu MY, Qiao GR, Jiang J, Yang HQ, Xie LH, Xie JZ, Zhuo RY. Transcriptome sequencing and de novo analysis for ma bamboo (Dendrocalamus latiflorus Munro) using the Illumina platform. PLoS One. 2012;7(10):e46766.
Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, Beeson KY, Schork NJ, Murray SS, Topol EJ, Levy S. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009;10:R32.
Hendre PS, Kamalakannan R, Varghese M. High-throughput and parallel SNP discovery in selected candidate genes in Eucalyptus camaldulensis using Illumina NGS platform. Plant Biotechnol J. 2012;10(6):646.
Shi CY, Yang H, Wei CL, Yu O, Zhang ZZ, Jiang CJ, Sun J, Li YY, Chen Q, Xia T. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds. BMC Genomics. 2011;12:131.
Wu ZJ, Li XH, Liu ZW, Xu ZS, Zhuang J. De novo assembly and transcriptome characterization: novel insights into catechins biosynthesis in Camellia sinensis. BMC Plant Biol. 2014;14:277.
Wang L, Cao HL, Chen CS, Yue C, Hao XY, Yang YJ, Wang XC. Complementary transcriptomic and proteomic analyses of a chlorophyll-deficient tea plant cultivar reveal multiple metabolic pathway changes. J Proteome. 2016;130:160–9.
Deng XJ, Zhang HQ, Wang Y, He F, Liu JL, Xiao X, Shu ZF, Li W, Wang GH, Wang GL. Mapped clone and functional analysis of leaf-color gene Ygl7 in a rice hybrid (Oryza sativa L. ssp. indica). PLoS One. 2014;9(6):e99564.
Tai YL, Wei CL, Yang H, Zhang L, Chen Q, Deng WW, Wei S, Zhang J, Fang CB, Ho CT. Transcriptomic and phytochemical analysis of the biosynthesis of characteristic constituents in tea (Camellia sinensis) compared with oil tea (Camellia oleifera). BMC Plant Biol. 2015;15:190.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.
Xia EH, Zhang HB, Sheng J, Li K, Zhang QJ, Kim C, Zhang Y, Liu Y, Zhu T, Li W, et al. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol Plant. 2017;10(6):866–77.
Xie C, Mao XZ, Huang JJ, Ding Y, Wu JM, Dong S, Kong L, Gao G, Li CY, Wei LP. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 2011;39(s2):316–22.
Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013;29(16):2073.
Xu QS, He YX, Yan XM, Zhao SQ, Zhu JY, Wei CL. Unraveling a crosstalk regulatory network of temporal aroma accumulation in tea plant (Camellia sinensis) leaves by integration of metabolomics and transcriptomics. Environ Exp Bot. 2018;149:81–94.
Shin DH, Choi M, Kim K, Bang G, Cho M, Choi SB, Choi G, Park YI. HY5 regulates anthocyanin biosynthesis by inducing the transcriptional activation of the MYB75/PAP1 transcription factor in Arabidopsis. FEBS Lett. 2013;587(10):1543.
Zvi MM, Shklarman E, Masci T, Kalev H, Debener T, Shafir S, Ovadis M, Vainstein A. PAP1 transcription factor enhances production of phenylpropanoid and terpenoid scent compounds in rose flowers. New Phytol. 2012;195(2):335–45.
Lightbourn GJ, Griesbach RJ, Novotny JA, Clevidence BA, Rao DD, Stommel JR. Effects of anthocyanin and carotenoid combinations on foliage and immature fruit color of Capsicum annuum L. J Hered. 2008;99(2):105–11.
Lu YF, Zhang ML, Meng XN, Wan HH, Zhang J, Tian J, Hao SX, Jin KN, Yao YC. Photoperiod and shading regulate coloration and anthocyanin accumulation in the leaves of malus crabapples. Plant Cell Tiss Org. 2015;121(3):619–32.
Li CF, Xu YX, Ma JQ, Jin JQ, Huang DJ, Yao MZ, Ma CL, Chen L. Biochemical and transcriptomic analyses reveal different metabolite biosynthesis profiles among three color and developmental stages in 'Anji Baicha' (Camellia sinensis). BMC Plant Biol. 2016;16(1):195.
Karageorgou P, Manetas Y. The importance of being red when young: anthocyanins and the protection of young leaves of Quercus coccifera from insect herbivory and excess light. Tree Physiol. 2006;26(5):613–21.
Liu M, Tian HL, Wu JH, Cang RR, Wang RX, Qi XH, Xu Q, Chen XH. Relationship between gene expression and the accumulation of catechin during spring and autumn in tea plants (Camellia sinensis L.). Hortic Res. 2015;2:15023.
Han YH, Huang KY, Liu YJ, Jiao TM, Ma GL, Qian YM, Wang PQ, Dai XL, Gao LP, Xia T. Functional analysis of two flavanone-3-hydroxylase genes from Camellia sinensis: a critical role in flavonoid accumulation. Genes. 2017;8(11):300.
Moriguchi T, Kita M, Tomono Y, Endo-Inagaki T, Omura M. Gene expression in flavonoid biosynthesis: correlation with flavonoid accumulation in developing citrus fruit. Physiol Plantarum. 2001;111(1):66–74.
Pelt JL, Downes WA, Schoborg RV, McIntosh CA. Flavanone 3-hydroxylase expression in Citrus paradisi and Petunia hybrida seedlings. Phytochemistry. 2003;64(2):435–44.
Pelletier MK, Shirley BW. Analysis of flavanone 3-hydroxylase in Arabidopsis seedlings (Coordinate regulation with chalcone synthase and chalcone isomerase). Plant Physiol. 1996;111(1):339–345.
Rothenberg DO, Yang HJ, Chen MB, Zhang WT, Zhang LY. Metabolome and transcriptome sequencing analysis reveals anthocyanin metabolism in pink flowers of anthocyanin-rich Tea (Camellia sinensis). Molecules. 2019;24(6):1064.
Sun BM, Zhu ZS, Cao PR, Hao C, Chen CM, Xin Z, Mao YH, Lei JJ, Jiang YP, Wei M, et al. Purple foliage coloration in tea (Camellia sinensis L.) arises from activation of the R2R3-MYB transcription factor CsAN1. Sci Rep. 2016;6:32534.
Guo YQ, Chang XJ, Zhu C, Zhang ST, Li XZ, Fu HF, Chen CS, Lin YL, Lai ZX. De novo transcriptome combined with spectrophotometry and gas chromatography-mass spectrometer (GC-MS) reveals differentially expressed genes during accumulation of secondary metabolites in purple-leaf tea (Camellia sinensis cv Hongyafoshou). J Hortic Sci Biotechnol. 2019;94(3):349–67.
Punyasiri PA, Abeysinghe IS, Kumar V, Treutter D, Duy D, Gosch C, Martens S, Forkmann G, Fischer TC. Flavonoid biosynthesis in the tea plant Camellia sinensis: properties of enzymes of the prominent epicatechin and catechin pathways. Arch Biochem Biophys. 2004;431(1):22–30.
Chao N, Li N, Qi Q, Li S, Lv T, Jiang XN, Gai Y. Characterization of the cinnamoyl-CoA reductase (CCR) gene family in Populus tomentosa reveals the enzymatic active sites and evolution of CCR. Planta. 2017;245(1):61–75.
Kallscheuer N, Vogt M, Bott M, Marienhagen J. Functional expression of plant-derived O-methyltransferase, flavanone 3-hydroxylase, and flavonol synthase in Corynebacterium glutamicum for production of pterostilbene, kaempferol, and quercetin. J Biotechnol. 2017;258:190–6.
Gagné S, Lacampagne S, Claisse O, Gény L. Leucoanthocyanidin reductase and anthocyanidin reductase gene expression and activity in flowers, young berries and skins of Vitis vinifera L. cv. Cabernet-sauvignon during development. Plant Physiol Bioch. 2009;47(4):282–90.
Mohanpuria P, Kumar V, Joshi R, Gulati A, Ahuja PS, Yadav SK. Caffeine biosynthesis and degradation in tea [Camellia sinensis (L.) O. Kuntze] is under developmental and seasonal regulation. Mol Biotechnol. 2009;43(2):104–11.
Li CF, Zhu Y, Yu Y, Zhao QY, Wang SJ, Wang XC, Yao MZ, Luo D, Li X, Chen L. Global transcriptome and gene regulation network for secondary metabolite biosynthesis of tea plant (Camellia sinensis). BMC Genomics. 2015;16:560.
Watanabe S, Kounosu Y, Shimada H, Sakamoto A. Arabidopsis xanthine dehydrogenase mutants defective in purine degradation show a compromised protective response to drought and oxidative stress. Plant Biotechnol. 2014;31(2):173–8.
Watanabe S, Nakagawa A, Izumi S, Shimada H, Sakamoto A. RNA interference-mediated suppression of xanthine dehydrogenase reveals the role of purine metabolism in drought tolerance in Arabidopsis. FEBS Lett. 2010;584(6):1181–6.
Lücker J, El Tamer MK, Schwab W, Verstappen FWA, Van der Plas LHW, Bouwmeester HJ, Verhoeven HA. Monoterpene biosynthesis in lemon (Citrus limon). cDNA isolation and functional analysis of four monoterpene synthases. Eur J Biochem. 2002;269(13):3160–71.
Kjonaas R, Croteau R. Demonstration that limonene is the first cyclic intermediate in the biosynthesis of oxygenated p-menthane monoterpenes in Mentha piperita and other Mentha species. Arch Biochem Biophys. 1983;220(1):79–89.
Yao GF, Ming ML, Allan AC, Gu C, Li LT, Wu X, Wang RZ, Chang YJ, Qi KJ, Zhang SL, et al. Map-based cloning of the pear gene MYB114 identifies an interaction with other transcription factors to coordinately regulate fruit anthocyanin biosynthesis. Plant J. 2017;92(3):437–51.
The authors wish to thank Zihao Zhang for technical assistance in sampling. And we also thank Jennifer Smith, PhD, from Liwen Bianji, Edanz Group China (www.liwenbianji.cn/ac), for editing the English text of a draft of this manuscript.
The fees for high-throughput sequencing and data processing were supported by the Natural Science Foundation of Fujian Province (2018 J01701) and the Earmarked Fund for China Agriculture Research System (CARS-19); the fees for quantitative real time polymerase chain reaction detection were supported by the Special Fund Project for Scientific and Technological Innovation of Fujian Agriculture and Forestry University (CXZX2017164, CXZX2018069) and the Major Science and Technology Project in Fujian Province (2015NZ0002–1), the publication fees were supported by the Rural Revitalization Tea Industry Technical Service Project of Fujian Agriculture and Forestry University (11899170102) and the Construction of Plateau Discipline of Fujian Province (102/71201801101). The funding body did not play a role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Availability of data and materials
All data presented in this study are provided either in the manuscript or additional files. Raw reads data have been deposited in NCBI Sequence Read Archive (SRA) under the accession numbers SRR8491466, SRR8491467, and SRR8491468.
Yuqiong Guo and Chen Zhu contributed equally to this work.
Authors and Affiliations
College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
Yuqiong Guo, Chen Zhu, Shanshan Zhao, Shuting Zhang, Haifeng Fu, Xiaozhen Li, Chengzhe Zhou, Lan Chen, Yuling Lin & Zhongxiong Lai
Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
Chen Zhu, Shuting Zhang, Yuling Lin & Zhongxiong Lai
ZXL, YQG, and CZ designed the work; YQG, CZ, and STZ performed the experiments and wrote the paper; YQG, CZ, SSZ, HFF, XZL, and YLL analyzed the data; CZ, STZ, WJW, CZZ, and LC helped to perform the sequence analysis and revised the paper carefully. All authors have read and approved the manuscript.
The tea cultivars ‘Tieguanyin’ and ‘Benshan’ used in this study were planted and grown in Xiping town, Anxi county, Fujian Province, China. No specific permits were required for plant collection. The study did not require ethical approval or consent as no endangered or protected plant species were involved.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S4. Differentially expressed genes (DEGs) related to limonene metabolic pathways in TGY (Wei), TGY (Wang), and BS. (DOC 16 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Guo, Y., Zhu, C., Zhao, S. et al. De novo transcriptome and phytochemical analyses reveal differentially expressed genes and characteristic secondary metabolites in the original oolong tea (Camellia sinensis) cultivar ‘Tieguanyin’ compared with cultivar ‘Benshan’.
BMC Genomics20, 265 (2019). https://doi.org/10.1186/s12864-019-5643-z