Phytochemical and comparative transcriptome analysis reveals different regulatory mechanisms in terpenoids pathways between Matricaria recutita L. and Chamaemelum nobile L

Background Matricaria recutita (German chamomile) and Chamaemelum nobile (Roman chamomile) belong to the botanical family Asteraceae. These two herbs are not only morphologically distinguishable, but their secondary metabolites – especially the essential oils in flowers are also different, especially the terpenoids. The aim of this project was to preliminarily reveal differential regulatory mechanisms in terpenoids biosynthetic pathway between German and Roman chamomile by performing comparative transcriptomic and metabolomics analyses. We determined the content of essential oil in disk florets and ray florets in these two chamomiles, and found that the terpenoids content of German chamomile is greater than that of Roman chamomile. In addition, comparisons between German and Roman chamomile were studied by RNA-Seq, which showed that 54% of genes shared >75% sequence identity between the two species. In particular, more highly expressed DEGs (differentially expressed genes) and TFs (transcription factors), different regulation of CYPs (cytochrome P450 enzymes) and rapid evolution of the downstream enzymes in terpenoids biosynthetic pathways in German chamomile, and maybe the reasons result to differences in the types and levels of terpenoids compounds in these two chamomiles. In addition, the phylogenetic tree of single copay genes showed that German chamomile and Roman chamomile had a high identity to C. nankingense. 1-deoxy-D-xylulose-5-phosphate synthase; DXR: 1-deoxy-D-xylulose-5-phosphate reductoisomerase; HDR: 4-hydroxy-3-methylbut-2-en-1-yl diphosphate reductase; TPS: terpene synthase; GPPS: geranyl diphosphate synthase; GGPPS: geranylgeranyl diphosphate synthase; SS: squalene synthase; PSY: phytoene synthase; FPPS: farnesyl diphosphate synthase; IPT: isopentenyl transferases; HDS: 4-hydroxy-3-methylbut-2-en-l-yl diphosphate synthase; MCS: 2-C-methyl-D-erythtitol 2,4-cyclodiphosphate synthase; CMK: 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase PMK: phosphomevalonate kinase; MPD: diphosphomevalonate decarboxylase; HMGR: 3-hydroxy-3-methylglutaryl-coenzyme A reductase; GC-MS: Gas Chromatography-Mass Spectrometer; FDR: Rate; PPIN: interaction network


qRT-PCR verification of selected genes
In order to determine the accuracy of the transcriptome sequencing, qRT-PCR analysis was used to assess the quality of the transcriptomic data. Total RNA was extracted from flowers, and first-strand cDNAs used for qRT-PCR analysis were synthesized from total RNA using a Prime-Script TM 1st Strand cDNA Synthesis Kit (TaKaRa, Dalian, China). Expression of selected terpenoid pathway genes were monitored by qRT-PCR using the SYBR Green qPCR mastermix (Takara, SYBR Premix Ex TaqII™), on a Bio-Rad CFX 96™ real-time PCR system (Bio-Rad), according to the manufacturer's instructions.
Detailed information about the selected unigenes, including their unigene IDs, and all PCR primers sequences are given in Additional files 1. We used the 18S ribosomal RNA gene as an internal reference for gene expression, and the relative expression levels were calculated using the 2 ΔCt method [66]. All qRT-PCR analyses were performed using three biological replicates and three technical replicates (means±standard deviation).

Determination of essential oil in German and Roman chamomile flowers
GC-MS analysis indicated that the essential monoterpenoid and sesquiterpenoid constituents in the flowers of German chamomile and Roman chamomile are significantly different ( Figure 1 and Figure  2, Additional file 1). The relative quantities of monoterpenoids and sesquiterpenoids in disk flowers were always higher than in ray flowers in both chamomiles. The main compounds from German chamomile essential oil were sesquiterpenoids, such as α-Bisabolol oxide A, Chamazulene, α-Bisabolol oxide B, α-Bisabolol and Espatulenol. Meanwhile, the major constituents from Roman chamomile essential oil were n-Hexadecanoic acid, linoleic acid and other esters. The contents of α-Bisabolol oxide A, α-Bisabolol oxide B and Chamazulene in German chamomile were 50, 30, 10 times more than that in Roman chamomile. In addition, the content of these compounds in disk flowers were 2-10-fold greater than that in ray flowers in two chamomiles. These results are consistent with the previous findings reported by Yao et al.[27].

De novo transcriptome assembly and comparative RNA-seq analyses
After removing the terminal adaptor sequences, duplicated and ambiguous sequences, and lowquality reads, we generated approximately 106.72 Gb of Illumina RNA-seq data from mRNA extracted from ray flowers and disk flowers of German and Roman chamomile. There were 53.31Gb and 53.41Gb of clean read data used in the assemblies for German and Roman chamomile, respectively.
The Q20 and Q30 scores were greater than 98% and 95% for the German and Roman chamomile transcriptome data, respectively. The final German chamomile cDNA assembly consisted of 117,203 unigenes; the average length was 1,056 bp and the N50 length was 1,686 bp. The final assembly for Roman chamomile had 147,616 unigenes with an average length of 914 bp and an N50 length of 1,506 bp (Table 1). There were 50,881 (43.41%) and 48,957 (33.17%) unigenes >1,000 bp in length in German and Roman chamomile, respectively.

Functional annotation and classification
All unigenes from both German chamomile and Roman chamomile were annotated using several  (Table 2).
Overall, 29,222 (24.93%) unigenes in the German chamomile transcriptome were assigned to 25 COG categories ( Supplementary Figure 1 A). Among these groups, unigenes belonging to "general function prediction" occupied the largest part (8,163), followed by transcription (4,382). In addition, 35,889 of the total 117,203 Roman chamomile unigenes were classified into 25 COG categories. The assignments (10,007) were mostly enriched in the ''general function prediction'', followed by the

PPIN construction and analysis between German chamomile and Roman chamomile
A total of 477 and 505 interaction pairs involved in terpenoid biosynthesis pathway were identified from German chamomile and Roman chamomile, respectively. We selected the interaction proteins for further analysis, and the PPIN related to terpenoid biosynthesis pathway were shown in figure 5.
We found amount of CYPs interaction with protein involved in terpenoid biosynthesis pathway in PPIN of German chamomile and Roman chamomile. And there were more types and amount of CYPs interaction protein in German chamomile than that in Roman chamomile. In addition, we found transcription factors such as MYB, WD-40 repeat and Zinc finger protein in PPIN of German chamomile and Roman chamomile. And the types and amount were also different. Moreover, interactions between proteins related to terpenoid biosynthesis pathway between German chamomile and Roman chamomile. (Figure 5 and Additional file 4).

Roman chamomile
A high level of nucleotide sequence similarity was found in genes that are homologous between German and Roman chamomile, with 54% of the genes sharing >75% identity. We obtained 60,338 transcripts that were common to German Chamomile and Roman chamomile. We also detected 56,865 and 87,278 special transcripts in German and Roman chamomile, respectively (Figure 6 A). In addition, the special transcripts identified in the two species were used as queries to search the KEGG databases; we found that 2,960 (43%) and 3,861 (49%) of the unigenes are involved in secondary metabolite biosynthesis in German and Roman chamomile, respectively. Also, 1,000 unigenes (46%) were related to plant-pathogen interactions in German chamomile and 1,295 (50%) in Roman chamomile. A total of 27,886 putative ortholog pairs were identified between German and Roman chamomile. Of these ortholog pairs, 584 had a Ka/Ks value more than 1, suggesting that they were under positive selection. Among these positive selection genes, there were 1 unigenes (CMK: 4diphosphocytidyl-2-C-methyl-D-erythritol kinase) related to terpenoids backbone biosynthesis, 1 unigenes (germacrene D synthase) related to sesquiterpenoids biosynthesis, 2 unigenes ((E)-betaocimene synthase and (+)-neomenthol dehydrogenase) related to monoterpenoids biosynthesis and 1 unigenes (Artemisia annua cytochrome P450 mono-oxygenase) related to diterpenoids biosynthesis (Table 3). Since the Ka/Ks value is widely used to detect selective pressure (environmental factors) acting on protein-coding sequences, rapid evolution of the genes involved in terpenoids backbone biosynthesis, monoterpenoids biosynthesis, sesquiterpenoids biosynthesis and diterpenoids biosynthesis might be associated with adaptive selection between German and Roman chamomile.

Comparison analysis in German chamomile, Roman chamomile and other composite family plants
A total of 92,393 gene families were identified by orthofinder, among them 1,232 single copay genes were identified. There were 6 gene families specifically in German chamomile, and 8 gene families specifically in Roman chamomile. In gene families which specifically from German chamomile, we found one gene family related to disease resistance. A total of 5, 718 gene families were specifically both in German chamomile and Roman chamomile ( Figure 6B). The KEGG enrichment results showed that gene families specifically both in German chamomile and Roman chamomile were mainly enriched in 'Metabolism', and 472 gene families were enriched in 'Biosynthesis of secondary metabolites' (Additional file 5). The phylogenetic tree of single copay genes showed that German chamomile and Roman chamomile had a high identity to C. nankingense ( Figure 6C). In addition, we analyzed unigenes of German chamomile, Roman chamomile and other composite family plants based on the Pfam database, and the results showed that unigenes related to TFs such as WD-repeat and Zinc−finger were more than other composite family plants. Furthermore, there were more MATE and ABC transporter in German chamomile and Roman chamomile ( Figure 6D).

qRT-PCR analysis and correlation analysis
We

Conclusions
Previous studies have shown that essential oil compositions vary in German chamomile as compared to Roman chamomile [27,50]. In this study, we found there are more monoterpenoids and sesquiterpenoids in flowers of German chamomile compared to Roman chamomile, and these compounds were more prevalent in disk florets than in ray florets. Furthermore, we obtained a large number of unigenes by transcriptome sequencing in German and Roman chamomile. We identified many candidate unigenes related to terpenoid biosynthesis, and the majority were more highly expressed in MC-DF than in MC-RF, especially the key enzyme genes in this pathway. The DEGs, TFs and CYPs involved in the terpenoid biosynthesis pathway were more numerous in German chamomile than in Roman chamomile, meanwhile, even transcripts related to terpenoid biosynthesis pathway under accelerated evolution were identified, which could explain the differences of chemical compounds in metabolite biosynthesis between the two species. The phylogenetic tree of single copay genes showed that German chamomile and Roman chamomile had a high identity to C.
nankingense. Our study is the first to report that a comparative transcriptome analysis between German chamomile and Roman chamomile. The transcriptomic data characterized in this research will be an invaluable resource for further studies of the functional genomics, molecular biology, and breeding in German chamomile and Roman chamomile.    Pathway enrichment analysis by tissue pair comparisons The ratio between the number of DEGs mapped to a pathway and the total number of genes mapped to that pathway are indicated by rich factors. A larger rich factor indicates greater intensiveness. The Q values were calculated using a hypergeometric test with the Bonferroni Correction. The Q value is a corrected p value that ranges from 0-1, and lower Q values mean greater intensiveness.
"Gene number" refers to the number of DEGs mapped to a given pathway.