- Research
- Open access
- Published:
Genomic analysis based on chromosome-level genome assembly reveals Myrtaceae evolution and terpene biosynthesis of rose myrtle
BMC Genomics volume 25, Article number: 578 (2024)
Abstract
Background
Rose myrtle (Rhodomyrtus tomentosa (Ait.) Hassk), is an evergreen shrub species belonging to the family Myrtaceae, which is enriched with bioactive volatiles (α-pinene and β-caryophyllene) with medicinal and industrial applications. However, the mechanism underlying the volatile accumulation in the rose myrtle is still unclear.
Results
Here, we present a chromosome-level genomic assembly of rose myrtle (genome size = 466 Mb, scaffold N50 = 43.7 Mb) with 35,554 protein-coding genes predicted. Through comparative genomic analysis, we found that gene expansion and duplication had a potential contribution to the accumulation of volatile substances. We proposed that the action of positive selection was significantly involved in volatile accumulation. We identified 43 TPS genes in R. tomentosa. Further transcriptomic and TPS gene family analyses demonstrated that the distinct gene subgroups of TPS may contribute greatly to the biosynthesis and accumulation of different volatiles in the Myrtle family of shrubs and trees. The results suggested that the diversity of TPS-a subgroups led to the accumulation of special sesquiterpenes in different plants of the Myrtaceae family.
Conclusions
The high quality chromosome-level rose myrtle genome and the comparative analysis of TPS gene family open new avenues for obtaining a higher commercial value of essential oils in medical plants.
Background
Volatile compounds play important roles in nature, such as altering plant-animal interactions and altering the local abiotic environment. The Myrtaceae species are regarded as essential oil producers because of the high concentration of cyclic mono- and sesquiterpenes [1,2,3]. Eucalyptus oil contains 1,8-cineole as the main component [4,5,6]. The leaves of Melaleuca alternifolia are dominantly detected with terpinen-4-ol, terpinolene, and 1,8-cineole, and these volatiles are usually called tea tree oil [7, 8]. Rose myrtle (Rhodomyrtus tomentosa (Ait.) Hassk), belonging to the family of Myrtaceae, is a paradigmatic example of terpenes-rich medicinal plant [9, 10]. Rose myrtle is increasingly used in a wide field of applications, including medicine, cosmetics, healthy food, and for industrial purposes [11, 12]. At present, many volatile compounds have been detected in rose myrtle [13], especially the leaves are enriched with (+)-α-pinene and β-caryophyllene [14, 15]. These simple and polymeric terpenoids function as photoprotectants, antifeedants, or physical barriers, playing vital roles in plant growth, development, and environmental interaction [16,17,18]. Commercially used essential oils, including myrtle oil, lavender oil and tea tree oil, are a mixture of volatile terpenes [19], with a growing amount of importance in industrial applications [20, 21].
Volatile terpenes are the largest class of natural products, which essentially originate from the C5 substrates dimethylallyl diphosphate (DMAPP) and isopentenyl diphosphate (IPP) [22]. The DMAPP and IPP precursors are produced through the mevalonate (MVA) and methylerythritol phosphate (MEP) pathways, respectively [23]. In plants, the MEP pathway typically operates in plastids while the MVA pathway operates in the cytosol [23]. Hemi-, mono-, and diterpenes, as well as carotenoids (tetraterpenes), are produced via the MEP pathway [24]. Terpene synthase (TPS) catalyzes complex carbocation cascade reactions on the prenyl diphosphate substrate, resulting in cyclic or linear terpene backbones [25, 26]. However, biology of volatile terpenes biosynthesis and accumulation is still unclear in rose myrtle.
TPS gene family members are divided into seven subgroups (-a, -b, -c, -d, -e/f, -g, and -h) [27, 28]. TPS-a and TPS-b mainly synthesizes sesquiterpenes and monoterpenes, respectively [27, 28]. TPS-g can synthesize monoterpenes, sesquiterpenes and diterpenes [27]. Terpenes and terpenoids play important roles in plant resistance to herbivores and response to environmental stimuli [29]. The genetic basis of terpene synthesis has been widely concerned in family Myrtaceae. The gene numbers encode putative terpene synthase in Eucalyptus grandis, M. alternifolia, and Leptospermum scoparium were 113, 37, and 49, respectively [7, 30, 31]. Further research into terpene biosynthesis is demanded for industrial production of essential oils.
A gap-free rose myrtle T2T genome has been reported recently [32], and their genome assembly provides a foundation for investigating the anthocyanin accumulation mechanism of R. tomentosa. However, as medicinal resources and undomesticated plants, the genetics of the special medicinal components and environmental adaptation strategies of R. tomentosa requires a better understanding and possible improvement. Here, we assembled a chromosome-level genome for R. tomentosa using third-generation PacBio in association with Illumina sequencing and Hi-C technique. Gene amplification and natural selection shaped the genetic adaptation of R. tomentosa to the harsh biotopes. The structure of the genes involved in the terpenoid synthesis pathway are positively selected. Our study represents the basis for exploring the genetic potential of R. tomentosa which contributes to the accumulation of essential oils.
Materials and methods
Plant sampling
For whole-genome assembly, a mature adult R. tomentosa individual was selected from a natural population from the South China National Botanical Garden (23.1817 N, 113.3671 E, Chinese Academy of Sciences, Guangzhou, China). The voucher specimen was kept at the South China Botanical Garden Herbarium (IBSC 0925721). Fresh leaves were collected for whole genome sequencing with Illumina HiSeq X Ten and PacBio Sequel sequencing platforms. For RNA sequencing in support of gene annotation, young leaves, petal lower lips, young stems, green fruits, and roots were sampled from the same individual.
Genome sequencing, assembly and quality assessment
We extracted and purified the total DNA from fresh leaves. For Illumina short-reads sequencing, PCR-free libraries with 300bp, 500bp, and 10kb-20kb paired-end (PE) insert were prepared and sequenced on the Illumina HiSeq X Ten platform. SMRT long-read sequencing was performed on a PacBio Sequel platform with the Sequel Sequencing Kit 2.1. For Hi-C sequencing, young and fresh leaf tissues were preserved in 1% (vol/vol) formaldehyde, DNA was cross-linked according to protocol, and a single library (150-bp PE) was sequenced on the Illumina HiSeq X platform. More detailed information on sequencing can be found in Table S1 (see online supplementary material).
Wtdbg2 (v1.3.1) [33] and FALCON (v0.4.1) [34] were used for error correction in PacBio long reads according to Illumina short reads and then generate consensus sequences. Further, these subreads were assembled into contigs by Flye v0.2.1. We applied SSPACE v1.2.0 [35] to generate scaffolds using Illumina mate-paired reads. Preassembled scaffolds were clustered, ordered, and orientated onto pseudo-chromosomes with ALLHiC software (v0.8.11) [36]. The genome size was estimated based on k-mer distribution analysis by GenomeScope (v2.0) [37] using Illumina short reads without a flow cytometry analysis. Hi-C libraries of fresh young leaves were constructed with NEB Next Ultra II DNA library preparation kit and DpnII enzyme (Ipswich, MA, USA).
Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.6.1 were used to evaluate the accuracy and completeness of the assembled genome. Genome completeness was assessed using the plant’s dataset of the BUSCO database, with an e-value < 1e-5. Single-copy embryophyta_odb10 homologous genes in BUSCO were used to predict the gene status of the existing sequences in the genome.
Finally, we used Merqury (v1.3) [38] to estimate the consensus QV of the assembly. Augustus [39] was utilized in de novo gene prediction while Trinity were implemented to generate EST evidence with RNA-seq data from four different tissues (root, leaf, flower, stem, and green fruit). The quality of assembled genome was evaluated by mapping RNA-seq reads from these different tissues using Bowtie2 [40].
Chromosome counting and karyotype analysis
Root tips were pretreated with 0.002% hydroxyquinoline at 4 ℃ for 3 h [41]. After a thorough wash, tips were fixed in 1:3 acetic ethanol and digested in HCl (1 M) solution for 45 min in a 37 ℃ water bath. The root tips were stained with Carbol-fuchsin solution for 72 h, then cells were crushed onto a glass plate and drawn under oil immersionlens.
Repeat and noncoding RNA annotation
We performed repeat masking using EDTA (v1.9.4 with parameter: –sensitive 1 –anno 1 –evaluate 1) with cDNA assembled from RNA-seq reads by Trinity. Four types of non-coding RNA genes, including tRNAs, rRNAs, miRNAs, and snRNAs, were predicted in the R. tomentosa genome. The tRNA genes were predicted using tRNAscan-SE with eukaryote parameters. INFERNAL with default parameters was used to annotate miRNA, snRNA, and rRNA.
Structural and functional annotation of genes
A combined strategy of homology-based search, de-novo gene prediction, and RNA sequencing-aided annotation was used to annotate gene structure for the R. tomentosa genome. For homolog prediction, sequences of proteins from 13 species, including 6 closely related species from Myrtaceae (E. grandis, L. scoparium, Psidium guajava, Syzygium oleosum), other Myrtales species (Punica granatum, Sonneratia alba, Rhizophora apiculata, Sonneratia caseolaris), some representative species (Arabidopsis thaliana, Solanum lycopersicum, Vitis vinifera, Vaccinium corymbosum) and monocot species (Oryza sativa). The protein sequences were aligned to the genome using tBlastn with an e-value cut-off of 1e-5. De-novo gene structure identification was based on Augustus [42], SNAP [43], and Fgensh, respectively. RNA-seq reads from different tissues were aligned to the genome using Bowtie2 (v3.2.7). Finally, putative protein-coding genes in the R. tomentosa genome were integrated using the Maker package (v 3.01.03).
Functional annotation of the protein-coding genes was conducted by performing BlastP (e-value cut-off 1e-05) searches against entries in the NCBI nr and SwissProt databases. Searches for gene motifs and domains were performed using InterProScan. The GO terms for genes were obtained from the corresponding InterPro or Pfam entry. Pathway reconstruction was performed using KOBAS (v2.0) and the KEGG database.
Phylogenetic analysis and estimation of divergence time
OrthoFinder was used to identify orthologous genes from R. tomentosa and 13 other species including A. thaliana, O. sativa, V. vinifera, E. grandis, L. scoparium, P. granatum, P. guajava, R. apiculata, S. alba, S. caseolaris, S. lycopersicum, S. oleosum, and V. corymbosum. Single-copy orthologous genes were retrieved from these 14 species and aligned using MUSCLE [44] with default parameters and low-quality alignment regions were removed using Gblocks (v 0.91b) with default parameters. All alignments were combined to produce a super-alignment matrix, which was used to construct a maximum likelihood (ML) phylogenetic tree using RAxML (v8.2.12) with parameters: -f a - × 12,345 -p 12,345 -# 100 -m PROTGAMMALGX -s ex.fa.gb -n ex -T 30. Divergence times between species were calculated using the r8s with the default parameters.
Gene family expansion and contraction analysis
Gene family expansion and contraction were conducted using the default settings by CAFÉ (v4.2.1) [45]. Gene families were identified by OrthoFinder. We determined the gene family expansions or contractions when the difference in gene copy number was significant with P-value < 0.01.
Comparative genome analyses
To assess the degree of collinearity, we try to identify syntenic blocks among R. tomentosa, P. guajava, and E. grandis using MCScanX [46]. A syntenic region was highlighted if it contained at least 30 shared genes.
Identification of TPS gene family
For the identification of TPSs, representative members of the subfamilies of M. alternifolia, P. guajava, and R. tomentosa were used as queries to perform Blastp searches against the protein database of each species with an E-value cut-off of 1e-5. Candidate sequences identified as orthologs were then aligned using Mafft to remove those that did not contain the intact domain. For phylogenetic analysis, sequences were combined to produce a super-alignment matrix, which was used to construct a maximum likelihood (ML) phylogenetic tree in RAxML (v8.2.12) with parameters: -f a - × 12,345 -p 12,345 -# 1000 -m PROTGAMMALGX -s ex.phy -n ex -T 30. The successfully constructed phylogenetic tree is displayed and annotated using iTOL software. Conserved motifs were identified by MEME tools, conserved domains were identified by NCBI Batch CD-search and visualized in TBtools-II [47].
RNA extraction, library construction, and sequencing
Total RNA was extracted using a Trizol reagent kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol. RNA quality was assessed on an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) and checked using RNase-free agarose gel electrophoresis. The cDNA fragments were purified and ligated to Illumina sequencing adapters. The ligation products were size selected by agarose gel electrophoresis, PCR amplified, and sequenced using Illumina HiSeqTM 4000 with PE 150 bp. The unigene expression was calculated and normalized to TPM (transcripts per million).
Positive selection analysis
For positive selection analysis, we first identified single-copy orthologous genes from R. tomentosa and the three most closely related species with assembled genomes: L. scorparium (Myrtaceae), E. grandis (Myrtaceae), and P. guajava (Myrtaceae), S. oleosum (Myrtaceae) and P. granatum (pomegranate, Lythraceae). For these genes, based on the phylogenetic topology, we employed the branch-site model incorporated in the PAML package v4.9 [48] to detect positively selected genes (PSGs). When one of the five species of Myrtaceae was specified as a foreground branch, the other four and the pomegranate branches in the phylogenetic tree were used as background branches. We conducted likelihood ratio tests to determine whether the positive selection was operating on the foreground branch. In this study, PSGs were identified only when P < 0.001.
Results
De-novo genome assembly and pseudo-chromosome construction
The genome of R. tomentosa, which is commonly grown in the South China including Guangzhou (Fig. 1A) was sequenced. The estimated genome size was 459 Mb based on the 31-mer depth distribution analysis of the sequenced short reads (Fig. S1). We obtained around 130 Gb of a high-quality dataset, including ~ 44.5 Gb of short-read sequences from three mate-pair libraries and ~ 85.8 Gb of Pacbio sequences, which represent ~ 300X coverage for the genome (Table S1). As shown in Fig. S2, the karyotype consists of 2n = 2X = 22 chromosomes. We assembled the genome with size of 466 Mb (Table 1, S2, and S3), consisting of 1,143 contigs with an N50 of 1.01 Mb (Table S2). Using Hi-C technology, 99.56% (463.9 Mb) of contigs can be ordered and anchored onto 11 pseudo-chromosomes (Fig. 1B, Table 1 and Fig. S3), which finally consisted of 28 scaffolds (11 pseudo-chromosomes and 17 scaffolds) with an N50 of 43.7 Mb (Table 1 and Table S2). The GC content of the assembled genome is 40.59% (Table 1), which is similar to those of P. guajava and E. grandis, the two closely related species to R. tomentosa.
The BUSCO database detected 1,546 (95.8%) and 24 (1.5%) complete and fragmented gene models, respectively out of 1,614 BUSCO genes (Table S3). To evaluate genome assembly quality, Merqury results showed that the integrity of the genome assembly was 86.2%, QV = 36.7, and the error rate was only 0.021%, indicating that a genome with high integrity and accuracy was constructed (Fig. S4). Moreover, 76.11%-95.50% of RNA-seq reads generated from different tissues can be successfully mapped to the assembled genome by hisat2 (Table S4). We also obtained the RNA-seq datasets of leaf samples reported by He et al. [14], and 75.33%-92.42% of RNA-seq reads were mapped to the assembled genome. Taken together, these observations suggest the high quality and completeness of the chromosome-level reference genome assembly of R. tomentosa.
Repetitive elements and protein-coding gene annotation
Repeat sequence annotation showed that the R. tomentosa genome contained 35.21% of repetitive sequences (Table 1 and Table S5). Among these sequences, long terminal repeats (LTRs) were the most abundant interspersed repeats, occupying 32.06% of the genome, including 18% Gypsy LTRs and 5.33% Copia LTRs (Table S5). TIR repeats and helitron repeats accounted for 2.12% and 0.45%, respectively (Table S5). We confidently annotated 35,554 protein-coding genes of which 95.7% had a homolog in a suite of functional databases (Table 1 and Table S6). In addition, 2,892 noncoding RNAs, comprising 143 conserved microRNAs, 601 transfer RNAs, 1,754 ribosomal RNAs, and 394 small nuclear RNAs, were identified in the R. tomentosa genome (Table S7). These results indicated that a little higher number of genes were annotated in R. tomentosa compared with that of other species (Table S8). A comparison of gene models for R. tomentosa species revealed that the length of exons and intron in R. tomentosa was relatively conserved, whereas the length of introns is a little shorter in A. thaliana (Fig. S5). However, the average length of genes was a little shorter in R. tomentosa, compared with other species (Fig. S5).
Gene family analysis
To identify evolutionary characteristics and gene families, the R. tomentosa genome was compared with 13 published genomes, including 6 closely related species from Myrtaceae (E. grandis, L. scoparium, P. guajava, S. oleosum), other Myrtales species (P. granatum, S. alba, R. apiculata, S. caseolaris), some representative species (A. thaliana, S. lycopersicum, V. vinifera, V. corymbosum), and a monocot rice (O. sativa) (Table S8). Based on gene family clustering analysis, 31,645 gene families were identified in total, of which 2,913 were shared by all 14 species, and 14 of these shared families were single-copy gene families (Table S9).
Gene family numbers were compared between R. tomentosa and other species. As shown in Fig. 1C, 10,506 gene families were shared between species, and 952 gene families were specific to R. tomentosa. Compared with P. guajava, there were more species-specific genes in R. tomentosa (Fig. 1D). Phylogenetic analysis of a concatenated sequence alignment of R. tomentosa and 13 other plant species indicated that R. tomentosa, as expected, clustered with Myrtaceae species (Fig. 1E). The divergence time between R. tomentosa and the most closely related species, P. guajava, was estimated to be ~ 24.4 million years ago (Fig. 1E). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis revealed that specific genes were especially enriched in terpenoid backbone biosynthesis and pyruvate metabolism (Table S10 and S11, Fig. 2A). Compared with the most recent common ancestor of the 14 plants, there were significant differences in the gene family in different species of Myrtaceae. In the case of the family gene in rose myrtle and eucalyptus, it tended to expand, while mainly experienced contraction in guava. Functional analysis showed that the significantly expanded genes were over-represented in ontology terms related to pyruvate metabolism, phenylpropanoid biosynthesis and flavonoid biosynthesis (Table S12 and S13, Fig. 2B). However, the contracted gene families did not show many specificities with marginal enrichment terms in phenylpropanoid biosynthesis (Table S14 and S15). These results suggested that gene expansion correlated with the terpenoid biosynthesis in R. tomentosa.
Gene duplication affected terpenoids synthesis in R. tomentosa
The distribution of synonymous substitutions per synonymous site (Ks) across all paralogous genes (regardless of gene order) showed a peak at Ks = 0.9, and similar peaks were found for P. guajava (Ks = 1.08) and E. grandis (Ks = 1.07) (Fig. S6). As shown in Fig. S6, the whole-genome duplication (WGD) event of R. tomentosa occurred later than that of P. granatum, which was consistent with previous publications in R. tomentosa [32]. These results provided additional evidence of one WGD event in Myrtaceae after the well-known paleo-hexaploidization event, γ, in the most recent common ancestor (MRCA) of all eudicots. We then analyzed the different origins of gene duplicates. All types of duplications were found and dispersed account for the largest proportion (32.4%), followed by the type of proximal (7.8%), tandem (5.4%) and WGD/segmental (0.7%) (Table S16). Further analysis showed that the KEGG pathway was enriched in monoterpenoid biosynthesis, sesquiterpenoid and triterpenoid biosynthesis, pyruvate metabolism, flavonoid biosynthesis, and phenylpropanoid biosynthesis (Fig. 3A). These results provided clues about the potential contribution of the gene expansion and duplication on the accumulation of volatile substances.
Positively selected genes (PSGs) in R. tomentosa
To better understand the evolutionary footprint in the accumulation of volatile compounds, we further analyzed the positive selection genes in R. tomentosa. Positive selection analyses for R. tomentosa, P. guava, E. grandis, and E. citriodora were conducted using the orthologs from P. granatum as the outgroup. We identified 872 genes possibly under positive selection in R. tomentosa among the 3,923 single-copy orthologous genes (P < 0.001; Table S17). A GO functional classification of PSGs indicated that the terms associated with DNA repair, protein ligase, membrane-bounded organelle, intracellular membrane-bounded organelle, and vesicle transport were significantly over-represented (Fig. S7). We found six PSGs involved in terpenoid synthesis (Table S17). Moreover, these PSGs were detectable at the transcriptional level in various stages of development, especially in leaf (Fig. 3B, Table 2). We also identified three positive selection genes related to the stomatal development pathway (Table S17). Synthetically, these results indicated that R. tomentosa exhibited a remarkable pattern of adaptive evolution in response to environmental cues.
TPS family genes probably affect terpenoids synthesis
To infer the influence of the TPS family on terpenoid biosynthesis in R. tomentosa, molecular evolutionary analysis was conducted. In total, 43 and 32 TPS genes were identified in R. tomentosa and P. guava (Table 3, Table S18), respectively, which contained 7 previously reported RtTPS genes [14]. To gain further insights into the RtTPS gene members, we surveyed the evolutionary relationships (Fig. 4A), motifs (Fig. 4B), domains (Fig. 4C), gene structure and chromosomal location of each TPS gene copy (Fig. 4D, Fig. S8). The RtTPS genes were classified into six subgroups based on their conserved domain structures. Gene structure and conserved domain analysis revealed that all TPS had conserved domain associated with terpene biosynthesis (Fig. 4), which suggests a conserved function in these RtTPSs. These results revealed both conservation and divergence between each subfamily in RtTPSs.
In order to get a better understanding of the evolutionary relationship and classification of the RtTPS members, a ML phylogenetic tree was generated based on amino acid sequences of the TPS domains from M. alternifolia, P. guajava, and R. tomentosa (Fig. 5). Compared with P. guava, we found a specifically expanded TPS-a subgroup in M. alternifolia, and R. tomentosa, which may be related to the accumulation of sesquiterpenoid compounds, such as β-caryophyllene. In TPS-a subgroup, we found that the a1 subgroup was enriched with RtTPSs, but MaltTPS were predominantly clustered to a2 subgroup. Additionally, TPS-a3 clade was missing completely in P. guava. It suggested that the diversity of TPS-a subgroups led to the accumulation of special sesquiterpenes in different plants of Myrtaceae family. Compared R. tomentosa and M. alternifolia, TPS-g subgroup was enriched with TPS family genes of P. guava.
To probe the underlying mechanism of the terpene accumulation pattern, we drawn a predicted terpene biosynthesis pathway with the expression of structural genes in different tissues of R. tomentosa using transcriptome data (Fig. 6). Different structural genes participating in the cytosolic MVA pathway and plastid MEP pathway were identified in this study, exhibiting distinct expression patterns. Tissue-specific expression analysis revealed that the RtTPS genes were differentially expressed in various rose myrtle tissues. RtTPS family genes were dominantly increased in leaf and with a low expression in root, especially in TPS-a1 subgroup. Additionally, we found that two TPS genes, RtTPS03 and RtTPS39, belonging to TPS-a and -b subgroups, were highly expressed in leaf. The results showed that RtTPS family genes affected characteristic terpene accumulation by specifically expanded subgroup and functional differentiation.
Discussion
Diversity among genome datasets contributes to comparative genomics analysis
Rose myrtle belongs to the family Myrtaceae [49], which has attracted increased attention recently because of its industrial and economic applications. A gap-free rose myrtle T2T genome has been reported recently during the period when we prepared the manuscript [32]. The genome size, GC contents, genome structure, and gene numbers of the R. tomentosa genome presented here is quite similar to the reported gap-free genome. These results indicated our assembly was of high quality, and it will provide useful datasets for comparative genomics. Another genome is subsequently reported, but the genome size (442 Mb) is smaller than the gap-free genome and our genome [50].
Tandem duplication and specific subfamily expansion of TPS in R. tomentosa
This manual annotation of the rose myrtle genome revealed that genes and pseudogenes from the same TPS subfamily with high sequence similarities were frequently located in close proximity on the same chromosome. This marked clustering of TPS genes into tandem arrays in rose myrtle paralleled the tandem clusters found in M. alternifolia [51]. Our study found that the gene duplication through unequal crossing over, and subsequent sub- or neo-functionalization, or the expanded specific subfamily evolution were critical mechanisms underpinning the evolution of TPS in rose myrtle. The mechanisms of tandem duplication and specific subfamily expansion are considered as contributors to the adaptive diversification of genes [52], such as TPS family genes, as they are more likely to be retained following gene duplication due to stress pressures [53].
The distinct gene subgroups of TPS affect specific terpene accumulation
All angiosperm TPS subfamilies are represented in R. tomentosa but variations in the size of certain subfamilies relative to the other Myrtaceae were observed [2].The largest distinction were evident in subfamilies that produce secondary metabolites, and thus are likely to be subject of adaptive pressures. For example, R. tomentosa has twice as many TPS-a (sesquiterpenoid) genes compared to P. guava, which is similarity to M. alternifolia [51]. This subgroup in rose myrtle is likely to have had the same significance historically as it had in M. alternifolia, which contributed to the abundance of aromatic compounds. P. guava has more TPS-g subgroup genes than M. alternifolia and R. tomentosa, and these results indicate that the distinctive gene subgroups of TPS led to the biosynthesis and accumulation of different aromas. We particularly find that TPS-a1 subgroup genes were significantly expanded and thus are key potential targets to produce β-caryophyllene in R. tomentosa [14]. The aromatic compounds and essential oils present are a key indicator in determining the economic value of R. tomentosa [54]. Then an in-depth understanding of terpene metabolism will help improve the potential application of secondary metabolites.
Conclusion
We presented a high-quality chromosome-level reference genome for R. tomentosa. The genome characterization including the genome size, GC content, genome structure, gene number, duplication of the genome and divergent time with the close relatives were quite consistent with a recently reported gap-free R. tomentosa genome. Elaborate genomic information on R. tomentosa has primely illustrated the evolutionary relationship of TPS gene family associated with terpene accumulation, especially the TPS-a subfamily which plays an important role in synthesizing the special terpene. Our study provides a further opportunity to research the potential application of secondary metabolites among Myrtaceae in the future.
Availability of data and materials
The R. tomentosa genome assembly and transcriptome raw reads were submitted to BIG Sub (https://ngdc.cncb.ac.cn/search/specific?db=bioproject&q=PRJCA013967).
Abbreviations
- DMAPP:
-
Dimethylallyl Diphosphate
- IPP:
-
Isopentenyl diphosphate
- MVA:
-
Mevalonate
- MEP:
-
Methylerythritol phosphate
- TPS:
-
Terpene synthase
- BUSCO:
-
Benchmarking Universal Single-Copy Orthologs
- TPM:
-
Transcripts per million
- PSGs:
-
Positively selected genes
- KEGG:
-
Kyoto Encyclopedia of Genes and Genomes
- WGD:
-
Whole-genome duplication
- MRCA:
-
Most recent common ancestor
References
Biffin E, Lucas EJ, Craven LA. Ribeiro da Costa I, Harrington MG, Crisp MD: Evolution of exceptional species richness among lineages of fleshy-fruited Myrtaceae. Ann Bot. 2010;106(1):79–93.
Padovan A, Keszei A, Külheim C, Foley WJ. The evolution of foliar terpene diversity in Myrtaceae. Phytochem Rev. 2014;13:695–716.
Barbosa dMÂA, Celeste dJPF, Oliveira FO, Pompeu VEL, Diniz dNL, Moraes CM, Pereira dSDR, Sandro P, Santana dOM, Helena dAAE. Myrcia paivae O.Berg (Myrtaceae) essential oil, first study of the chemical composition and antioxidant potential. Molecules. 2022;27(17):5460–5469.
Merghni A, Noumi E, Hadded O, Dridi N, Panwar H, Ceylan O, Mastouri M, Snoussi M. Assessment of the antibiofilm and antiquorum sensing activities of Eucalyptus globulus essential oil and its main component 1,8-cineole against methicillin-resistant Staphylococcus aureus strains. Microb Pathogenesis. 2018;118:74–80.
Galan DM, Ezeudu NE, García JV, Geronimo CA, Berry N, Malcolm BJ. Eucalyptol (1,8-cineole): an underutilized ally in respiratory disorders? J Essent Oil Res. 2020;32:103–10.
Kainer D, Padovan A, Degenhardt J, Krause ST, Mondal P, Foley WJ, Külheim C. High marker density GWAS provides novel insights into the genomic architecture of terpene oil yield in Eucalyptus. New phytol. 2019;223(3):1489–504.
Calvert J, Baten A, Butler JB, Barkla BJ, Shepherd M. Terpene synthase genes in Melaleuca alternifolia: comparative analysis of lineage-specific subfamily variation within Myrtaceae. Plant Syst Evol. 2021;304:111–21.
Corona-Gómez L, Hernández-Andrade L, Mendoza-Elvira SE, Suazo FM, Ricardo-González DI, Quintanar-Guerrero D. In vitro antimicrobial effect of essential tea tree oil (Melaleuca alternifolia), thymol, and carvacrol on microorganisms isolated from cases of bovine clinical mastitis. Int J Vet Sci Med. 2022;10:72–9.
Hamid HA, Mutazah SR, Yusoff M. Rhodomyrtus tomentosa: A phytochemical and pharmacological review. Asian J Pharm Clin Res. 2017;10(1):10–6.
Vo T-S, Ngo D-H. The health beneficial properties of Rhodomyrtus tomentosa as potential functional food. Biomolecules. 2019;9:76.
Srisuwan S, Mackin KE, Hocking DM, Lyras D, Bennett-Wood V, Voravuthikunchai SP, Robins-Browne RM. Antibacterial activity of rhodomyrtone on Clostridium difficile vegetative cells and spores in vitro. Int J Antimicrob Agents. 2018;52(5):724–9.
Zhao ZF, Wu L, Xie J, Feng Y, Tian J, He X, Li B, Wang L, Wang X, Zhang Y et al. Rhodomyrtus tomentosa (Aiton.): A review of phytochemistry, pharmacology and industrial applications research progress. Food Chem. 2020;309:125715–125724.
Pham TN, Le XT, Pham VT, Le HT. Effects of process parameters in microwave-assisted extraction on the anthocyanin-enriched extract from Rhodomyrtus tomentosa (Ait.) Hassk and its storage conditions on the kinetic degradation of anthocyanins in the extract. Heliyon. 2022;8:e09518-e09525.
He SM, Wang X, Yang SC, Dong Y, Zhao QM, Li YJ, Cong K. De novo transcriptome characterization of Rhodomyrtus tomentosa leaves and identification of genes involved in α/β-pinene and β-caryophyllene biosynthesis. Front Plant Sci. 2018;9:1231–42.
Hiranrat A, Mahabusarakam WJT. New acylphloroglucinols from the leaves of Rhodomyrtus tomentosa. J Asian Nat Prod Res. 2008;64(49):11193–7.
Kuntorini EM, Nugroho LH. yani, Nuringtyas TR: Anatomical structure, flavonoid content, and antioxidant activity of Rhodomyrtus tomentosa leaves and fruits on different age and maturity level. Biodiversitas. 2019;20:12.
Uddin ABMN, Hossain F, Reza ASMA, Nasrin MS, Alam AHMK. Traditional uses, Pharmacological activities, and phytochemical constituents of the genus Syzygium: a review. Food Sci Nutr. 2022;10(6):1789–819.
Hu X, Chen Y, Dai J, Yao L, Wang L. Rhodomyrtus tomentosa fruits in two ripening stages: Chemical compositions, antioxidant capacity and digestive enzymes inhibitory activity. Antioxidants. 2022;11:1390.
Bakkali F, Averbeck S, Averbeck D, Idaomar M. Biological effects of essential oils–a review. Food Chem Toxicol. 2008;46:446–75.
Singh B, Sharma RA. Plant terpenes: defense responses, phylogenetic analysis, regulation and clinical applications. 3 Biotech. 2014;5:129–151.
Vattekkatte A, Garms S, Brandt W, Boland W. Enhanced structural diversity in terpenoid biosynthesis: enzymes, substrates and cofactors. Org Biomol Chem. 2018;16:348–62.
Oldfield E, Lin FY. Terpene biosynthesis: modularity rules. Angew Chem Int Ed Engl. 2012;51(5):1124–37.
Zeng L, Dehesh K. The eukaryotic MEP-pathway genes are evolutionarily conserved and originated from Chlaymidia and cyanobacteria. BMC Genomics. 2021;22(1):137.
Vranová E, Coman D, Gruissem W. Network analysis of the MVA and MEP pathways for isoprenoid synthesis. Annu Rev Plant Biol. 2013;64:665–700.
Degenhardt J, Köllner TG, Gershenzon J. Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochem. 2009;70:1621–37.
Sun P, Schuurink RC, Caissard J-C, Hugueney P, Baudino S. My way: noncanonical biosynthesis pathways for plant volatiles. Trends Plant Sci. 2016;10(21):884–94.
Lanier E, Andersen TB, Hamberger BR. Plant terpene specialized metabolism: Complex networks or simple linear pathways. Plant J. 2023;114(5):1178–201.
Karunanithi PS, Zerbe P. Terpene synthases as metabolic gatekeepers in the evolution of plant terpenoid chemical diversity. Front Plant Sci. 2019;10:1066.
Boncan DAT, Tsang SSK, Li C, Lee IHT, Lam HM, Chan TF, Hui JHL. Terpenes and terpenoids in plants: interactions with environment and insects. Int J Mol Sci. 2020;21(19):7382.
Thrimawithana AH, Jones D, Hilario E, Grierson E, Ngo HM, Liachko I, Sullivan S, Bilton TP, Jacobs JME, Bicknell R, et al. A whole genome assembly of Leptospermum scoparium (Myrtaceae) for mānuka research. N Z J Crop Hortic Sci. 2019;47:233–60.
Külheim C, Padovan A, Hefer CA, Krause ST, Köllner TG, Myburg AA, Degenhardt J, Foley WJ. The Eucalyptus terpene synthase gene family. BMC Genomics. 2015;16:450.
Li FP, Xu SQ, Xiao ZT, Wang JM, Mei Y, Hu HF, Li JY, Liu JY, Hou ZW, Zhao JL et al. Gap-free genome assembly and comparative analysis reveal the evolution and anthocyanin accumulation mechanism of Rhodomyrtus tomentosa. Hortic Res. 2023;10:uhad005.
Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat Methods. 2019;17:155–8.
Wagh S, Tople S, Benhamouda F, Kushilevitz E, Mittal P, Rabin T. Falcon: honest-majority maliciously secure framework for private deep learning. Proc Priv Enhanc Technol. 2021;2021:188–208.
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinform. 2011;27(4):578–9.
Wang YB, Zhang X. Chromosome scaffolding of diploid genomes using ALLHiC. Bio-protoc. 2022;101: e4503.
Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. GenomeScope: fast reference-free genome profiling from short reads. Bioinform. 2017;33:2202–4.
Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21(1):245.
Haas BJ, Papanicolaou A, Yassour M, Grabherr MG, Blood PD, Bowden JC, Couger MB, Eccles DA, Li B, Lieber M, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–512.
Langdon WB. Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks. Biodata Min. 2015;8:1.
Vijayakumar NK, Subramanian D. Cytologia: Cytotaxonomical studies in south indian Myrtaceae. Cytologia. 1985;50:513–20.
Hoff KJ, Stanke M. Predicting genes in single genomes with AUGUSTUS. Curr Protoc Bioinform. 2019;65: e57.
Leskovec J, Sosič R. SNAP: A general-purpose network analysis and graph-mining library. ACM Trans Intell Syst Technol. 2016;8(1):1–20.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
Mendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinform. 2021;36(22–23):5516–8.
Wang YP, Tang HB, DeBarry JD, Tan X, Li JP, Wang XY, Lee T-H, Jin HZ, Marler BS, Guo H, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49–e49.
Chen C, Wu Y, Li J, Wang X, Zeng Z, Xu J, Liu Y, Feng J, Chen H, He Y, et al. TBtools-II: A “One for All, All for One” bioinformatics platform for biological big-data mining. Mol Plant. 2023;16:1733–42.
Yang ZH. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.
Grattapaglia D, Vaillancourt RE, Shepherd M, Thumma B, Foley WJ, Külheim C, Potts BM, Myburg AA. Progress in Myrtaceae genetics and genomics: Eucalyptus as the pivotal genus. Tree Genet Genomes. 2012;8:463–508.
Detcharoen M, Bumrungsri S, Voravuthikunchai SP. Complete genome of rose myrtle, Rhodomyrtus tomentosa, and its population genetics in Thai Peninsula. Plants. 2023;12:1582.
Voelker J, Mauleon R, Shepherd M. The terpene synthase genes of Melaleuca alternifolia (tea tree) and comparative gene family analysis among Myrtaceae essential oil crops. Plant Syst Evol. 2023;309:13.
Rathnayaka S, Kaewwongwal A, Laosatit K, Yimram T, Lin Y, Chen X, Nakazono M, Somta P. Tandemly duplicated genes encoding polygalacturonase inhibitors are associated with bruchid (Callosobruchus chinensis) resistance in moth bean (Vigna aconitifolia). Plant Sci. 2022;323: 111402.
Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu S-H. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiol. 2008;148:1003–993.
Salni S, Marisa H. Antibacterial activity of essential oil from rose myrtle leaves (Rhodomyrtus tomentosa (Ait.) Hassk). Molekul. 2020;15:158–165.
Acknowledgements
Not applicable.
Funding
This work was supported by the Guangdong Forestry Science and Technology Innovation Project (No. 2020-KJCX011), National Natural Science Foundation of China (No. 32070340) and Guangzhou Innovation Leading Team Project (No. 202009020004).
Author information
Authors and Affiliations
Contributions
SLD conceived the project. SLD, LY, PJC and FQZ designed the study. LY, JJJ, QMQ, PL, CF, HGL, GHZ, and SWL performed the sampling and experiments, and data analysis. JJJ, YL, and SWL designed and visualized the figures. LY, JJJ and SLD wrote the manuscript. All authors read and approved the final manuscript.
Authors’ information
LY and JJJ contributed equally to this work and share the first co-authorship.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study was conducted the plant material that complies with relevant institutional, national, and international guidelines and legislation. The mature adult R. tomentosa individual was cultivated in the South China National Botanical Garden (Chinese Academy of Sciences, Guangzhou, China). The voucher specimen was kept at the South China Botanical Garden Herbarium (IBSC 0925721).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
12864_2024_10509_MOESM1_ESM.pdf
Supplementary Material 1: Fig S1. K-mer frequency distribution curve of Illumina short reads for the R. tomentosa genome by GenomeScope. Fig S2. Chromosome karyotype analysis of R. tomentosa. 2n = 2X = 22. Bar = 10 μM. Fig S3. Hi-C contact data mapped to the R. tomentosa chromosome. Fig S4. Merqury assembly spectrum plots for evaluating k-mer completeness to the R. tomentosa chromosome. Fig S5. Comparison of gene models between R. tomentosa with those in other species. Fig S6. Distribution of synonymous substitution levels (Ks) of paralogous (A) and orthologous genes (B). Fig S7. Go analysis of positively selected genes in R. tomentosa. Fig S8. Chromosomal location of TPSs on chromosomes in R. tomentosa.
12864_2024_10509_MOESM2_ESM.xlsx
Supplementary Material 2: Table S1. Summary statistic for raw sequencing dataset. Table S2. Summary statistics for the final genome assembly of R. tomentosa. Table S3. Evaluation of the genome assembly of R. tomentosa using Benchmarking Universal Single-Copy Orthologs (BUSCO). Table S4. Statistics of the R. tomentosa RNA-Seq data from different tissues. Table S5. Statistics of the repeat annotation results. Table S6. Statistics of gene annotation. Table S7. Statistics of non-coding gene annotation. Table S8. List of plant genome sequences used in the comparative genomic analysis. Table S9. Gene families clustered by OrthoFinder in 14 species. Genes used for OrthoFinder were proteins without splice variants. Table S10. KEGG enrichment analysis of species-specific genes in R. tomentosa. Table S11. GO enrichment analysis of species-specific genes in R. tomentosa. Table S12. KEGG enrichment analysis of significant expansion genes in R. tomentosa. Table S13. GO enrichment analysis of significant expansion genes in R. tomentosa. Table S14. KEGG enrichment analysis of significant contraction genes in R. tomentosa. Table S15. GO enrichment analysis of significant contraction genes in R. tomentosa. Table S16. Classification of different origins of duplicate genes in R. tomentosa. Table S17. Positive selection genes in R. tomentosa. Table S18. The RtTPS genes in R.tomentosa..
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Yang, L., Jin, J., Lyu, S. et al. Genomic analysis based on chromosome-level genome assembly reveals Myrtaceae evolution and terpene biosynthesis of rose myrtle. BMC Genomics 25, 578 (2024). https://doi.org/10.1186/s12864-024-10509-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12864-024-10509-6