Genome-wide characterization and metabolite profiling of Cyathus olla: insights into the biosynthesis of medicinal compounds

Cyathus olla, belonging to the genus Cyathus within the order Agaricales, is renowned for its bird's nest-like fruiting bodies and has been utilized in folk medicine. However, its genome remains poorly understood. To investigate genomic diversity within the genus Cyathus and elucidate biosynthetic pathways for medicinal compounds, we generated a high-quality genome assembly of C. olla with fourteen chromosomes. The comparative genome analysis revealed variations in both genomes and specific functional genes within the genus Cyathus. Phylogenomic and gene family variation analyses provided insights into evolutionary divergence, as well as genome expansion and contraction in individual Cyathus species and 36 typical Basidiomycota. Furthermore, analysis of LTR-RT and Ka/Ks revealed apparent whole-genome duplication (WGD) events its genome. Through genome mining and metabolite profiling, we identified the biosynthetic gene cluster (BGC) for cyathane diterpenes from C. olla. Furthermore, we predicted 32 BGCs, containing 41 core genes, involved in other bioactive metabolites. These findings represent a valuable genomic resource that will enhance our understanding of Cyathus species genetic diversity. The genome analysis of C. olla provides insights into the biosynthesis of medicinal compounds and establishes a fundamental basis for future investigations into the genetic basis of chemodiversity in this significant medicinal fungus. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10528-3.


Content
Experimental section: Determination of neuroprotective activity of crude extract of C. olla .. 1 Table S1.Statistics S5

Experimental section: Determination of neuroprotective activity of crude extract of C. olla
The rat adrenal pheochromocytoma cell line PC-12 was obtained from the China Center for Typical Culture Collection.The cells were cultured in F-12 (Ham) nutrient mixture supplemented with 10% heat-inactivated horse serum (HS), 5% heat-inactivated fetal bovine serum (FBS), 100 U/mL penicillin G, 100 g/mL streptomycin, and 2.5 g/L sodium bicarbonate.The PC-12 cells were cultured in a humidified atmosphere with 5% CO2 at 37°C.PC-12 cells with neurites were morphologically analyzed and quantified using phase contrast microscopy.The cells were inoculated onto poly-L-lysine-coated 24-well plates at a density of 2×10 4 cells/mL in standard serum medium and incubated for 24 hours.Prior to exposure to either the vehicle (0.1% DMSO) or the specified reagents, the F-12 medium with low serum (1% HS and 0.5% FBS) was replaced.The cells were then treated with 20 ng/mL of nerve growth factor (NGF) and varying concentrations of crude extract of C. olla.A control group was also treated with 20 ng/mL NGF.This experiment was repeated three times, with one concentration in each well.After an additional 48 hours of incubation, the neuronal growth of PC-12 cells was captured using a digital camera attached to an inverted microscope equipped with a phase contrast objective.Five random images were chosen from each well under the microscope.The proportion of cells with neurites equal to or longer than the length of one cell body was determined as a measure of neurite outgrowth, expressed as a percentage of the total number of cells within the five randomly selected fields of view.The experiment was repeated at least three times, and the results are presented as the mean ± standard deviation.Rank is the gradient of data length, >0 is that all data.Flag is data type, all is all sequencing data, pass is effective sequencing data, fail is filtration data.TotalBase is the number of base.
TotalReads is the number of reads.MaxLen is maximum length of data.AvgLen is average length of data.N50 is N50 of data, all reads are summed in order from long to short, and when additive length up to half of all reads total length, the last read length added is N50.L50 is L50 of data, all reads are accumulated in turn ranking in order from long to short, when additive length up to half of all reads total length, the number of sequences is L50.N90 is N90 of data, algorithm the same as N50.L90 is L90 of data, algorithm the same asL50.meanQ is mean quality value.Total_length is the assembly length.Total_length_without N is the length without gap in the assembly result.Total_number is the number of assembled sequences.GC_content is the GC content.N50 is the N50 of the data, all the sequences are sorted according to the longest to the shortest and then added up, and the last added sequence length is N50 when it reaches half of the total length.N90 algorithm is the same as N50.Average is the average length.Median is the median length.Min is the minimum length.Max is the maximum length.The total number of gene is the total number of genes; the average of mRNA_length is the average length of mRNA; the average of cds_length is the average length of CDS; the average of exon_number is the average number of exons per gene.the average of exon_length is the average exon length; the average of intron_length is the average intron length; the total number of exon is the total number of exons; the total number of intron is the total number of introns; the total intron length is the total intron length.According to the results of the Nr library match, the top 10 species were counted and the rest were classified as other species, and the distribution of these species was mapped according to their proportion.The identified FPP sequences used for clustering were obtained from a reported literature.Multiple sequence alignment and evolutionary tree construction were performed as described above.Multiple sequence pairs are implemented with mafft V7 .505 1 with parameters -maxiterate 1000 --localpair.
The evolutionary tree was constructed by IQtreeV2.

Figure S1 .
Figure S1.The ITS of Cyathus olla SUT01 was aligned to NCBI nr database.

Figure S2 .
Figure S2.NGF-dependent promotional activity of crude extract of C. olla on rat pheochromocytoma PC12 cells.

Figure S4 .
Figure S4.Species distribution map of Nr database alignment to sequences.

Figure S6 .
Figure S6.Statistical Chart of COG Functional Annotated Classification.

Figure S8 .
Figure S8.Domain annotation based on the Pfam database.

Figure S9 .
Figure S9.Ka and Ks comparisons of four Cyathus species.

Figure S10 .
Figure S10.Molecular network analysis of metabolites from the mycelium and fruiting bodies of C. olla SUT01.

Figure S11 .
Figure S11.The LC-ESI-HRMS and LC-ESI-HRMS/MS spectrums of isolates from C. ollaSUT01.The mass spectrometry data were obtained in positive ion mode and numbers 1-13 correspond to compounds 1-12 one by one.

Figure S13. The 1 H
Figure S13.The 1 H and 13 C NMR spectra of Compound 4.

Figure S14 .
Figure S14.The 1 H and 13 C NMR spectra of Compound 5.

Figure S17 .
Figure S17.Annotation of the triterpenoid biosynthetic pathway of C. olla SUT01 using KAAS.

Table S1 . Statistics of Illumina NovaSeq sequencing data volume information of C. olla SUT01 genome.
Q20 and Q30 are Phred value greater than 20, 30 base as a percentage of total base, respectively.
Sample name is data type.Total_reads is sequencing reads number.Total_bases is total sequencing base number.GC_Content is G/C base number as a percentage of total base number.

Table S7 . Statistics of C. olla SUT01 protein-coding gene annotation.
Annotation is the gene with at least one annotation; Uniprot is the gene annotated to the Uniprot database; Pfam is the gene that is annotated to the Pfam database; Refseq is the gene that is annotated to the Refseq database; Nr is the gene that is annotated to the Nr database; Interproscan is the gene that is annotated to the Interproscan GO is the gene annotated to the GO database; KEGG is the gene that is annotated to the KEGG database; Pathway is the gene that is annotated to the KEGG Pathway database; COG is the gene that is annotated to the COG database.

Table S11 . The source (URL) statistics for 39 Basidiomycetes and C. olla used to phylogenetic analysis. Species Table S12. Statistics of repetitive sequence of C. olla SUT01.
SINE is short scattered element.LINE is Long scattered element.LTR is long terminal repetition, mainly include two types, Gypsy and Copia.DNA is transposons.Satellite is satellite repetitive sequence.Low_complexity is Low_complexity repetition.Other is other types repetition.Unknown is unknown repetitive sequence.Total is total repetitive sequence.