Genomic-wide analysis and prediction of genes involved in biosynthesis of polysaccharide and bioactive secondary metabolites in high-temperature-tolerance of wild Flammulina filiformis

Background: Flammulina filiformis (=Asian “F.velutipes”) is a popular commercial edible mushroom. Many bioactive compounds such as polysaccharides and sesquiterpenoids with medicinal effects have been isolated and identified, but their biosynthesis and regulation in molecular level is unclear. In this study, we sequenced the genome of the wild strain F. filiformis Liu 355, predicated the biosynthetic gene clusters (BGCs) and profiled these genes expression between wild and cultivar strains and among different development stages of the wild strain of F. filiformis by a comparative transcriptomic analysis. Results: The results revealed that the genome of the F. filiformis was 35.01 M bp in length and annotated with 10396 gene models. 12 putative terpeniod gene clusters were predicted, 12 sesquiterpenes synthase genes belonged to four different groups and two type I PKS (polyketide synthase) gene clusters were identified from F. filiformis genome. The gene number related terpeniod biosynthesis is higher in wild strain (119 genes) than cultivar strain (81 genes) and most of them are up regulated in primodium and fruiting body of the wild strain, while PKS genes are usually up-regulated in the mycelium of wild strain. Moreover, genes encoding UDP-glucose pyrophosphorylase and UDP-glucose dehydrogenase involved in polysaccharide biosynthesis have relative high transcripts both in mycelium and fruiting bodies of F. filiformis. Conclusions: We identified candidate genes involved in the biosynthesis of polysaccharide and terpenoid bioactive compounds and profiled these genes expression during the development of F. filiformis. This study expends our knowledge for understanding the biology of F. filiformis and provides valuable data for elucidating the secondary metabolism regulation of the special strain of F. filiformis.

glucans and heteropolysaccharides), immunomodulatory protein (e.g. FIP-fve) and multiple bioactive sesquiterpenes were also isolated and identified from the extract, mycelium and fruiting bodies of F. velutipes [26] . Tang et al. [12] reviewed the compounds derived from the F. velutipes and their divers biological activities. More and more studies on chemical compounds and biological activity of the mushroom supported that F. velutipes should be exploited as a great source for development of functional foods, nutraceuticals and even pharmaceutical drugs [27].
Development of "omic" technology provided the powerful tools to understand the biology of edible mushroom including effective utilization for cultivation substrate (lignocellulose) by genome and transcriptome sequencing discovered a new family of diterpene cyclases in fungi [39,40] and identification of the candidate cytochromes P450s genes cluster possible related to triterpenoid biosynthesis in medicinal mushroom Ganoderma lucidum by genome sequencing improved the production of the medicinal effective compounds [41,42].
However, as a popular edible mushroom that have wide spectrum of interesting biological activities, little is known for synthesis and regulation of bioactive secondary metabolites of F. filiformis. In previous experiments, we collected a wild strain of F. filiformis Liu355 from Longling, Yunnan and demonstrated that it could tolerate relatively high temperature during fruiting body formation (at 18 °C-22 °C) in the laboratory and it is prior to the commercial strains of F. filiformis (Asian commercial F. velutipes usually produce fruiting body under low temperature ≤15 °C) [16]. Thus, the wild strain is a potential and important genetic material for future breeding engineer because it can save large amounts of energy. Most interestingly, the chemical composition of the wild strain was different from other commercially cultivated strains of F. filiformis with more unique chemical compounds. Total 13 new sesquiterpenes with nor-eudesmane, spiroaxane, cadinane, and cuparane skeletons were isolated and identified from the wild strain Liu355 [9].
Thus, the aims of our study are to explore the genetic feature of this interesting wild strain of F. filiformis in genomic scale, to identify gene clusters associated with biosynthesis of bioactive secondary metabolites and to profile these candidate genes expression difference during the development of F. filiformis. This research will facilitate our understanding for its biology of the wild strain and provide a useful datasets for molecular breeding, as well for improving compounds production and producing novel compounds by next heterologous pathway and metabolic engineering in future.

General features of the F. filiformis genome
Prior to our study, three "genomes" assigned toF.velutipes" were publically available including the relative complete genome of the KACC42780 from Korea, L11 from China and a draft genome of the TR 19 from Japan. In this study, we sequenced the genome of wild strain of F. filiformis by small fragments library construction and did a comparative genome analysis on secondary metabolism gene clusters. The assembled genomes of the wild F. filiformis were 35.01Mbp with approximately 118-fold genome coverage. Total 10396 gene models were predicted, with an average sequence length of 1445 bp. The genome size and the number of predicted protein encoding genes are very similar to the public reference genome of F. filiformis ( = Asian F. velutipes) ( Table 1). Functional annotation of the predicted genes showed that more than half of predicted genes were annotated in NR database (6383 genes) and 1972, 2582, 837, 5794 genes were annotated in database SwissProt, KEGG, COG and GO, respectively. We identified 107 cytochrome P450 family genes, 674 genes encoding secretory proteins and 287 genes in CAZy database.
There have 17293 pan-genes among four strains of F. filiformis and pan-genome core comprised 4074 genes (on average 23.5% of each genome) (Fig. 1A). The proportion (23.5%) of core genes in pan-genome analysis is similar to pan-genome analysis of 23 Corallococcus spp. [43]. But possibly, the number was lower than the actual number because that these genomes were not sequenced to completion. 3104 out of orthologous genes was annotated in KEGG database, 2722 genes have annotations in the GO database and 1055 genes are specific to the wild strain Liu355.

Functional characteristics of F. filiformis predicted genes
A KEGG enrichment analysis was performed to determine the functions of predicted genes of F. filiformis. The result showed that the highest number of genes of F. filiformis is involved in genetic information processing and translation (253 genes), followed by metabolism (carbohydrate metabolism with 243 genes). 21 genes were found participate into terpenoid and polyketides biosynthesis (Additional file 1: Figure S1).

Transcriptomic analysis and gene expression
We studied the gene expression differences across different development stages of F. filiformis including the monokaryotic (MK), dikaryotic mycelium (DK), primordium (PD) and fruiting body (FB) in transcriptomic level. Meanwhile, the DK of the cultivar strain of F. filiformis (CGMCC 5.642) also was transcriptomic sequenced. Three biological replicates were designed for each sample. An average clean data for each sample is 8.07-9.32 G. We mapped the clean reads to genomeof F. filiformis Liu 355 using the HISAT software and get relative high total mapping rate (92.63%). In addition, the expression variation between samples was the smallest between MK and DK and was the greatest between the MK and FB of F. filiformis (Additional file 1: Figure S2).
Among 10396 gene models of F. filiformis, 9931 genes models were expressed (FPKM>5) across the four different tissues (MK, DK, PD and FB) of the wild strain and the mycelium of a cultivar strain of F. filiformis. 6577 genes were commonly expressed in all tissues and 151 genes were specific expressed in the cultivar strain, and 152,116,46, 199 genes were specific expressed in MK, DK, PD and FB of the wild strain of F. filiformis, respectively (Fig.   1B). Tissue-specific and high expression transcripts in F. filiformis Liu 355 were listed in Additional file 2: Table S1. Two new genes encoding ornithine decarboxylase (involved in ployamine synthesis) are highly expressed in mycelium of cultivar strains (Novel01369, Novel01744) and genes encoding oxidoreductase also has the highest expressed level (gene 830, FPKM>1000). Genes encoding agroclavine dehydrogenase, acetylxylan exterase, beta-glucan synthesis-associated protein and arabinogalactan endo-1,4-beta-galactosidase protein are significantly high expressed in fruiting bodies (FB) of the wild strain F. filiformis with more than 20-100 fold change compared to mycelium. It was known that agroclavine dehydrogenase was involved in the biosynthesis of fungal ergot alkaloid ergovaline [44]; and beta-glucan synthesis-associated protein is probably linked the polysaccharide biosynthesis of fungal cell wall. Highly expression of these genes indicated that it probably play an important roles in fruiting bodies development and compounds enrichment.
Total 5131 genes (51.67%) were up-or down regulated at least in one stage transitions such as from mycelium to primordium (PD vs DK, 3889 genes) and from primordium to fruiting body (FB vs PD, 3308 genes) ( KEGG enrichment analysis showed that DEGs involved in glutathione metabolism is significantly enriched in dikayontic mycelium of wild strain Liu 355 compared to cultivar strain (Fig. 2). Thirty-three DEGs, including genes encoding glutathione S-transferase, ribonucleoside-diphosphate reductase, 6-phosphogluconate dehydrogenase, cytosolic non-specific dipeptidase, gamma-glutamyltranspeptidase, glutathione peroxidase etc. Moreover, based on predicted metabolic enzymes related toPSs biosynthesis of Ganoderma lucidum [46], we identified 21 putative essential enzyme involved in PSs biosynthesis in F. filiformis using protein sequencing homology search method (Table 2), including glucose-6-phosphate isomerase, hexokinase, mannose-6-phosphate isomerase, UDP-glucose dehydrogenase, galactokinase and transketolase etc. Among them, genes encoding UDP-glucose pyrophosphorylase, UDP-glucose dehydrogenase and fructosebisphosphate aldolase have relative high transcripts in all samples analyzed (FPKM>100).
These candidate genes will be functional verification by experiment in future, combined the quantification of PSs in different tissue.

Carbohydrate active enzymes (CAZymes)
Secreted carbohydrate-degrading enzymes are crucial enzymes for fungal biology, both for . The expression profile of these genes related to CAZys is diverse (Fig. 3). Seven genes of 23 GH16 family are up-regulated expression in DK mycelium stage compared to MK (three genes annotated as candidate glycosidase and four genes as beta-glucan synthesisassociated protein), four genes of GH16 family are up-regulated in fruiting bodies compared to primordium.
Besides GH16, seven out of the 14 GH5 members (four genes annotated as Glucan 1,3beta-glucosidase, two genes annotated as endoglucanase and another annotated as mannan endo-1,4-beta manosidase) are differentially up-regulated in DK mycelium of the wild strain compared to MK mycelium and five genes belong to GH5 family are upregulated in primordium stage compared to mycelium. Eight members out of 11 GH43 are up-regulated in primordium compared to mycelium.

Predicted bioactive secondary metabolism genes clusters of F. filiformis
Besides the macronutrients and micronutrients present in the F. filiformis, a large number of structurally diverse bioactive secondary metabolites, especially various novel sesquiterpenes and norsequiterpenes were identified from the mycelium and fruiting bodies of F. filiformis [9]. The genes involved in sesquiterpenes biosynthesis were described previously [9]. In this study, the gene cluster encoding terpenoid and PKS were re-examined using the genome and transcriptome resource. In total, 12 gene clusters related to terpenoid biosynthesis and two gene clusters for polyketide biosynthesis were predicted (Fig. 4, Additional file 2: Table S3). Compared with other three cultivar strains (KACC42780, TR 19 and L11 with genome sequencing), the numbers of gene clusters involved in terpene, PKS, NRPS and Siderophore biosynthesis are different and the gene number related to terpene synthesis is higher in wild strain Liu355 (119 genes) than cultivar strain L11 (81 genes) in our study (Table 3).  Table S4). Eight genes out of 12 genes related to sesquiterpene biosynthesis have closely phylogenetic relationship with genes identified from cultivar Flammulina in the previous study [9]. 12 STSs genes of F. filiformis include five genes encoding delta (6)-protoilludene synthase (gene1663-D2, gene9115, gene2784 and gene9115-D2, gene 6325-D2), two genes encoding trichodiene synthase (gene1140, gene 2254), two genes encoding alpha-muurolene synthase (gene1358-D2, gene1358), and one gene encoding glucose-6-phosphate isomerase. Among them, genes encoding delta (6)-protoilludene synthase are up-regulated in fruiting body stage compared to mycelium and also up regulated in DK mycelium compared to MK of the wild strain Liu355 (Table 4).  Table S3). The two gene clusters both included core genes encoding polyketide synthase (gene 8217 and gene 1373).

Putative genes for polyketides biosynthesis in F. filiformis
Genes located scafflold 78 are most up-regulated in wild strain mycelium compared with cultivar strain, including erythronolide synthase (gene 1374), polyketide synthase (gene1373), benzoate 4-monooxygenase (gene 1372), indicating that polypeptide compounds are probably abundant in mycelium of this mushroom and especially in wild strain.

Cytochrome P450s in F. filiformis genome
It is reported that a more divergent cytochrome P450 oxidase could be involved in secondary biosynthesis [55]. So, we searched the genome of the

Heat shock protein correspond to temperature change in F. filiformis
Besides with unique compounds, the wild strain of F. filiformis Liu 355 can produce the fruiting body at relative high temperature than current commercial strains, implying it is a potential excellent breeding resource. A temperature downshift (cold stimulation) is considered to be one of the most important and essential environmental factors for the fruiting initiation and fruiting body formation of F. velutipes [34]. In generally, mycelia of F. velutipes can grow vegetative at 20-24 °C and fruiting at an optimum temperature for 12-15 °C [32]. In our study, the wild strain of Liu 355 can grow fruiting bodies at 18 °C at the laboratory. Therefore, it is potential excellent genetic material for F. velutipes breeding. Proteomic sequencing revealed that the expression of proteins related to energy metabolism (e. g. catalase, glucose-6-phosphate isomerase, trehalase and betaglucosidase), amino acids biosynthesis (e. g. argininosuccinate synthase) and signal pathway (e.g. BAR adaptor protein) are dramatically increased after long-term cold stress [32,34]. In addition, histidine kinases, response regulators, and sometimes histidinecontaining phosphotransfer proteins were also reported play crucial roles in response to cold stress in cyanobacterium Synechocystis sp. and Arabidopsis [56,57]. Transcriptomic sequencing revealed that histidine kinase and proteins involved in MAPK pathway (e.g. STE protein kinase, MAPK kinase kinase) and Ca 2+ signal transduction pathway (calciumdependent protein kinase) were differentially expressed in F. velutipes [31].
The heat shock protein (HSP) family was known that it positively correlated with the organism thermotolerance [58]. In this study, 28 genes annotated as HSP were identified in F. filiformis genome (Fig. 5). Among them, six genes are significantly up-regulated in wild strain Liu 355 compared to cultivar strain and encoding HSP12, HSPC4, HSP104, LHS1 and GRP78, respectively. HSP12 is part of a group of small heat shock proteins (HSP) that function as chaperone proteins and are ubiquitously involved in nascent protein folding by protecting proteins from misfolding and partially characterized as a stress response and expression of HSP12 protein was observed in response to cold stress [59]. In S. cerevisiae and C. albicans, HSP104 in association with HSP40 and HSP70 helps in reactivation and aggregation of denatured protein, by providing disaggregated protein to HSP40 and HSP70 as a substrate [60]. Expression of HSP104 and HSP70 is regulated by Hsf (heat-shock factor) interaction, which can be stimulated by heat stress in yeast [58]. However, the exactly molecular function of HSP in the high-temperature-tolerance of wild F. filiformis and adaptive mechanisms for relative high temperature need further study.

Conclusion
In our study, the genome and transcriptome sequencing, assembly and annotation of a high-temperature-tolerance of wild F. filiformis was carried out and the gene clusters associated with polysaccrides, terpenoid and polyketide biosynthesis were predicted.
Comparative genomic analysis with other three Asian cultivars strains of F.velutipes ( = F. filiformis) revealed that the wild strain has a similar genome size and relative much more putative gene numbers related to secondary metabolites biosynthesis. Most of genes related terpeniod biosynthesis are up regulated in primodium and fruiting body of the wild strain, while PKS genes are usually up-regulated in the mycelium of wild strain but the exactly regulation pathway is unresolved in this study.
Six genes belongs to the heat shock protein (HSP) family were significantly up-regulated in wild strain Liu 355 compared to cultivar strain, including HSP12, HSPC4, HSP104, LHS1 and GRP78, which is possible responsible for high-temperature-tolerance of wild F. filiformis developing fruiting body in relative high temperaturebut these genes expression in other different strain of F. filiformis, especially in strain with low-temperature development will be verified in next step.
Our study provides an important genetic dataset for the potential breeding materials of F. filiformis and put a foundation for better understanding the biology of F. filiformis.

Culture of strains
The wild strain Liu355 was isolated from the fruiting body of F. filiformis collected from Longling, Yunnan province, southwestern China and its ITS sequence was listed under GenBank accession number KP867925 [9]. The haploid monokaryotic strain F. filiformis Liu355 (deposited in mycological lab of Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences) was prepared by protoplast mononuclear method and was grown on potato dextrose agar (PDA) at room temperature for 2-3 weeks in the dark (Additional file 1: Figure S6). The fruiting bodies were obtained in sterile plastic bottles containing on growth substrate (cotton seed hulls 78%, wheat bran 20%, KH 2 PO 4 0.1%, MgSO 4 0.1%, sucrose 1% and ground limestone 1%, with a moisture content of 70%) at 25°C for 30 d, followed by cold-stimulation at 18°C and 90% humidity until primordial occurred. Cultures were maintained at low temperature (18°C and 75% humidity) to allow full fruiting body development [61]. In addition, the genomic data of two cultivars strains from Korea (KACC42780, Bioproject PRJNA191921) and Japan (TR19, Bioproject PRJDB4587) were available from NCBI public database and genomic sequences of strain L11 (Bioproject PRJNA191865) was friendly provided by the Mycological Research Center, College of Life Sciences, Fujian Agriculture and Forestry University [62].

Genome, transcriptome sequencingand analysis
Total genomic DNA of F. filiformis was extracted from the mononuclear mycelium in the PDA medium using the Omega E. Z. N. A. fungal DNA midi kit (Omega, USA) according to manufacturer's instructions. Total DNA obtained was detected by agarose gel electrophoresis and quantified by Qubit. The strain of F. filiformis was sequenced using 350 bp paired-end reads on an Illumina HiSeq 4000 by PE150 strategy. Library construction and sequencing was performed at the Beijing Novogene Bioinformatics Technology Co. Ltd (Beijing, China). After removing adapter and low quality sequence using FastQC, the high quality reads were mapped into the reference genome sequence of F. filiformis L11 (Bioproject PRJNA191865) using BWA software. Functional annotation of the predicted genes was performed using BLAST against Gene ontology (GO), Kyoto

Encyclopedia of Genes and Genomes (KEGG), SwissProt and NCBI non-redundant (NR)
protein database [39]. Pan-genome analysis was carried out using the standalone CD-HIT tool to cluster orthologous proteins [63].
For transcriptomic sequencing, total RNA was extracted using the RNAeasy Plant Mini Kit Diego, CA, USA). Bioinformatics analysis was done based on clean data with high quality and the RNA-seq reads were mapped to the F. filiformis genome (Liu355) using TopHat v2.0.1253 [64]. HTSeq v0.6.1 software was used to count the read numbers mapped to each gene [65]. The FPKM value was used to calculate gene expression, and the upperquartile algorithm was used to correct the gene expression. Gene differential expression analysis was performed using the DESeq R package (1.18.0) using a corrected p-value [66]. Genes with an adjusted P-value < 0.05 were assigned as differentially expressed.
Hierarchical clustering of gene expression was conducted using Genesis 1.7.7 [67].

Prediction of gene clusters
The biosynthetic gene clusters were predicted using antiSMASH 3.0 software [68]. The antiSMASH currently offers a broad collection of tools and databases for automated genome mining and comparative genomics for wide variety of different classes of secondary metabolites [69]. In addition, homology sequence search method (Blast) was also used for identification of the genes related to terpenoid biosynthesis. The

Availability of data and materials
The datasets supporting the results of this article are included with in the article and additional files. The genomic and transcriptomic data have been deposited to GenBank database with the dataset identifier PRJNA531555 for genome and PRJNA530834 for tanscriptome.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.