The genome of the basal agaricomycete Xanthophyllomyces dendrorhous provides insights into the organization of its acetyl-CoA derived pathways and the evolution of Agaricomycotina
BMC Genomics volume 16, Article number: 233 (2015)
Xanthophyllomyces dendrorhous is a basal agaricomycete with uncertain taxonomic placement, known for its unique ability to produce astaxanthin, a carotenoid with antioxidant properties. It was the aim of this study to elucidate the organization of its CoA-derived pathways and to use the genomic information of X. dendrorhous for a phylogenomic investigation of the Basidiomycota.
The genome assembly of a haploid strain of Xanthophyllomyces dendrorhous revealed a genome of 19.50 Megabases with 6385 protein coding genes. Phylogenetic analyses were conducted including 48 fungal genomes. These revealed Ustilaginomycotina and Agaricomycotina as sister groups. In the latter a well-supported sister-group relationship of two major orders, Polyporales and Russulales, was inferred. Wallemia occupies a basal position within the Agaricomycotina and X. dendrorhous represents the basal lineage of the Tremellomycetes, highlighting that the typical tremelloid parenthesomes have either convergently evolved in Wallemia and the Tremellomycetes, or were lost in the Cystofilobasidiales lineage. A detailed characterization of the CoA-related pathways was done and all genes for fatty acid, sterol and carotenoid synthesis have been assigned.
The current study ascertains that Wallemia with tremelloid parenthesomes is the most basal agaricomycotinous lineage and that Cystofilobasidiales without tremelloid parenthesomes are deeply rooted within Tremellomycetes, suggesting that parenthesomes at septal pores might be the core synapomorphy for the Agaricomycotina. Apart from evolutionary insights the genome sequence of X. dendrorhous will facilitate genetic pathway engineering for optimized astaxanthin or oxidative alcohol production.
Xanthophyllomyces dendrorhous (formerly often referred to as Phaffia rhodozyma) is a red-pigmented moderately psychrophilic growing yeast . It is a basidiomycete classified among the Tremellomycetes in the order Cystofilobasidiales together with Cystofilobasidium, Xanthophyllomyces, and a clade containing Mrakia/Mrakiella and several anamorphic species of Tausonia/Guehomyces, Itersonilia, and Udeniomyces [2,3]. However, it is currently unclear whether the Cystifilobasidiales are the most basal group in the Tremellomycetes, or whether Cystofilobasidiales should be excluded from Tremellomycetes in order to assure its monophyly . Because the Cytofilobasidiales are deeply rooted within the Agaricomycotina, they may be of key importance for understanding the evolution of this group. Xanthophyllomyces dendrorhous was originally isolated from exudates of Betula species and other broad-leave trees . Later it was also isolated from leaves of Nothofagus trees and stromata of the tree’s biotrophic fungal parasite Cyttaria spp. . Xanthophyllomyces dendrorhous possesses a homothallic life cycle . A sexual reproductive cycle can be initiated by application of sugar alcohols  leading to sexual conjugation between cells of the same strain from which a long holobasidium with terminal spores is formed. Most publications show that X. dendrorhous is diploid [5,7]. However, electrophoretic chromosome separation in another strain  indicate that at least some strains may be haploid.
Xanthophyllomyces dendrorhous has two evolutionary special metabolic features. One is the synthesis of astaxanthin which is considered unique among fungi. The other is the fermentation of sugar to alcohol under oxidative conditions . Astaxanthin serves as an antioxidant, quenching reactive oxygen species to protect X. dendrorhous from damage by oxidative stress [10,11]. Its biosynthesis is via the mevalonate pathway to the formation of β-carotene with enzymes similar to other carotenogenic fungi . However, all steps of 3-hydroxylation and 4-ketolation at both terminal β-ionone rings leading to the formation of astaxanthin are carried out by a very unique P450 monooxygenase. This protein, Asy, belongs to the 3A subfamily ; electrons are provided by a specific cytochrome P450 reductase . Although the astaxanthin concentration in wild-type strains of X. dendrorhous is too low for commercialization, attempts have been made to increase the astaxanthin yield developing X. dendrorhous as a production system for this carotenoid. The most promising yields were obtained by a combination of classical random mutagenesis followed by systematic engineering of the whole biosynthesis pathway .
Given the interesting position within the largest group of Basidiomycetes, the Agaricomycotina, its special metabolic features mentioned above related to acetyl-CoA derived pathways and its biotechnological potential, it was the aim of this study to elucidate the genome sequence for a phylogenomic investigation for the Agaricomycotina and to elucidate its acetyl-CoA metabolism. The latter is important to obtain tools for the analysis and the modeling of these biotechnological import pathways.
Genome assemblies, completeness assessment and repeat elements
Three Illumina libraries of insert size 250 bp [EMBL: ERR575093], 3 kb [EMBL: ERR575094] and 8 kb [EMBL: ERR575095], were sequenced on an Illumina HiSeq machine with 100 bp paired-end chemistry to generate the whole genome sequence of X. dendrorhous. After filtering raw reads using 26 phred score as average read quality cutoff and a 100 bp length cutoff, 93.23%, 20.13% and 24.46% of reads were left in the 250 bp, 3 kb and 8 kb insert libraries, respectively. Genome assemblies were done with the Velvet genome assembler  using the three different libraries with various k-mer lengths. The genome assembly resulted in a total of 267 scaffolds [EMBL: LN483084-LN483350], the nuclear genome was of 19.50 Mb in size in 257 scaffolds and the mitochondrial genome of 23.50 kb in 10 scaffolds. More than 70% of the genome was assembled into just 7 scaffolds, all of which were longer than 1.7 Mb, and 98% of the genome was represented by 15 scaffolds (Figure 1A).
After generating the genome assembly, all three libraries were mapped to the assembled genome, and around 96.03%, 92.61%, and 93.89% of the used reads could be mapped back to the assembled scaffolds. The genome completeness and continuity of the assembled genome were assessed by using the Cegma pipeline. Around 98.8% of the 458 core eukaryotic were recovered, indicating the high quality of the genome assembly. Repeat element predictions revealed 3.12% of repeat elements in the genome of X. dendrorhous. Classification of repeat elements was done by using TransposonPSI, and as the most abundant ones, 18 gypsy and 49 TY1_Copia retrotransposons were predicted in the assembled genome.
Protein encoding genes and annotations
Both ab-initio and transcript alignment-based methods were used to generate the gene models for X. dendrorhous. These predictions generated 6385 protein coding genes. Functional annotations of the predicted proteome using Panther and InterPro revealed 4627 (72%) and 4951 (77.34%) protein sequences, respectively, which could be assigned with a function.
Protein subcellular localization
The subcellular localization of X. dendrorhous proteins was predicted using ProtComp9 (http://linux1.softberry.com/) and 1378 proteins with mitochondrial, 68 with peroxisomal, 1789 with nuclear, 1420 with cytoplasmic and 705 with plasma membrane localisation were predicted. Secretome predictions using SignalP4.1, TargetP, and TmHmm resulted in a set of 296 proteins predicted to be secreted.
The two major acetyl-Co A derived biosynthesis pathways in X. dendrorhous are terpenoid and fatty acid biosynthesis. Typically, terpenoids in fungi are synthesized via the mevalonate pathway . As outlined in Figure 2, this pathway starts with the condensation of three molecules of acetyl-CoA and proceeds via mevalonate and its diphosphorylation to isopentenyl pyrophosphate. The reactions of this pathway are catalyzed by six enzymes. All of these could be identified in the X. dendrorhous genome by comparison to the corresponding genes from related fungi (Table 1). The gene numbers are given next to the corresponding reactions in Figure 2. An alternative non-fungal route to isopentenyl pyrophosphate is via deoxyxylulose 5-phosphate . All genes of this pathway are absent from the X. dendrorhous genome.
After isomerization of isopentenyl pyrophosphate to dimethyl allyl pyrophosphate, the prenyl pyrophosphate chain is extended by condensation of isopentenyl pyrophosphate molecules with an allylic partner (Figure 2). In addition to the gene encoding isopentenyl pyrophosphate isomerase, two different prenyl transferases genes were detected encoding the enzymes for the formation of either farnesyl pyrophosphate the direct precursor of sterols or geranylgeranyl pyrophosphate the direct precursor of carotenoids (Table 1). The genes encoding isopentenyl pyrophosphate isomerase , geranylgeranyl pyrophosphate synthase  and farnesyl pyrophosphate synthase  have been cloned before from X. dendrorhous. Recently, it has been shown that these prenyl transferases act sequentially  as indicated in Figure 2. Ergosterol is the dominating sterol especially in higher fungi . Its biosynthesis pathways is established in Figure 3 corresponding to the identified ERG genes. The genes assigned and listed in Table 2 encode the enzymes of the early steps including squalene synthesis, epoxidation and cyclisation to lanosterol and the genes involved in modification of this sterol. The reaction sequence involves a C-14 demethylase, a C-14 reductase, a C-3 dehydrogenase and a C-3 keto reductase yielding zymosterol. Next conversions are by C-24 methyl transferase to fecosterol by a C-8 isomerase to episterol and by a C-22 desaturase and a C-24 reductase to the final pathway product ergosterol. Among all genes of this sterol pathway, ERG 5 is the only gene cloned before from X. dendrorhous .
All genes of carotenoid biosynthesis were known before our genome sequencing. Their numbers from the X. dendrorhous genome sequencing are XDEN_03692 for crtYB encoding the phytoene synthase/lycopene cyclase gene , XDEN_03755 for crtI the gene of a phytoene desaturase  and XDEN_04454 for the astaxanthin synthase gene . XDEN_00679 is the gene coding for a reductase which provides the electrons for the P450-type astaxanthin synthase .
In fungi, multiple fatty acid synthesis options are present which is structurally differently organized . In the cytoplasm of eukaryotes, fatty acid synthesis operates with a multi-enzyme fatty acid synthase (FAS) comples (type I) with discrete functional domains for the individual reactions organized on two polypeptides. In addition, an independent mitochondrial (prokaryotic) synthesis pathway exists in fungi which uses independent enzymes (type II) encoded by separate genes . In X. dendrorhous, the dominating fatty acids are palmitic, oleic and linoleic acid (unpublished results). Figure 4A shows the biosynthesis pathway to these fatty acids catalyzed by type I FAS. The synthesis starts with the acetyl CoA carboxylase and the acyl carrier protein (Table 3). The following reactions, the formation of acetyl-ACP and malonyl-ACP, condensations, ketoacyl reduction, the dehydratase reaction, and enoyl reduction all the way to palmityl-CoA are catalyzed two multi-enzyme complexes FAS1 and FAS2. The sequences of the individual domains could be identified and located on both FAS genes (Figure 4B). The additional genes involved in the elongation of C16 to C18 fatty acid and the insertion of a delta-9 and a delta-12 double bond were also identified in the X. dendrorhous genome. For the latter desaturase, we found two candidate genes.
Secondary metabolism analyses
Genes involving in the secondary metabolism apart from terpenoids were predicted by using the SMURF  online web server and secondary metabolite clusters were defined. Only one polyketide synthase (PKS) like and two non-ribosomal peptide synthetases (NRPS) like genes were predicted from the genome of X. dendrorhous. Two candidate secondary metabolite clusters were predicted (Additional file 1: Table S1). Backbone genes of these clusters have been listed in the Additional file 1: Table S2. Further InterPro domain analyses of the PKS-like gene predicted a beta-ketoacyl synthase (KS) domain, an acyl transferase (AT) domain and an acyl carrier protein (ACP) domain within XDEN_04041.
Orthology analyses among Tremellales
Xanthophyllomyces dendrorhous protein sequences were tested for orthology with the two other available Tremellomycetes genomes, i.e. Cryptococcus neoformans (teleomorph Filobasidiella neoformans) and Tremella mesenterica. A total of 3721 orthologs are shared by all of the three genomes, 239 orthologs are shared by only X. dendrorhous and C. neoformans, while C. neoformans and T. mesenterica (Tremellales) share 1061 orthologs not present in X. dendrorhous (Cystofilobasidials) (Figure 1B), highlighting that X. dendrorhous is not closely related to C. neoformans, supporting the splitting of the Tremellomycetes into the Cystofilobasidiales on the one hand and the core Tremellomycetes (Filobasidiales, Holtermanniales, Trichosporonales, Tremellales) on the other.
Phylogenetic analyses were done using 48 fungal genomes (Additional file 1: Table S3), including the genome of X. dendrorhous. Orthologs among these genomes were identified, and a total of 636 orthologs were predicted in all these 48 genomes. Of these, 137 were 1:1 orthologs, which were used to perform the phylogenetic analyses. The maximum likelihood tree generated using RAxML was supported by high to maximum support for all nodes and revealed a sister-group relationship of Agaricomycotina and Ustilaginomycotina (Figure 5). Wallemia was placed basal within the Agaricomycotina and X. dendrorhous appeared as sister taxon to C. neoformans and T. mesenterica. In general, the Tremellomycetes with Xanthophyllomyces, but without Wallemia, were revealed as monophyletic and to be the sister clade to the remaining Agaricomycotina. For the Agaricomycotina, the Dacryomycetes were confirmed as the sister-group to the Agaricomycetes. Within the Agaricomycetes, the Auriculales occupied the basal position, followed by the Hymenochetales, which were revealed as the sister-group of the other Agaricomycetes included with maximum support. Boletales and Agaricales were found to group together with maximum support and formed the sister group to a clade comprising the orders Corticiales, Gloeophyllales, Russulales, and Polyporales. Within this group, which was supported by a 94% bootstrap support, the former as well as the latter two orders were grouped together with maximum and high support, respectively. This is also in line with the results of the Bayesian phylogenetic inference (Additional file 2: Figure S2).
Genome assembly and completeness
Over the past few years next generation sequencing technologies have extensively been used to elucidate the whole genome and transcriptome of fungal species [28-32]. In this work Illumina sequencing have been used for sequencing the genome and transcriptome of a red-pigmented yeast, Xanthophyllomyces dendrorhous. The of the fact that more than 70% of the genome were represented by only 7 scaffolds of more than 1.7 Mb in size and that 98% of the genome was covered by just 15 scaffolds highlights that the genome has been assembled almost to chromosome-length scaffolds. The high quality of the genome assembly is also suggested by a recovery of about 98.8% of the core eukaryotic genes in a CEGMA analysis. This is most likely the result of using an appropriate combination of short paired-read as well as long distance mate-pair libraries and the fact that the strain we sequenced appears to be haploid. The low content of repeat elements of only about 3.1% further facilitates the genome assembly.
Secondary metabolite encoding genes have been described in several fungal genomes [33-37]. The X. dendrorhous genome encodes for only one polyketide synthase like gene and two non-ribosomal peptide synthetase like genes. This suggests a limited ability to produce biologically active substances as expected for species not depending on keeping other organisms at bay.
In line with earlier phylogenomic studies using fewer taxa [38,39] a sister-group relationship of Agaricomycotina and Ustilaginomycotina was inferred. With respect to septal pore ultrastructure it was interesting to note that with the inclusion of the cystofilobasidiomycete X. dendrorhous, Wallemia spp. remained in the most basal position within the Agaricomycotina. The septal pore apparatus of Wallemia with the typical tremelloid sacculate parenthesomes closely resembles that of the core Tremellomycetes (Filobasidiales, Holtermanniales, Trichosporonales, Tremellales), whereas the dolipores of Cystofilobasidiales lack parenthesomes [3,39]. Thus either the typical tremelloid sacculate parenthesomes at the septal pores were lost in the Cystofilobasidiales lineage, or have convergently evolved in Wallemia and the core Tremellomycetes, Assuming that it is unlikely that the complex tremelloid parenthosomes have evolved twice, a loss of the tremelloid parenthosomes in the Cytofilobasidiales seems to be the more parsimonious explanation. In this sense the evolution of the Agaricomycotina was apparently accompanied by the evolutionary development of parenthesomes at the septal pores that may improve the cell to cell communication .
In addition, our study shows with optimal support the monophyly of the group consisting of Tremellomycetes and Cystofilobasidiales. Based on morphological, ultra-structural, chemical, ecological and molecular data, the monophyly of the Tremellomycetes (incl. the Cystofilobasidiales) has been suggested by various authors . However, this had not been previously supported by molecular phylogenetic studies [41-43].
Within the Agaricomycetes, the phragmobasidial Auriculales were in a basal position and the Hymenochaetales with holobasidia were placed basal to the crown-group of the class. In contrast to an earlier phylogenomic study , phylogenetic relationships within the crown of Agaricomycetes showed a high resolution. The recently-described orders Gloeophyllales and Corticiales  were clustered together with maximum support, in line with the overview provided by Hibbett , and placed basal to a clade containing the Agaricomycetidae (represented by Agaricales and Boletales) and a second clade that included Polyporales and Russulales. The sister-group relationship of Polyporales and Russulales was supported by a bootstrap support of 98% in the phylogenomic analysis and is in contrast to the result from a multi-locus dataset , which discussed the order Russulales as a sister group to the Agaricomycetidae. The same topology was obtained using Bayesian Inference, with high support (Additional file 2: Figure S2).
CoA-related metabolic pathways and beyond
The two prominent terpenoid pathways in X. dendrorhous lead to the synthesis of sterols and carotenoids. They are of biotechnological importance due to the ongoing development of this fungus as a biological astaxanthin production system . In addition, it offers the potential for the hetrologous synthesis of novel sesquiterpenes like α–cuprene  or diterpenes instead of carotenoids. Annotation of all genes of terpenoid synthesis in X. dendrorhous starting with the mevalonate pathway and ending with ergosterol and astaxanthin was successful (Tables 1 and 2). Among these were the genes of two different prenyl transferases which sequentially provide C15 farnesyl pyrophosphate for sterol and geranylgeranyl pyrophosphate for carotenoid biosynthesis . Xanthophyllomyces dendrorhous possesses a unique astaxanthin synthase related the cytochrome P450 3A subfamily with an unknown phylogenetic origin . The highest similaritiy to fungal cytochrome P450 oxidases was found in Cryptococcus neoformans, but with only 36% identity.
The genes for specific biosynthesis pathways are often clustered in fungal genomes. This is not the case for sterol and carotenoid biosynthesis in the genome of X. dendrorhous and is in contrast to carotenogenic fungi from other groups in which these genes are organized in clusters. All carotenogenic fungi possess the crtYB and crtI genes. For example in Phycomyces blakesleeanus and Mucor circinelloides, both genes are found next to each other with a spacing of 1.4 or 0.5 kbp, respectively, but convergently transcribed  and in Fusarium fujikuroi, they are 0.6 kbp apart and transcribed together in the same direction .
Another important group of compounds originating from acetyl-CoA are the fatty acids. In the X. dendrorhous genome, the genes for the cytoplasmic pathway and the mitochondrial pathway could be discriminated (Table 3). The latter operates on individual enzymes  for which all genes could be annotated. The gene organization of the cytoplasmic pathway is more complex. In ascomycetes, the genes for two fatty acid synthase proteins 1 and 2 exist with all necessary eight enzymatic activities (Figure 4A). In contrast, most basidiomycetes like Laccaria, Coprinopsis and Ustilago possess a single very large protein with all necessary fatty acid synthesis activities . However, this is not the case in X. dendrorhous. Here, we found the genes for two distinct fatty acid synthase proteins 1 and 2 (Figure 4B) which resembles the situation in the related species Cryptococcus neoformans. However, we were unable to identify the subunit of the acyl-carrier protein, neither on FAS2 as in yeast nor on FAS1 as in C. neoformans .
Even under aerobic conditions, X. dendrorhous grows fermentative on glucose accumulating ethanol, which, at the beginning of the stationary phase, is re-used as growth substrate . Since carotenoid biosynthesis is highest in the oxidative phase, it is important to understand the unknown regulatory mechanisms responsible for optimum astaxanthin synthesis on different substrates. The whole genome sequence of X. dendrorhous now provides a source to address the genes of the primary metabolism, providing a basis for transcriptomic and metabolomic analysis. This should be helpful to look for regulatory circuits and metabolic networks which supply acetyl-CoA as substrate. The current study also provides genomic data from a species of the Agaricomycotina for setting up a basis for the comparison with other fungi to investigate how the C30 and C40 terpenoid pathways have developed.
The current study provides the first insights into a genome of a cystofilobasidiomycete and reveals that Wallemia is the most basal agaricomycotinous lineage, followed by the Tremellomycetes with a sister-group relationship between the Cystofilobasidiales and the core Tremellomycetes. Thus, this study provides further insights into the evolution of Agaricomycotina and suggests that the typical cisternal caps (parenthesomes) at the septal pores represent an apomorphic characteristic for the Argaricomycotina in general. Accordingly, the lack of parenthesomes at the septal pores may be apomorphic only for the Cystofilobasidiales. Phylogenomic investigations also support a sister-group relationship of Agaricomycotina and Ustilaginomycotina. Within the Agaricomycotina, the phylogenetic relationships of the species included were resolved with high to maximum support and provided evidence for a sister-group relationship of Polyporales and Russulales. With respect to the biotechnological potential of X. dendrorhous, the genome sequence will extremely facilitate genetic pathway engineering of secondary products. All genes of acetyl-Co A derived pathways could be annotated. They can be used to overproduce existing fatty acids and sterols in addition to carotenoids or extend these pathways yielding new products. Furthermore, the accessibility of genes of the primary metabolisms is extremely helpful to model and engineer an optimum precursor supply.
Growth and isolation of genomic DNA
The X. dendrorhous strain CBS6938 (= ATCC96594) was grown as shaking culture in YPD medium at 21°C for 5 days. The pellet from 15 ml of culture was suspended in 0.5 ml YPD and mixed with 300 μl of glass beads (0.25 mm-0.5 mm diameter). The cells were broken in a swing mill (Retsch MM200) at a frequency of 30/s. After centrifugation, the supernatant was collected and purified by extraction with phenol/chloroform/isoamylalcohol. Finally, the DNA was precipitated by adding 2.5 volumes of 100% ice-cold ethanol and 1/10 volume of a 3 M sodium acetate solution overnight at −20°C. The DNA was pelleted, washed with 70% ice-cold ethanol and dried at room temperature. The DNA pellet was suspended in 30 μl H2O and stored at 4°C. The amount of isolated DNA was determined from an agarose gel after staining with ethidium bromide by densitometry of the fluorescence and comparison to standard DNA of known amounts.
Isolation of RNA
RNA was extracted by using NucleoSpin® RNA Plant kit (MACHEREY-NAGEL GmbH & Co. KG) according to the instructions of the manufacturer. The sample cultivation conditions for the RNA isolation are the same as above. The RNA quality was controlled using a NanoPhotometer (IMPLEN) as well as being evaluated on a 1.5% agarose gel stained with ethidium bromide.
Preprocessing of genomic and transcriptomic reads
Data filtering parameter estimations and data filtering steps were performed on adapter/primer trimmed data using FastQFS (Sharma and Thines, unpublished). A length cutoff of 100 bps and an average quality cutoff of 26 phred score was used to filter reads from all three libraries. RNA-Seq data was filtered using Trimmomatic , with a length cutoff of 32 and a quality cutoff of 15 in a window of 5 bp.
Genome assembly, genome assembly completeness assessment and repeat element masking
The genome of X. dendrorhous was assembled using the Velvet  genome assembler. All three libraries of insert sizes of 250 bps, 3 kb, and 8 kb were used to generate scaffolds. Velvet was optimized by testing several k-mer sizes and k-mer coverage cutoffs. An optimal assembly was generated by a k-mer of a length of 93 and using a k-mer coverage cutoff of 15. The completeness of the genome assembly was assessed using the CEGMA  pipeline. Repeat elements within the assembled genome were predicted using RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html). Both Recon  and RepeatScout  tools were used within the RepeatModeler pipeline for de novo repeat element predictions. Reference-based repeat element search was done using the Repbase libraries v20130422 . Tandem repeat elements were identified using trf  within the RepeatModeler pipeline and the final set of predicted repeat elements were masked by using RepeatMasker (http://www.repeatmasker.org/). Repeat element characterizations were also done by using TransposonPSI (http://transposonpsi.sourceforge.net/).
Gene prediction and annotation
Both ab initio and alignment-based methods were used to predict protein coding genes within the assembled genome. Genemark-ES  was used for generating the first set of gene models. These gene models were tested for RNA-seq coverage greater than 10X using Samtools . The gene models supported by RNA-Seq were further used for training Augustus . RNA-seq reads were mapped on the assembled genome by using Tophat2  and transcripts were generated by using Cufflinks . The resulting bam file from Tophat2 was used to generate Intron/exon hints from Augustus predictions.
The transcript sequences obtained from Cufflinks were mapped on the genome by using PASA  and GMAP . These gene models and the information from GeneMark-ES, Augustus, PASA and GMAP was used for obtaining consensus gene models by using EVM . High weights were given to the transcript-mapped gene models. In another round of gene predictions, RNA-Seq data was again mapped to the gene masked and repeat masked genome. Newly obtained transcripts were added to the gene models generated by the first round of gene predictions (Additional file 1: Figure S1).
Gene annotations were performed using Blast2GO . Protein family analyses were done using the standalone versions of PANTHER . KEGG  analyses were performed using the KAAS  online server. The euKaryotic Orthologous Group cluster (KOG)  analyses was performed locally by downloading KOG protein sequences; and alignments were done using the standalone BlastP  with an e-value cutoff of e-5. Protein domain analysis was done using Interproscan . TribeMCL  was used for the clustering of protein sequences. For the annotation of the biosynthesis pathways, biochemical information of the enzymes were searched in the KEGG database (http://www.genome.jp/kegg/kegg2.html), the BRENDA Enzyme Information system (http://www.brenda-enzymes.de/index.php4?page=information/introduction.php4) and at NCBI.
Protein subcellular localization
Protein subcellular localization was predicted using ProtComp9 (http://linux1.softberry.com/). Proteins having an extracellular secretion signal were predicted using SignalP v4.1 . These outputs were further filtered using TargetP v1  and TmHmm  predictions, for excluding proteins targeted to the mitochondrion or containing transmembrane domains, respectively.
Orthology and phylogenetic analyses
In total 48 fungal genomes were used for the identification of orthologous genes for conducting a phylogenomic analysis. Ortholog predictions were done considering all protein sequences of 48 fungal genomes using OrthoMCL . OrthoMCL was run using a percentage identity cutoff of 50% and an e-value cutoff of e−5. Multiple sequence alignments of 1:1 orthologs were performed using Mafft  with the G-INS-i algorithm. Maximum Likelihood phylogenetic inference on the concatenated set was done using RAxML , using the GAMAWAG model and 1000 bootstrap replicates. In another approach MrBayes  was run on the aligned protein sequences using 2 million generations, sampling every 500th tree and discarding the first 95% of the trees sampled before inferring posterior probability values. For reference of the specific parameters, the Bayes block has been deposited at http://dx.doi.org/10.12761/SGN.2015.1.
All 3 genomic sequence libraries [EMBL: ERR575093-ERR575095] and a RNA-Seq library [EMBL: ERR575096] have been submitted to the European Nucleotide Archive (ENA) database (Study accession number: PRJEB6925). The assembled scaffolds of X. dendrorhous and annotations have also been submitted to ENA and can be accessed from accession ids LN483084-LN483350. Genome and annotation files are also available at our local server and can be accessed from http://dx.doi.org/10.12761/SGN.2015.1.
M. W. Miller MY, Masami S. Phaffia, a new yeast genus in the deuteromycotina (Blastomycetes). Int J Syst Evol Microbiol. 1976;26(2):286–91.
John Webster RW. Introduction to fungi. 3rd ed. 2007.
Weiß M BR, Sampaio JP, Oberwinkler F. Tremellomycetes and related groups, vol. 7A. 2nd ed. Berlin, Germany: Springer-Verlag; 2014.
David-Palma M, Libkind D, Sampaio JP. Global distribution, diversity hot spots and niche transitions of an astaxanthin-producing eukaryotic microbe. Mol Ecol. 2014;23(4):921–32.
Kucsera J, Pfeiffer I, Ferenczy L. Homothallic life cycle in the diploid red yeast Xanthophyllomyces dendrorhous (Phaffia rhodozyma). Antonie Van Leeuwenhoek. 1998;73(2):163–8.
Golubev WI. Perfect state of Rhodomyces dendrorhous (Phaffia rhodozyma). Yeast. 1995;11(2):101–10.
Hermosilla G, Martinez C, Retamales P, Leon R, Cifuentes V. Genetic determination of ploidy level in Xanthophyllomyces dendrorhous. Antonie Van Leeuwenhoek. 2003;84(4):279–87.
Wery J, Gutker D, Renniers AC, Verdoes JC, van Ooyen AJ. High copy number integration into the ribosomal DNA of the yeast Phaffia rhodozyma. Gene. 1997;184(1):89–97.
Reynders MB, Rawlings DE, Harrison STL. Demonstration of the Crabtree effect in Phaffia rhodozyma during continuous and fed-batch cultivation. Biotechnol Lett. 1997;19(6):549–52.
William A, Schroeder EAJ. Antioxidant role of carotenoids in Phaffia rhodozyma. Microbiology. 1993;139(5):907–12.
William A, Schroeder EAJ. Carotenoids protectPhaffia rhodozyma against singlet oxygen damage. J Ind Microbiol. 1995;14(6):502–7.
Sandmann G MN. The mycota X industrial applications. In: Karl Esser PAL, Bennett JW, Heinz D, editors. The Mycota X industrial applications. Osiewacz: Springer Verlag Berlin; 2002.
Ojima K, Breitenbach J, Visser H, Setoguchi Y, Tabata K, Hoshino T, et al. Cloning of the astaxanthin synthase gene from Xanthophyllomyces dendrorhous (Phaffia rhodozyma) and its assignment as a beta-carotene 3-hydroxylase/4-ketolase. Mol Genet Genomics. 2006;275(2):148–58.
Alcaino J, Barahona S, Carmona M, Lozano C, Marcoleta A, Niklitschek M, et al. Cloning of the cytochrome p450 reductase (crtR) gene and its involvement in the astaxanthin biosynthesis of Xanthophyllomyces dendrorhous. BMC Microbiol. 2008;8:169.
Gassel S, Breitenbach J, Sandmann G. Genetic engineering of the complete carotenoid pathway towards enhanced astaxanthin formation in Xanthophyllomyces dendrorhous starting from a high-yield mutant. Appl Microbiol Biotechnol. 2014;98(1):345–50.
Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9.
Eisenreich W, Rohdich F, Bacher A. Deoxyxylulose phosphate pathway to terpenoids. Trends Plant Sci. 2001;6(2):78–84.
Verdoesa JC, Ooyenab AJJ. Isolation of the isopentenyl diphosphate isomerase encoding gene of Phaffia rhodozyma; improved carotenoid production in Escherichia coli. Acta Botanica Gallica: Botany Lett. 1999;146(1):43–53.
Breitenbach J, Visser H, Verdoes JC, van Ooyen AJ, Sandmann G. Engineering of geranylgeranyl pyrophosphate synthase levels and physiological conditions for enhanced carotenoid and astaxanthin synthesis in Xanthophyllomyces dendrorhous. Biotechnol Lett. 2011;33(4):755–61.
Alcaino J, Romero I, Niklitschek M, Sepulveda D, Rojas MC, Baeza M, et al. Functional characterization of the Xanthophyllomyces dendrorhous farnesyl pyrophosphate synthase and geranylgeranyl pyrophosphate synthase encoding genes that are involved in the synthesis of isoprenoid precursors. PLoS One. 2014;9(5):e96626.
Weete JD, Abril M, Blackwell M. Phylogenetic distribution of fungal sterols. PLoS One. 2010;5(5):e10899.
Loto I, Gutierrez MS, Barahona S, Sepulveda D, Martinez-Moya P, Baeza M, et al. Enhancement of carotenoid production by disrupting the C22-sterol desaturase gene (CYP61) in Xanthophyllomyces dendrorhous. BMC Microbiol. 2012;12:235.
Verdoes JC, Krubasik KP, Sandmann G, van Ooyen AJ. Isolation and functional characterisation of a novel type of carotenoid biosynthetic gene from Xanthophyllomyces dendrorhous. Mol Gen Genet. 1999;262(3):453–61.
Verdoes JC, Misawa N, van Ooyen AJ. Cloning and characterization of the astaxanthin biosynthetic gene encoding phytoene desaturase of Xanthophyllomyces dendrorhous. Biotechnol Bioeng. 1999;63(6):750–5.
Schweizer E, Hofmann J. Microbial type I fatty acid synthases (FAS): major players in a network of cellular FAS systems. Microbiol Mol Biol Rev. 2004;68(3):501–17. table of contents.
Schneider R, Brors B, Massow M, Weiss H. Mitochondrial fatty acid synthesis: a relic of endosymbiontic origin and a specialized means for respiration. FEBS Lett. 1997;407(3):249–52.
Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, et al. SMURF: genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol. 2010;47(9):736–41.
Janbon G, Ormerod KL, Paulet D, Byrnes 3rd EJ, Yadav V, Chatterjee G, et al. Analysis of the genome and transcriptome of Cryptococcus neoformans var. grubii reveals complex RNA expression and microevolution leading to virulence attenuation. PLoS Genet. 2014;10(4):e1004261.
Toome M, Ohm RA, Riley RW, James TY, Lazarus KL, Henrissat B, et al. Genome sequencing provides insight into the reproductive biology, nutritional mode and ploidy of the fern pathogen Mixia osmundae. New Phytol. 2014;202(2):554–64.
Morita T, Koike H, Hagiwara H, Ito E, Machida M, Sato S, et al. Genome and transcriptome analysis of the basidiomycetous yeast Pseudozyma antarctica producing extracellular glycolipids, mannosylerythritol lipids. PLoS One. 2014;9(2):e86490.
Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B, et al. The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science. 2012;336(6089):1715–9.
Sharma R, Mishra B, Runge F, Thines M. Gene loss rather than gene gain is associated with a host jump from monocots to dicots in the smut fungus Melanopsichium pennsylvanicum. Genome Biol Evol. 2014;6:2034–49.
Dean RA, Talbot NJ, Ebbole DJ, Farman ML, Mitchell TK, Orbach MJ, et al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature. 2005;434(7036):980–6.
Ehrlich KC, Yu J, Cotty PJ. Aflatoxin biosynthesis gene clusters and flanking regions. J Appl Microbiol. 2005;99(3):518–27.
Xu J, Saunders CW, Hu P, Grant RA, Boekhout T, Kuramae EE, et al. Dandruff-associated Malassezia genomes reveal convergent and divergent virulence traits shared with plant and human fungal pathogens. Proc Natl Acad Sci U S A. 2007;104(47):18730–5.
Yu J, Bhatnagar D, Cleveland TE. Completed sequence of aflatoxin pathway gene cluster in Aspergillus parasiticus. FEBS Lett. 2004;564(1–2):126–30.
Yu J, Chang PK, Ehrlich KC, Cary JW, Bhatnagar D, Cleveland TE, et al. Clustered pathway genes in aflatoxin biosynthesis. Appl Environ Microbiol. 2004;70(3):1253–62.
Zajc J, Liu Y, Dai W, Yang Z, Hu J, Gostincar C, et al. Genome and transcriptome sequencing of the halophilic fungus Wallemia ichthyophaga: haloadaptations present and absent. BMC Genomics. 2013;14:617.
Padamsee M, Kumar TK, Riley R, Binder M, Boyd A, Calvo AM, et al. The genome of the xerotolerant mold Wallemia sebi reveals adaptations to osmotic stress and suggests cryptic sexual reproduction. Fungal Genet Biol. 2012;49(3):217–26.
Bloemendal S, Kuck U. Cell-to-cell communication in plants, animals, and fungi: a comparative review. Naturwissenschaften. 2013;100(1):3–19.
Bauer RBD, Sampaio JP, Weiß M, Oberwinkler F. The simple-septate basidiomycetes: a synopsis. Mycol Prog. 2006;5:41–66.
Matheny PBGJ, Zalar P, Arun Kumar TK, Hibbett DS. Resolving the phylogenetic position of the Wallemiomycetes: an enigmatic major lineage of Basidiomycota. Can J Bot. 2006;84:1794–805.
Millanes AM, Diederich P, Ekman S, Wedin M. Phylogeny and character evolution in the jelly fungi (Tremellomycetes, Basidiomycota, Fungi). Mol Phylogenet Evol. 2011;61(1):12–28.
Ebersberger I, de Matos SR, Kupczok A, Gube M, Kothe E, Voigt K, et al. A consistent phylogenetic backbone for the fungi. Mol Biol Evol. 2012;29(5):1319–34.
Hibbett DS, Binder M, Bischoff JF, Blackwell M, Cannon PF, Eriksson OE, et al. A higher-level phylogenetic classification of the Fungi. Mycol Res. 2007;111(Pt 5):509–47.
Hibbett DS. A phylogenetic overview of the Agaricomycotina. Mycologia. 2006;98(6):917–25.
Melillo E, Setroikromo R, Quax WJ, Kayser O. Production of alpha-cuprenene in Xanthophyllomyces dendrorhous: a step closer to a potent terpene biofactory. Microb Cell Fact. 2013;12:13.
Sanz C, Velayos A, Alvarez MI, Benito EP, Eslava AP. Functional analysis of the Phycomyces carRA gene encoding the enzymes phytoene synthase and lycopene cyclase. PLoS One. 2011;6(8):e23102.
Linnemannstons P, Prado MM, Fernandez-Martin R, Tudzynski B, Avalos J. A carotenoid biosynthesis gene cluster in Fusarium fujikuroi: the genes carB and carRA. Mol Genet Genomics. 2002;267(5):593–602.
Reich M, Gobel C, Kohler A, Buee M, Martin F, Feussner I, et al. Fatty acid metabolism in the ectomycorrhizal fungus Laccaria bicolor. New Phytol. 2009;182(4):950–64.
Wozniak A, Lozano C, Barahona S, Niklitschek M, Marcoleta A, Alcaino J, et al. Differential carotenoid production and gene expression in Xanthophyllomyces dendrorhous grown in a nonfermentable carbon source. FEMS Yeast Res. 2011;11(3):252–62.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15)):2114–20.
Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23(9):1061–7.
Bao Z, Eddy SR. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002;12(8):1269–76.
Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21 Suppl 1:i351–8.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1–4):462–7.
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–80.
Borodovsky M, Lomsadze A. Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinformatics. 2011;Chapter 4:Unit 4 6 1–10.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Stanke M, Schoffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7:62.
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5.
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith Jr RK, Hannick LI, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66.
Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21(9):1859–75.
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9(1):R7.
Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36(Database issue):D281–8.
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35(Web Server issue)):W182–5.
Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, et al. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004;5(2):R7.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.
Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, et al. InterProScan: protein domains identifier. Nucleic Acids Res. 2005;33(Web Server issue):W116–20.
Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30(7):1575–84.
Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8(10):785–6.
Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000;300(4):1005–16.
Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305(3):567–80.
Li L, Stoeckert Jr CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–89.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.
Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22(21):2688–90.
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61(3):539–42.
We thank Claus Weiland for support with respect to cluster access. This work was supported by the research funding program LOEWE “Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz” of Hesse’s Ministry of Higher Education, Research, and the Arts in the framework of IPF and BiK-F.
The authors declare that they have no competing interests.
MT and GS designed the study. SG, SS, and XX processed the strain sequenced in this study. RS performed genome assembly, gene predictions and annotations, orthology analyses, phylogenetic analyses. SS and GS performed the annotation of metabolite pathways. GS, MT, RB, and RS wrote the manuscript, with contributions from other authors. All authors read and approved the final manuscript.
Secondary metabolite clusters predicted within the genome of X. dendrorhous. Table S2. Backbone genes of the two secondary metabolite clusters predicted within the genome of X. dendrorhous. Table S3. List of fungal genomes used for phylogenetic analyses. Figure S1. Gene prediction pipeline used for predicting genes within the genome of X. dendrorhous.
Phylogenetic tree based on Bayesian phylogenetic inference. Numbers on branches denote posterior probabilities.
About this article
Cite this article
Sharma, R., Gassel, S., Steiger, S. et al. The genome of the basal agaricomycete Xanthophyllomyces dendrorhous provides insights into the organization of its acetyl-CoA derived pathways and the evolution of Agaricomycotina. BMC Genomics 16, 233 (2015). https://doi.org/10.1186/s12864-015-1380-0