Comparative genomics provides new insights into the diversity, physiology, and sexuality of the only industrially exploited tremellomycete: Phaffia rhodozyma
© The Author(s). 2016
Received: 5 October 2016
Accepted: 1 November 2016
Published: 9 November 2016
The class Tremellomycete (Agaricomycotina) encompasses more than 380 fungi. Although there are a few edible Tremella spp., the only species with current biotechnological use is the astaxanthin-producing yeast Phaffia rhodozyma (Cystofilobasidiales). Besides astaxanthin, a carotenoid pigment with potent antioxidant activity and great value for aquaculture and pharmaceutical industries, P. rhodozyma possesses multiple exceptional traits of fundamental and applied interest. The aim of this study was to obtain, and analyze two new genome sequences of representative strains from the northern (CBS 7918T, the type strain) and southern hemispheres (CRUB 1149) and compre them to a previously published genome sequence (strain CBS 6938). Photoprotection and antioxidant related genes, as well as genes involved in sexual reproduction were analyzed.
Both genomes had ca. 19 Mb and 6000 protein coding genes, similar to CBS 6938. Compared to other fungal genomes P. rhodozyma strains and other Cystofilobasidiales have the highest number of intron-containing genes and highest number of introns per gene. The Patagonian strain showed 4.4 % of nucleotide sequence divergence compared to the European strains which differed from each other by only 0.073 %. All known genes related to the synthesis of astaxanthin were annotated. A hitherto unknown gene cluster potentially responsible for photoprotection (mycosporines) was found in the newly sequenced P. rhodozyma strains but was absent in the non-mycosporinogenic strain CBS 6938. A broad battery of enzymes that act as scavengers of free radical oxygen species were detected, including catalases and superoxide dismutases (SODs). Additionally, genes involved in sexual reproduction were found and annotated.
A draft genome sequence of the type strain of P. rhodozyma is now available, and comparison with that of the Patagonian population suggests the latter deserves to be assigned to a distinct variety. An unexpected genetic trait regarding high occurrence of introns in P. rhodozyma and other Cystofilobasidiales was revealed. New genomic insights into fungal homothallism were also provided. The genetic basis of several additional photoprotective and antioxidant strategies were described, indicating that P. rhodozyma is one of the fungi most well-equipped to cope with environmental oxidative stress, a factor that has probably contributed to shaping its genome.
The basidiomycetous yeast Phaffia rhodozyma (synonym Xanthophyllomyces dendrorhous) belongs to a basal lineage of the Agaricomycotina within the Tremellomycetes and possess a set of unique characteristics of outstanding scientific interest and technological value. It is best known as one of the few currently commercially exploited natural sources of astaxanthin, an economically important pigment widely used in aquaculture and pharmaceutical industries , with an expected global market size for 2015 of a quarter-billion dollars . P. rhodozyma is so far the only astaxanthinogenic yeast known , and this carotenoid pigment is considered one of the most potent free reactive oxygen species (ROS) scavenger. Recently, numerous reports have demonstrated that astaxanthin, when used as a nutritional supplement, can act as an anticancer agent; reduce the risk of diabetes, cardiovascular diseases, and neurodegenerative disorders; and stimulate immunization .
This exceptional property of P. rhodozyma is supposed to have evolved as a result of its adaptation to live in association with plant substrates, particularly tree exudates in mountain environments where ROS are generated by high levels of UV radiation (UVR) , and/or the phylloplane of mountain trees where cells are directly affected by UV radiation [6, 7]. In line with this hypothesis, additional photoprotective strategies were found in P. rhodozyma, such as the synthesis of an antioxidant compound named Phaffiol  and the accumulation of mycosporine-glutaminol-glucoside (MGG), a UVB-screening compound that also has antioxidant properties [9, 10].
The microbial phylogeography and ecology of P. rhodozyma are also interesting due to the strong association, and possible co-evolution, of the yeast with specific tree species of birch in the Northern Hemisphere [11–13] and southern beech (Nothofagus spp.) in the Southern Hemisphere [6, 7, 14]. Many genetically distinct, natural populations of P. rhodozyma are known worldwide, but most of the diversity is found in the Southern Hemisphere, mainly in Australasia, whereas Holarctic populations are mostly genetically uniform . The population structure of this yeast seems to be driven by adaptation to the different niches as a result of long-distance dispersal, and the observed genetic diversity correlates with host tree genera, rather than with geography .
The sexual stage of P. rhodozyma is unusual because it does not involve a unicellular to filamentous stage transition, an exception among basidiomycetous yeasts that might be related to the adaptive loss of filamentous structures that are normally related to the exploitation of solid substrates. In most basidiomycetous yeasts, the sexual cycle is initiated by mating of two compatible strains of distinct mating types (heterothallism) followed by the formation of a dikaryotic mycelium , but in the case of Phaffia, no such compatibility system appears to be necessary. P. rhodozyma has an homothallic mating behavior  usually involving the conjugation between the mother cell and its bud (pedogamy) on polyol-rich media , followed by the formation of a slender, non-septate basidium (holobasidium), with basidiospores arising terminally on its apex. Occasionally, basidial formation may result from the conjugation of identical but independent cells or without apparent conjugation (one single cell, usually larger than the vegetative cells originates the basidium) . In heterothallic basidiomycetous yeasts, sexual identity is determined by mating type-specific genes encoding pheromone/receptors (P/R) and homeodomain (HD) transcription . However, the presence/absence and function of these genes in homothallic basidiomycetes, including P. rhodozyma has not yet been fully elucidated.
Currently, two type strains have been designated for this yeast (CBS 5905 and CBS 7918T) since, for some time, it was believed these strains were not conspecific . It was later determined that both strains belong to the same species , a confusion resulting from the fact that the anamorphic strain CBS 5905 is a hybrid or admixed strain derived from two genetically distinct lineages of P. rhodozyma . Thus, we consider the Holarctic strain CBS 7918T to be the valid type strain for the species. Here we selected it for genome sequencing to compare it to the previously published genome sequence of another Holarctic isolate, CBS 6938 . We also sought to obtain and analyze the draft genome sequence of a representative P. rhodozyma strain from the Southern Hemisphere (CRUB 1149) to compare it to its counterparts from the Northern Hemisphere. In particular, we focused on genes related to photoprotection and antioxidant activities, as well as genes involved in sexual reproduction.
Results and discussion
Genome sequencing, assembly, and gene prediction
Detailed information on the three strains of P. rhodozyma included in this study
Origin and isolation reference
Sap of Betula sp. stumps, Finland (Golubev et al., 1995 )
Sharma et al., 2015 
VKM Y-2786, JCM 9681, KCTC 17160, NCYC 2774
Exudate of Betula verrucosa, Moscow, Russia (Golubev et al., 1995 )
Water from Lake Ilon, surrounded by Nothofagus pumilio, Patagonia, Argentina (Libkind et al., 2007 )
Comparative analysis of genomes, assemblies, and genes statistics for P. rhodozyma and other fungi
CBS 7918 T
We also recovered, for the first time, contigs for mtDNA, rDNA cluster, and pDK1 plasmid assembled with coverages of ~1500X, ~6000X and ~2000X, respectively. The excess of sequencing coverage of these contigs suggests a copy-number ratio of 45:1 mtDNA:nDNA and the existence of ~182 clusters of rDNA in the genome of P. rhodozyma CBS 7918T. Sequences encoding all specific rRNA subunits were successfully found in both mitochondrial and nuclear contigs. The resulting CBS 7918T rDNA assembly yielded 9042 nucleotides and contains the entire gene cluster: 18S, 5.8S, 28S, and 5S including IGS and ITSs regions. The partial rDNA regions of IGS, ITS, and 26S from the same strain (AF139633, AF075496, and NR077107) showed 100 % similarity to the assembled rDNA operon. Such alignments cover 2017 nts that represents 22 % of the assembled operon. Moreover, we obtained the complete CRUB 1149 rDNA cluster, which has 0.35 % mismatches relative to the CBS. This is the first report of the complete rDNA cluster (which includes phylogenetic barcodes routinely used by taxonomists) of the type strain of P. rhodozyma.
We predicted 5980 coding genes for CBS 7918T and 6016 for CRUB 1149, both having less genes than CBS 6938 (n = 6385). Reciprocal best blast hits of only canonical CDS obtained using de novo gene predictions (6098 / 6260 for CBS 6938; 5877 / 5980 to CBS 7918T and 5916 / 6016 for CRUB 1149) that the three strains share 5463 genes. The number of genes shared between CBS 7918T and the CBS 6938 strains and not present in CRUB 1149 is 213. On the other hand, the CBS 7918T and the CRUB 1149 share 68 genes, among this group are 3 genes responsible of mycosporine synthesis (see section). The mean percentages of identical amino acids were lower for the Patagonian strain being 98.6 % for CBS 7918T and 95.0 % for CRUB 1149 relatives to the CBS 6938. The lower number of common genes and the higher dissimilarity detected in the Patagonian strain is probably related to the fact that it belongs to a genetically divergent lineage. In order to retrieve putative Phaffia orphan genes, we worked with the common set of predicted genes of the three strains (n = 5463) and kept genes with expression evidence from Sharma et al.  (RNA-seq of CBS 6938) that were not present in any of the other tremellomycetous yeasts and lacked hits in the NCBI nr database (see methods). This resulted in a set of 283 orphan genes that is shared among the three strains of P. rhodozyma, is transcribed, and contains putative CDSs that do not have known relatives in any other branch of the tree of life (Additional file 1: Table S1). However, we found that 66 % (n = 188) of the proteins included in this set contain at least one known protein domain (see methods), suggesting that many may have distantly related sequences that simply missed our detection threshold. Further analyses are required to determine the origin and function of these genes .
The density of genes per Mb for the three genomes was similar, ranging between 318 and 321 genes/Mb, and most of the genes (97–98.4 %) contained introns (7.5 introns/gene, 111 avg. intronic length and 199 avg. exonic length). The proportion of intron-containing genes and number of introns per genes are, together with species of the sister genus Mrakia among the highest described in any fungal species, suggesting that this might not be an exclusive trait of P. rhodozyma, but rather of the Cystofilobasidiales. To further evaluate this hypothesis, we used the available annotations and ran de novo predictions of 77 Basidiomycota and Ascomycota genomes (those listed in Table 2 plus 48 additional genomes). The average percentage of intron-less genes (ILG) and introns per gene (IPG) of the Ascomycota was 62.93 % and 1.63 respectively, while for Basidiomycota, we found a lower proportion of ILG (19.81 %) and a higher average number of IPG (4.35). Among the latter, tremellomycetous fungi showed the lowest %ILG (4.32 all or 7.52 without considering Cystofilobasidiales) and the highest IPG number (6 all or 5.22 without considering Cystofilobasidiales). Members of the Cystofilobasidiales were an extreme case among Tremellomycetes fungi with %ILG = 2.19 and IPG = 7.65. Thus, P. rhodozyma represents an unusual case of a yeast where almost all genes possess multiple introns. Moreover, the density of introns in the Cystofilobasidiales ranks relatively highly among all eukaryotic organisms, suggesting intron gain relative to the Last Eukaryotic Common Ancestor (LECA) .
To assess the quality of our genome assemblies and annotations, we retrieved 205 public sequences of P. rhodozyma CBS 7918T from NCBI and successfully mapped them against our assembly (BlastN e-values between 3e-07 and 0.0) (Additional file 2: Table S2). Reciprocal best BLAST hits identified 2938 (49.1 %) orthologous proteins in C. neoformans and 1417 (23.7 %) in S. cerevisiae; 1328 (22.2 %) of these proteins were in common among the three taxa. We found 49 tRNAs dispersed among the scaffolds, corresponding to 12 anticodon species. Another suggested measure for quantitative assessment of the genome completeness is the number of core eukaryotic genes (CEGs) present. Applying the models defined by Parra et al. , we found 247 and 245 out of 248 confident CEGs, respectively, for the type and CRUB strains. The average CEGs for other tremellomycetous species with complete genomes available is 244. Thus, several analytical parameters indicate that the two assemblies and annotations of P. rhodozyma obtained here are adequate tools for comparative genomics and for mining of genes of fundamental and/or applied interest.
Genomic diversity within P. rhodozyma
Photoprotective and antioxidant strategies in P. rhodozyma
There is evidence that P. rhodozyma has evolved strategies to cope with high levels of environmental oxidative and UV radiation (UVR) stress [9, 29, 30]. However, with the exception of the elucidation of most of the genes involved in astaxanthin synthesis due to applied interests, there is scarce knowledge on the genetic basis of complementary strategies against oxidative stress. For example, the genes encoding the enzymes involved in the synthesis of mycosporine-glutaminol-glucoside (MGG) and ROS scavenging enzymes, such as catalases and superoxide dismutases (SODs), remain uncharacterized. Our genome mining of newly sequenced strains of P. rhodozyma allowed the localization of all known genes related to the synthesis of astaxanthin (Additional file 3: Table S3). We were also able to identify a hitherto unknown gene cluster that may be responsible for MGG synthesis in yeasts, as well as a battery of enzymes (catalases and SODs) whose orthologs in other fungal species act as scavengers of free radical oxygen species (ROS) (Additional file 3: Table S3).
The complete set of genes responsible for astaxanthin biosynthesis were annotated for both Phaffia strains (Additional file 3: Table S3) and compared to those of CBS 6938. With the exception of the crtR enzyme (0.13 % amino acid sequence divergence), we did not find any nonsynonymous differences between the Holarctic P. rhodozyma strains for the seven genes studied. On the other hand, the CRUB 1149 strain showed aminoacidic dissimilarities when compared to CBS 6938 for all these genes (values ranged 0.39–2.52 %). Nucleotide sequence variability has already been reported for some of these genes, and partial sequences of idi, crtI, and crtS proved to be valuable molecular markers for genetic differentiation of the distinct lineages of P. rhodozyma .
Although astaxanthin biosynthesis has been elucidated at the genetic level, the complex regulatory mechanism controlling this process is scarcely known, and it is a focus of ongoing research . The repressive effect of glucose on the expression of the crtYB, crtI, and crtS genes was demonstrated in P. rhodozyma and potential Mig1-binding sites in the promoter regions of the three genes were found, suggesting that transcriptional regulation mechanisms may be involved in this inhibition [31, 32]. We located a putative homolog (G04777_PC) of the Cryptococcus gattii Mig1 gene. Moreover, we were able to identify a gene that encodes for a beta-carotene 15,15′-monooxygenase (G04735_P), which is related with the β-carotene cleavage oxygenases produced in Fusarium fujikuroi by the gene CARX  and in Ustilago maydis by the gene Cco1 . These enzymes are related to the production of retinal via the cleavage of different carotenoids with at least one β-ionone ring. Retinal is one of the best-known apocarotenoid and serves as a chromophore for opsins, which are involved in several photoreceptor functions. In Fusarium fujikuroi, the production of retinal is involved in the regulation of the carotenoid biosynthetic pathway via a negative feedback mechanism . The presence in P. rhodozyma of this gene, which has proven regulatory functions in the carotenoid biosynthesis of other fungi, leads us to speculate that it may play a similar role in this yeast. To the best of our knowledge this gene has been scarcely studied in basidiomycetous fungi. It also represents a potential new target for genetic engineering approaches aimed at the improvement of astaxanthin yields in P. rhodozyma.
Mycosporines (MYC) synthesis has recently been shown to occur through two distinct pathways in cyanobacteria [35, 36]. Blaskus & Walsh , proposed a biosynthetic route in cyanobacteria that consists of a specific cluster of three genes that encodes a DHQS homolog (2-epi-5-epi-valiolone synthase, EEVS-like) O-methyltransferase (O-MT) and a gene encoding an ATP-grasp . In the P. rhodozyma CBS 7918T genome, we identified an 8-kb-long gene cluster in scaffold 175 that encodes at least three proteins (EEVS-like, O-MT and ATP-Grasp) that are homologous to those responsible for mycosporine synthesis in cyanobacteria . This finding is consistent with the fact that P. rhodozyma produces the sunscreen molecule mycosporine-glutaminol-glucoside (MGG) . An identical cluster of genes was detected in the Patagonian strain CRUB 1149, which, along with 15 other P. rhodozyma isolates tested, also produces MGG. Thus, we were surprised when we failed to detect any evidence of the putative MGG cluster in the >100x coverage genome assembly of CBS 6938, either using the original assembly, or our own assembly. RNA-seq data from the same strain also failed to reveal the expression of genes from a MGG cluster. When we experimentally tested CBS 6938 for mycosporine synthesis, we found that it was negative, a result that agrees with the absence of the gene cluster in its genome. The strain CBS 6938 represents the first exception to this hypothesis that all strains of P. rhodozyma can synthesize MGG [9, 38], although it might not be the only one because additional P. rhodozyma MGG-negative isolates have recently been recovered from Antarctic environments . These results reinforce the hypothesis that the synthesis of MGG in yeasts is not essential and is rather an adaptation for coping with environmental stress conditions, specifically UV light and/or oxidative damage. A positive correlation between MGG synthesis and UVR tolerance has been previously reported for yeast .
The present study provides the first insight into a set of genes that could potentially be responsible for the synthesis of MGG in P. rhodozyma, an interesting photoprotective compound with potential application in cosmetics. Similar molecules named mycosporine-like amino acids (MAAs) are being already exploited as active ingredients of commercial sunscreen products, such as Helioguard and Helionori . Thus, although our findings need experimental validation (ongoing work in our lab), it definitely represents a step forward towards a better understanding of the molecular basis of the biosynthetic pathways that give rise to the evolutionarily and industrially important metabolite.
A sequence matching the conserved domain cd08152, which is related to the y4iL protein of Rhizobium sp. NGR234, was also found in P. rhodozyma (encoded on scaffold 66). This protein, of bacterial origin, shares the catalase fold and heme binding motif, suggesting that it might have also catalase activity. This y4iL-like gene is present in other 8 species of the Tremellales studied, as well as in the 3 genomes of P. rhodozyma. Although the activity of this gene is uncharacterized, it possibly contributes to the H2O2 - inactivating mechanisms of P. rhodozyma.
Superoxide dismutase enzymes
The presence of Mn-SOD and the absence of Cu/Zn-SOD in P. rhodozyma was previously described reported [45, 46]. Given that it has been reported that Mn-SOD is present only in the mitochondria, while Cu/Zn-SOD is cytosolic , it was proposed that P. rhodozyma would be hypersensitive to oxidative stress. Interestingly, genome sequence analysis revealed the presence of two different genes encoding Mn-SODs, confirming the possible existence of two isozymes . Signal peptide sequences analysis suggests that one Mn-SOD could be localized to the mitochondria and the other to the cytosol. In the ascomycetes, multiple gene duplication events of the gene encoding the mitochondrial Mn-SODs were coupled with the subsequent loss of the amino-terminal mitochondrial targeting sequence, and it has been proposed that basidiomycetes also underwent a late gene duplication . The two isozymes we observed in P. rhodozyma likely correspond to this latter duplication event, which also provides an evolutionary mechanism for their possible differential subcellular localization. It is remarkable that both Mn-SODs are present in Wallemia sebi and Stereum hirsutum, suggesting that the duplication is a plesiomorphic character in the Agaricomycotina. However, Tremellales lost the gene encoding the cytosolic Mn-SOD, while Cystofilobasidiales lost the gene encoding the cystosolic Cu-SOD. SOD activity in P. rhodozyma is known to confer resistance to KCN and H2O2, two compounds that affect Cu/Zn-SOD but not Mn-SOD. The existence of three different types of catalases and the lack of H2O2-sensitive SOD enzymes suggests that interaction with this reactive species has been important in shaping P. rhodozyma genome.
Genes involved in sexual reproduction
Our genome survey also identified orthologs encoding components of the conserved pheromone response pathway that is activated upon pheromone/receptor interaction during mating in C. neoformans, namely the genes encoding the subunits of the heterotrimeric G protein (GPA1-3, STE4, and STE18) and those that compose the MAP kinase module itself (STE11, STE7, and CPK1). Moreover, we identified a gene encoding a p21-activated kinase (STE20) in the vicinity of the STE3-2 gene, which is consistent with observations in other basidiomycetes, viz. species of the pathogenic Cryptococcus neoformans complex  and of the sensu lato Kwoniella clade  in Tremellales (Agaricomycotina), as well as in yeast species in the Pucciniomycotina [52, 53]. A final set of orthologs encoding transcription factors that have key roles in mating in S. cerevisiae (Ste12) , C. neoformans (Mat2 and Znf2) , and U. maydis (Prf1)  were also found and are listed in Additional file 4: Table S4. The analysis of the genome assembly of P. rhodozyma strain CRUB 1149 yielded identical results to those obtained with the type strain (CBS 7918T).
The present study provides new genomic insights into several biological and genetic aspects of the industrially relevant yeast P. rhodozyma, an early diverging Agaricomycete with exceptional physiological and ecological properties. By analyzing, for the first time, the valid type strain of the species together with a genetically distinct lineage from the Southern Hemisphere, comparative genomic analyses within the species and among the Tremellomycetes become possible. Indication of conspecificity of the northern and southern strains was obtained, though due to the relatively high level of genomic divergence detected, these strains might be considered different varieties in the future. The proportion of intron-containing genes and the number of introns per gene in P. rhodoyma are the highest hitherto known for fungi, having values similar to those found in humans. Although it remains to be confirmed using a larger set of genomes, available data suggest that this trait might not be species-specific but rather might be shared with other members of the Cystofilobasidiales. Genome mining provided the first insight into the genetic basis of the synthesis of mycosporine-glutaminol-glucoside, another biotechnologically important molecule due to its antioxidant and UV sunscreen activities. We also observed that a putative cluster of genes found in MGG-producing strains was absent in non-mycosporinogenic ones. Further studies are in progress to elucidate the biosynthetic pathway for MGG synthesis in yeasts. The study of genes encoding additional enzymes that protect against oxidative stress revealed an unexpected diversity of catalases and the loss of H2O2-sensitive superoxide dismutases. Our results indicate that the P. rhodozyma genome is enriched in antioxidant mechanisms, in particular those most effective to cope with H2O2, suggesting that the environmental interaction with this reactive species has been of great relevance to the evolution of P. rhodozyma.
Strain, culture conditions, and DNA sequencing
Yeasts were cultured 72 h in 15 ml YM broth (g/l, yeast extract 3; malt extract 3; peptone 5; dextrose 10) at 20 °C, and genomic DNA was extracted using a modified phenol:chloroform:isoamyl alcohol method . DNA was dissolved in TE buffer (10 mM Tris–HCl, 1 mM EDTA, pH 7.6) with RNase A (100 μg/ml). Paired-end Illumina libraries with an average length of 455 bp, as measured by Agilent 2100 Bioanalyzer, were constructed following Hittinger et al. . The genome of Phaffia rhodozyma CBS 7918T and CRUB 1149 were sequenced by using Illumina GA IIX paired-end reads. A total of 6,995,372 paired-end reads with a length of 115 nucleotides were generated for a combined depth of coverage of 41X-fold for the CBS 7918T and 9638996 paired-end reads with same length for the CRUB 1149 with a coverage of 57X-fold.
Genome assembly and correction
De novo genome assembly was performed with SPAdes 3.1.1 , including adapter removal, trimming, quality filtering, and error correction, resulting in 6805670 paired-end reads with a mean length of 115 nt and an estimated insert size of 317 nt, yielding 769 contigs with length > =200 bp for the CBS 7918T strain. Out of 361 scaffolds with a length >2Kb, we selected 343 with a median read coverage of 10.5, standard deviation of 5.1, a minimum of 5.2, and maximum of 71.2. The average GC content was 47.3 % with a standard deviation of 2.7 %. The remainder 18 scaffolds had coverages between 357 and 2074 and, together with shorter contigs, were considered as non-nuclear DNA for contig extension rounds (see below). The assembly of the 9356088 CRUB 1149 corrected reads with mean length of 115 nt and an estimated insert size of 327 nt, yielded 642 contigs with length > =200 bp. Out of 322 scaffolds with length >2Kb we kept 305 with average coverage of 17.5. Mitochondrial DNA, pDK1 plasmids, and regions of rDNA nuclear clusters were assembled using custom scripts. Such scripts performed: read alignment to NCBI deposited sequences, selection of seed contigs, extension of contigs by multiple sequence alignment, and information content (IC) calculation. Reads were aligned to 28S-5S ribosomal RNA ITS 1 partial sequence and to pDK1 plasmid sequence with Blat v34  with the default parameters (NCBI accession numbers AF139633.1 and AJ278424.1 respectively). Cryptococcus neoformans mitochondrial proteins were downloaded from Broad Institute, and alignments were performed with Blast v2.2.17 (parameter: -Q4)  and Blat to mitochondrial-encoded rRNAs, RNL, and RNS. A seed read with maximum coverage were selected for each target sequence. Rounds of contig extension were performed by aligning reads with Lastz v1.03.54  (parameters: --step = 10 --seed = match12 --notransition --exact = 20 --noytrim --match = 1,5 --ambiguous = n --coverage = 40..100 –identity = 95) fixing positions with IC > 0.5 and a minimum coverage of 100.
Gene prediction and functional annotation
Ab initio gene prediction with GeneMark-ES v2.3e  was self-trained on the genome scaffolds (parameters: --min_contig 8000 --max_nnn 1000). Repeats and low-complexity sequences were retrieved by RepeatMasker v4.0.3  using the RepBase library . tRNAs were predicted by tRNAscan-SE v1.23 . ncRNAs, including rRNAs, were predicted with HMMER v3.1b1 . Automatic annotation of genes was performed by recording the best reciprocal blastp hits (e-value < 10−5, identity > 50 %) to the Cryptococcus neoformans (Broad Institute) and Saccharomyces cerevisiae (SGD) proteomes. Blast2GO  was used to retrieve functional annotations. Additionally, multiple sequence alignments were generated by MAFFT v6.935b . Enzymatic domains were predicted using PRIAM  (parameters: -pt 0.5 -mo -1 -mp 70 -cc T -cg T -e T -cm T), and related pathways were retrieved from the KEGG public database. No automatic improvements or consensus gene models were made by combining evidence due to low abundance of available mRNAs or ESTs. However, manual curation was applied to all genes with transcript sequences deposited in NCBI. We used tblastn and blastp to map genes involved in meiosis, mating, and the synthesis of photoprotective and antioxidant compounds from available sequences of Phaffia sp., Cryptococcus, sp., Ustilago maydis, Neurospora crassa and Saccharomyces cerevisiae (NCBI, JGI, and SGD databases, all permission granted). Protein domains were predicted with PFAM .
In order to compare between gene sets we use 77 fungal genomes, 23 of Tremellomycetes show at Table 2. Since gene prediction strategies differed between genome submissions, and it would introduce inconsistencies in the comparative analyses, we predicted such gene structures with GeneMark-ES v2.3e  using same parameters applied to the two strains sequenced in the present study. Gene features listed in Table 2 were retrieved from de-novo predictions. We selected just canonical CDS (ATG-STOP) to analyze gene orthology in Phaffia strains. Best reciprocal hits (BRH), blastp (e-value < 10−5), were calculated between the three possible pairs of gene sets using as primary and secondary selective criterion the e-value and number of nucleotides aligned respectively. Shared genes among the 3 strains or present only in two genomes were selected by pairs of consistent BRH. The putative set of orphan genes was defined from common genes for the 3 Phaffia strains with transcriptional evidence, at least 50 % of read to gene sequence coverage, from CBS 6938 RNA-seq data . Predictions present in any other Tremellomycetes were filtered out using blastp (e-value < 10−5). A second filter using blastp with the nr NCBI database discarded any gene with a hit (e-value < 10−5) to any other species. Among the resulting 286 genes, we found 188 containing at least one PFAM domain.
Pairwise genome-wide alignments were produced with Blat using the default parameters . Sequence divergence among the 3 P. rhodozyma genomes was estimated by applying a sliding windows approach using the longest scaffolds of the strain CBS 6938 as references, corresponding to 17 scaffolds that covers 98.73 % of its assembly. We used a window size and step of 1Kb; windows with less than 500 bp unambiguously called between the 2 alignments were discarded. Finally, the divergence value for each datapoint was calculated as the average for such window plus the five windows on each flank . Alignment-free pairwise distances, Kr, based on shustrings was calculated with GenomeTools . Although Kr values above 0.5 are generally regarded as unreliable, we used a more restrictive threshold of 0.3 because we found inconsistent values between genome triads at high Kr levels.
Phylogenetic tree reconstruction
A total of 21 proteomes were obtained from JGI and UCSC databases (permissions granted). The set of 248 core eukaryotic genes (CEGs) were scanned with HMMER, applying the specific models and thresholds defined by Parra et al., . Multiple alignments were produced using MUSCLE  (default parameters) for the common 210 orthologous proteins. These alignments were concatenated, gaps were eliminated, and the resulting length was 60,298 columns. The phylogeny was inferred using MrBayes v3.2.2 . The tree was reported as a cladogram. Alignment and other parameters are available as supplementary material.
Analysis of photoprotective and antioxidant genes
Genes involved in the synthesis of photoprotective and antioxidant metabolites or enzymes were identified in the genome assemblies of P. rhodozyma CBS 7918T and CRUB 1149, by tblastn or by blastp using the predicted proteins. Mycosporine genes were detected using the protein sequences of the genes Ava_3856, Ava_3857 and Ava_3858 from Anabaena variabilis . The carotenoid genes crtE, crtI, and crtYB were used to identify candidate genes for carotenoid production in Tremellomycetes. Catalases were detected using representative proteins of the families cd08154, cd08155, cd08156, cd08157, and cd08152; these sequences were downloaded from the Conserved Domains Database (CDD) of NCBI. Superoxide dismutases were identified using representative proteins of the families PF00080, PF00081, PF02777, and PF09055 as queries; these sequences were downloaded from the PFAM database of EMBL-EBI. Catalases were categorized using phylogenetic analyses that compared P. rhodozyma protein sequences with the predicted protein sequences from other Tremellomycetes. Sequences were aligned using MUSCLE , and unrooted phylogenetic trees were constructed using RAxML v. 7.3.5  with the PROTGAMMAWAG model of amino acid substitutions, eliminating columns with gaps. Branch supports were determined using 1000 rapid bootstrap replicates. The subcellular localization of the superoxide dismutases were predicted using TargetP 1.1 (http://www.cbs.dtu.dk/services/TargetP/) .
Identification of genes involved in sexual reproduction
The genomic regions containing the genes that determine sexual identity in basidiomycetes (genes encoding the homeodomain transcription factors HD1/HD2 and the mating pheromones/receptors) were identified in the genome assemblies of P. rhodozyma CBS 7918T and CRUB 1149 (or in local databases of proteins resulting from genome annotation) by reciprocal tblastn and blastp, respectively, using C. neoformans Mat proteins (Sxi1, Sxi2, Mfa1, and Ste3) as queries (Additional file 4: Table S4). Pheromone precursor genes failing detection by blast due to their short length and highly variable sequences were identified manually upon inspection of the genomic regions in the vicinity of pheromone receptor genes by searching for the existence of ORFs whose deduced protein sequences contained a conserved CAAX motif at the C-terminus. To ascertain the contiguity of the scaffolds harboring the two sets of pheromone and receptor genes, a pair of primers (MP100 5′-TCCATCCTCAACTGATTGC-3′ and MP103 5′-TTCATCTTGTCAGACAGC-3′) were used to amplify and partially sequence the intervening region between both pheromone precursor genes. Standard PCR and cycling conditions were used with Phusion® High-Fidelity DNA Polymerase using an annealing temperature of 51 °C and extension for 90 s. Protein sequences of genes involved in the pheromone signaling cascade in C. neoformans , were used to identify the corresponding putative orthologs in P. rhodozyma by blast searches (Additional file 4: Table S4).
This work was partially funded in Argentina by grants PICT 1814 and PICT 2542 (ANPCYT), PIP 424 (CONICET) and B171 (UNComahue), in Portugal by grant PTDC/BIA-GEN/112799/2009 and by the Unidade de Ciências Biomoleculares Aplicadas-UCIBIO, which is financed by national funds from FCT/MEC (UID/Multi/04378/2013) and co-financed by the ERDF under the PT2020 Partnership Agreement (POCI-01-0145-FEDER-007728), and in USA by the National Science Foundation under grants DEB-1253634 and DEB-1442148 and funded in part by the DOE Great Lakes Bioenergy Research Center (DOE Office of Science BER DE-FC02-07ER64494). CTH is a Pew Scholar in the Biomedical Sciences and an Alfred Toepfer Faculty Fellow, supported by the Pew Charitable Trusts and the Alexander von Humboldt Foundation, respectively. MAC and MD-P hold, respectively, grants SFRH/BPD/79198/2011 and SFRH/BD/81895/2011 from Fundação para a Ciência e a Tecnologia, Portugal. We thank Jim Dover for technical support and Mark Johnston for providing access to an Illumina GAIIx instrument at the University of Colorado School of Medicine. To Dr. Cifuentes (U.N. Chile) for providing a set of mRNAs for annotation and quality checks. We thank Laurie Connell, Christina Cuomo, Ratan Gachhui, and Joseph Heitman for the authorization of use of their genome sequences.
Availability of data and material
Genome assembly and annotations have been deposited at DDBJ/EMBL/GenBank under the following accession numbers: PRJNA306035 (CBS 7918T) and PRJNA307837 (CRUB 1149).
The following data sets are included as Additional files 6, 7 and 8. These files, together with the genome and annotation files, will be also available at our local server and can be accessed from http://www.comahue-conicet.gob.ar:8080/Genome_Phaffia
DL, NB, JPS, and PG designed the study. DL and CTH obtained the Illumina reads. NB performed genome assembly, gene predictions and annotations, and orthology. NB and MM carried out phylogenetic analyses. MM, MAC, and MDP performed the annotation of specific genes. DL, NB, MM, MAC, CTH and MDP wrote the manuscript, with contributions from other authors. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Authorization for us of all genome data of unpublished genomes has been granted from the original authors.
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Sandmann G. Carotenoids of Biotechnological Importance. In: Biotechnology of Isoprenoids. Edited by Schrader J, Bohlmann J, vol. 148: New York: Springer International Publishing; 2015. p 449–67.Google Scholar
- Schmidt I, Schewe H, Gassel S, Jin C, Buckingham J, Hümbelin M, Sandmann G, Schrader J. Biotechnological production of astaxanthin with Phaffia rhodozyma/Xanthophyllomyces dendrorhous. Appl Microbiol Biotechnol. 2011;89:555–71.View ArticlePubMedGoogle Scholar
- Fell JW, Johnson EA. Phaffia M.W. Miller, Yoneyama & Soneda (1976). In: The yeasts: a taxonomic study. Kurtzman C, Fell JW, Boekhout T, editors. Amsterdam: Elsevier; 2011. p 1853–4.Google Scholar
- Ambati RR, Phang S-M, Ravi S, Aswathanarayana RG. Astaxanthin: sources, extraction, stability, biological activities and its commercial applications—a review. Mar Drugs. 2014;12:128–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Schroeder WA, Johnson EA. Singlet oxygen and peroxyl radicals regulate carotenoid biosynthesis in Phaffia rhodozyma. J Biol Chem. 1995;270:18374–9.View ArticlePubMedGoogle Scholar
- David‐Palma M, Libkind D, Sampaio JP. Global distribution, diversity hot spots and niche transitions of an astaxanthin‐producing eukaryotic microbe. Mol Ecol. 2014;23:921–32.View ArticlePubMedGoogle Scholar
- Libkind D, Tognetti C, Ruffini A, Sampaio JP, Van Broock M. Xanthophyllomyces dendrorhous (Phaffia rhodozyma) on stromata of Cyttaria hariotii in northwestern Patagonian Nothofagus forests. Rev Argent Microbiol. 2011;43:198–202.Google Scholar
- Jinno S, Hata K, Shimidzu N, Okita T. Phaffiaol, a new antioxidant isolated from a yeast Phaffia rhodozym. J Antibiot. 1998;51:508–11.View ArticlePubMedGoogle Scholar
- Libkind D, Moline M, van Broock M. Production of the UVB‐absorbing compound mycosporine–glutaminol–glucoside by Xanthophyllomyces dendrorhous (Phaffia rhodozyma). FEMS Yeast Res. 2011;11:52–9.View ArticlePubMedGoogle Scholar
- Moliné M, Arbeloa EM, Flores MR, Libkind D, Farías ME, Bertolotti SG, Churio MS, van Broock MR. UVB photoprotective role of mycosporines in yeast: photostability and antioxidant activity of mycosporine-glutaminol-glucoside. Radiat Res. 2011;175:44–50.View ArticlePubMedGoogle Scholar
- Phaff H, Miller M, Yoneyama M, Soneda M. A comparative study of the yeast florae associated with trees on the Japanese islands and on the west coast of North America. In: Proceedings of the 4th IFS: Fermentation Technology Today Meeting Society of Fermentation Technology, Osaka, Japan. 1972. p. 759–74.Google Scholar
- Golubev V, Bab’eva I, Blagodatskaya V, Reshetova I. Taxonomic study of yeasts isolated from spring sap flows of birch (Betula verrucosa Ehrh.). Microbiology. 1977;46:564–9.PubMedGoogle Scholar
- Weber RW, Davoli P, Anke H. A microbial consortium involving the astaxanthin producer Xanthophyllomyces dendrorhous on freshly cut birch stumps in Germany. Mycologist. 2006;20:57–61.View ArticleGoogle Scholar
- Libkind D, Moliné M, de García V, Fontenla S, van Broock M. Characterization of a novel South American population of the astaxanthin producing yeast Xanthophyllomyces dendrorhous (Phaffia rhodozyma). Ind Microbiol Biotechnol. 2008;35:151–8.View ArticleGoogle Scholar
- Kües U, James TY, Heitman J. 6 Mating Type in Basidiomycetes: Unipolar, Bipolar, and Tetrapolar Patterns of Sexuality. In: Evolution of fungi and fungal-like organisms. New York: Springer; 2011. p 97–160.Google Scholar
- Kucsera J, Pfeiffer I, Ferenczy L. Homothallic life cycle in the diploid red yeast Xanthophyllomyces dendrorhous (Phaffia rhodozyma). Antonie Van Leeuwenhoek. 1998;73:163–8.View ArticlePubMedGoogle Scholar
- Golubev WI. Perfect state of Rhodomyces dendrorhous (Phaffia rhodozyma). Yeast. 1995;11:101–10.View ArticlePubMedGoogle Scholar
- Fell J, Blatt G. Separation of strains of the yeasts Xanthophyllomyces dendrorhous and Phaffia rhodozyma based on rDNA IGS and ITS sequence analysis. J Ind Microbiol Biotechnol. 1999;23:677–81.View ArticlePubMedGoogle Scholar
- Libkind D, Ruffini A, van Broock M, Alves L, Sampaio JP. Biogeography, host specificity, and molecular phylogeny of the basidiomycetous yeast Phaffia rhodozyma and its sexual form, Xanthophyllomyces dendrorhous. Appl Environ Microbiol. 2007;73:1120–5.View ArticlePubMedGoogle Scholar
- Sharma R, Gassel S, Steiger S, Xia X, Bauer R, Sandmann G, Thines M. The genome of the basal agaricomycete Xanthophyllomyces dendrorhous provides insights into the organization of its acetyl-CoA derived pathways and the evolution of Agaricomycotina. BMC Genomics. 2015;16:233.View ArticlePubMedPubMed CentralGoogle Scholar
- Tautz D, Domazet-Lošo T. The evolutionary origin of orphan genes. Nat Rev Genet. 2011;12:692–702.View ArticlePubMedGoogle Scholar
- Csuros M, Rogozin IB, Koonin EV. A detailed history of intron-rich eukaryotic ancestors inferred from a global survey of 100 complete genomes. PLoS Comput Biol. 2011; doi:10.1371/journal.pcbi.1002150.
- Parra G, Bradnam K, Ning Z, Keane T, Korf I. Assessing the gene space in draft genomes. Nucleic Acids Res. 2009;37:289–97.View ArticlePubMedGoogle Scholar
- Gostinčar C, Ohm RA, Kogej T, Sonjak S, Turk M, Zajc J, Zalar P, Grube M, Sun H, Han J. Genome sequencing of four Aureobasidium pullulans varieties: biotechnological potential, stress tolerance, and description of new species. BMC Genomics. 2014;15:549.View ArticlePubMedPubMed CentralGoogle Scholar
- Libkind D, Hittinger CT, Valério E, Gonçalves C, Dover J, Johnston M, Gonçalves P, Sampaio JP. Microbe domestication and the identification of the wild genetic stock of lager-brewing yeast. Proc Natl Acad Sci. 2011;108:14539–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Almeida P, Gonçalves C, Teixeira S, Libkind D, Bontrager M, Masneuf-Pomarède I, Albertin W, Durrens P, Sherman DJ, Marullo P. A Gondwanan imprint on global diversity and domestication of wine and cider yeast Saccharomyces uvarum. Nat Commun. 2014; doi:10.1038/ncomms5044.
- Peris D, Sylvester K, Libkind D, Gonçalves P, Sampaio JP, Alexander WG, Hittinger CT. Population structure and reticulate evolution of Saccharomyces eubayanus and its lager‐brewing hybrids. Mol Ecol. 2014;23:2031–45.View ArticlePubMedGoogle Scholar
- Peris D, Langdon QK, Moriarty RV, Sylvester K, Bontrager M, Charron G, Leducq JB, Landry CR, Libkind D, Hittinger CT. Complex ancestries of lager-brewing hybrids were shaped by standing variation in the wild yeast saccharomyces eubayanus. PLoS Genet. 2016;12(7):e1006155.View ArticlePubMedPubMed CentralGoogle Scholar
- Schroeder WA, Johnson EA. Carotenoids protect Phaffia rhodozyma against singlet oxygen damage. J Ind Microbiol. 1995;14:502–7.View ArticleGoogle Scholar
- Johnson EA. Phaffia rhodozyma: colorful odyssey. Int Microbiol. 2003;6:169–74.View ArticlePubMedGoogle Scholar
- Marcoleta A, Niklitschek M, Wozniak A, Lozano C, Alcaíno J, Baeza M, Cifuentes V. Glucose and ethanol-dependent transcriptional regulation of the astaxanthin biosynthesis pathway in Xanthophyllomyces dendrorhous. BMC Microbiol. 2011;11:190.View ArticlePubMedPubMed CentralGoogle Scholar
- Wozniak A, Lozano C, Barahona S, Niklitschek M, Marcoleta A, Alcaíno J, Sepulveda D, Baeza M, Cifuentes V. Differential carotenoid production and gene expression in Xanthophyllomyces dendrorhous grown in a nonfermentable carbon source. FEMS Yeast Res. 2011;11:252–62.View ArticlePubMedGoogle Scholar
- Prado-Cabrero A, Scherzinger D, Avalos J, Al-Babili S. Retinal biosynthesis in fungi: characterization of the carotenoid oxygenase CarX from Fusarium fujikuroi. Eukaryot Cell. 2007;6:650–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Estrada AF, Brefort T, Mengel C, Díaz-Sánchez V, Alder A, Al-Babili S, Avalos J. Ustilago maydis accumulates β-carotene at levels determined by a retinal-forming carotenoid oxygenase. Fungal Genet Biol. 2009;46:803–13.View ArticlePubMedGoogle Scholar
- Spence E, Dunlap WC, Shick JM, Long PF. Redundant pathways of sunscreen biosynthesis in a cyanobacterium. Chembiochem. 2012;13:531–3.View ArticlePubMedGoogle Scholar
- Colabella F, Moliné M, Libkind D. UV sunscreens of microbial origin: mycosporines and mycosporine-like aminoacids. Recent Pat Biotechnol. 2015;8:179–93.View ArticleGoogle Scholar
- Balskus EP, Walsh CT. The genetic and molecular basis for sunscreen biosynthesis in cyanobacteria. Science. 2010;329:1653–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Libkind D, Sommaruga R, Zagarese H, van Broock M. Mycosporines in carotenogenic yeasts. Syst Appl Microbiol. 2005;28:749–54.View ArticlePubMedGoogle Scholar
- Contreras G, Barahona S, Sepúlveda D, Baeza M, Cifuentes V, Alcaíno J. Identification and analysis of metabolite production with biotechnological potential in Xanthophyllomyces dendrorhous isolates. World J Microbiol Biotechnol. 2015;31:517–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu YS, Wu JY. Hydrogen peroxide-induced astaxanthin biosynthesis and catalase activity in Xanthophyllomyces dendrorhous. Appl Microbiol Biotechnol. 2006;73:663–8.View ArticlePubMedGoogle Scholar
- Hansberg W, Salas-Lizana R, Domínguez L. Fungal catalases: function, phylogenetic origin and structure. Arch Biochem Biophys. 2012;525:170–80.View ArticlePubMedGoogle Scholar
- Giles SS, Stajich JE, Nichols C, Gerrald QD, Alspaugh JA, Dietrich F, Perfect JR. The Cryptococcus neoformans catalase gene family and its role in antioxidant defense. Eukaryot Cell. 2006;5:1447–59.View ArticlePubMedPubMed CentralGoogle Scholar
- Zamocky M, Furtmüller PG, Obinger C. Evolution of catalases from bacteria to humans. Antioxid Redox Signal. 2008;10:1527–48.View ArticlePubMedPubMed CentralGoogle Scholar
- Zámocký M, Gasselhuber B, Furtmüller PG, Obinger C. Molecular evolution of hydrogen peroxide degrading enzymes. Arch Biochem Biophys. 2012;525:131–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Schroeder WA, Johnson E. Antioxidant role of carotenoids in Phaffia rhodozyma. J Gen Microbiol. 1993;139:907–12.View ArticleGoogle Scholar
- Martinez-Moya P, Niehaus K, Alcaíno J, Baeza M, Cifuentes V. Proteomic and metabolomic analysis of the carotenogenic yeast Xanthophyllomyces dendrorhous using different carbon sources. BMC Genomics. 2015;16:289.View ArticlePubMedPubMed CentralGoogle Scholar
- Fridovich I. Superoxide radical and superoxide dismutases. Annu Rev Biochem. 1995;64:97–112.View ArticlePubMedGoogle Scholar
- Fréalle E, Noël C, Viscogliosi E, Camus D, Dei‐Cas E, Delhaes L. Manganese superoxide dismutase in pathogenic fungi: an issue with pathophysiological and phylogenetic involvements. FEMS Immunol Med Microbiol. 2005;45:411–22.View ArticlePubMedGoogle Scholar
- David-Palma M, Sampaio JP, Gonçalves P. Genetic dissection of sexual reproduction in a primary homothallic basidiomycete. PLoS Genet. 2016;12(6):e1006110.View ArticlePubMedPubMed CentralGoogle Scholar
- Metin B, Findley K, Heitman J. The mating type locus (MAT) and sexual reproduction of Cryptococcus heveanensis: insights into the evolution of sex and sex-determining chromosomal regions in fungi. PLoS Genet. 2010;6: doi: 10.1371/journal.pgen.1000961.
- Guerreiro MA, Springer DJ, Rodrigues JA, Rusche LN, Findley K, Heitman J, Fonseca Á. Molecular and genetic evidence for a tetrapolar mating system in the basidiomycetous yeast Kwoniella mangrovensis and two novel sibling species. Eukaryot Cell. 2013;12:746–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Coelho MA, Sampaio JP, Gonçalves P. A deviation from the bipolar-tetrapolar mating paradigm in an early diverged basidiomycete. PLoS Genet. 2010;6: doi: 10.1371/journal.pgen.1001052.
- Maia TM, Lopes ST, Almeida JM, Rosa LH, Sampaio JP, Gonçalves P, Coelho MA. Evolution of mating systems in Basidiomycetes and the genetic architecture underlying mating-type determination in the yeast Leucosporidium scottii. Genetics. 2015;201:75–89.View ArticlePubMedPubMed CentralGoogle Scholar
- Errede B, Ammerer G. STE12, a protein involved in cell-type-specific transcription and signal transduction in yeast, is part of protein-DNA complexes. Genes Dev. 1989;3:1349–61.View ArticlePubMedGoogle Scholar
- Lin X, Jackson JC, Feretzaki M, Xue C, Heitman J. Transcription factors Mat2 and Znf2 operate cellular circuits orchestrating opposite-and same-sex mating in Cryptococcus neoformans. PLoS Genet. 2010;6: doi: 10.1371/journal.pgen.1000953.
- Kaffarnik F, Müller P, Leibundgut M, Kahmann R, Feldbrügge M. PKA and MAPK phosphorylation of Prf1 allows promoter discrimination in Ustilago maydis. EMBO J. 2003;22:5817–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Gonçalves P, Valério E, Correia C, de Almeida JM, Sampaio JP. Evidence for divergent evolution of growth temperature preference in sympatric Saccharomyces species. PLoS One. 2011;6: doi:10.1371/journal.pone.0020739.
- Hittinger CT, Gonçalves P, Sampaio JP, Dover J, Johnston M, Rokas A. Remarkably ancient balanced polymorphisms in a multi-locus gene network. Nature. 2010;464:54–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77.View ArticlePubMedPubMed CentralGoogle Scholar
- Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.View ArticlePubMedPubMed CentralGoogle Scholar
- Harris RS. Improved pairwise alignment of genomic DNA: ProQuest. 2007.Google Scholar
- Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 2008;18:1979–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Smit A, Hubley R, Green P. 1996–2010. RepeatMasker Open-3.0. http://www.repeatmasker.org.
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.View ArticlePubMedGoogle Scholar
- Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Eddy SR, Rost B. A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol. 2008;4: doi:10.1371/journal.pcbi.1000069.
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–6.View ArticlePubMedGoogle Scholar
- Katoh K, Kuma K-i, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Claudel‐Renard C, Chevalet C, Faraut T, Kahn D. Enzyme‐specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 2003;31:6633–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J. Pfam: the protein families database. Nucleic Acids Res. 2014;42:222–30.View ArticleGoogle Scholar
- Gremme G, Steinbiss S, Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform. 2013;10:645–56.View ArticlePubMedGoogle Scholar
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–90.View ArticlePubMedGoogle Scholar
- Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000;300:1005–16.View ArticlePubMedGoogle Scholar
- Feretzaki M, Heitman J. Genetic circuits that govern bisexual and unisexual reproduction in Cryptococcus neoformans. PLoS Genet. 2013;9: doi:10.1371/journal.pgen.1003688.