Comprehensive structural annotation of Pichia pastoris transcriptome and the response to various carbon sources using deep paired-end RNA sequencing
- Shuli Liang†1,
- Bin Wang†1,
- Li Pan1,
- Yanrui Ye1,
- Minghui He3,
- Shuangyan Han1,
- Suiping Zheng1,
- Xiaoning Wang1, 2Email author and
- Ying Lin1Email author
© Liang et al.; licensee BioMed Central Ltd. 2012
Received: 16 January 2012
Accepted: 18 December 2012
Published: 31 December 2012
The methylotrophic yeast Pichia pastoris is widely used as a bioengineering platform for producing industrial and biopharmaceutical proteins, studying protein expression and secretion mechanisms, and analyzing metabolite synthesis and peroxisome biogenesis. With the development of DNA microarray and mRNA sequence technology, the P. pastoris transcriptome has become a research hotspot due to its powerful capability to identify the transcript structures and gain insights into the transcriptional regulation model of cells under protein production conditions. The study of the P. pastoris transcriptome helps to annotate the P. pastoris transcript structures and provide useful information for further improvement of the production of recombinant proteins.
We used a massively parallel mRNA sequencing platform (RNA-Seq), based on next-generation sequencing technology, to map and quantify the dynamic transcriptome of P. pastoris at the genome scale under growth conditions with glycerol and methanol as substrates. The results describe the transcription landscape at the whole-genome level and provide annotated transcript structures, including untranslated regions (UTRs), alternative splicing (AS) events, novel transcripts, new exons, alternative upstream initiation codons (uATGs), and upstream open reading frames (uORFs). Internal ribosome entry sites (IRESes) were first identified within the UTRs of genes from P. pastoris, encoding kinases and the proteins involved in the control of growth. We also provide a transcriptional regulation model for P. pastoris grown on different carbon sources.
We suggest that the IRES-dependent translation initiation mechanism also exists in P. pastoris. Retained introns (RIs) are determined as the main AS event and are produced predominantly by an intron definition (ID) mechanism. Our results describe the metabolic characteristics of P. pastoris with heterologous protein production under methanol induction and provide rich information for further in-depth studies of P. pastoris protein expression and secretion mechanisms.
KeywordsRNA-Seq Transcriptome Pichia pastoris Methanol induction Internal ribosome entry site (IRES) Translation initiation mechanism
The methylotrophic yeast P. pastoris is widely used as a heterologous expression platform for the industrial production of a series of valuable proteins due to its excellent characteristics, such as highly inducible gene expression, high-density cell growth, and high secretory capability . The N-glycosylation pathway in P. pastoris has been reengineered to produce heterologous pharmaceutical proteins with human-like N-glycan structures, which may further improve the importance of P. pastoris in the biopharmaceutical industry [2–4]. P. pastoris is also used as a model organism to study the protein expression machinery, such as protein folding and secretion [5–9]. Apart from applications in protein expression, P. pastoris could also be used to study the biogenesis and degradation of the peroxisome .
Although the P. pastoris expression system has been investigated in numerous studies and has been commercially available for many years, there is still little physiological or genetic information available. A draft genome sequence of P. pastoris is now commercially available, but the strict obligation to keep the sequence information confidential has hampered the publication of relevant data . Due to the lack of reported P. pastoris genome sequence and related DNA microarrays, alternative approaches such as heterologous hybridization of P. pastoris cDNA with Saccharomyces cerevisiae microarrays  and transcript analysis with the aid of affinity capture (TRAC)  have been exploited to study the P. pastoris transcriptome. The first DNA microarray for P. pastoris was produced using commercial sequence data (Integrated Genomics, Chicago, IL, USA), containing partial P. pastoris genes, and examined the unfolded protein response during protein production . Recently, the genome sequences of three P. pastoris strains (GS115 , DSMZ70382 , CBS7435 ) have become publicly available. Transcriptomics, proteomics, and metabolic flux analysis data for P. pastoris will benefit from this now-public sequence information.
Transcriptomics is a favored approach for analyzing mRNA regulation patterns, providing snapshots of various physiological conditions or developmental stages. Recently, a massively parallel mRNA sequencing platform (RNA-Seq), based on next-generation sequencing technology, was used to map and quantify the dynamic transcriptome. Although RNA-Seq is a recent technology, and is still in active development, it has clear advantages over existing hybridization-based or tag sequence-based approaches for transcriptome analysis . It offers key advantages, detecting known transcripts at single-nucleotide resolution, measuring gene expression levels over a larger dynamic range, discovering new transcripts within the intronic and intergenic regions, characterizing antisense transcription and RNA editing, and identifying alternative splicing events and gene fusion phenomena, as described in several reviews [16–18]. RNA-Seq data also show a high level of reproducibility, for both technical and biological replicates .
To date, RNA-Seq has been used successfully to define the transcription landscape on a genome-scale for over a dozen higher eukaryotic organisms, ranging from animals to plants . S. cerevisiae[20–22], Schizosaccharomyces pombe, Candida albicans, Aspergillus oryzae, Aspergillus flavus, and several prokaryotes [27–29] have also been subjected to transcriptome analysis using the RNA-Seq technology.
On an industrial scale, the majority of P. pastoris processes described so far use glycerol as the substrate for fast growth to obtain high cell densities and methanol as the substrate and inducer for heterologous protein production. In this research, we sequenced P. pastoris poly(A)-enriched mRNA from growth conditions using glycerol or methanol as a substrate. We investigated the complex transcriptome of P. pastoris on a genome-scale using the RNA-Seq technology. Our results determined the transcription landscape on a whole P. pastoris genome scale (99.21%) and transcriptional level for the majority of P. pastoris annotated genes (4914 of the total of 5313 protein-encoding genes), defined 27 novel transcripts, four new exons, and untranslated regions (UTRs) for more than 900 genes. Alternative upstream initiation codons (uATGs) and upstream open reading frames (uORFs) in the 5’-UTRs of many genes were also identified. Internal ribosome entry sites (IRESes) were identified within the UTRs of the genes encoding kinases and proteins involved in growth control. Regarding AS events in the P. pastoris transcriptome, retained introns (RIs) was determined to be the main AS event and were produced predominantly by an intron definition (ID) mechanism. Based on the RNA-Seq data, we have depicted the differential gene expression of P. pastoris between growth conditions using glycerol and methanol as substrates and enriched the transcriptomic data available for P. pastoris.
Results and discussion
Summary of RNA-Seq data
To provide an analysis of the P. pastoris transcriptome at single base-pair resolution, cDNA libraries were constructed from poly (A)-enriched mRNA of P. pastoris chemostat cultures and analyzed using high-throughput Illumina sequencing. A paired-end sequencing strategy was used in RNA sequencing, in which the read length was augmented to 75 base pairs (bp). RNA samples were prepared from the chemostat cultures of P. pastoris GS115 transformed with different expression vectors bearing a lipase gene from Rhizomucor miehei (RML), including a secretory expression plasmid, a surface display expression plasmid, and a plasmid without the RML gene (control) (Additional file 1: Figure S1). When reaching the steady state, cell dry weights of P. pastoris strains were determined and samples were retained to isolate the RNA. The cell dry weight of CKM, SEM, SDEM_replicate1, SDEM_replicate2, SDEG_replicate1, and SDEG_replicate2 was 7.5, 7.3, 6.7, 6.9, 11.5 and 11.2 g/L, respectively.
Extensive coverage of the complete P. pastoris genome was detected in the RNA-Seq data (Figure 1C). Of the genome, 99.21% was expressed as RNA-Seq reads (Additional file 2: Table S1). In the BioinformaticsGent Online Genome Annotation System (BOGAS), in total, 5313 protein-encoding genes were annotated in the 9.43Mbp genomic sequence of P. pastoris GS115 strain. Of all the annotated genes, > 93.5% were detected with >90% sequence coverage (Additional file 2: Table S2). However, using S. pombe, > 122 million reads with a length of 39 bp from six samples under different growth conditions, it was possible to detect transcription for virtually all of the annotated genes with >50% sequence coverage . In this study, 152.7 million reads with a length of 75 bp provided a much more elaborate transcriptional status for P. pastoris annotated genes, detecting genes with lower expression levels, providing a powerful tool to dissect the detailed structure of the P. pastoris transcriptome at the single-base pair resolution.
To quantify gene expression levels by RNA-Seq data, the number of reads per kilobase of exon region per million mapped reads (RPKM) was calculated. The intronic and intergenic expression levels were only marginally lower than the expression level of exons (Figure 1B) and the mean expression level of novel transcripts was remarkably lower than these of intronic and intergenic regions. Overlapping transcriptional region in intergenic regions and antisense transcripts in intronic regions resulted in the higher expression level of intronic and intergenic regions. When an overlapping region occurs between two transcripts with opposite transcriptional orientation, its expression level was artificially elevated because reads mapping to both strands were all accounted. This becomes an issue especially in organisms with small genomes, where transcripts are densely packed. There were totally 1290 overlapping transcripts detected in the RNA-seq data (Additional file 2: Table S3). Boxplot analysis revealed that the expression level of overlapping transcripts was indeed much higher than their flanking regions (Additional file 1: Figure S2). Of the 5313 protein-coding genes in the P. pastoris genome database, 4970 were expressed as >1 RPKM with >90% sequence coverage in the RNA-Seq data (Additional file 2: Table S2). The control sample (CKM) without heterologous gene expression under conditions similar to P. pastoris natural growth conditions was used to depict its gene expression model (Additional file 1: Figure S3). Of all the annotated genes, 1204 genes were expressed greater than the 75thpercentile (> 114.29 RPKM). The 6010 Go terms of the P. pastoris 2544 genes were downloaded from the BoinformaticsGent Online Genome Annotation Service (BOGAS) to perform Gene ontology (GO) analysis. Gene ontology (GO) analysis revealed that genes involved in the protein production system (structural constituent of the ribosome, proteasome, and translational elongation) and energy production system (ATP synthesis coupled proton transport, glycolysis, and mitochondrion) were specifically enriched in this highly expressed gene group (false discovery rate (FDR) <0.05). For expression levels ranging from the 25th to 75th percentiles (33.13-114.29 RPKM), 2407 genes were detected, where no highly enriched gene group was identified, according to GO functional enrichment analysis. The genes enriched in nucleic acid metabolic processes were expressed lower than the 25th percentile. That is, the RNA-Seq data and related GO functional enrichment analysis clearly demonstrated that the expression levels of the genes involved in protein production and energy metabolism were high while that of the genes involved in nucleotide metabolism and genetic message transfer were low under CKM growth conditions, conforming to the physiological and metabolic status of P. pastoris under methanol-induced conditions.
Novel annotation of P. pastoris transcriptome using RNA-Seq data
UTR annotation for P. pastoris genes
IRESes in the 5’-UTRs of P. pastoris genes
Translation initiation in the vast majority of cellular mRNAs is mediated by a cap-dependent mechanism. However, under many cellular conditions (mitosis, apoptosis, cellular stresses of hypoxia and heat shock, and viral infection) when cap-dependent translation initiation is compromised, protein synthesis is mediated via an alternative initiation pathway, the internal ribosome entry site (IRES) dependent translation initiation . IRES elements have been found in mRNA from viruses, mammals, vertebrates, and yeasts . However, IRES elements have not previously been demonstrated in P. pastoris, a widely used protein expression platform, because of a lack of UTR information. Within the database of published IRESes (IRESdb), cellular mRNAs-encoded IRESes are classified according to the gene product function. The IRES-containing mRNAs are always transcripted by the genes encoding transcription factors, translation factors, chaperones, kinases, and proteins involved in growth control [32, 33]. Of the 5’-UTRs from 914 genes annotated in the present study, the 5’-UTRs of the protein kinase PAS_chr3_0850 (GCN2) and PAS_chr2-1_0459 (KOG1), involved in growth control, were selected and monitored for IRES activity.
Novel transcripts and exons in the P. pastoris genome
Cataloging of Alternative Splicing in P. pastoris under different growth conditions
We performed GO functional enrichment analysis of P. pastoris alternatively spliced genes. Of all the alternatively spliced genes, 135 (53.36%) were annotated by GO enrichment analysis via Fisher’s exact test (FDR < 0.05). Regarding molecular function, genes producing proteins with ion transmembrane transporter activity and ion-transporting ATPase activity have a high frequency of AS events. Cellular component analysis revealed that genes encoding macromolecular complexes and protein complexes were specifically enriched in AS events. Biological process analysis showed that AS events occurred mainly in the processes of macromolecular complex assembly and cellular component organization.
RI is the dominant AS variant in fungi with great statistical significance (p < 1e-10 by Fisher’s exact test) . Consistent with this, RI was detected as the main AS event in our P. pastoris RNA-Seq data (Figure 5B), accounting for 97.04% of all AS isoforms. In our study, RNA samples were subjected to DNase treatment to eliminate any possible contamination with genomic DNA. Therefore, RI events in our data should not be interfered with by intronic genome sequences. RI events were further validated by qPCR. The results illustrated that the retained intron of the PAS_chr1-1_0445 gene exhibited a transcriptional level equivalent to its annotated exon while the canonical intron had a relatively low transcriptional level (Additional file 1: Figure S5). AS events can be divided into four categories: RI, cassette exon (CE, including SE, alternative first exons (AFE), alternative last exons (ALE), and mutually exclusive exons (MXE)), A5SS, and A3SS. The ratio CE/(RI + CE) is usually used to summarize the pattern of the splice variants . This ratio in P. pastor is was only 0.76% (Figure 5B), comparable to that of five formerly investigated fungi . Further, it has been suggested that RI was the main AS type in P. pastoris. It has been reported that ID, as one of the splice site recognition mechanisms in eukaryotes, was prone to produce RI variants and the retained introns were usually shorter than the constitutive introns . The average length of the P. pastoris retained introns was 89 bp, shorter than the canonical introns (99 bp) in P. pastoris. Thus, P. pastoris may perform splice site recognition predominantly by the ID mechanism.
Analysis of differentially expressed genes in P. pastoris cultivated with glycerol and methanol
Compared with the SDEG culture, in the SDEM culture, GO analysis and KEGG metabolic pathway analysis revealed that up-regulated genes were specifically located in proteasome, autophagy, phagosome, thiamine metabolism, N-glycan biosynthesis, methane metabolism, protein processing in the endoplasmic reticulum, and protein export pathways (Additional file 1: Figure S6). Genes involved in the biosynthesis of unsaturated fatty acids and the citrate cycle were down-regulated (Figure 6B and Additional file 2: Table S10). These results described the metabolic characteristics of heterologous protein production of P. pastoris with methanol induction. Carbon shift from glycerol to methanol is an important process in heterologous protein expression. The genes involved in the methanol utilization pathway were markedly up-regulated under SDEM conditions, similar to the adaptation of Hansenula polymorpha to methanol . Alcohol oxidases (AOX, including AOX1 and AOX2) were up-regulated by 39.8-fold and 11.3-fold, respectively; they catalyze the methanol to formaldehyde and hydrogen peroxide reaction as the first step of the methanol utilization pathway. Other related genes, such as catalase (CAT), formaldehyde dehydrogenase (FLD), and S-formyl glutathione hydrolase (FGH), were also highly expressed (see supplementary data). For glycerol metabolism, P. pastoris homologs of S. cerevisiae putative passive glycerol channel (YFL054C) and the active glycerol importer (STL1) decreased significantly at the transcriptional level after the carbon shift from glycerol to methanol. Cytosolic glycerol kinase (GUT1) was also down-regulated because of the lack of glycerol supply.
Another significantly up-regulated gene group was vitamin metabolism. As an important vitamin, thiamine can be synthesized from hydroxylmethylpyrimidine pyrophosphate and hydroxyethylthiazole phosphate by TMP synthase. All thiamine-regulated genes in P. pastoris were induced on the carbon shift from glycerol to methanol (Additional file 2: Table S10), which was consistent with the circumstances in the batch fermentation at pH5 or pH3 . The up-regulation of these genes may be due to the activation of TPP riboswitches .
P. pastoris needs many more peroxisomes to resist the toxicity of methanol when methanol is the sole carbon source. In the SDEM culture, almost all of the genes controlling the development and the function of the peroxisome were up-regulated compared with the SDEG culture (Additional file 2: Table S10). Peroxisome proteins are encoded by nuclear genes, synthesized in the cytoplasm, and then imported into peroxisomes. Two peroxisomal targeting signal genes (PTS1 and PTS2) are essential for sorting proteins to this organelle. PEX5 and PEX7 encode receptors for PTS1 and PTS2, respectively . The expression level of PEX5 was much higher than that of PEX7, consistent with only a limited number of peroxisomal matrix proteins containing the PTS2 receptor . Both PEX1 and PEX6, which act in the terminal step of the peroxisomal matrix protein import  and participate in the recycling of peroxisomal signal receptors, were up-regulated, indicating that the requirement for importing matrix proteins to peroxisomes was strong. The most up-regulated peroxisome gene was PEX11 (11.96-fold), which is implicated in regulating peroxisome proliferation. The up-regulation of PEX11 was consistent with previous findings in baker’s yeast  and H. polymorpha cultivated with peroxisome-inducing carbon sources. It is thought that PEX11 also participates in the β-oxidation of fatty acids . PEX11, together with the genes related to β-oxidation of fatty acids, is induced in H. polymorpha by the carbon shift from glucose to methanol , but their expression levels are down-regulated in P. pastoris under hypoxic conditions . Surprisingly, PEX11 was induced while the genes related to β-oxidation of fatty acids (POX1, FOX2, POT1, SPS19, FAA2, PXA2) were down-regulated in our RNA-Seq data under methanol conditions. The reason for this difference remains unclear and needs further investigation.
In addition to glycerol metabolism, transcriptional regulation was the main biological process enriched in the down-regulated genes (>1.5-fold) when methanol was the sole carbon source. Down-regulation of genes encoding transcription factors (RTG1, PHO2, CST6, HAA1, and GAT1) was detected. It has been reported that the lower specific growth rate resulted in the down-regulation of genes related to ribosomal synthesis when the carbon source is changed from glycerol to methanol in recombinant P. pastoris. Analogously, reduced transcriptional regulation would be due to the lower growth rate under methanol conditions in H. polymorpha. A lower growth rate could not explain the reduced transcriptional regulation in this study because of the chemostat culture, suggesting that P. pastoris may be prone to reduce the transcription of some less important genes to supply enough energy and substrates for expression of the heterologous gene (here, RML).
With high resolution and unprecedented throughput (~1200 P. pastoris genomes), the global transcriptome of P. pastoris was accurately annotated under the cells growth with glycerol as the substrate and the protein expression using methanol as an inducer and substrate. Internal ribosome entry sites were identified in the 5’-UTRs of KOG1 and GCN2, indicating that the IRES-dependent translation initiation also exists in P. pastoris. Differential expression analysis provided rich information for further in-depth studies of the translation initiation mechanism, protein expression and secretion mechanisms in P. pastoris. These results will be very useful in the further bioprocess engineering and optimization of induced conditions of P. pastoris.
Materials and methods
Strains and vectors
Strains and plasmids used for RNA-Seq
Abbreviation of samples
Non-expression of heterlogous protein
Secretory expression of RML gene
Surface display expression of RML gene
Chemostat cultivation was used throughout this study, providing constant environmental conditions to ensure that the yeast cultures for transcriptome analysis would be reproducible, reliable, and biologically homogeneous. To obtain seed cultures for chemostat cultivation, single colonies on fresh YPD plates (10 g/L yeast extract, 20 g/L peptone, 20 g/L glucose, 20 g/L agar) were inoculated directly into 250 mL shake flasks containing 100 mL YPD liquid culture medium. After cultivation at 28°C for approximately 24 h with agitation (250 rpm), seed cultures were used to inoculate the starting volume (1 L) of the bioreactor to an optical density (OD600) of 0.5. The chemostat cultivations, consisting of batch cell growth (batch) and continuous culture with glycerol or methanol feeding, were performed in a 2.0 L working volume bioreactor under computer-based process control.
Batch cultivations were performed in basal salt medium, containing (per L): 26.7 mL H3PO4 (85%), 0.93 g CaSO4, 18.2 g K2SO4, 14.9 g MgSO4·7H2O, 4.13 g KOH, 40.0 g glycerol, 0.5 mL antifoam, and 4.35 mL trace salt stock solution. Trace salt stock solution contained (per L): 6.0 g CuSO4·5H2O, 0.08 g NaI, 3.0 g MnSO4·H2O, 0.2 g Na2MoO4·2H2O, 0.02 g H3BO3, 0.5 g CoCl2, 20.0 g ZnCl2, 65.0 g FeSO4·7H2O, 0.2 g biotin, and 5.0 mL of H2SO4. The cultivation temperature was 28°C with a pH of 5.0, controlled with 25% ammonium hydroxide. The dissolved oxygen concentration was maintained above 20% saturation by altering the stirrer speed between 200 and 1200 rpm, while the airflow was kept constant at 2.0 vvm.
After approximately 24 h, the batch finished and the continuous culture started at a dilution rate of D = 0.04 h-1. The chemostat medium contained (per L): 13.4 mL H3PO4 (85%), 0.47 g CaSO4, 9.1 g K2SO4, 7.5 g MgSO4·7H2O, 2.07 g KOH, 20.0 g methanol or glycerol, 0.5 mL antifoam, and 4.35 mL trace salt stock solution. The continuous culture was performed for at least five resident times to reach steady-state conditions.
For Illumina sequencing, total RNA was extracted from each sample using the hot acidic phenol method and then treated with RNase-free DNase I (TaKaRa) for 45 min according to the manufacturer’s protocol. The integrity of the total RNA was analyzed using an Agilent Technologies 2100 Bioanalyzer, and all six samples were shown to have an RNA Integrity Number (RIN) value > 8. The poly(A)-enriched mRNA was purified using Sera-mag Magnetic Oligo(dT) Beads (Illumina) from 20 mg total RNA method described by Wang et al . The cDNA library products were sequenced using the 1 G Illumina Genome Analyzer at the Beijing Genomics Institute, Shenzhen. Two biological replicates of SDEM and SDEG were analyzed. The raw sequencing data are deposited in SRA  at NCBI with accession number SRA048068.
Read mapping in the reference genome and genes
The P. pastoris GS115 genome and gene information were downloaded from the P. pastoris genome database . After removing the reads containing sequencing adapters and reads of low quality (reads containing Ns > 4), the remaining reads were aligned to the P. pastoris genome using SOAP2 , allowing up to 5-base mismatches. The reads that failed to be mapped were progressively trimmed off one base at a time from the 3’-end and mapped to the genome again until a match was observed (unless the read had been trimmed to less than 40 bases). For paired-end reads, the insert between paired reads was set from 1 base to 10 kilobases, allowing them to span introns of varying sizes in the genome. The same strategy was used for aligning paired-end reads to the non-redundant gene, except the insert was changed from 1 base to 1 kilobases.
Normalized gene expression using RNA-Seq
The gene expression level was normalized by the number of reads per kilobase of exon region per million mapped reads (RPKM) . The cutoff value for determining gene transcriptional activity was determined based on a 95% confidence interval for all RPKM values of each gene. That is to say, the genes with an expression level in the lowest 5% were discarded.
A novel transcriptional active region (TAR) was determined in the intergenic regions by contiguous expression with each base supported by at least two uniquely mapped reads and a length > 35 bp. Supported by paired-end information, a novel transcript unit (TU) was constructed by connectivity between novel TARs that were joined by at least three paired reads. Novel TUs with length < 150 bp or average expression of < 2 reads per base were excluded from further analysis. Furthermore, the sequences of the novel transcripts were analyzed by ORF Finder in NCBI.
To define the UTRs of the annotated genes, we searched for a break in the transcribed region around the genes, denoted by positions with read counts of 0, starting from either end of genes. Genes where ends overlapped with other genes were excluded from the analysis. If no break was found, the UTR was discarded.
Upstream open reading frames (uORFs) and alternative upstream initiation codons (uATGs) were screened in the 5’-UTRs using computational methods. Any uORF of length > 150 bp or with a distance between the uORF and the gene start codon > 500 bp was excluded from the results. The length between the uATG and the gene’s coding region should be < 150 bp, uATGs must be in-frame with the gene’s coding region, and there should be no stop codon in the uATG sequence.
Analysis of IRES activity in 5’-UTR
Strains and plasmids constructed for IRES analysis
UTR was amplified from KOG1 or GCN2 gene
Positive control of the expression of LacZ
Northern blot analysis
P. pastoris strains X33/pRml-LacZ, X33/pRml-GCN2-LacZ, X33/pRml-KOG1-LacZ, and X33/pLacZ were cultivated in YPD (containing 1 g/L yeast extract, 2 g/L peptone, 2 g/L dextrose) for 48 h. Total RNA was isolated using the hot acidic phenol method and then denatured in a mixture of 50% formamide (Sigma), 17 mM morpholinepropane sulfonic acid (MOPS) and 2.2 M formaldehyde for 15 min at 65°C and chilled on ice. RNA was separated by electrophoresis on a 1% agarose gel containing 2.2 M formaldehyde and 20 mM MOPS and transferred onto a nylon membrane (Hybond-N, Amersham), on which they were cross-linked by UV. The preparation of the LacZ probe and subsequent hybridizations were performed using Express Hyb (Clontech) according to the manufacturer’s protocol.
Gene ontology analysis
We obtained the GO terms of each P. pastoris gene with the Blast2GO software (ver. 2.4.9)  using default parameters. Blast2GO was also used for GO functional enrichment analysis of certain genes, performing Fisher’s exact test with a robust false discovery rate (FDR) correction to obtain an adjusted p-value between certain test gene groups and the whole annotation.
Analysis of alternative splicing (AS) events
To identify all potential splice sites, we searched for the putative donor site (‘GT’ on the forward strand or ‘AC’ on the reverse strand) and the acceptor site (‘AG’ on the forward strand or ‘CT’ on the reverse strand) inside intronic regions and required that a novel splice site was to be supported by at least one uniquely mapped read. We then enumerated all the possible pairs of donor sites and acceptor sites to detect potential splice junctions.
To determine the number of reads supporting each splice junction, all reads that could not be matched to the genome and that could be matched by trimming off several terminal bases were retrieved and aligned against the junction sequence with a tolerance of 2 bp mismatches. A proposed junction site was required to be supported by at least two unambiguous mapped reads with different match positions within the splice junction region and also with a minimum match of five bases on each side of the junction (such reads were called “trans-reads”).
To identify AS events in P. pastoris, the junction sequences identified were used to analyze seven types of AS events including SE, RI, A5SS, A3SS, MXE, AFE and ALE according to the method of Wang .
Analysis of differential gene expression under heterologous protein production conditions
Two biological replicates of SDEG and SDEM were used to analyze differentially expressed genes under culture conditions as in the description above, using the ‘R’ package (DEseq) , based on the read count of each gene under methanol or glycerol conditions. GO functional enrichment analysis was performed using the Blast2GO software and KEGG pathway analysis was performed using the Cytoscape software (ver. 2.6.2) .
Validation of RNA-Seq analysis
Novel transcripts and new exons were validated by RT-PCR with primers designed in the exon regions (Additional file 2: Table S11). To detect different AS isoforms, real-time quantitative PCR (qPCR) was performed using primers designed inside exons or introns (Additional file 2: Table S11). The P. pastoris samples were the same as those for the Illumina sequencing. After extraction using the hot acidic phenol method, total RNA was treated with RNase-free DNase I (TaKaRa) for 45 min according to the manufacturer’s protocol. Subsequently, first-strand cDNA synthesis and qPCR were performed using the PrimeScript™ RT-PCR Kit (TaKaRa) and the SYBR Premix Ex Taq II™ kit (TaKaRa). For the negative control, the reverse transcriptase was omitted in the cDNA synthesis reaction with the DNase-treated total RNA as the template. No PCR product was obtained using the negative control as the template, suggesting that the genomic DNA was removed completed.
massively parallel mRNA sequencing
alternative upstream initiation codon
upstream open reading frame
internal ribosome entry site
transcript analysis with the aid of affinity capture
Rhizomucor miehei lipase
reads per kilobase of exon region per million mapped reads
false discovery rate
alternative first exon
alternative last exon
alternative 5’-splice site
alternative 3’-splice site
S-formyl glutathione hydrolase
putative passive glycerol channel
active glycerol importer
cytosolic glycerol kinase
RNA integrity number
transcriptional active region
real-time quantitative PCR.
This work was supported by National Natural Science Foundation of China (No. 30670053) and the Ministry of Science and Technology of the People's Republic of China (National “863” Project No. 2006AA020203 and 2011AA100905).
- Daly R, Hearn MTW: Expression of heterologous proteins in Pichia pastoris: a useful experimental tool in protein engineering and production. J Mol Recognit. 2005, 18 (2): 119-138. 10.1002/jmr.687.View ArticlePubMedGoogle Scholar
- Hamilton SR, Gerngross TU: Glycosylation engineering in yeast: the advent of fully humanized yeast. Curr Opin Biotechnol. 2007, 18 (5): 387-392. 10.1016/j.copbio.2007.09.001.View ArticlePubMedGoogle Scholar
- Hamilton SR, Davidson RC, Sethuraman N, Nett JH, Jiang Y, Rios S, Bobrowicz P, Stadheim TA, Li H, Choi BK, et al: Humanization of yeast to produce complex terminally sialylated glycoproteins. Science. 2006, 313 (5792): 1441-1443. 10.1126/science.1130256.View ArticlePubMedGoogle Scholar
- Jacobs PP, Geysens S, Vervecken W, Contreras R, Callewaert N: Engineering complex-type N-glycosylation in Pichia pastoris using GlycoSwitch technology. Nat Protoc. 2009, 4 (1): 58-70.View ArticlePubMedGoogle Scholar
- Gasser B, Maurer M, Rautio J, Sauer M, Bhattacharyya A, Saloheimo M, Penttilä M, Mattanovich D: Monitoring of transcriptional regulation in Pichia pastoris under protein production conditions. BMC Genomics. 2007, 8 (1): 179-10.1186/1471-2164-8-179.PubMed CentralView ArticlePubMedGoogle Scholar
- Graf A, Gasser B, Dragosits M, Sauer M, Leparc GG, Tüchler T, Kreil DP, Mattanovich D: Novel insights into the unfolded protein response using Pichia pastoris specific DNA microarrays. BMC Genomics. 2008, 9 (1): 390-10.1186/1471-2164-9-390.PubMed CentralView ArticlePubMedGoogle Scholar
- Dragosits M, Stadlmann J, Albiol J, Baumann K, Maurer M, Gasser B, Sauer M, Altmann F, Ferrer P, Mattanovich D: The effect of temperature on the proteome of recombinant Pichia pastoris. J Proteome Res. 2009, 8 (3): 1380-1392. 10.1021/pr8007623.View ArticlePubMedGoogle Scholar
- Baumann K, Carnicer M, Dragosits M, Graf AB, Stadlmann J, Jouhten P, Maaheimo H, Gasser B, Albiol J, Mattanovich D, et al: A multi-level study of recombinant Pichia pastoris in different oxygen conditions. BMC Syst Biol. 2010, 4: 141-10.1186/1752-0509-4-141.PubMed CentralView ArticlePubMedGoogle Scholar
- Dragosits M, Stadlmann J, Graf A, Gasser B, Maurer M, Sauer M, Kreil D, Altmann F, Mattanovich D: The response to unfolded protein is involved in osmotolerance of Pichia pastoris. BMC Genomics. 2010, 11: 207-10.1186/1471-2164-11-207.PubMed CentralView ArticlePubMedGoogle Scholar
- Yurimoto H, Oku M, Sakai Y: Yeast Methylotrophy: Metabolism, Gene Regulation and Peroxisome Homeostasis. Int J Microbiol. 2011, 2011: 101298-PubMed CentralView ArticlePubMedGoogle Scholar
- Integrated Genomics: ERGO bioinformatics suite. [http://ergo.integratedgenomics.com/ERGO/].
- Sauer M, Branduardi P, Gasser B, Valli M, Maurer M, Porro D, Mattanovich D: Differential gene expression in recombinant Pichia pastoris analysed by heterologous DNA microarray hybridisation. Microb Cell Fact. 2004, 3 (1): 17-10.1186/1475-2859-3-17.PubMed CentralView ArticlePubMedGoogle Scholar
- De Schutter K, Lin YC, Tiels P, Van Hecke A, Glinka S, Weber-Lehmann J, Rouzé P, Van de Peer Y, Callewaert N: Genome sequence of the recombinant protein production host Pichia pastoris. Nat Biotechnol. 2009, 27 (6): 561-566. 10.1038/nbt.1544.View ArticlePubMedGoogle Scholar
- Mattanovich D, Graf A, Stadlmann J, Dragosits M, Redl A, Maurer M, Kleinheinz M, Sauer M, Altmann F, Gasser B: Genome, secretome and glucose transport highlight unique features of the protein production host Pichia pastoris. Microb Cell Fact. 2009, 8 (1): 29-10.1186/1475-2859-8-29.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuberl A, Schneider J, Thallinger GG, Anderl I, Wibberg D, Hajek T, Jaenicke S, Brinkrolf K, Goesmann A, Szczepanowski R, et al: High-quality genome sequence of Pichia pastoris CBS7435. J Biotechnol. 2011, 154 (4): 312-320. 10.1016/j.jbiotec.2011.04.014.View ArticlePubMedGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10 (1): 57-63. 10.1038/nrg2484.PubMed CentralView ArticlePubMedGoogle Scholar
- Costa V, Angelini C, De Feis I, Ciccodicola A: Uncovering the complexity of transcriptomes with RNA-Seq. J Biomed Biotechnol. 2010, 2010: 853916-PubMed CentralView ArticlePubMedGoogle Scholar
- Marguerat S, Bahler J: RNA-seq: from technology to biology. Cell Mol Life Sci. 2010, 67 (4): 569-579. 10.1007/s00018-009-0180-6.PubMed CentralView ArticlePubMedGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18 (9): 1509-1517. 10.1101/gr.079558.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee A, Hansen KD, Bullard J, Dudoit S, Sherlock G: Novel low abundance and transient rnas in yeast revealed by tiling microarrays and ultra high–throughput sequencing are not conserved across closely related yeast species. PLoS Genet. 2008, 4 (12): e1000299-10.1371/journal.pgen.1000299.PubMed CentralView ArticlePubMedGoogle Scholar
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320 (5881): 1344-1349. 10.1126/science.1158441.PubMed CentralView ArticlePubMedGoogle Scholar
- Yassour M, Kaplan T, Fraser HB, Levin JZ, Pfiffner J, Adiconis X, Schroth G, Luo S, Khrebtukova I, Gnirke A, et al: Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc Natl Acad Sci U S A. 2009, 106 (9): 3264-3269. 10.1073/pnas.0812841106.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature. 2008, 453 (7199): 1239-1243. 10.1038/nature07002.View ArticlePubMedGoogle Scholar
- Bruno VM, Wang Z, Marjani SL, Euskirchen GM, Martin J, Sherlock G, Snyder M: Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq. Genome Res. 2010, 20 (10): 1451-1458. 10.1101/gr.109553.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang B, Guo G, Wang C, Lin Y, Wang X, Zhao M, Guo Y, He M, Zhang Y, Pan L: Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing. Nucleic Acids Res. 2010, 38 (15): 5075-5087. 10.1093/nar/gkq256.PubMed CentralView ArticlePubMedGoogle Scholar
- Yu J, Fedorova ND, Montalbano BG, Bhatnagar D, Cleveland TE, Bennett JW, Nierman WC: Tight control of mycotoxin biosynthesis gene expression in Aspergillus flavus by temperature as revealed by RNA‐Seq. FEMS Microbiol Lett. 2011, 322 (2): 145-149. 10.1111/j.1574-6968.2011.02345.x.View ArticlePubMedGoogle Scholar
- Yoder-Himes DR, Chain PSG, Zhu Y, Wurtzel O, Rubin EM, Tiedje JM, Sorek R: Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci U S A. 2009, 106 (10): 3976-3981. 10.1073/pnas.0813403106.PubMed CentralView ArticlePubMedGoogle Scholar
- Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, et al: A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella Typhi. PLoS Genet. 2009, 5 (7): e1000569-10.1371/journal.pgen.1000569.PubMed CentralView ArticlePubMedGoogle Scholar
- Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, Bergman NH: Structure and complexity of a bacterial transcriptome. J Bacteriol. 2009, 191 (10): 3203-3211. 10.1128/JB.00122-09.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18 (11): 1851-1858. 10.1101/gr.078212.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Spriggs KA, Stoneley M, Bushell M, Willis AE: Re-programming of translation following cell stress allows IRES-mediated translation to predominate. Biol Cell. 2008, 100: 27-38. 10.1042/BC20070098.View ArticlePubMedGoogle Scholar
- Baird SD, Turcotte M, Korneluk RG, Holcik M: Searching for IRES. RNA. 2006, 12 (10): 1755-1785. 10.1261/rna.157806.PubMed CentralView ArticlePubMedGoogle Scholar
- Bonnal S, Boutonnet C, Prado-Lourenço L, Vagner S: IRESdb: the internal ribosome entry site database. Nucleic Acids Res. 2003, 31 (1): 427-428. 10.1093/nar/gkg003.PubMed CentralView ArticlePubMedGoogle Scholar
- Spingola M, Grate L, Haussler D, Ares M: Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA. 1999, 5 (2): 221-234. 10.1017/S1355838299981682.PubMed CentralView ArticlePubMedGoogle Scholar
- McGuire AM, Pearson MD, Neafsey DE, Galagan JE: Cross-kingdom patterns of alternative splicing and splice recognition. Genome Biol. 2008, 9 (3): R50-10.1186/gb-2008-9-3-r50.PubMed CentralView ArticlePubMedGoogle Scholar
- Hirschman JE, Balakrishnan R, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hong EL, Livstone MS, Nash R, et al: Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome. Nucleic Acids Res. 2006, 34 (suppl 1): D442-D445.PubMed CentralView ArticlePubMedGoogle Scholar
- Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol. 2010, 11 (10): R106-10.1186/gb-2010-11-10-r106.PubMed CentralView ArticlePubMedGoogle Scholar
- van Zutphen T, Baerends RJ, Susanna KA, de Jong A, Kuipers OP, Veenhuis M, van der Klei IJ: Adaptation of Hansenula polymorpha to methanol: a transcriptome analysis. BMC Genomics. 2010, 11 (1): 1-10.1186/1471-2164-11-1.PubMed CentralView ArticlePubMedGoogle Scholar
- Croft MT, Moulin M, Webb ME, Smith AG: Thiamine biosynthesis in algae is regulated by riboswitches. Proc Natl Acad Sci U S A. 2007, 104 (52): 20770-20775. 10.1073/pnas.0705786105.PubMed CentralView ArticlePubMedGoogle Scholar
- Veenhuis M, Salomons FA, Van Der Klei IJ: Peroxisome biogenesis and degradation in yeast: a structure/function analysis. Microsc Res Tech. 2000, 51 (6): 584-600. 10.1002/1097-0029(20001215)51:6<584::AID-JEMT8>3.0.CO;2-W.View ArticlePubMedGoogle Scholar
- Collins CS, Kalish JE, Morrell JC, McCaffery JM, Gould SJ: The peroxisome biogenesis factors Pex4p, Pex22p, Pex1p, and Pex6p act in the terminal steps of peroxisomal matrix protein import. Mol Cell Biol. 2000, 20 (20): 7516-7526. 10.1128/MCB.20.20.7516-7526.2000.PubMed CentralView ArticlePubMedGoogle Scholar
- Kal AJ, van Zonneveld AJ, Benes V, van Den Berg M, Koerkamp MG, Albermann K, Strack N, Ruijter JM, Richter A, Dujon B, et al: Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources. Mol Biol Cell. 1999, 10 (6): 1859-1872.PubMed CentralView ArticlePubMedGoogle Scholar
- van Roermund CWT, Tabak HF, van den Berg M, Wanders RJA, Hettema EH: Pex11p plays a primary role in medium-chain fatty acid oxidation, a process that affects peroxisome number and size in Saccharomyces cerevisiae. J Cell Biol. 2000, 150 (3): 489-498. 10.1083/jcb.150.3.489.PubMed CentralView ArticlePubMedGoogle Scholar
- Leinonen R, Sugawara H, Shumway M: The sequence read archive. Nucleic acids Res. 2011, 39 (suppl 1): D19-D21.PubMed CentralView ArticlePubMedGoogle Scholar
- Mattanovich D, Callewaert N, Rouzé P, Lin YC, Graf A, Redl A, Tiels P, Gasser B, De Schutter K: Open access to sequence: browsing the Pichia pastoris genome. Microb Cell Fact. 2009, 8: 53-10.1186/1475-2859-8-53.PubMed CentralView ArticlePubMedGoogle Scholar
- Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009, 25 (15): 1966-1967. 10.1093/bioinformatics/btp336.View ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMedGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456 (7221): 470-476. 10.1038/nature07509.PubMed CentralView ArticlePubMedGoogle Scholar
- Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman WH, Trajanoski Z, Galon J: ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009, 25 (8): 1091-1093. 10.1093/bioinformatics/btp101.PubMed CentralView ArticlePubMedGoogle Scholar