Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae
© Vongsangnak et al; licensee BioMed Central Ltd. 2008
Received: 05 April 2008
Accepted: 23 May 2008
Published: 23 May 2008
Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number of hypothetical proteins accounted for more than 50% of the annotated genes. Considering the industrial importance of this fungus, it is therefore valuable to improve the annotation and further integrate genomic information with biochemical and physiological information available for this microorganism and other related fungi. Here we proposed the gene prediction by construction of an A. oryzae Expressed Sequence Tag (EST) library, sequencing and assembly. We enhanced the function assignment by our developed annotation strategy. The resulting better annotation was used to reconstruct the metabolic network leading to a genome scale metabolic model of A. oryzae.
Our assembled EST sequences we identified 1,046 newly predicted genes in the A. oryzae genome. Furthermore, it was possible to assign putative protein functions to 398 of the newly predicted genes. Noteworthy, our annotation strategy resulted in assignment of new putative functions to 1,469 hypothetical proteins already present in the A. oryzae genome database. Using the substantially improved annotated genome we reconstructed the metabolic network of A. oryzae. This network contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073 metabolites and 1,846 (1,053 unique) biochemical reactions. The metabolic reactions are compartmentalized into the cytosol, the mitochondria, the peroxisome and the extracellular space. Transport steps between the compartments and the extracellular space represent 281 reactions, of which 161 are unique. The metabolic model was validated and shown to correctly describe the phenotypic behavior of A. oryzae grown on different carbon sources.
A much enhanced annotation of the A. oryzae genome was performed and a genome-scale metabolic model of A. oryzae was reconstructed. The model accurately predicted the growth and biomass yield on different carbon sources. The model serves as an important resource for gaining further insight into our understanding of A. oryzae physiology.
A. oryzae is a member of the diverse group of aspergilli that includes species that are important microbial cell factories, as well as species that are human and plant pathogens . A. oryzae has been used safely in the fermentation industry for hundreds of years in the production of soy sauce, miso and sake. Today A. oryzae is also used for production of a wide range of different fungal enzymes such as α-amylase, glucoamylase, lipase and protease and it is regarded as an ideal host for the synthesis of proteins of eukaryotic origin . In the post genome-sequencing era, various high-throughput technologies have been developed to characterize biological systems on the genome-scale . Discovering new biological knowledge from high-throughput biological data and assigning biological functions to all the proteins encoded by the genome is, however, challenging and allowing systems level investigations of microbial cell factory. For fungi, several genome-sequencing and annotation projects have been presented, including Saccharomyces cerevisiae , A. nidulans , A. fumigatus , and A. niger [6, 7]. Recently, genome sequence of A. oryzae by Machida and his coworkers has been published . Based on their sequence annotation using gene-finding software tools such as ALN , GlimmerM  and GeneDecoder , this analysis 12,074 genes encoding proteins were predicted to be present in the genome . Despite this prediction many genes had not been assigned a definite function, and of the 12,074 genes, more than 50% were annotated as hypothetical proteins. Hence, there are clearly opportunities for refining the gene prediction and improving the annotation. However, the present one dimensional data does not allow for complete annotation of all genes and it would therefore be interesting and potentially fruitful to use integrative biological tools in the process of improving the annotation of fungal genomes . In this process reconstruction of a genome-scale metabolic model is a good starting point as it allows for integration of various types of data. Nowadays, there are several open sources of fungal metabolic models, such as for S. cerevisiae , A. nidulans , A. niger  and a model for the central carbon metabolism of A. niger . These models currently are prominent as one of the most promising approaches to achieve an in silico prediction of cellular function in terms of physiology .
The aim of this study is to improve the annotation of the genome sequence of A. oryzae and further integrate enhanced annotated data to construct a genome-scale metabolic model of A. oryzae. The first A. oryzae EST library, sequencing and assembly were performed in order to improve gene prediction. Then functional assignment was done by our developed annotation strategy and a combination of different bioinformatics tools and databases. The bioinformatics tools used were BLAST , HMMER , and PSI-BLAST . Several databases used were namely the A. oryzae genome database , the EST database of A. flavus , the A. nidulans genome database , the A. fumigatus genome database , the S. cerevisiae genome database , the Pfam protein families database , the COG database , and the Non-Redundant (NR) protein database . Subsequently, manual inspection was through in order to achieve a solid annotation for enzyme functions that were needed for reconstruction of the metabolic network. Based on the improved annotated genome, the genome-scale metabolic network was reconstructed. The network was built by comparison with other related metabolic models, namely models for S. cerevisiae , A. nidulans , and A. niger [15, 16], and biochemical pathway databases, literature, as well as experimental evidence for the presence of specific pathways. The biomass composition was taken from the literature, whereas, maintenance and growth-associated ATP consumption rates were estimated based on literature data on yields and growth rates. Finally, Flux Balance Analysis (FBA) was used to predict the flux distributions in the metabolic network, and the biomass yields as well as growth rates on different carbon sources were estimated to validate the metabolic model of A. oryzae.
Results and Discussion
Gene discovery and validation
Identification of protein functions by pairwise comparison
Comparison of genome characteristics and function assignments between A. oryzae and other related fungi
Genome size (Mb)
Number of chromosomes
Number of total predicted genes
ANI and AO
AFU and AO
SC and AO
Number of protein sequence homologs
Percentage of sequence homologs
Number of assigned putative functions
Percentage of assigned putative functions
Number of predicted genes involved in metabolism
Number of putative functions involved in metabolism
Metabolic pathway mapping
The metabolic models for S. cerevisiae , A. nidulans , and A. niger [15, 16] were combined to generate an initial reaction list for the construction of the A. oryzae metabolic network. Duplicated reactions were removed resulting in a list of 1,924 genes and 1,070 functions involved in metabolism. For each enzyme function involved in this reaction list it was searched in the above generated list of metabolic proteins present in A. oryzae. If an enzyme name matched, then the enzyme-encoding genes, enzyme functions and Enzyme Commission (EC) numbers of A. oryzae were selected and mapped onto this reaction list. Hereafter a classification system was established to divide reactions in the whole metabolic network of A. oryzae into 7 main metabolic pathways: carbohydrate metabolism, energy metabolism, amino acid metabolism, nucleotide metabolism, lipid metabolism, cofactor metabolism and secondary metabolism. It is hereby found that the highest number of enzyme-encoding genes is involved in carbohydrate metabolism, which is consistent with the fact that A. oryzae has the ability to use a wide range of carbohydrate substrates. For amino acid and lipid metabolisms, many enzyme-encoding genes were also found. A lower number of enzyme-encoding genes were found in nucleotide, cofactor and energy metabolisms. The lowest number of enzyme-encoding genes was found in secondary metabolism. In fact, the A. oryzae genome contains a lot of enzyme-encoding genes involved in secondary metabolism , but most of these genes are without EC numbers and could therefore not be mapped onto the metabolic network. The hereby resulting metabolic network contains several gaps, which means that there are metabolic reactions without corresponding enzymes.
Filling gaps in the metabolic network using an integrated bioinformatics tool
The result clearly shows that there is a high probability for that the gene called "AO090003000859" encode D-xylose reductase. Based on searching of this gene in the A. oryzae genome database , the gene name AO090003000859 is only reported for general prediction and poorly characterized functions. Moreover, the exploration in other databases such as the Genbank, this gene name is only showed to have a region encoding aldo/keto reductase family proteins, but there is no evidence on the specific function of the gene. As a result from using GFAOP, the missing enzyme of D-xylose reductase is entered into the pathway. Our method results in an improved annotation of the genome using the context of the metabolic network. An iterative process was done for filling all the gaps in the whole metabolic network. Ultimately, 210 gaps in the metabolic network were closed using GFAOP. These gaps distributed with 86 gaps in lipid metabolism, 31 gaps in secondary metabolism, 34 gaps in amino acid metabolism, 23 gaps in nucleotide metabolism, 17 gaps in carbohydrate metabolism, 10 gaps in cofactor metabolism, and 9 gaps in energy metabolism.
Characteristics of the improved annotation and reconstructed metabolic network
Statistical characteristics of improved annotation and metabolic reconstruction.
Characteristics of improved annotation
Improved annotated data
Total protein-encoding genes
New putative protein functions to newly predicted genes
Other functional groups
Hypothetical proteins to newly predicted genes
New putative protein functions to previously hypothetical proteins
Other functional groups
Same putative protein functions
Other functional groups
Characteristics of network
1,846 (1,053 Unique)
1,090 (676 Unique)
281 (161 Unique)
118 (113 Unique)
Reactions with gene assignments
173 (53 Unique)
15 (12 Unique)
Reactions without gene assignments
108 (108 Unique)
103 (101 Unique)
Biomass growth simulation
Biomass composition in the metabolic model of A. oryzae
Average molecular weight1 [g/mol]
Content2 [g/100 g DW]
Stoichiometric coefficient4 [mmol/g DW]
Free fatty acid
Assessment of model validation of A. oryzae
A strategy for the improved annotation of the genome sequence of A. oryzae was developed. Using our assembled EST library, 1,046 EST sequences (about 12% of 9,038 EST sequences) were discovered as newly predicted genes and about 75% (6,773 of 9,038 EST sequences) were used to validate previously annotated genes. This indicates that the developed annotation strategy is a very useful approach for gene prediction. Applying a combination of various bioinformatics tools and databases, this annotation strategy was successfully applied for function assignment of genes. A high number of newly predicted genes were assigned with 398 new putative functions, and with new putative functions to 1,469 proteins previously annotated as hypothetical proteins. Therefore our analysis results in a substantially reduced number of hypothetical proteins. In particular, more enzyme-encoding genes could be assigned functions and this led to filling of 210 missing enzymes in the metabolic network. Applying the enhanced annotated genome, biochemical pathway databases, other related metabolic models, and the literature, a metabolic network was reconstructed. The network contains 729 enzymes, 1,314 enzyme-encoding genes (10% of 13,120 total predicted genes), 1,073 metabolites and 1,846 (1,053 unique) biochemical reactions. The 1,053 unique reactions are distributed into different compartments, with 831 reactions located in the cytosol, 173 reactions located in the mitochondria, 19 reactions located in the perosixome, and 30 reactions located in the extracellular space. Transport reactions between the different compartments and the extracellular space represents 281 (161 unique) reactions. This metabolic network was formulated to a stoichiometric model. The model was applied for Flux Balance Analysis (FBA) to obtain the flux distributions corresponding to maximized growth. A physiological study on different carbon sources of A. oryzae was performed to validate the genome-scale model, and the model is found to accurately predict the maximum specific growth rate and the biomass yield on different carbon sources. This indicates that the A. oryzae metabolic model is able to simulate the phenotypic behavior and the model will hereby serve as an important resource for gaining further insight into our understanding of the important cell factory A. oryzae.
EST library construction
The EST sequences of A. oryzae strain A1560 were constructed from a normalized library and an un-normalized library. The normalized library was constructed by inserting cDNA of A. oryzae in pCMV-Sport6 plasmids between the Mlu I and the Not I sites (Vector – Not I – poly A (3' of insert) – 5' of insert – Mlu I – Sal I- vector). The plasmids were amplified in Escherichia coli EMDH10B-TONA (a recA strain). The un-normalized library was made by inserting cDNA of A. oryzae between the EcoR1 and NotI sites in the vector pYES2. The plasmids were amplified in E. coli DH10B.
EST sequencing and assembly
The EST sequences were generated by sequencing on ABI 377 and ABI 3700 instruments from Applied Biosystems using BigDye terminators version 1 and 2. In total 23,072 EST sequences were produced. Quality clipping, vector removal, E. coli contamination removal and assembly were done with the phredPhrap package . The sequences were assembled into 9,038 EST contigs.
Genome annotation process
The strategy of gene finding as shown in Figure 7A was carried out based on our assembled EST sequences of A. oryzae (see Additional file 1, also available online in Genbank database under accession number "EY424375–433412") together with public EST data of A. flavus . Our assembled EST data of A. oryzae were compared to the genes previously identified  in the genome of A. oryzae strain RIB 40 by BLASTN . The purpose of this comparison was to validate genes that were already annotated and to discover new genes that had not been annotated by Machida et al . The 9,038 EST sequences were classified into four categories as outlined in Additional file 3 and described as follows. All sequences shorter than 300 bases were discarded from the analysis. If the length of an EST sequence was over 500 bps and the highest ranking hit had a score lower than 50 bits, then the EST sequence was categorized as a sequence that served as a newly predicted gene. If the length of the EST sequence was over 300 bps and the highest ranking hit had a score over 100 bits, then the EST sequence was categorized as validating an earlier identified gene . If the highest ranking hit had a score lower than 100 bits, the EST sequence was classified as weakly validating a gene . In the effort to predict new genes in the A. oryzae genome, A. flavus EST data from the TIGR database  was also used. The cut-off for gene discovery and validation was selected to be the identical as with our assembled EST data of A. oryzae. After performing gene finding, assignment of protein function was done. The main principle was performed based on sequence alignment analysis, metabolic pathway mapping, filling the gaps by integrated bioinformatics tool and lastly manual curation. The sequence alignment was done to assign putative function to newly predicted genes by BLASTX . The newly predicted gene was searched against the NR protein database  and Protall_e protein database [Unpublished]. The assignment of putative protein function was transferred if the alignment length of the highest ranking hit was over 50 amino acids and the identity over 25%. The sequence alignment was done through pairwise comparison of protein sequences by BLASTP  between A. oryzae and other related fungi (i.e. A. nidulans strain FGSC-A4, A. fumigatus strain Af293, S. cerevisiae strain S288c) as shown in Figure 7B. The criteria for similarity searching were alignment length (bps) and identity (%), with the parameters depending on the type of fungus used for the comparison . An estimated suitable cut-off for S. cerevisiae was an alignment length above 100 bps and an identity higher than 40%. For other related Aspergillus species, the cut-off was an alignment length above 200 bps and an identity higher than 40%. All cut-off values were determined by using sequences with known protein functions. After finishing the annotation process, the metabolic network of A. oryzae was reconstructed. At the beginning, an initial metabolic reaction list for A. oryzae was constructed by combination of S. cerevisiae , A. nidulans , and A. niger [15, 16] metabolic models. In addition, data collection from metabolic pathway databases, such as KEGG  and BioCyc , of other organisms was integrated into this reaction list. The improved annotated genomic data (i.e. enzyme-encoding genes, enzyme functions, and EC numbers) were then mapped into the reaction list. In order to visualize all the metabolic reactions, overall metabolic map was drawn (see Figure 4 and Additional file 5 for full size). The improved annotated data were placed onto this map. At the end, gaps that existed in the metabolic network were then filled using an integrated bioinformatics tool that allowed for automatic searching for specific enzyme functions. Finally, manual curation of the model was done for finalizing the reconstruction process.
Metabolic network reconstruction
The metabolic network reconstruction aimed at representing the whole metabolism of A. oryzae, which consists of primary catabolism of carbohydrates, biosynthesis of amino acids, nucleotides, lipids, cofactors and production of Gibbs free energy required for biosynthesis, as well as of secondary metabolism. Combination of different types of information was essential to carry out a solid reconstruction. Information was collected from the improved annotated data of A. oryzae, biochemical pathways, publications on specific enzymes, online protein databases (e.g. Swiss-Prot database ) and also literature. In addition, there was physiological evidence for the presence of a reaction or pathway in A. oryzae, e.g. when there was information of presence of a specific enzyme activity or presence of a pathway involved in consumption of a given substrate or formation of a given metabolic product, then the underlying reaction was added to the model, even if there was no annotated gene supporting the presence of the reaction. In the processes of stoichiometry for cofactors as well as the information on reversibility or irreversibility for each reaction, these were verified and added as information into the reconstructed network. Different cellular compartments were considered and consequently biochemical reactions were distributed into four different compartments: the extracellular space, the cytosol, the mitochondria, and the peroxisome . Identification of localization of each biochemical reaction was analyzed according to enzyme localization, which was performed by applying protein localization predictors. Herein, pTARGET  and CELLO  were selected to predict sub-cellular protein localization because they contain databases of known eukaryotic protein localizations. If there is no information on localization of a biochemical reaction or its corresponding enzyme, then by default this reaction was considered to occur in the cytosol. In addition, the reconstructed metabolic network included transport steps between the different intracellular compartments and between the cell and the environment.
Modeling and simulation based Flux Balance Analysis (FBA)
After the metabolic network was reconstructed, this was transformed into a mathematical framework to perform Flux Balance Analysis (FBA) . This approach is based on conservation of mass under steady-state conditions. This conversion requires stoichiometry of metabolic pathways, metabolic demands and a few specific parameters. An optimal flux distribution can be obtained within the feasible region by using linear programming . A reaction is selected as an objective function that is to be maximized or minimized. For physiologically meaningful results, the objective functions must be defined as the ability to produce the required components of cellular biomass for a specified uptake rate of a selected carbon source. By maximizing the flux towards biomass formation, a flux is obtained for each reaction in the metabolic network.
Model validation of A. oryzae by physiological study on different carbon sources
Model validation is an important step in the reconstruction process. In this study, the model was validated by simulating the rate of biomass formation on different carbon sources in batch experiments. Here the uptake rate of the carbon source was given as input to the simulations. Different carbon sources namely glucose (C6), maltose (C12), glycerol (C3) and xylose (C5), which were selected as they result in widely different physiological responses and parameters. The strain used for generating these data was A. oryzae wild type strain A1560, which was obtained from Novozymes A/S, Denmark. Three biological replicates were done for each carbon source. The fermentations were performed using an in-house fermenter with a working volume of 1.2 L, and operated at 34°C and pH was kept constant at 6 by adding 10% of H3PO4 or 10% NH3 solution. The aeration flow rate was set at 1.2 L/min. The stirrer speed was controlled at 800 rpm for the first 4 hrs and later increased to 1100 rpm. The dissolved oxygen tension was initially calibrated at 100%. The concentrations of oxygen and carbon dioxide in the exhaust gas were measured by a gas analyzer (Magnos 4G for O2, Uras 3G for CO2, Hartmann & Braun, Germany). Biomass dry weight measurements were done as follows: A sample was filtered using nitrocellulose filters (pore size 0.45 μm, Munktell, Sweden), and the filter cake was therefore dried at 110°C overnight. Hereafter the filter was placed in a dessicator overnight, and subsequently, weighed. In addition, the extracellular concentration of sugars, organic acids, and polyols were measured by using high-performance liquid-chromatography (HPLC) on an Aminex HPX-87H, 300 mm*7.8 mm column. The column was kept at 45°C and eluted at 0.6 ml/min with 5 mM H2SO4.
This research work was support by a stipend to Wanwipa Vongsangnak from Novozymes Bioprocess Academy (NBA) and Technical University of Denmark (DTU). The authors are grateful to José Manuel Otero, Roberto Olivares Hernandez, Chia-Wen Chang and Rawisara Ruenwai for discussions and critical revisions of the manuscript. Also, we thank Lone Vuholm and Jeanette Thomassen for valuable technical assistance.
- Goldman GH, Osmani SA: The Aspergilli. Genomics, Medical Aspects, Biotechnology, and Research Methods (Mycology). 2008, Taylor and Francis group: CRC PressGoogle Scholar
- Kitano H: Computational systems biology. Nature. 2002, 420: 206-210. 10.1038/nature01254.PubMedView ArticleGoogle Scholar
- Fisk DG, Ball CA, Dolinski K, Engel SR, Hong EL, Issel-Tarver L, Schwartz K, Sethuraman A, Botstein D, Cherry JM: Saccharomyces cerevisiae S288C genome annotation: a working hypothesis. Yeast. 2006, 23: 857-865. 10.1002/yea.1400.PubMedPubMed CentralView ArticleGoogle Scholar
- Galagan JE, Calvo SE, Cuomo C, Ma LJ, Wortman JR, Batzoglou S, Lee SI, Basturkmen M, Spevak CC, Clutterbuck J, Kapitonov V, Jurka J, Scazzocchio C, Farman M, Butler J, Purcell S, Harris S, Braus GH, Draht O, Busch S, D'Enfert C, Bouchier C, Goldman GH, Bell-Pedersen D, Griffiths-Jones S, Doonan JH, Yu J, Vienken K, Pain A, Freitag M, Selker EU, Archer DB, Penalva MA, Oakley BR, Momany M, Tanaka T, Kumagai T, Asai K, Machida M, Nierman WC, Denning DW, Caddick M, Hynes M, Paoletti M, Fischer R, Miller B, Dyer P, Sachs MS, Osmani SA, Birren BW: Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature. 2005, 438: 1105-1115. 10.1038/nature04341.PubMedView ArticleGoogle Scholar
- Nierman WC, Pain A, Anderson MJ, Wortman JR, Kim HS, Arroyo J, Berriman M, Abe K, Archer DB, Bermejo C, Bennett J, Bowyer P, Chen D, Collins M, Coulsen R, Davies R, Dyer PS, Farman M, Fedorova N, Fedorova N, Feldblyum TV, Fischer R, Fosker N, Fraser A, Garcia JL, Garcia MJ, Goble A, Goldman GH, Gomi K, Griffith-Jones S, Gwilliam R, Haas B, Haas H, Harris D, Horiuchi H, Huang JQ, Humphray S, Jimenez J, Keller N, Khouri H, Kitamoto K, Kobayashi T, Konzack S, Kulkarni R, Kumagai T, Lafon A, Latge JP, Li WX, Lord A, Lu C, Majoros WH, May GS, Miller BL, Mohamoud Y, Molina M, Monod M, Mouyna I, Mulligan S, Murphy L, O'Neil S, Paulsen I, Penalva MA, Pertea M, Price C, Pritchard BL, Quail MA, Rabbinowitsch E, Rawlins N, Rajandream MA, Reichard U, Renauld H, Robson GD, de Cordoba SR, Rodriguez-Pena JM, Ronning CM, Rutter S, Salzberg SL, Sanchez M, Sanchez-Ferrero JC, Saunders D, Seeger K, Squares R, Squares S, Takeuchi M, Tekaia F, Turner G, de Aldana CRV, Weidman J, White O, Woodward J, Yu JH, Fraser C, Galagan JE, Asai K, Machida M, Hall N, Barrell B, Denning DW: Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature. 2006, 439: 502-502. 10.1038/nature04572.View ArticleGoogle Scholar
- Baker SE: Aspergillus niger genomics: Past, present and into the future. Medical Mycology. 2006, 44: S17-S21. 10.1080/13693780600921037.PubMedView ArticleGoogle Scholar
- Pel HJ, de Winde JH, Archer DB, Dyer PS, Hofmann G, Schaap PJ, Turner G, de Vries RP, Albang R, Albermann K, Andersen MR, Bendtsen JD, Benen JAE, van den Berg M, Breestraat S, Caddick MX, Contreras R, Cornell M, Coutinho PM, Danchin EGJ, Debets AJM, Dekker P, van Dijck PWM, van Dijk A, Dijkhuizen L, Driessen AJM, d'Enfert C, Geysens S, Goosen C, Groot GSP, de Groot PWJ, Guillemette T, Henrissat B, Herweijer M, van den Hombergh J, van den Hondel C, van der Heijden R, van der Kaaij RM, Klis FM, Kools HJ, Kubicek CP, van Kuyk PA, Lauber J, Lu X, van der Maarel M, Meulenberg R, Menke H, Mortimer MA, Nielsen J, Oliver SG, Olsthoorn M, Pal K, van Peij N, Ram AFJ, Rinas U, Roubos JA, Sagt CMJ, Schmoll M, Sun JB, Ussery D, Varga J, Vervecken W, de Vondervoort P, Wedler H, Wosten HAB, Zeng AP, van Ooyen AJJ, Visser J, Stam H: Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88. Nature Biotechnology. 2007, 25: 221-231. 10.1038/nbt1282.PubMedView ArticleGoogle Scholar
- Machida M, Asai K, Sano M, Tanaka T, Kumagai T, Terai G, Kusumoto KI, Arima T, Akita O, Kashiwagi Y, Abe K, Gomi K, Horiuchi H, Kitamoto K, Kobayashi T, Takeuchi M, Denning DW, Galagan JE, Nierman WC, Yu JJ, Archer DB, Bennett JW, Bhatnagar D, Cleveland TE, Fedorova ND, Gotoh O, Horikawa H, Hosoyama A, Ichinomiya M, Igarashi R, Iwashita K, Juvvadi PR, Kato M, Kato Y, Kin T, Kokubun A, Maeda H, Maeyama N, Maruyama J, Nagasaki H, Nakajima T, Oda K, Okada K, Paulsen I, Sakamoto K, Sawano T, Takahashi M, Takase K, Terabayashi Y, Wortman JR, Yamada O, Yamagata Y, Anazawa H, Hata Y, Koide Y, Komori T, Koyama Y, Minetoki T, Suharnan S, Tanaka A, Isono K, Kuhara S, Ogasawara N, Kikuchi H: Genome sequencing and analysis of Aspergillus oryzae. Nature. 2005, 438: 1157-1161. 10.1038/nature04300.PubMedView ArticleGoogle Scholar
- Gotoh O: Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps. Bioinformatics. 2000, 16 (3): 190-202. 10.1093/bioinformatics/16.3.190.PubMedView ArticleGoogle Scholar
- Majoros WH, Pertea M, Antonescu C, Salzberg SL: GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Research. 2003, 31 (13): 3601-3604. 10.1093/nar/gkg527.PubMedPubMed CentralView ArticleGoogle Scholar
- Asai K, Itou K, Ueno Y, Yada T: Recognition of human genes by stochastic parsing. Pac Symp Biocomput. 1998, 3: 228-239.Google Scholar
- Liu ET: Integrative biology and systems biology. Molecular Systems Biology. 2005Google Scholar
- Forster J, Famili I, Fu P, Palsson BO, Nielsen J: Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Research. 2003, 13: 244-253. 10.1101/gr.234503.PubMedPubMed CentralView ArticleGoogle Scholar
- David H, Özçelik , Hofmann G, Nielsen J: Analysis of Aspergillus nidulans metabolism at the genome-scale. BMC Genomics. 2008, 9: 163-10.1186/1471-2164-9-163.PubMedPubMed CentralView ArticleGoogle Scholar
- Andersen MR, Nielsen ML, Nielsen J: Metabolic model integration of the bibliome, genome,metabolome and reactome of Aspergillus niger. Molecular Systems Biology. 2008, 4: 178-10.1038/msb.2008.12.PubMedPubMed CentralView ArticleGoogle Scholar
- David H, Akesson M, Nielsen J: Reconstruction of the central carbon metabolism of Aspergillus niger. European Journal of Biochemistry. 2003, 270: 4243-4253. 10.1046/j.1432-1033.2003.03798.x.PubMedView ArticleGoogle Scholar
- Borodina I, Nielsen J: From genomes to in silico cells via metabolic networks. Current Opinion in Biotechnology. 2005, 16: 350-355. 10.1016/j.copbio.2005.04.008.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic Local Alignment Search Tool. Journal of Molecular Biology. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Altschul S, Madden T, Schaffer A, Zhang JH, Zhang Z, Miller W, Lipman D: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Faseb Journal. 1998, 12: A1326-A1326.Google Scholar
- Aspergillus oryzae genome database . [http://www.bio.nite.go.jp/dogan/MicroTop?GENOME_ID=ao]
- Aspergillus flavus Gene Index . [http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=a_flavus]
- Aspergillus nidulans genome database . [http://www.broad.mit.edu/annotation/genome/aspergillus_nidulans]
- Aspergillus fumigatus genome database . [http://www.sanger.ac.uk/Projects/A_fumigatus/]
- Saccharomyces genome database . [http://www.yeastgenome.org/index.shtml]
- Pfam database . [http://www.sanger.ac.uk/Software/Pfam]
- COG database . [http://www.ncbi.nih.gov/COG]
- Non-redundant protein database . [ftp://ftp.ncbi.nih.gov/blast/db/FASTA/]
- Payne GA, Nierman WC, Wortman JR, Pritchard BL, Brown D, Dean RA, Bhatnagar D, Cleveland TE, Machida M, Yu J: Whole genome comparison of Aspergillus flavus and A. oryzae. Medical Mycology. 2006, 44: S9-S11. 10.1080/13693780600835716.View ArticleGoogle Scholar
- Pain A, Böhme U, Berriman M: Hot and sexy moulds!. Nature reviews. 2006, 4: 244-245. 10.1038/nrmicro1388.PubMedGoogle Scholar
- Gene Ontology Database . [http://www.geneontology.org/GO.annotation.shtml]
- McConkey GA, Pinney JW, Westhead DR, Plueckhahn K, Fitzpatrick TB, Macheroux P, Kappes B: Annotating the Plasmodium genome and the enigma of the shikimate pathway. Trends in Parasitology. 2004, 20: 60-65. 10.1016/j.pt.2003.11.001.PubMedView ArticleGoogle Scholar
- Osterman A, Overbeek R: Missing genes in metabolic pathways: a comparative genomics approach. Current Opinion in Chemical Biology. 2003, 7: 238-251. 10.1016/S1367-5931(03)00027-9.PubMedView ArticleGoogle Scholar
- Perl Scalable Vector Graphics . [http://search.cpan.org/~ronan/]
- Nielsen J: Physiological engineering aspects of Penicillium chrysogenum. World Scientific Pub Co Inc; 1996-
- Pedersen H, Carlsen M, Nielsen J: Identification of enzymes and quantification of metabolic fluxes in the wild type and in a recombinant Aspergillus oryzae strain. Appl Environ Microbiol. 1999, 65 (1): 11-19.PubMedPubMed CentralGoogle Scholar
- Prathumpai W, Gabelgaard JB, Wanchanthuek P, van de Vondervoort PJI, de Groot MJL, McIntyre M, Nielsen J: Metabolic control analysis of xylose catabolism in Aspergillus. Biotechnology Progress. 2003, 19 (4): 1136-1141. 10.1021/bp034020r.PubMedView ArticleGoogle Scholar
- Carlsen M, Nielsen J: Influence of carbon source on alpha-amylase production by Aspergillus oryzae. Appl Microbiol Biotechnol. 2001, 57 (3): 346-349.PubMedView ArticleGoogle Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research. 1998, 8: 186-194.PubMedView ArticleGoogle Scholar
- Rost B: Twilight zone of protein sequence alignments. Protein Engineering. 1999, 12: 85-94. 10.1093/protein/12.2.85.PubMedView ArticleGoogle Scholar
- KEGG pathway database . [http://www.kegg.com]
- BioCyc database . [http://biocyc.org/server.html]
- Swiss-Prot database . [http://www.expasy.ch/sprot/]
- Carson BD: Microbodies in fungi. A review. Journal of industrial microbiology. 1990, 6: 1-10.1007/BF01576172.View ArticleGoogle Scholar
- Guda C, Subramaniam S: TARGET: a new method for predicting protein subcellular localization in eukaryotes. Bioinformatics. 2005, 21: 3963-3969. 10.1093/bioinformatics/bti650.PubMedView ArticleGoogle Scholar
- Yu CS, Chen YC, Lu CH, Hwang JK: Prediction of protein subcellular localization. Proteins-Structure Function and Bioinformatics. 2006, 64: 643-651. 10.1002/prot.21018.View ArticleGoogle Scholar
- Edwards JS, Covert M, Palsson B: Metabolic modelling of microbes: the flux-balance approach. Environmental Microbiology. 2002, 4: 133-140. 10.1046/j.1462-2920.2002.00282.x.PubMedView ArticleGoogle Scholar
- Bonarius HPJ, Schmid G, Tramper J: Flux analysis of underdetermined metabolic networks: The quest for the missing constraints. Trends in Biotechnology. 1997, 15: 308-314. 10.1016/S0167-7799(97)01067-6.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.