Analysis of Aspergillus nidulans metabolism at the genome-scale
© David et al. 2008
Received: 03 November 2007
Accepted: 11 April 2008
Published: 11 April 2008
Skip to main content
© David et al. 2008
Received: 03 November 2007
Accepted: 11 April 2008
Published: 11 April 2008
Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned a function.
In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated) to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data and getting further insight into this phenomenon in A. nidulans.
We demonstrate how pathway modeling of A. nidulans can be used as an approach to improve the functional annotation of the genome of this organism. Furthermore we show how the metabolic model establishes functional links between genes, enabling the upgrade of the information content of transcriptome data.
Aspergillus nidulans, also known as Emericella nidulans, as it can undergo sexual reproduction in its life cycle in addition to the non-perfect (asexually reproducing) form that characterizes aspergilli, is an important member of the filamentous fungal genus Aspergillus. This genus encompasses a large diversity of species of great medical and economical relevance. In the medical and agricultural fields, A. flavus and A. parasiticus represent major producers of mycotoxins (e.g. aflatoxins) that can contaminate important food and feed crops, while A. fumigatus may cause serious diseases in immuno-compromised animals and humans (e.g. invasive pulmonary aspergillosis). From a biotechnological viewpoint, Aspergillus species represent important industrial producers of diverse products, such as industrial enzymes (e.g. amylases by A. niger and A. oryzae), bulk chemicals (e.g. citric acid by A. niger), and pharmaceuticals (e.g. lovastatin, a cholesterol lowering-agent, by A. terreus).
Whereas the first efforts made in fungal genome research have focused on yeasts, there has been an increasing focus on filamentous fungi due to their medical, agricultural and biotechnological importance. There are quite large differences between yeast and most filamentous fungal genomes, with the latter exhibiting larger genomes owing to larger centromers and lower gene density per nucleotide length as well as the presence of far more genes. Furthermore, many of the filamentous fungal genes have a more complex structure due to the presence of multiple introns . A. nidulans has become one of the model organisms of choice for filamentous fungal genome research as it is a representative of the important group of aspergilli, but also because this fungus has served as a model organism for studies of cell development and gene regulation . It is one of the most extensively studied organisms in the fields of genetics and biochemistry, and this is obviously of great value in the identification of the function of orphan filamentous fungal genes and characterization of the biological roles of their products.
Genome-sequencing projects of several Aspergillus species have recently been completed (A. fumigatus, A. nidulans, A. niger, A. oryzae, A. parasiticus) or are nearing completion (A. flavus, A. terreus) [3, 4]. In particular, the genomic sequence of A. nidulans (strain FGSC A4) was released by the Broad Institute of MIT and Harvard, with a13-fold coverage, in Spring 2003 . The size of its genome is approximately 31 Mb, and it is organized in 8 chromosomes. 9,541 open reading frames (ORFs) were predicted using automated gene prediction tools (FGENESH, FGENESH+, and GENEWISE), and PFAM (protein family)  matches were identified by Hmmer analysis. However, due to the highly conservative criteria adopted in the gene naming process, and also due to the relative low number of genes characterized before whole genome sequencing, more than 90% of all ORFs identified are called hypothetical or predicted proteins.
In order to improve the predictions and improve the annotation, automatically assigned genes should be subjected to manual curation. Herein, integration of different types of data and combination of diverse genomic tools play a major role. Functional assignments of genes based on genomic data may be complemented with information providing biochemical and physiological evidence. Of special importance in functional genomics are high-throughput data generated by post-genomic techniques (e.g. transcriptome data using hybridization arrays ), which provide genome-wide screens of gene function [3, 8, 9]. In addition to similarity-based tools (e.g. BLAST , FASTA ), a number of methods are available for comparative genomics that combine various types of genomic evidence, such as protein fusion events , gene clustering on the chromosome , occurrence profiles or signatures , shared regulatory sites , which enable the so-called "gene context analysis" .
Metabolic reconstructions can give a valuable contribution to functional genomics, by uncovering missing metabolic functions (i.e., functions not assigned to genes) and hence putting forward the identification of the corresponding genes [16, 17]. The underlying idea is that larger functional systems (e.g. metabolic pathways) require the presence of their components or elementary functional units (e.g. enzymatic steps) in order to be operative. Hence, in this approach, efforts are directed towards discovering (uncharacterized) genes within the sequenced genome that have a defined role in the metabolism, in opposition to strategies that aim at predicting the functions of a given set of genes known to be present. Comparative genome analysis may then be accomplished with well-characterized genes of related organisms, by employing diverse tools of comparative genomics (based on sequence similarity and gene context), and a prioritized list of potential candidate genes for the function of interest generated. The functional assignment of genes may be verified eventually by using experimental techniques. This approach has been previously described by Osterman and Overbeek , and a computational method has been developed for automated large-scale predictions of protein function . This framework has been applied to prokaryotes (Thiobacillus ferrooxidans ) and eukaryotes (human ), and here we used it for the annotation of metabolic genes within the genome of A. nidulans.
In particular, we focused on the functional annotation of the genes involved in the central metabolism of this fungus, as well as on the biosynthetic pathways of relevant secondary metabolites. For the purpose, we reconstructed the central metabolic network of A. nidulans, based on detailed metabolic reconstructions of other eukaryotes, namely A. niger , Saccharomyces cerevisiae , and Mus musculus . In what concerns the secondary metabolism, pathways for the biosynthesis of penicillin in Penicillium chrysogenum  and aflatoxins in different species of Aspergillus  were used as templates. Pathway prediction was also supported by available information on the biochemistry and physiology of A. nidulans. Our analysis assigned functions to 472 orphan ORFs in the metabolism of A. nidulans, by employing similarity-based tools of comparative genomic analysis (BLAST) and using public (non-redundant) databases of genes and proteins of established function . These functions represented either missing enzymatic activities, for which no ORFs had been previously identified, or previously assigned functions, for which additional isogenes were discovered. The functional assignments made for the individual ORFs were integrated into a mathematical model that can be used to simulate metabolic behavior, thereby enabling a comprehensive and integrative analysis of metabolic functions in A. nidulans. Furthermore, the information contained in the metabolic reconstruction can be exploited for the analysis of large-scale transcription data. To illustrate this, the reconstructed metabolic network of A. nidulans was used in connection with an algorithm developed by Patil and Nielsen  for the large-scale analysis of gene expression profiles, in particular for studying transcriptional responses to specific genetic changes in A. nidulans (deletion of the regulatory gene creA) . In a previous study, the metabolic network reconstructed in this work was employed in the study of the effects of changes in the environmental conditions (carbon source) on the transcriptome profiles of A. nidulans .
Number of ORFs associated to the metabolic reactions in the metabolic reconstruction for A. nidulans.
Part of metabolism
No. of previously annotated ORFs 
No. of newly annotated ORFs
Total no. of ORFs1
Nitrogen and sulphur metabolism
Total number of biochemical conversions and transport processes in the reconstructed metabolic network of A. nidulans and comparison with the metabolic networks of A. niger (316 unique reactions) and S. cerevisiae (843 unique reactions).
Part of metabolism
Total no. of metabolic reactions network (no. of unique)
No. of unique reactions common to other metabolic networks
A. niger 1
1095 (681 1 )
Nitrogen and sulphur metabolism
Polymerization, assembly and maintenance
Table 2 presents the number of biochemical reactions involved in each part of the metabolism considered in the analysis, as well as the number of transport processes. In addition, Table 2 shows the number of unique reactions and transport processes predicted to participate in the different parts of the metabolism of A. nidulans that are common to the metabolic networks of A. niger and S. cerevisiae.
The reliability of the functional assignments was evaluated according to the criteria described in Materials and Methods, and the candidate ORFs were classified into categories, as shown in Additional file 1. The definition of these criteria, in particular of the cut-off values in the E-values of BLAST searches, was found to strongly determine the number of candidate ORFs to consider for each metabolic function, and hence the reactions to include in the metabolic network for A. nidulans. The lists of the ORFs considered for each of the metabolic functions, for different stringencies of the criteria (i.e. cut-off in E-values) is presented in Additional file 2, whereas the data presented in Additional file 1 refer to a cut-off E-values of 1E-50.
After employing the algorithm described above, some of the metabolic functions predicted to occur in A. nidulans still remained without a link to a specific ORF in the genome. These metabolic functions corresponded to biochemical conversions for which no ORFs (hits) were identified by homology-based comparative analysis, or to those candidate ORFs that were subsequently neglected for not complying with the criteria considered (see Results - Evaluation of functional assignments). Nevertheless, 33 of these metabolic functions (or biochemical reactions) were considered to be part of the metabolic network, since they were essential for growth (see Results - Essential genes).
The metabolic reconstruction served as a basis to develop a mathematical model that describes the stoichiometry of all the metabolic processes in A. nidulans. The model comprised 1213 metabolic reactions, of which 1095 were biochemical transformations and 118 were transport processes. In addition, the model included 732 metabolites that could be balanced, i.e. metabolites whose net rate of their formation could be balanced with their net rate of consumption. Out of the 1213 reactions, there were 794 unique reactions (681 unique biochemical conversions and 113 unique transport processes), i.e. 419 of the reactions in the metabolic network were redundant. All the reactions in the metabolic network are listed in Additional file 1, along with a list of the abbreviations considered for the metabolite names (Additional file 2). Compartmentation was considered and the allocation of the biochemical conversions and metabolites to the different intracellular compartments (cytosol, mitochondria, and glyoxysomes) was based on the metabolic models for A. niger and S. cerevisiae. Besides catabolic and biosynthetic pathways, the model also included polymerization reactions and a reaction describing the formation of biomass, which was considered as a drain of building blocks or macromolecules in appropriate ratios to produce 1 mmol of monomers in the macromolecule or 1 g dry-weight (DW), respectively. These ratios were calculated based on the composition of the macromolecules in terms of building blocks (taken from A. oryzae ) and on the composition of biomass in terms of macromolecules (taken from A. nidulans  for lipids and A. oryzae  for the remaining macromolecules). A reaction representing the consumption of ATP for non-growth associated purposes was also included in the metabolic model. The ATP costs in the polymerization of amino acids and nucleotides were predicted based on reports for P. chrysogenum . Furthermore, the ATP requirement for the assembly of macromolecules (19 mmol ATP/g DW) and for maintenance (2.85 mmol ATP/(g DW.h)) were estimated based on experimental biomass yields of A. nidulans grown in glucose-limited chemostat cultures for different dilution rates . The values calculated were comparable with those reported for A. niger and P. chrysogenum .
Essential ORFs in A. nidulans for growth on any of the four carbon sources investigated.
Part of the metabolism1
No. of essential functions assigned to ORFs (total no.)
Tricarboxylic Acid Cycle
Coenzyme A and pantothenate biosynthesis
AN1778.2, AN2526.2, AN0205.2, AN9446.2
AN5794.2, AN4234.2, AN9094.2
Amino acid metabolism
AN7722.2, AN8770.2, AN1150.2, AN4409.2, AN1883.2, AN2914.2
Glutamate and glutamine metabolism
Glycine, serine and threonine metabolism
AN8859.2, AN4793.2, AN2882.2
AN3748.2, AN2293.2, AN6536.2, AN0717.2, AN7044.2, AN7430.2
Branched chain amino acid metabolism
AN4323.2/AN7878.2/AN5957.2, AN4956.2/AN4430.2, AN6346.2, AN0840.2
AN8519.2, AN5610.2, AN5601.2, AN2873.2
AN1263.2, AN4443.2, AN8277.2, AN1222.2
Aromatic amino acids metabolism
AN0708.2/AN8886.2/AN4350.2, AN5731.2, AN6866.2, AN6338.2, AN5959.2, AN3695.2, AN3634.2, AN0648.2, AN6231.2/AN5444.2, AN4577.2, AN5200.2, AN1689.2, AN0648.2
AN1395.2, AN6637.2, AN6541.2, AN5922.2, AN8121.2, AN3626.2, AN4739.2, AN4464.2, AN0893.2, AN5716.2, AN5566.2
AN0961.2, AN5909.2, AN5884.2, AN6157.2, AN4258.2, AN8213.2, AN3581.2, AN7028.2, AN0490.2
Fatty acids metabolism
AN5599.2, AN6139.2, AN5166.2, AN5661.2, AN2154.2, AN1376.2, AN2261.2, AN6610.2, AN6580.2, AN6712.2/AN6211.2/AN7604.2
AN4923.2, AN3869.2, AN2311.2, AN4414.2, AN0579.2, AN8012.2, AN3376.2, AN7751.2, AN5585.2, AN7146.2, AN0451.2, AN4042.2
The reconstructed metabolic network was used in combination with data from large-scale transcriptional studies conducted with A. nidulans, in order to detect overall metabolic responses to specific genetic and environmental perturbations, using an algorithm developed by Patil and Nielsen . This algorithm integrates gene expression data with topological information from metabolic models and enables the identification of small and coordinated changes in expression levels due to genetic or environmental perturbations. By using this algorithm, it is possible to identify highly regulated or reporter metabolites (i.e. metabolites around which the most significant changes in transcription occur) and highly correlated subnetworks (i.e. sets of connected genes with significant and coordinated transcriptional response to a perturbation), which enable to uncover metabolic responses to perturbations from transcriptional profiles.
Top 30 reporter metabolites.
Effect of growth medium
Effect of genotype
N-Acetyl-L-glutamate 5-phosphate (mitochondrial)
N-Acetyl-L-glutamate 5-phosphate (mitochondrial)
N-Acetyl-L-glutamate 5-semialdehyde (mitochondrial)
N-Acetyl-L-glutamate 5-semialdehyde (mitochondrial)
Versiconal hemiacetal acetate
Methanol, ethanol, and acetaldehyde were also found as very highly ranked reporter metabolites in the list concerning the effect of the medium, which is an indication of the differential regulation of a whole set of dehydrogenases, and is also suggested by the emergence of NAD+ and NADH as reporter metabolites. Moreover, L-xylulose and D-sorbitol, which are involved in the metabolism of polyols, were ranked second and third, respectively, in the list of reporter metabolites concerning the effect of the genotype. Furthermore, it was observed that metabolites participating in the formation of secondary metabolites (e.g. sterigmatocystin) were among the 30 most highly regulated metabolites.
Analysis of reporter metabolites and neighboring enzymes, using the "GO term finder".
Effect of growth medium p-values for different media
Effect of genotype p-values for different strains
Carboxylic acid metabolism
Carboxylic acid metabolism
Organic acid metabolism
Organic acid metabolism
Generation of precursor metabolites and energy
Amino acid biosynthesis
Energy derivation by oxidation of organic compounds
Amino acid biosynthesis
Amino acid metabolism
Amino acid and derivative metabolism
Main pathways of carbohydrate metabolism
Nonprotein amino acid biosynthesis
Aspartate family amino acid metabolism
Tricarboxylic acid cycle intermediate metabolism
Nitrogen compound biosynthesis
Nonprotein amino acid metabolism
Amino acid metabolism
Nitrogen compound metabolism
Urea cycle intermediate metabolism
Amino acid and derivative metabolism
Glutamine family amino acid metabolism
Nonprotein amino acid biosynthesis
According to the Aspergillus nidulans Database , only 194 of the metabolic ORFs included in the metabolic model developed for A. nidulans had been assigned a function in the metabolism before this study, and hence there was a large potential for functional annotation of metabolic genes. This potential was demonstrated by the functional assignment of 472 orphan ORFs. Yet, a number of issues arose in the course of the annotation of the metabolic genes in A. nidulans.
One of these issues was related to limitations inherent to a pathway-driven gene finding approach. In particular, the use of metabolic networks from selected organisms as references does not allow the identification of metabolic functions that are not present in these templates. In this way, additional or alternative reactions or pathways to these templates that may make part of the metabolic network of A. nidulans will not be predicted to exist, and subsequently the corresponding genes will not be identified using this methodology. In order to minimize this, the metabolic networks used as reactions databases in this work concerned organisms closely related to A. nidulans, and thus the major part of the metabolic reactions present in this fungus were likely to be covered.
Another issue is that our identification of candidate ORFs encoding a given function relied on comparative analysis, based on sequence similarity between the proteins in A. nidulans and other organisms. However, there probably exist enzymes in A. nidulans that are encoded by genes with low similarity to those encoding the same function in other organisms. Hence, queries based on BLAST searches will not result in identification of such genes. This problem was partially overcome by using, as queries, proteins of phylogenetically related organisms, for which many genes are likely to be conserved.
As alternatives to analytical tools relying on sequence similarity, there are tools based on genome context. However, gene clustering on the chromosome and analysis of shared regulatory sites as well as motif profiling are mainly relevant for comparative genomics in prokaryotes .
The selection of ORFs among the candidates encoding each function involved the specification of certain criteria and the development of a classification system. Hereby the reliability of the assignment of specific metabolic functions to candidate ORFs could be evaluated objectively and systematically, and furthermore it was possible to classify each annotation into categories according to the criteria defined. These criteria were essentially based on the E-values of BLAST searches and on the consistency between the function of interest and the function of the BLAST hits. All ORFs that were classified into at least one of the categories defined (A/A*, F, Y/Y* and O/O*, see Material and Methods - Evaluation of functional assignments) were considered in the metabolic model, whereas the ORFs that did not fall into any of the categories were not incorporated in the model. The categories A*, Y*, and O* were included to take into account also those ORFs for which the results from the comparative analysis were not conclusive, but did not contradict the function in question (e.g. aldose reductase versus xylose reductase; mitochondrial isocitrate dehydrogenase versus cytosolic isocitrate dehydrogenase). In this way, we aimed at identifying all ORFs that were associated to a given function (which might also represent isoenzymes or subunits of enzyme complexes), even though, by not being so stringent, we may have run into the problem of also finding false positives. Furthermore, assignment of ORFs to these categories may be valuable in future annotation studies.
The choice of the cut-off E-value in the BLAST searches is obviously determining the quality of our annotation. A number of factors influence the E-values in BLAST searches, such as the length of the queries and the size of the databases. Hence, the candidate ORFs were ranked according to the score to length ratio, as it hereby was possible to compensate for the fact that smaller proteins lead to higher E-values (smaller scores) in BLAST searches, because the probability of finding them by chance in the database is higher. Moreover, in order to have an idea on the order of magnitude of reasonable E-values for the different BLAST searches, previously annotated ORFs were used as positive controls. However, universal criteria may not apply, since the proteins' sequences may have diverse properties in the different parts of the metabolism (size, content, etc). Therefore, a sensitivity analysis was performed, in which different cut-off values in the E-values were considered in the BLAST searches. The stringency of this criterion determined the number of metabolic functions assigned to ORFs, as well as the number of ORFs considered for each function (see Additional file 3). A cut-off of 1E-50 in the E-values was found to be reasonable and hence this value was chosen for all BLAST searches in the selection of candidate ORFs and development of the metabolic model.
On the other hand, the differences in the order of magnitude of the E-values in the different BLASTP searches carried out can explain the fact that some ORFs were not classified into the category O (i.e., have an homolog, different from the query sequence, in the NCBI protein database ). It would be expected that all candidate ORFs would fall into this category, since these were found using as queries, protein sequences retrieved from this database. However, by setting the same cut-off in the E-values for all BLASTP searches, we may have been too stringent in some cases, namely when searching homologs for the candidate ORFs in the protein database at NCBI, which is a very large database and hence the probability of finding the query sequence by chance is high, resulting in high E-values.
As mentioned above, all BLAST hits for a given function whose E-values were below the cut-off and that were classified into at least one of the categories described were considered to encode that function. However, in some cases, the discrimination between ORFs encoding isoenzymes and subunits of enzyme complexes was difficult, particularly for those cases in which there was no information available for A. nidulans or this could not be directly extrapolated from other organisms. For example, even though in S. cerevisiae there are two genes (YDR341C and YHR091C) encoding the enzyme arginyl tRNA synthetase (EC 126.96.36.199), only a single ORF (AN6368.2) was identified in A. nidulans, through BLASTP searches. Similarly, a single ORF was found in A. nidulans (AN5610.2) for L-aminoadipate-semialdehyde dehydrogenase (EC 188.8.131.52), which is a complex enzyme in yeast containing two subunits (a large and a small subunit), encoded by two different genes (YBR115C and YGL154C, respectively).
This study therefore shows clearly that the outcome of comparative genomics is highly dependent on the criteria chosen, and also highlights the importance of the application of a systematic evaluation to the obtained annotations.
Transcriptomics has become a cornerstone of functional genomics over the last decade and the analysis and interpretation of microarray data has proven to be a new challenge. Even for well-studied model organisms, the analysis and interpretation of transcriptome data is often hampered by the numerous genes that have no function assigned. This becomes even more evident when transcriptomics moves on to genomes whose annotation barely relies on automated gene prediction. In these cases, basically any additional information can be very helpful for the analysis and interpretation of data. As shown previously by our group, the metabolic network reconstructed in this work can be valuable for upgrading the information content in transcriptome data , i.e. to provide a link between gene expression profiles and integrated metabolic functions. The importance of the reconstructed metabolic network in the analysis of transcriptome data of A. nidulans was again illustrated in this work with a study concerning transcriptional responses to changes in the growth medium composition (glucose and ethanol) of a wild type strain and a mutant strain with deletion of the regulatory protein CreA . The initial transcription data set used comprised 3,278 ORFs, of which 571 ORFs had an assigned function in the metabolic network. Although the selected expression data subset (consisting of 571 ORFs) did not cover the whole metabolic network reconstructed for A. nidulans, the reporter metabolites identified after applying the method from Patil and Nielsen , still provided valuable information on the underlying metabolic changes, which reflects the robustness of the method.
For both categories (differential expression according to growth medium or genotype), the highest ranked reporter metabolites were involved in the biosynthesis of amino acids (arginine and threonine, respectively) (Table 4), which was not expected, since it was anticipated that the different environmental/genetic conditions would mainly affect the carbon metabolism. In particular for the transcriptional responses due to changes in the carbon source, the involvement of the amino acid metabolism was further reflected in many of the other reporter metabolites (Table 4).
On the other hand, this approach may play an important role in functional genomics, by giving insight into the perturbations that lead to specific transcriptional responses (for example, in the analysis of strains that are mutated in genes with unknown function). In fact, from the list of reporter metabolites presented in Table 4, it is possible to deduce that major changes occurred in the metabolism of carbon compounds in A. nidulans, in particular in the metabolism of ethanol (from reporter metabolites, such as ethanol, acetaldehyde, acetate, acetyl-CoA, etc.). However, tracing back the underlying perturbations from the topology of the network and observed transcriptional responses is still a challenging task, even for well-studied organisms.
Furthermore, the reconstructed metabolic network for A. nidulans allows the utilization of tools that have been developed by the yeast community for the analysis of -omics data. This represents an advantage, because one can rely on the linkage of the expression data to a specific reaction rather than to a specific gene. After identification of the reporter metabolites and the reactions in which they participate, the reconstructed metabolic network of S. cerevisiae can be used to identify ORFs in yeast that are involved in the same reactions and use these for further analysis. As an example, the "GO term finder" available at Saccharomyces Genome Database was used to identify specific parts of the metabolism related to the reporter metabolites. The results (reporter metabolites) obtained based on the differential expression according to the medium revealed many GO categories that could be directly linked to the change of the carbon source from glucose to ethanol, and therefore again showed that by using this approach one can obtain a birds-eye view on the major metabolic changes. The results based on the changes in the genotype of the strains, i.e. deletion of the carbon repression mediator CreA, showed that the highest ranked reporter metabolite, as well as other lower ranking ones, was involved in the amino acid biosynthesis, which was also confirmed by the GO term analysis. This linkage between glucose repression and amino acid biosynthesis was further substantiated by the finding that a defined set of genes that is regulated via CreA also has binding sites for CpcA (homologue of GNC4 from S. cerevisiae) that is a major regulator of the amino acid metabolism . This shows that apparently the CreA protein is not only involved in the mediation of carbon repression, but plays an even more global role in the metabolism of the cell. The emergence of polyols in the list of reporter metabolites confirms previously reported results [36, 37], and the expression data therefore allow tracking down the origin of these changes in the metabolism.
In this work, we illustrated the use of a pathway-driven approach to improve the functional annotation of the genome of A. nidulans. Moreover, we showed how the metabolic reconstruction establishes functional links between genes, enabling the upgrade of the information content of transcriptome data.
The central metabolism of A. nidulans, as well as selected pathways from its secondary metabolism, was reconstructed using as templates detailed metabolic reconstructions of other organisms. For the reconstruction of the metabolism of carbon compounds in A. nidulans, a detailed metabolic reconstruction for A. niger  was used as reaction database, whereas the metabolisms of amino acids, nucleotides, and lipids were predicted based on reactions identified in S. cerevisiae , for which a more comprehensive reconstructed metabolic network was available. As the energy metabolism of S. cerevisiae is simpler than for many other eukaryotic cells, the energy metabolism of A. nidulans was reconstructed using as references the corresponding pathways from both yeast  and mouse . The biosynthetic routes of secondary metabolites, namely penicillin and aflatoxins, were based on detailed studies of these pathways in P. chrysogenum  and in different species of Aspergillus , respectively. Evidence for the presence of these pathways in A. nidulans was in part supported by available genomic data, such as previously annotated ORFs , sequenced and cloned genes (with or without an ORF associated) , Expressed Sequence Tags (ESTs)  and Tentative Consensus sequences (TCs) . The metabolic pathways in A. nidulans were also predicted based on biochemical evidence given by reports on isolation and characterization of enzymes in this fungus [41, 42]. Furthermore, the inclusion of some reactions in the metabolic reconstruction of A. nidulans was supported solely by physiological evidence (e.g. the ability to grow on a certain sugar as the sole carbon and energy sources require a transporter and a complete pathway for metabolism of this sugar).
The directionality and reversibility of the reactions included in the metabolic network of A. nidulans were based on those in the metabolic networks of other organisms that served as templates for the reconstruction.
Once the metabolic network of A. nidulans was reconstructed and the missing functional roles identified, the genome of A. nidulans was surveyed for the genes encoding the corresponding enzymes, by employing comparative genomics tools based on sequence similarity. This approach was also applied for the identification of ORFs encoding metabolic functions that had been previously assigned to genes in A. nidulans, aiming at identifying isogenes. The sequences of a set of proteins with the designated enzymatic activity in other organisms (queries) were compared with the set of all predicted proteins in A. nidulans . The protein sequences used as queries were obtained from the (non-redundant) NCBI protein database  (searched by the corresponding enzyme commission numbers whenever possible), and represented well-characterized enzymes in other organisms, preferentially in organisms taxonomically related to A. nidulans (e.g. fungi or eukaryota). BLASTP (blosum62) was used for comparative analysis and assigning metabolic functions to ORFs in the genome of A. nidulans. The selection of the best candidate ORFs for a given function relied on the cut-off in the expectation values (E-values) considered, as well as on the score-to-sequence length ratio. Protein sequences or translated nucleotide sequences were used in the comparisons, rather than nucleotide sequences, due to the presence of introns, which characterizes eukaryota, and thus filamentous fungi. For the cases in which no hits were generated in this way, the translated nucleotide sequences of genes encoding the specific enzymes (obtained from the (non-redundant) NCBI nucleotides database ) were used as queries, and BLASTX was employed for generating putative assignments.
Classification of candidate ORFs into categories, according to several criteria.
Functional annotation (A)
Comparison of the previous functional annotations for the candidate ORFs  with the function of interest
function assigned to the candidate ORF consistent with function of interest
function assigned to the candidate ORF not contradicting function of interest (e.g. aldose reductase versus xylose reductase; mitochondrial isocitrate dehydrogenase versus cytosolic isocitrate dehydrogenase)
no function assigned to the candidate ORF (e.g. hypothetical protein)
function assigned to the candidate ORF not consistent with function of interest
Protein families (F)
protein family of candidate ORF consistent with protein family of function of interest
no protein family associated to the candidate ORF
protein family of candidate ORF not consistent with protein family of function of interest
Comparative genomics with yeast (Y)
function of the protein in yeast (that has a match with the candidate ORF) consistent with missing function
function of the protein in yeast (that has a match with the candidate ORF) not contradicting missing function (e.g. aldose reductase versus xylose reductase; mitochondrial isocitrate dehydrogenase versus cytosolic isocitrate dehydrogenase)
no match of protein in yeast with the candidate ORF
function of the protein in yeast (that has a match with the candidate ORF) not consistent with missing function
Comparative genomics with other organisms (O)
BLASTP searches of candidate ORF in A. nidulans against (non-redundant) protein database at NCBI 
functions of proteins in other organisms (that have a match with the candidate ORF) consistent with missing function
functions of proteins in other organisms (that have a match with the candidate ORF) not contradicting missing function (e.g. aldose reductase versus xylose reductase; mitochondrial isocitrate dehydrogenase versus cytosolic isocitrate dehydrogenase)
no matches of candidate ORF with proteins in other organisms
functions of proteins in other organisms (that have a match with the candidate ORF) not consistent with missing function
Previous annotations of the candidate ORFs (if available in the Aspergillus nidulans Database  or in The Aspergillus nidulans Linkage Map ) were compared with the function in question. The candidate ORFs were classified into the categories A, A*, A-X or A-NA, depending on the availability and consistency of the results of the comparison (see Fig. 3 and Table 6).
The protein domains originally predicted for the candidate ORFs  were compared with those of the enzymes catalyzing the functions of interest , and the candidate ORFs were classified accordingly into the categories F, F-X or F-NA (see Fig. 3 and Table 6).
The proteins in S. cerevisiae  were used as queries in BLASTP searches against all predicted proteins in A. nidulans , and the correspondences between ORFs in this yeast and A. nidulans generated in this way were used to evaluate the reliability of the candidate ORFs. This was accomplished by comparing the functions assigned to the ORFs in yeast with those assigned to the candidate ORFs. The candidate ORFs were then classified into the following classes Y, Y*, Y-X or Y-NA (see Fig. 3 and Table 6).
Potential homologs of the candidate ORFs encoding the metabolic functions in A. nidulans were surveyed in other organisms, through BLASTP searches of the former against the (non-redundant) protein database at NCBI . The corresponding functions were compared and the candidate ORFs were classified into the categories O, O*, O-X or O-NA (see Fig. 3 and Table 6).
The BLAST searches often yielded several hits (or candidate ORFs) with E-values lower than the cut-off value (or with similar score-to-length ratios) for a given function. These hits could potentially correspond to ORFs encoding isoenzymes or subunits of enzyme complexes. In these situations, all candidate ORFs were considered and classified according to the criteria described above. In some cases, the existence of these hypothetical isoenzymes or subunits in A. nidulans was supported by information concerning this fungus available in the literature. Otherwise, information was extrapolated from A. niger or S. cerevisiae.
Multi-enzyme complexes and isoenzymes were represented in a different way in the metabolic reconstruction. All subunits considered to belong to a same multi-enzyme complex were associated to a single reaction in the metabolic network, whereas isoenzymes were represented as independent reactions. For example, in the metabolic network of A. nidulans, the pyruvate dehydrogenase complex is represented by two subunits (AN5162.2/AN9403.2), which are associated to a single reaction (EC 184.108.40.206). On the other hand, two isoenzymes of glutamate dehydrogenase (AN7451.2 and AN4376.2) were associated to two different reactions in the metabolic network, a NAD+- and a NADP+-dependent reaction, respectively. Reactions catalysed by multi-enzyme complexes were included, if at least one of the subunits was identified.
Another issue that was addressed was related to the assignment of different functions to the same ORF. These ORFs could possibly encode multifunctional proteins, and hence these were surveyed for the existence of multiple protein domains to verify the hypotheses.
The reconstructed metabolic network of A. nidulans, including reactions encoded by previously annotated genes as well as metabolic functions initially missing, served as a basis to develop a stoichiometric model. The model was subsequently used for simulating microbial growth and for gene deletion analysis, by employing flux balance analysis and linear programming methods [21, 43].
The reconstructed metabolic network was used in combination with a computational method published by Patil and Nielsen  for analysis of large-scale gene expression data referring to a study on glucose repression in A. nidulans. This method enables the identification of so-called reporter metabolites and metabolic subnetworks, based on their interconnectedness within the metabolic network through common metabolites and on information about changes in the expression level of the genes.
This approach was applied to analyze expression data concerning a reference strain and a creA deleted strain of A. nidulans, grown on different carbon sources, specifically glucose and ethanol . Furthermore, the top 30 reporter metabolites identified based on the changes in gene expression between the different carbon sources (glucose versus ethanol) or genotype of the strains (reference versus creA deletion mutant) were used in combination with information on the topology of the metabolic network of S. cerevisiae  for further analysis, using tools available at the Saccharomyces Genome Database .
H.D. was funded through a research fellowship (SFRH/BD/3110/2000) of the III Community Support Framework financed by the European Social Fund and by a Portuguese National Fund from the Ministry of Science and Technology. İ.Ş.Ö. was awarded a NATO-B1 Post Doc research fellowship by The Scientific and Research Council of Turkey (TUBITAK).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.