Upon start, the main window displays two list-boxes and the currently active buttons (Figure 2, A, B, and 2C, respectively). Changing the default values is possible at this stage through the "Setup" function and has to be done before initializing the analysis. The "Start" button (Figure 2, C) commences the analysis and all organisms currently present in the KEGG database will be displayed in a listbox (Figure 2, A). Single and multiple selections are possible and will be confirmed with the "Retrieve Organisms" button. The second listbox (Figure 2, B) will then be automatically populated with KEGG pathways. After selecting the desired pathways to be analyzed, the "Retrieve Pathways" button confirms the selection and starts the KEGG pathway mapping. Both, the organisms and the pathways present in the KEGG database at the time of the analysis can be selected independently from each other, with the exception of organism-specific pathways (i.e. ABC transporter or two-component regulatory systems). This guarantees use of the most flexible solution for selective comparative analyses between groups of organisms. By selecting all organisms and pathways, the given gene set can be compared against the complete KEGG database.
The user-selected organism and pathway combination is shown in a separate pop-up window (not shown). The current status of the KEGG pathway mapping is also shown in a separate log-window (not shown). In general, the right panel (Figure 2, C) harbors the user-guide interface and was designed to lead the user through the analyses in a step-by-step approach. By default, the organism and pathway confirmation automatically initializes the KEGG pathway mapping. If only the selected and retrieved protein sequences are required, or a manual start for pathway mapping is desired, the setup module allows the configuration for manual mode. KEGG pathway mapping can then be initiated with the "Submit to KEGG" button. Selected pathway/organism combinations are saved as an ASCII text file. The retrieved protein sequences are stored into a separate ASCII file and a Blast-compatible database is generated. For future analyses, the pathway/organism selection and the respective database can be re-used with different query protein sequence sets.
The possibility to re-use previous selections dramatically reduces the time needed to complete KEGG analyses, as retrieval of individual protein sequences from KEGG is omitted. In addition, Blast results obtained with the given query set can also be re-used. This shortens the run time further, enabling rapid mappings and analyses of pathways with varying relaxed or stringent threshold values.
The provided gene set will then be compared to the local database generated from the selected organism-pathway protein sequence combination using the BlastP algorithm. Blast hits featuring an e-value below the user-selectable threshold will be used to generate the marked KEGG pathway requests. Pathway maps are saved as GIF files and the URL for the respective KEGG pathway map including the corresponding BlastP results are stored separately in text files.
Results are displayed in a separate window. Figure 3 illustrates pathway mapping for the Glycolysis/Gluconeogenesis pathway (KEGG pathway code: 00010) using the ORFeome of L. acidophilus NCFM [10] as query set and the complete KEGG database as template.
In general, previously selected pathways are displayed by either their KEGG pathway code or full name. Alternative analyses can be displayed by changing the default mapping directory, using the "Directory" function (Figure 3, A). The selected pathway will then be graphically displayed and BlastP hits below the specified threshold are indicated as red boxes, bearing the respective EC numbers (Figure 3, C). Each marked element is shown by its EC-number code, numerically sorted, in a listbox (Figure 3, B). Upon selection of an entry, all BlastP hits below the threshold are sorted by ascending e-values and displayed accordingly (Figure 3, D). This workflow allows for a quick pathway mapping throughout a given gene set and those potentially involved in multiple pathways can be easily identified and analyzed.
In the example shown, the conversion of glyceronephosphate to glyceraldehyde-3-phosphate is mediated by a triosephosphate isomerase (EC 5.3.1.1). Selecting this entry from the EC entry list (Figure 3, B), highlights all query hits found in L. acidophilus below the defined threshold (Figure 3, D). Two entries below an e-value of 1e-120 were found, namely ORFs Lba699 (e-value: 1e-127) and Lba700 (e-value: 1e-131). Both entries show significant similarities to triosephosphate isomerases. Further analyses showed that the conversion of glyceraldehyde-3-phosphate to glycerate-1,3-bisphosphate and to glycerate-3-phosphate is mediated by Lba698 (EC 1.2.1.12, e-value 1e-176) and Lba699 (EC 2.7.2.3, e-value 0), respectively. The ambiguity found for EC 5.3.1.1 could be resolved and, consequently, the genome annotation was updated accordingly. More detailed analyses revealed the presence of the complete pathway for uptake and conversion of glucose into pyruvate and L-lactate. A more detailed analysis of the complete metabolic pathway reconstruction of L. acidophilus NCFM using PathwayVoyager is described elsewhere [10].
PathwayVoyager does not evaluate or extrapolate the displayed hits and the quality and significance of the results depend on the current content of the KEGG database. As with every predictive software, results should be carefully analyzed and seen in their genetic context to evaluate activities and potential substrate specifity-variances carried out by homologous enzymes. Results from previous analyses can be displayed by selecting the "View existing KEGG pathways" option in the PathwayVoyager main window (Figure 2, C).
Run times for PathwayVoyager may vary, depending on the number of selected pathways and organisms. Analysis of a complete genome of ~2,000 open reading frames (ORFs) using the complete KEGG database can be carried out in less than 36 h.