PRODIS has been implemented on the server using Apache web server and a MySQL server version 5.0 running over a Pentium 2.5 GHZ machine using Linux Suse distribution 11.0. The database construction uses a relational approach and data indexes to associate projects to experiments, the experiments to their information and results and to each other. The PRODIS database is accessed using a browser through a web based interface that uses PHP scripts to communicate with the database. Experimental data resulting from analysis aided by instruments are inserted in the database using the web interface as well with the help of parsers that interpret the data directly from files generated by the instruments used in the analysis. Perl scripts are used for submission of m/z files to identification tools installed on the server (X-Tandem and OMSSA) or accessed remotely (Mascot).
PRODIS data model and scope
Proteomic analysis can be performed using a diversity of experimental setups. One of these is the use of a separation platform such as 2D-PAGE and/or LC followed by a run in an identification platform that uses MS. A data model designed for a Proteomics data management system has to take this into consideration in order to represent the sequence of experiments correctly. PRODIS data model has been designed considering these needs and includes three main groups of tables: (i) those designed to store general information on projects such as name, description, coordinator, members, associated publications, etc; (ii) those designed to store protocols and experimental conditions; and (iii) those designed to store results and result-associated files. A simplification of the data model can be seen in Figure 1, the full model has 46 tables and can be downloaded from the PRODIS web site.
PRODIS maintains two types of information, the conditions under which an experiment has been performed and its results. Experimental conditions stored include identification of samples, solvents, temperature, instrument used for the analysis, etc. Experiment results stored are different for each type of experiment. For 2D-PAGE, PRODIS stores gel images and Image Master Platinum (GE HealthCare) files; for LC peak lists are stored, and for MS m/z lists and protein identification files are stored in the database. The PRODIS data model has the experiment as central entity and handles three types of data: LC, 2D-PAGE and MS. For each type of experiment there is a set of tables associated to it and entries associated with the experiment. These entries store the experimental conditions, the protocol used to perform it, one or more specific results (a list of chromatography peaks, images of gels, m/z lists, etc.) and all the information regarding the project to which it belongs. Images and chromatogram files are stored outside of the database, with links to these files stored in the database along with all results. As a result of its design, in PRODIS, the graphics (chromatograms or gel images) are directly associated to the experiments and the specific samples used in a prior stage of the experiment, making easy to track and control all steps executed up to protein identification.
The experiment is the main component of the model because PRODIS is aimed primarily at assisting researchers in tracking experiments and cross-linking experimental data. The data flow starts when a new experiment is performed and its information is entered in the experiment table. Each experiment receives an internal ID that identifies it uniquely. According to the type of experiment the tables associated to that type of experiment are also filled and associated to the internal ID. Experiments are often related to other experiments, such as when a particular result prompts the researcher to perform a more detailed analysis on a peak or spot. PRODIS stores the associated experiments by cross-linking the internal IDs which allows it to establish a temporal link, so that the sequence of experiments performed can be followed later. This complex set of operations can be visualized by the user as an experiment tree in reports produced by the system (Figure 2B).
The data model proposed has been designed using as model data from the proteomic analysis of the centipede Scolopendra viridicornis (LC) and the parasite Schistosoma mansoni (2D-PAGE and MS). Final analysis of these data is available on [17] and [18].
Experiment tracking
The main feature of PRODIS developed to help improve proteomic analysis is the ability to track all steps used in the experimental process, from sample extraction to protein identification. This feature, known as Experiment tracking, is implemented in PRODIS by associating related experiments in the database through the use of ids to tag these related experiments. When an experiment is inserted in the system the user has the option of associating it with an existing experiment. This generates what is called in PRODIS an experiment tree, which contains the relationship between experiments. In this way it is possible to identify exactly how experiments are related to one another, assisting the researchers in controlling the flow of experiments performed, the samples used and also in explaining the results and how they were obtained In order to identify which sample has originated a given result, one can simply look for the first experiment in the experiment tree of the experiment being analyzed. The first experiment data will contain the information on the sample that has been used in the analysis. In a similar way, given a specific sample which has been used in an experiment, the experiment tree for it will identify all experiments that use this sample. Experiment tracking can provide even richer information, however. It is possible to identify, for example, which spot in a given gel has originated a certain m/z list. The experiment tree for the specific m/z list provides this information.
The experiment tree is not tied to a certain type of experiment. It can identify related experiments from any given experiment. In this way the researcher can trace back or forward from any point establishing the causes or consequences of any step in the experimental process, helping to correct problems in the experimental process, or improve it by uncovering relationships that are hidden in the data.
Figures 2 and 3 illustrate how the experiment tree can be used in PRODIS. At this example, samples obtained in Experiment 1 (protein extraction) were used in Experiments 100, 146 and 174. The resulting products are used in Experiments 101 and 148 (Figure 2-A). After finishing the experiment, a report containing the experiment tree is generated (Figure 2-B), where the above relationships are described. The experiment tree shows which experiments are related to each other and in which order. As any experiment details or results can be consulted through the View experiments option in PRODIS, the experiment tracking can be also performed by using this option. When using the View experiments option all data related to the experiment can be seen. In the bottom of the page, PRODIS shows the experiments that are parents and children of the current experiment. Figure 3 shows the information using the View experiment option for Experiment 1. On the bottom of the page can be seen the experiments generated from Experiment 1 : Experiments 100, 146 and 174. By clicking in the link with the experiment name the user can navigate the experiment tree and efficiently locate the experimental data needed.
Submission and data retrieval
PRODIS is designed to store general information on projects and experiments, LC data from the AKTA Explorer 100 (Amersham BioSciences) HLPC system, 2D-PAGE data generated using the Image Master 2D-Platinum system (GE HealthCare), m/z files from MS (mzXML, pkl, mzML and mzData formats) and identification files from Mascot, X-Tandem and OMMSA runs. At the moment PRODIS has been designed to small Proteomics facilities and does not present functionalities for on-line LC-MS setups. The screens to collect data for this kind of setup are currently being developed. Data can be submitted to PRODIS through the web interface in a friendly and intuitive way. PHP scripts were designed to upload the data inserted through forms in the database and Perl scripts parse all the files uploaded inserting data in the database or directing files to the proper directory (Figure 4). A series of drop-down menus have been included in the interface as a way of minimize user typos and decrease errors in data upload. Data retrieval in PRODIS is also performed through the web interface. Data can be viewed as human-readable HTML or reports in printable format can be retrieved from the system.
MIAPE compliance
Several initiatives including those associated to the HUPO have discussed the importance of a set of standards for proteomics data publication. Therefore, MIAPE has been developed by the Proteomics Standards Initiative of HUPO (HUPO-PSI) as a set of guidelines representing the minimal information required to report and sufficiently support assessment and interpretation of a proteomics experiment [19]. PRODIS data upload functions includes all information demanded by MIAPE. The fields for uploading information required by MIAPE are spread throughout the several scripts of the interface. For submission to other databases or publication, these data can be exported by the user directly from the database in PDF format. An XML format exporter is being finalized.
Data access in PRODIS
Data access is controlled through a password protected system in which each screen for data upload and retrieval is validated. Users have different levels of permission, which allow the execution of tasks according to the user profile. This approach allows different projects to be registered in the system and use it without any risk of data sharing or leakage between them. PRODIS accomplishes this task through a comprehensive protection mechanism that associates users with the tasks they performed, and also tasks and users with managers that can oversee experiment order with ease and reliability. Each user in PRODIS is identified in the system through its login/password, and is given a set of permissions. Permission levels vary from guest, to coordinator. Guests can only access a limited set of experiment data, while coordinators can access and update all data associated with its projects. There are also permission levels associated with project membership. A researcher can access and update data from its own experiments, see data associated with experiments of the same project, even if inserted by other project members, and cannot see data that belongs to projects to which they do not belong. In this way it is possible to track experiments not only by storing all information related to the experiment but also by ensuring that each researcher has the appropriate permissions to execute the experiment, and by maintaining the information about who performed which experiment. By entering experiment information in the system all this information is automatically recorded, so researchers can easily identify which experiments they did, and analyze their results. Also, project managers can see which projects members executed which parts of the analysis and how their work progresses as the project is developed.