Figure 2 shows the key screenshot of our iCOD database. iCOD database has two different sections. One is “Case Archive”. The user can browse the patient data by directly viewing the case list or search for the patients having several specific features by use of retrieval function in the section. Another section is “Clinical Omics Data Analysis”. In this section, various kinds of analytical results about the interrelation between clinico-pathological findings and molecular omics or estimation of disease pathways can be seen through the web interface.
Viewing and searching the case in the database
The user can browse the patient list in the database by clicking "Display all case List" in the section of “Case Archive”. To see the details of specific cases, click “Show case information” button. The user will be able to examine further data of an individual patient such as clinical manifestations, medical images (CT, X-ray, ultrasounds, etc), laboratory tests, drug histories, and pathological findings as well as life style information. The case information items and their layered structures are listed in figure 1. The time axis diagram shows the kinds of data stored and their collected dates of each patient in detail (see figure 3).
The iCOD provides users a convenient search engine to query keyword related to pathological/clinical findings and patient ID stored in the database. To search the individual patient cases in the database satisfying the conditions, enter key terms of the query in the “Search” box in the section “Case Archive”.
Clinical omics data analysis
“Clinical omics data analysis” provides various maps to observe the interrelation (correlation) between clinico-pathological phenotype and gene expression using multivariate statistical analysis applied to the molecular and clinico-pathological information of the patients.
Click on the “Clinical Omics Data Analysis” button from the top page. The user will be able to choose two different analysis methods, which are 2 dimensional 3 layered (2D-3L) map and Pathome-Genome map (CCA).
The 2D-3L map consists of two types of views. The left side view shows the overview of the plot of each patient which provides the relative position of the patient’s information in each of the molecular, pathological and clinical layer. For each layer, principal component analysis (PCA) is used to create 2D map by summarizing the multivariate data into the first and the second principal component scores. The right side view shows the detailed data list in the each layer of the selected patients. Molecular layer displays the result of gene expression profile by a heatmap diagram. In this map, patients are grouped by user-specified criterion selected in the “Parameter Settings” diagram. The screenshot in the figure 4 displays the heatmap in which the criterion of existence or non-existence of “Portal vein/Hepatic vein invasion” is used to extract the differentially expressed genes; the most significant differentially expressed 100 genes are extracted by the user-specified criteria of p-value of Wilcoxon rank-sum test (see figure 5).
In pathological layer and clinical layer, each plot represents patient position in the corresponding 2 principal components coordinate system. By selecting a patient in a certain layer, the 2D-3L map draws connecting line between corresponding points of different layers of the same patient, by which the user can intuitively understand the relationship among different layers of an individual patient. The user can choose multiple patient points at the same time, and the selected patients are shown in the data list; this can be operated by specifying the region including the entire designated patient in the layer with a simple mouse operation.
This map has a parameter setting function for a customized analysis. To use this function, the user only have to change detailed parameter values in three buttons “Data selection”, “Parameter setting” and “Display setting” at the head of the 2D-3L map page. In “Data selection” page, select the type of cancer you wish to analyze (only “Hepatocellular carcinoma” dataset is currently available in the international version with the other cancer datasets in preparation). In “Parameter setting” page, the user will be able to specify the any group of clinical/pathological items to be applied by principle component analysis to determine layer axis.In “Display setting” page, shapes and colours can be adjusted in accordance with various parameters; so that characteristics of a specific group of patients can be obtained by changing these factors for comparison.
Figure 4 shows the case study of the 2D-3L map. First, we can obtain overexpressed or suppressed gene list corresponding to the criterion "Portal vein/Hepatic vein invasion" from the molecular layer. In this case, we found MCM6 gene, DNA replication licensing factor. We are also able to confirm the relation between patient's recurrence and the size of tumor, corresponding to the above-mentioned criterion. Please look at the explanation of figure 4.
Pathome-Genome map shows the relation between clinical/pathological information and gene expression, which was calculated by the regularized canonical correlation analysis (CCA) method (figure 6(a)). CCA is a generalized version of multiple regression analysis, and the associations between two groups of variables are obtained by maximizing the correlation coefficient between the linear combination of each group of variables. In Pathome-Genome map, CCA is used to analyze and visualize the correlation structure between clinico-pathological factors and genes. So, the user can understand the interrelation between two different kinds of data in a same two dimensional coordinates. As described in previous paragraph, the user can arbitrary select the type of cancer and specify the clinical/pathological items he wish to analyze. In this case, we examined what genes are related for a certain clinical items (AFP, Maximum Diameter, Portal vein/Hepatic vain invasion, and TNM). Figure 6(a) clearly shows the relation between Portal vein/Hepatic vain invasion and cell cycle related genes (CCNA2/B1, MAPK13, BUB, and CDC2).
Future development
Our international version is available now, containing 140 patient cases of hepatocellular carcinoma. The number of cases are increasing and containing the other disease cases such as colon and oral cancer. We also plan to prepare retrieval page that displays the correspondence table of arbitrary gene and its p-values of all criterions used in this data base. We are preparing to accept clinical omics data from other public projects as a repository. We are also preparing to disclose our web based analysis tool for microarray called “Microarray Analysis Workflow”, used to build our database.