iCOD : an integrated clinical omics database based on the systems-pathology view of disease
© Shimokawa et al. 2010
Published: 2 December 2010
Skip to main content
© Shimokawa et al. 2010
Published: 2 December 2010
Variety of information relating between genome and the pathological findings in disease will yield a wealth of clues to discover new function, the role of genes and pathways, and future medicine. In addition to molecular information such as gene expression and genome copy number, detailed clinical information is essential for such systematic omics analysis.
In order to provide a basic platform to realize a future medicine based on the integration of molecular and clinico-pathological information of disease, we have developed an integrated clinical omics database (iCOD) in which comprehensive disease information of the patients is collected, including not only molecular omics data such as CGH (Comparative Genomic Hybridization) and gene expression profiles but also comprehensive clinical information such as clinical manifestations, medical images (CT, X-ray, ultrasounds, etc), laboratory tests, drug histories, pathological findings and even life-style/environmental information. The iCOD is developed to combine the molecular and clinico-pathological information of the patients to provide the holistic understanding of the disease. Furthermore, we developed several kinds of integrated view maps of disease in the iCOD, which summarize the comprehensive patient data to provide the information for the interrelation between the molecular omics data and clinico-pathological findings as well as estimation for the disease pathways, such as three layer-linked disease map, disease pathway map, and pathome-genome map.
With these utilities, our iCOD aims to contribute to provide the omics basis of the disease as well as to promote the pathway-directed disease view. The iCOD database is available online, containing 140 patient cases of hepatocellular carcinoma, with raw data of each case as supplemental data set to download. The iCOD and supplemental data can be accessed at
Recent rapid advances in the human genomics and the subsequent “post-genomic” comprehensive molecular information collectively called “omics” [1, 2], such as transcriptome, proteome, metabolome, are bringing about a new possibility of medicine. Such application of molecular information to medicine has been so far called genomic medicine , where “personalized medical care” is aimed to be realized based on the inborn individual genomic differences or polymorphisms. Recently, however, post-genomic omics information, for example, gene expression profile (transcriptome) or cellular protein mass spectrometry (proteome) of diseased tissues has been found to be much more directly related to patient’s disease states; it is site-specific in the diseased area and changes through the progression of the disease, so that it can bring about more exact predictive information about the ongoing disease process.
Furthermore, inspired by the rise of the systems biology in the biological science, also in disease study, needs for the systems approach to understand a disease as an integrated whole have been widely recognized. Except for rare monogenetic diseases, most of the diseases can be considered as an integrated system where aberrations of molecular, tissue/organic and individual level are closely interrelated to produce clinical phenotype. We call this perspective “systems pathology” view of disease 
With these backgrounds, it becomes accepted that the interrelation between various omics information and clinico-pathological findings of disease is of crucial importance to be clarified in order to develop a new possibility of medicine, which we call “omics-based systems medicine”.
Cancer is now considered as systems dysfunction of cellular regulatory pathway which is caused by the combined effects of environmental/life-style related factor and genetic aberration such as somatic or germline mutations, SNPs, copy number alternation, epigenetic changes and so forth. For diagnosis and therapy of such diseases, not only the molecular information but also clinical, pathological and life-style information is indispensable. Without them, complex diseases such as cancer will not be able to be examined correctly [5–8]. There have been developed many cancer databases [9–11], each of which stores a variety of molecular information. However, more detailed clinical/environmental information in combination with the molecular information is needed to elucidate the whole process of the complex diseases such as cancer. We have first developed an integrated clinical omics database (iCOD), a basic platform where comprehensive disease information of the patient is collected. This database includes not only molecular omics data such as CGH (Comparative Genomic Hybridization) and gene expression profiles but also comprehensive clinical information such as clinical manifestations, medical images (CT, X-ray, ultrasounds, etc), laboratory tests, drug histories, pathological findings and even life-style/environmental information, and gene search menu, related to these clinical information. Furthermore, we developed several kinds of integrated view maps of disease in our iCOD, which summarize the comprehensive patient data to provide the information for the interrelation between the molecular omics data and clinico-pathological findings as well as estimation for the disease pathway, such as three layer-linked disease map, disease pathway map, and pathome-genome map. With these utilities, our iCOD aims to clarify the omics basis of the disease as well as to promote the pathway-directed disease view.
Recently, some pharmaceutical companies announced that they will open the genomic data focus on lung and gastric cancers to rapidly increase knowledge of disease and disease process. So, we can expect that the field of research based on such clinical/environmental information will develop with our iCOD.
Cancer Genome Anatomy Project (CGAP) ), The Cancer Genome Atlas (TCGA) , Cancer Genome Project (CGP) (http://www.sanger.ac.uk/genetics/CGP/) and Atlas of Genetics and Cytogenetics in Oncology and Haematology (AGCOH)  are related to our work. However, clinical information is usually only partially treated as Tissue information.
We also collected gene expression data, array CGH data with detailed pathological information of the sample tissue obtained from each patient. DNA and RNA were extracted from the surgical specimen, after laser capture microdissection which was conducted if required. All of the expression data in the database was obtained using Affymetrix HG-U133 plus 2.0 array as described previously . Array CGH analysis was performed as described in . We have so far collected comprehensive information about several kinds of cancer such as hepatocellular carcinoma, colon and oral cancer of more than 500 cases for its domestic version, of which internationally publicized database is now available online, containing 140 patient cases of hepatocellular carcinoma, which can be browsed at “Case Archive” section in database.
The iCOD was made on the PostgreSQL Database system. This database is capable of storing and handling these clinical/omics data by using 2 dimensional 3 layered (2D-3L) map. The 2D-3L program script is running on the Apache-Tomcat web server.
The back end data analysis programs were written by Java-servlet R statistical software which are available upon request.
The iCOD provides users a convenient search engine to query keyword related to pathological/clinical findings and patient ID stored in the database. To search the individual patient cases in the database satisfying the conditions, enter key terms of the query in the “Search” box in the section “Case Archive”.
“Clinical omics data analysis” provides various maps to observe the interrelation (correlation) between clinico-pathological phenotype and gene expression using multivariate statistical analysis applied to the molecular and clinico-pathological information of the patients.
Click on the “Clinical Omics Data Analysis” button from the top page. The user will be able to choose two different analysis methods, which are 2 dimensional 3 layered (2D-3L) map and Pathome-Genome map (CCA).
In pathological layer and clinical layer, each plot represents patient position in the corresponding 2 principal components coordinate system. By selecting a patient in a certain layer, the 2D-3L map draws connecting line between corresponding points of different layers of the same patient, by which the user can intuitively understand the relationship among different layers of an individual patient. The user can choose multiple patient points at the same time, and the selected patients are shown in the data list; this can be operated by specifying the region including the entire designated patient in the layer with a simple mouse operation.
This map has a parameter setting function for a customized analysis. To use this function, the user only have to change detailed parameter values in three buttons “Data selection”, “Parameter setting” and “Display setting” at the head of the 2D-3L map page. In “Data selection” page, select the type of cancer you wish to analyze (only “Hepatocellular carcinoma” dataset is currently available in the international version with the other cancer datasets in preparation). In “Parameter setting” page, the user will be able to specify the any group of clinical/pathological items to be applied by principle component analysis to determine layer axis.In “Display setting” page, shapes and colours can be adjusted in accordance with various parameters; so that characteristics of a specific group of patients can be obtained by changing these factors for comparison.
Figure 4 shows the case study of the 2D-3L map. First, we can obtain overexpressed or suppressed gene list corresponding to the criterion "Portal vein/Hepatic vein invasion" from the molecular layer. In this case, we found MCM6 gene, DNA replication licensing factor. We are also able to confirm the relation between patient's recurrence and the size of tumor, corresponding to the above-mentioned criterion. Please look at the explanation of figure 4.
Our international version is available now, containing 140 patient cases of hepatocellular carcinoma. The number of cases are increasing and containing the other disease cases such as colon and oral cancer. We also plan to prepare retrieval page that displays the correspondence table of arbitrary gene and its p-values of all criterions used in this data base. We are preparing to accept clinical omics data from other public projects as a repository. We are also preparing to disclose our web based analysis tool for microarray called “Microarray Analysis Workflow”, used to build our database.
Many cancer related databases which stored a variety of molecular information have been developed, as described before. However, more detailed clinical/environmental information in combination with the molecular information is needed to elucidate the whole process of the complex diseases such as cancer. From this point of view, our iCOD is the first database which provides the comprehensive clinical, pathological and life-style information in addition to the molecular biological information as well as their estimated interrelation. The iCOD database is useful both for clinical researchers who intend to have knowledge about molecular basis of disease which could be used for diagnosis, therapy and prognosis of the diseases, and for molecular biologists who intend to know the function and phenotype of the molecular pathways and their interrelation through the knowledge in the cases of their dysfunction. Our subproject aims to develop the model disease database in the “omics” era which has a standardized database organization being able to cover the multi-hierarchical (from molecular to clinical level) information concerning the diseases.
We prepared the download page of raw gene expression data for users who want to analyze them with his/her own tool. The supplemental data can be found at http://omics.tmd.ac.jp/icod_pub_eng/download. Raw data files consist of raw gene expression data by Affymetrix.CEL binary format, and detailed clinical information of each case is stored by CSV text format.
KS drafted the manuscript. KS is responsible for achievement and the organization of the web design. KM calculated all p-values and canonical correlation analysis concerning hepatocellular carcinoma. SS and AH checked clinical information, and translated them into English. HM organized the molecular biology experiment. HT and HM provided advice and supervised the research group. All authors read and approved the final manuscript.
Development of iCOD has been conducted as a government-commissioned national project, under the direction of Information Center for Medical Sciences at Tokyo Medical and Dental University, in collaboration with National Cancer Center for providing additional cases, former RIKEN Genome Science Center, and Advanced Industrial Science and Technology for collaborating with the development of multi-hierarchical omics database scheme and sharing each other database (12). Our iCOD, which integrates omics information and other comprehensive clinico-pathological information, is also one of the subprojects of the “Integrated Database Project” funded by MEXT(the Ministry of Education, Culture, Sports, Science and Technology of Japan) with the purpose to integrate the databases in Japan in the field of life science directed by DBCLS (Database Center for Life Science). We thank the Hitachi Software company for supporting to develop this database, Tokyo Medical and Dental University Hospital for clinical collaboration. We thank Dr. Dana Ichinotsubo for checking this manuscript. This work was sponsored by MEXT, under the direction of Information Center for Medical Sciences at Tokyo Medical and Dental University.
This article has been published as part of BMC Genomics Volume 11 Supplement 4, 2010: Ninth International Conference on Bioinformatics (InCoB2010): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/11?issue=S4
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.