- Open Access
CellMiner: a relational database and query tool for the NCI-60 cancer cell lines
© Shankavaram et al; licensee BioMed Central Ltd. 2009
- Received: 16 January 2009
- Accepted: 23 June 2009
- Published: 23 June 2009
Advances in the high-throughput omic technologies have made it possible to profile cells in a large number of ways at the DNA, RNA, protein, chromosomal, functional, and pharmacological levels. A persistent problem is that some classes of molecular data are labeled with gene identifiers, others with transcript or protein identifiers, and still others with chromosomal locations. What has lagged behind is the ability to integrate the resulting data to uncover complex relationships and patterns. Those issues are reflected in full form by molecular profile data on the panel of 60 diverse human cancer cell lines (the NCI-60) used since 1990 by the U.S. National Cancer Institute to screen compounds for anticancer activity. To our knowledge, CellMiner is the first online database resource for integration of the diverse molecular types of NCI-60 and related meta data.
CellMiner enables scientists to perform advanced querying of molecular information on NCI-60 (and additional types) through a single web interface. CellMiner is a freely available tool that organizes and stores raw and normalized data that represent multiple types of molecular characterizations at the DNA, RNA, protein, and pharmacological levels. Annotations for each project, along with associated metadata on the samples and datasets, are stored in a MySQL database and linked to the molecular profile data. Data can be queried and downloaded along with comprehensive information on experimental and analytic methods for each data set. A Data Intersection tool allows selection of a list of genes (proteins) in common between two or more data sets and outputs the data for those genes (proteins) in the respective sets. In addition to its role as an integrative resource for the NCI-60, the CellMiner package also serves as a shell for incorporation of molecular profile data on other cell or tissue sample types.
CellMiner is a relational database tool for storing, querying, integrating, and downloading molecular profile data on the NCI-60 and other cancer cell types. More broadly, it provides a template to use in providing such functionality for other molecular profile data generated by academic institutions, public projects, or the private sector. CellMiner is available online at http://discover.nci.nih.gov/cellminer/.
- Query Tool
- Browser Window
- Developmental Therapeutics Program
- Simplify Molecular Input Line Entry Specification
Microarrays and other new high-throughput technologies of the past decade have made it possible to generate large molecular profile databases on clinical cancers and cultured cancer cells. Novel molecular subtypes of cancer (differing, for example, in mechanism of transformation, propensity to metastasize, and sensitivity to particular therapies) have been identified from such profiles . The most value, however, can be realized by integrating the various types of data. A number of concrete, biomedically interesting examples have supported the 'integromic hypothesis': i.e., that multiple types of molecular profiles on the same set of biological samples can be synergistic when combined [2–6]. To aid in the assembly, organization, integration, and querying of multiple molecular profile data sets on the same samples, we have developed CellMiner, a freely available, user-friendly, web-based resource. CellMiner currently focuses on two cancer cell line sets, the NCI-60 and the Du145/RC.01 pair.
The NCI-60 is a panel of 60 human cancer cell lines used by the Developmental Therapeutics Program (DTP) of the U.S. National Cancer Institute to screen > 100,000 compounds plus natural products since 1990 [7–10]. The NCI-60 panel includes cancers of colorectal, renal, ovarian, prostate, lung, breast, and central nervous system origin, as well as leukemias and melanomas. We and our many collaborators around the world have profiled the NCI-60 more comprehensively at the DNA, RNA, protein, mutation, functional, and pharmacological levels than any other set of cells in existence. The resulting data have been the subject of a large number of integromic analyses [5, 6, 10–12]. The limitations of cell lines as surrogates for clinical tumors are well known, but an advantage of the NCI-60 panel is the wealth of pharmacological data based on exposure of the cells to large numbers of drugs and other chemical compounds. Other advantages are that the cells can be obtained in unlimited amounts, that they are homogeneous in lineage, and that they can be manipulated easily (e.g., by gene transfer or RNA interference technologies). The information from them complements what is available from animal and clinical studies. The extensive profiling of the NCI-60 has been viewed as a forerunner of The Cancer Genome Atlas project, which is confined to a smaller set of characteristics (all of them at the nucleic acid level) but in the more difficult context of clinical cancers.
The NCI-60 data have been widely used in cancer research and bioinformatics , but the full utility of the multiple data sets is evident only when one integrates them to formulate complex 'biosignatures' or to understand the behaviour of pathways and systems within the cell. CellMiner provides bioinformatic 'glue' that binds the various data sets together and make them fluently interoperable. It complements database developments by the NCI, DTP but with a particular emphasis on data queries and integration of different molecular data types. It incorporates both raw and processed data, as well as metadata on cells, experiments, and platforms. It therefore provides the casual user with the resources needed to analyze relationships among cell and data types without going through the often-painful task of pre-processing the data. For example, data pre-processed using the MAS5, RMA, and GCRMA algorithms are provided for the Affymetrix U95 and U133 chip-sets. The user can input a list of genes, chromosome locations, whole-genome locations, or platform-specific identifiers to query or download the relevant data or identify the intersection of multiple data sets. For those who want to dig deeper or check the quality of data for particular genes, cells, or tested compounds, CellMiner provides the raw data (e.g., Affymetrix CEL files). It also provides connections between the experimental data and key attributes of the genes, including all associated Genbank accession numbers, Refseq accession numbers, chromosome numbers, and chromosomal locations. Similarly, the drug database includes NSC (National Service Center) numbers, CIS (Chemical Information System) numbers, and chemical structure information whenever possible. CellMiner currently incorporates 15 data sets, and more are being added on a continuing basis.
Local data repositories
Description of the datasets included in the current version of CellMiner. More will be added on a continuing basis.
DNA copy number changes from bacterial artificial chromosome array
Bussey et al., 2006
DNA sequencing data on mutations on 24 human cancer genes
Ikediobi et al, 2006
Methylation of E-cadherin promoter
PCR amplification and sequencing of sodium bisulfite modified DNA
Reinhold et al, 2007
cDNA clone microarray with 9,607 features
Affymetrix 6,800-feature microarray
Shankavaram et al., 2007
Affymetrix 64,000-feature microarray
Shankavaram et al., 2007
Affymetrix 44,000-feature microarray
Shankavaram et al., 2007
RT-PCR data on 47 ABC transporters
Szakacs, et al., 2004
632-feature 70-mer oligo microarray
Huang et al., 2004
Microarray with 612 ESTs plus another set of 616 ESTs chosen on the basis of their known roles in cancer lymphoid biology
Amundson et al., 2008
627 human microRNA probes, including 321 mature microRNAs, as well as probes for most of their precursors.
Blower et al., 2007
Reverse phase antibody lysate array with detection using 156 monoclonal antibodies
The "mechanism of action" set with 6 compound classes. The list of compounds was assembled for an earlier study as training set for neural network analysis of drug mechanism of action.
Weinstein et al., 1992 
Combination of A118 and A1400 selected from > 70,000 tested, publicly available compounds by applying a series of filters (see text for description)
Selected compounds tested in the NCI DTP's sulforhodamine B assay two or more times and for which structure records are available.
Blower et al., 2002
Job execution and display of results
Summary of search functions and criteria available in the CellMiner resource.
NCI-60 (59 cell lines)
Dataset selection (molecular type)
User select cell line criteria
All/tissue type selection
All/tissue type selection
All/tissue type selection
User select identifier type
gene or platform specific id, chromosome or genomic location
NSC, Chemical name, Molecular formula
User select identifier list
File attachment, list, single value
File attachment, list, single value
Output data fields
Information on Patient, cell line, Experimental details
Quantification of image files
HUGO, Entrez Gene id, Gene Symbol, Chromosome, Cytoband, mRNA-Refseq, Protein-Refseq, Transcription start and Transcription-end
HUGO, Entrez Gene id, Gene Symbol, Chromosome, Cytoband, mRNA-Refseq, Protein-Refseq, Transcription start and Transcription-end
Chemical name, SMILES, molecular formula, molecular weight, mechanism of action
HUGO, Zygosity, CDS mutation, AA mutation, mutation characterization
HTML table of cell line information
Text of log2 intensity values
HTML, MS-Excel, text file of log2 intensity values
Text file of log2 intensity values for each the matching datasets
HTML, MS-Excel, Text
The setup of the query is defined according to the parameters selected by the user (Table 2). Example scenarios for each function are described below.
CellMiner provides both raw and normalized data to download. The raw data are stored in a repository as compressed files of the appropriate type. For example, Affymetrix arrays are stored as probe-level CEL files, which can be downloaded as zip compressed files onto local computers.
Normalized data sets were obtained by applying appropriate statistical methods to the raw data, using pre-processing procedures described in CellMiner in the data set metadata section. The exact form of the data depends on the type. For example, transcript expression levels were log2-transformed to provide a convenient basis for queries and for integration with other data types. The choice of log-transformation was dictated by the distributional properties and error structures of most hybridization-based expression data sets. The main sample table, which is linked to the gene annotation table, holds the unique identifier for each data set in the repository. Results are obtained as downloadable text files. The results page provides the experiment name, gene symbol for each probe identifier, and log2 expression data for all of the cell lines or cell lines selected by the user.
The user can access detailed information on the project that produced a data set. Included are entries on the microarray (or other technology) platform and collaborators, as well as a link to the primary publication(s). A file containing a description of the data set and the normalization procedure in publication-level detail is also included for each data set download.
Querying data sets
Retrieve entire experiments as the result of complex queries (as shown in Figure 3).
Retrieve particular subsets of data as the result of more complex queries (e.g., a collection of data for a gene of interest across multiple platforms, as illustrated in Figure 4).
Retrieve data in HTML, tab-delimited, or Microsoft Excel format for storage in a local database or for analyses on the user's computer.
CellMiner data search is performed in two steps. First, the user selects input criteria and second, output options from an extensive list of possibilities provided (Figure 3). Download requests are processed in the background, and when they are complete, a link to the requested data files is provided in a new browser window.
We and our collaborators have used the cell line data in a number of biological and pharmacological contexts. To cite recent examples, we have used the data (i) to identify drugs ("MDR1-inverse") that, paradoxically, are more potent in cell that express the multi-drug resistance gene MDR1 , (ii) to identify possible molecular target relationships for the drug Aminoflavone , and (iii) to identify asparagines synthetase expression as a potential biomarker for use of the enzyme-drug L-asparaginase for treatment of ovarian or other solid tumors [12, 22]. Earlier, global analysis of the pharmacological data provided information critical to the go-no go decision for clinical development of oxaliplatin, now a standard agent for treatment of primary and recurrent colorectal cancer. To maximize the utility and value of the data by providing a framework for data integration, it is critical to identify subsets of genes for which information is available at the DNA, RNA and protein level. The intersection resource of CellMiner finds the genes (proteins) that are common to two or more datasets and outputs the data for those genes (proteins) in the respective sets.
Querying drug data
All public drug data from the NCI-60 screen are available at the DTP website http://dtp.nci.nih.gov/. In CellMiner, we currently include three smaller, curated sets presented as the negative log2 of the 50% growth inhibitory concentration (GI50). Those datasets have been used frequently in publications by the Genomics & Bioinformatics Group, as well as by other laboratories: (i) A118: the so-called "mechanism of action" compounds. This data set was assembled for an earlier study in which mechanisms of action were predicted using neural networks ; (ii) A1429: a 1429-compound combination of the A118 set and additional compounds selected from the DTP's overall database of publicly available compounds by applying a series of quality-control filters . Selection was based on the number of times a compound had been tested, the number of missing values, and the number of cell lines for which GI50 values fell within the range of concentrations tested; (iii) A4444: chemically defined, tested compounds with known 2D structures . The curated data sets were included in CellMiner to associate patterns of potency in the screen with molecular structures of the compounds and molecular characteristics of the cells.
The query page for drug data is similar to that for a gene query in terms of input and output. For a drug data query, the user first selects a compound data set and a tissue type (or all cells), then submits a list of compounds in terms of any of the following identifiers: NSC number, chemical name, molecular formula, or a molecular weight range (specified as low: high). The following options can be specified for inclusion in the output: chemical name, Simplified Molecular Input Line Entry Specification (SMILES) representation, molecular formula, molecular weight, and/or mechanism of action of the compound if available. The output can be in any of the available format types (i.e., HTML, text, or Excel). Download requests are processed in the background. When the download is complete, a link to the requested data files is provided in a new browser window.
Query mutation data
Because mutation data differ in format from expression data, they are queried in CellMiner from a different menu. The mutation data on almost all exons and exon-intron splice junctions of 24 cancer-related genes were obtained by re-sequencing, in collaboration with researchers at the Wellcome Trust Sanger Institute http://www.sanger.ac.uk/. For those studies, PCR primers were designed to amplify the exons and flanking intronic sequences of 24 cancer genes.
A variety of database tools are currently available to facilitate the integration of multiple datasets on cell lines. Oncomine  and GeneX  are two such user-friendly tools for storage and analysis of datasets collected from the literature or submitted by individual users. However, those tools do not support open-source architecture and are limited to gene expression data.
Cell line collections are made available in resources like the American Tissue Cell Culture (ATCC) http://www.atcc.org, European Collection of Cell Cultures (ECACC) http://www.hpacultures.org.uk/collections/ecacc.jsp and European Searchable Tumour Line Database (ESTDAB) . The ATCC and ECACC databases are large collection of cell lines and metadata associated with them. ESTDAB is an open-source, online collection of immunologically characterized tumour cell in a database that holds deep information on immunological markers but is limited largely to melanoma cancer cells lines. Those resources are very different from CellMiner in that they lack the molecular profiling data on the cell lines. CellMiner provides a data integration resource that includes multiple data types, platforms and cell lines from nine diverse cancer types.
Cell Miner is an evolving application that provides a one-stop resource for molecular and pharmacological profile data on the widely studied NCI-60 cancer cell panel. Also included currently (in part to provide a template for inclusion of data on cell types beyond the NCI-60) are prostate line DU145 and its topoisomerase 1-resistant derivative RC0.1. Apart from providing a wide selection of queries for integrating expression data with gene annotations, CellMiner offers metadata on the cell lines, the profiling platforms, and the profile data sets. CellMiner is thus a practical resource that provides a data repository, query capability, and assistance in data integration. It is tuned to systems-oriented, integromic analyses, as well as to querying of particular molecules or cell types. A frequent application of the latter type arises from the scenario in which the user wants to find a cell type (or cell types) with particular molecular features (e.g., p53 mutation, PTEN wild-type, MDR1-expressing) as the basis for classical hypothesis-driven experiments (e.g., siRNA knock-down, oncogene transfection, pharmacological sensitivity). To enhance the utility of CellMiner, we are continuing to add new features and databases beyond those currently included.
Project name: CellMiner, a repository for raw and preprocessed molecular data and a query tool for the NCI-60 cancer cell panel (and other cell types).
Project home page: http://discover.nci.nih.gov/cellminer/
Other server-side requirements: MySQL, Apache HTTP server
Restrictions to use: none
We are grateful to the many DTP staff members who make such studies possible. We particularly wish to remember the late Kenneth D. Paull for his pioneering work on analysis of NCI-60 data. We thank Susan Holbeck, Daniel Zaharevitz, Dominic Scudiero, Anne Monks, and Robert Shoemaker, as well as other DTP staff and contractors for their work on the screen and its data. We also thank the many collaborators who have worked with us to generate the repertoire of molecular profile databases currently in CellMiner. Principal collaborators are listed at http://discover.nci.nih.gov/cellminer/datasets.do. In anticipation, we thank the many other collaborators who have contributed, or will contribute, to data that will be added to CellMiner in the future.
- Chung CH, Bernard PS, Perou CM: Molecular portraits and the family tree of cancer. Nat Genet. 2002, 32 (Suppl): 533-540. 10.1038/ng1038.View ArticlePubMedGoogle Scholar
- Pommier Y, Weinstein JN, Aladjem MI, Kohn KW: Chk2 molecular interaction map and rationale for Chk2 inhibitors. Clin Cancer Res. 2006, 12 (9): 2657-2661. 10.1158/1078-0432.CCR-06-0743.View ArticlePubMedGoogle Scholar
- Weinstein JN: Integromic analysis of the NCI-60 cancer cell lines. Breast disease. 2004, 19: 11-22.PubMedGoogle Scholar
- Weinstein JN, Myers TG, O'Connor PM, Friend SH, Fornace AJ, Kohn KW, Fojo T, Bates SE, Rubinstein LV, Anderson NL, et al: An information-intensive approach to the molecular pharmacology of cancer. Science. 1997, 275 (5298): 343-349. 10.1126/science.275.5298.343.View ArticlePubMedGoogle Scholar
- Bussey KJ, Chin K, Lababidi S, Reimers M, Reinhold WC, Kuo WL, Gwadry F, Ajay , Kouros-Mehr H, Fridlyand J, et al: Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Mol Cancer Ther. 2006, 5 (4): 853-867. 10.1158/1535-7163.MCT-05-0155.PubMed CentralView ArticlePubMedGoogle Scholar
- Shankavaram UT, Reinhold WC, Nishizuka S, Major S, Morita D, Chary KK, Reimers MA, Scherf U, Kahn A, Dolginow D, et al: Transcript and protein expression profiles of the NCI-60 cancer cell panel: an integromic microarray study. Mol Cancer Ther. 2007, 6 (3): 820-832. 10.1158/1535-7163.MCT-06-0650.View ArticlePubMedGoogle Scholar
- Boyd DA, Cvitkovitch DG, Hamilton IR: Sequence, expression, and function of the gene for the nonphosphorylating, NADP-dependent glyceraldehyde-3-phosphate dehydrogenase of Streptococcus mutans. Journal of bacteriology. 1995, 177 (10): 2622-2627.PubMed CentralPubMedGoogle Scholar
- Holbeck SL: Update on NCI in vitro drug screen utilities. Eur J Cancer. 2004, 40 (6): 785-793. 10.1016/j.ejca.2003.11.022.View ArticlePubMedGoogle Scholar
- Shoemaker RH: The NCI60 human tumour cell line anticancer drug screen. Nature reviews. 2006, 6 (10): 813-823. 10.1038/nrc1951.PubMedGoogle Scholar
- Weinstein JN: Spotlight on molecular profiling: "Integromic" analysis of the NCI-60 cancer cell lines. Mol Cancer Ther. 2006, 5 (11): 2601-2605. 10.1158/1535-7163.MCT-06-0640.View ArticlePubMedGoogle Scholar
- Weinstein JN, Pommier Y: Transcriptomic analysis of the NCI-60 cancer cell lines. Comptes rendus biologies. 2003, 326 (10–11): 909-920. 10.1016/j.crvi.2003.08.005.View ArticlePubMedGoogle Scholar
- Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, et al: A gene expression database for the molecular pharmacology of cancer. Nat Genet. 2000, 24 (3): 236-244. 10.1038/73439.View ArticlePubMedGoogle Scholar
- Szakacs G, Annereau JP, Lababidi S, Shankavaram U, Arciello A, Bussey KJ, Reinhold W, Guo Y, Kruh GD, Reimers M, et al: Predicting drug sensitivity and resistance: profiling ABC transporter genes in cancer cells. Cancer Cell. 2004, 6 (2): 129-137. 10.1016/j.ccr.2004.06.026.View ArticlePubMedGoogle Scholar
- Nishizuka S, Charboneau L, Young L, Major S, Reinhold WC, Waltham M, Kouros-Mehr H, Bussey KJ, Lee JK, Espina V, et al: Proteomic profiling of the NCI-60 cancer cell lines using new high-density reverse-phase lysate microarrays. Proc Natl Acad Sci USA. 2003, 100 (24): 14229-14234. 10.1073/pnas.2331323100.PubMed CentralView ArticlePubMedGoogle Scholar
- Ikediobi ON, Davies H, Bignell G, Edkins S, Stevens C, O'Meara S, Santarius T, Avis T, Barthorpe S, Brackenbury L, et al: Mutation analysis of 24 known cancer genes in the NCI-60 cell line set. Mol Cancer Ther. 2006, 5 (11): 2606-2612. 10.1158/1535-7163.MCT-06-0433.PubMed CentralView ArticlePubMedGoogle Scholar
- Reinhold WC, Reimers MA, Maunakea AK, Kim S, Lababidi S, Scherf U, Shankavaram UT, Ziegler MS, Stewart C, Kouros-Mehr H, et al: Detailed DNA methylation profiles of the E-cadherin promoter in the NCI-60 cancer cells. Mol Cancer Ther. 2007, 6 (2): 391-403. 10.1158/1535-7163.MCT-06-0609.View ArticlePubMedGoogle Scholar
- Blower PE, Yang C, Fligner MA, Verducci JS, Yu L, Richman S, Weinstein JN: Pharmacogenomic analysis: correlating molecular substructure classes with microarray gene expression data. The pharmacogenomics journal. 2002, 2 (4): 259-271. 10.1038/sj.tpj.6500116.View ArticlePubMedGoogle Scholar
- Weinstein JN, Kohn KW, Grever MR, Viswanadhan VN, Rubinstein LV, Monks AP, Scudiero DA, Welch L, Koutsoukos AD, Chiausa AJ, et al: Neural computing in cancer drug development: predicting mechanism of action. Science. 1992, 258 (5081): 447-451. 10.1126/science.1411538.View ArticlePubMedGoogle Scholar
- Roschke AV, Tonon G, Gehlhaus KS, McTyre N, Bussey KJ, Lababidi S, Scudiero DA, Weinstein JN, Kirsch IR: Karyotypic complexity of the NCI-60 drug-screening panel. Cancer Res. 2003, 63 (24): 8634-8647.PubMedGoogle Scholar
- Zeeberg BR, Riss J, Kane DW, Bussey KJ, Uchio E, Linehan WM, Barrett JC, Weinstein JN: Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC bioinformatics. 2004, 5: 80-10.1186/1471-2105-5-80.PubMed CentralView ArticlePubMedGoogle Scholar
- Meng LH, Shankavaram U, Chen C, Agama K, Fu HQ, Gonzalez FJ, Weinstein J, Pommier Y: Activation of aminoflavone (NSC 686288) by a sulfotransferase is required for the antiproliferative effect of the drug and for induction of histone gamma-H2AX. Cancer Res. 2006, 66 (19): 9656-9664. 10.1158/0008-5472.CAN-06-0796.View ArticlePubMedGoogle Scholar
- Lorenzi PL, Reinhold WC, Rudelius M, Gunsior M, Shankavaram U, Bussey KJ, Scherf U, Eichler GS, Martin SE, Chin K, et al: Asparagine synthetase as a causal, predictive biomarker for L-asparaginase activity in ovarian cancer cells. Mol Cancer Ther. 2006, 5 (11): 2613-2623. 10.1158/1535-7163.MCT-06-0447.View ArticlePubMedGoogle Scholar
- Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM: ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia. 2004, 6 (1): 1-6.PubMed CentralView ArticlePubMedGoogle Scholar
- Mangalam H, Stewart J, Zhou K, et al: GeneX: An Open Source gene expression database and integrated tool set. IBM Systems Journal. 2001, 40: 552-569.View ArticleGoogle Scholar
- Pawelec G, Marsh SG: ESTDAB: a collection of immunologically characterised melanoma cell lines and searchable databank. Cancer Immunol Immunother. 2006, 55 (6): 623-627. 10.1007/s00262-005-0117-3.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.