An online database for brain disease research
© Higgs et al; licensee BioMed Central Ltd. 2006
Received: 02 February 2006
Accepted: 04 April 2006
Published: 04 April 2006
The Stanley Medical Research Institute online genomics database (SMRIDB) is a comprehensive web-based system for understanding the genetic effects of human brain disease (i.e. bipolar, schizophrenia, and depression). This database contains fully annotated clinical metadata and gene expression patterns generated within 12 controlled studies across 6 different microarray platforms.
A thorough collection of gene expression summaries are provided, inclusive of patient demographics, disease subclasses, regulated biological pathways, and functional classifications.
The combination of database content, structure, and query speed offers researchers an efficient tool for data mining of brain disease complete with information such as: cross-platform comparisons, biomarkers elucidation for target discovery, and lifestyle/demographic associations to brain diseases.
Brain disease studies based on experiments using genome-wide measurements with microarrays are traditionally challenging as compared to other disease areas. The biological results are often hindered by statistical issues of small sample sizes, small effect sizes, and patient-to-patient variability [1–3]. Also, clinical information for patients is typically sparse, such that unknown clinical covariates can either confound or confuse many of the gene expression patterns and trends, as opposed to the primary disease. Corrections using such clinical information can greatly improve inference in determining markers for disease, as well as elucidating patterns within the disease.
Technical problems in microarray data can also affect the analyses. Meaningful results are often limited by array platform-to-platform comparisons and overall organization/presentation of large data sets/results. Studies conducted on disparate platforms are inherently more difficult to analyze than those conducted on the same platform . Cross-platform comparisons present analysis challenges due to differences in scaling and sensitivity (to name a few) which introduce inconsistencies in reproducibility [5–8]. Large data sets and comprehensive results summaries present another challenge that requires good organization of both analytical and bioinformatics information (e.g. expression profiles, gene summary information, pathway diagrams, fold change value comparisons, etc.) into a user-friendly format to facilitate efficient data mining. A relational web-based tool that logically combines all of these factors can enhance researchers' ability to determine the underlying genomic patterns in brain disease.
The SMRIDB is an online data warehouse and analytical system designed to aid researchers in understanding the biological associations both between and within the brain disorders of schizophrenia, bipolar, and major depression. This open source database combines genomic patterns of brain disease with patient clinical metadata into a user-friendly query interface to enable efficient data mining for purposes of biomarker discovery and elucidating biological mechanisms of brain disease. The metadata includes a full summary of clinical history for each patient with hyperlinks to disease-level information, such that demographic- and lifestyle-associated effects can be determined as they relate to brain disorders. The genomic data has been compiled from 12 separate labs (identified as studies), each data set generated from brain tissue isolated from two controlled populations of 165 patients, diagnosed with one of the three brain disorders (plus unaffected control brain tissue). This genomic data has been generated across 6 separate human array platforms (Affymetrix: hgu133a, hgu133plus, hgu95av2, Agilent, Codelink, and cDNA custom array) providing patterns/trends and analytical inferences that are not limited by platform dependencies.
Construction and content
NCBI's Database for Annotation, Visualization and Integrated Discovery (DAVID 2.0) was used as the standard source for gene annotation information . The primary fields extracted from DAVID include: LocusLink, gene symbol, and gene summary. Additional annotations include gene product mappings to the Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology Consortium (GO) for pathway and GO terms/classes, respectively. For Affymetrix arrays, queries were based on the Affymetrix probe ID (AFFYID). For other arrays, the Genbank accessions (GENBANK) were used.
Individual study-level analysis
Patient demographic variables for all diseases
All Patient Variables
PMI > 30
Brain pH > 6.5
Smoking at Time of Death
Herpes simplex virus 1 OD
Herpes simplex virus 2 OD
Hervk 18 SNP
Hervk 18 Expression
Patient demographic variables for Bipolar patients
Bipolar Patient Variables
Bipolar Heavy Alcohol Use
Bipolar Heavy Drug Use
Bipolar Psychotic Feature
Bipolar Sudden Death
Bipolar Suicide Status
Bipolar Lifetime Antipsychotics >0
Bipolar Mood Stabilizer
Patient demographic variables for Schizophrenic patients
Schizophrenic Patient Variables
Schizophrenia Heavy Alcohol Use
Schizophrenia Heavy Drug Use
Schizophrenia Sudden Death
Schizophrenia Lifetime Antipsychotics >45,000
Patient demographic variables for Depressed patients
Depressed Patient Variables
Depression Heavy Alcohol Use
Depression Heavy Drug Use
Pathway/GO details page
Gene details page
To date, making comparisons across disparate gene expression platforms has been very difficult [5–8]. Chip manufacturing differences such as probe selection, processing protocols, and spot normalization algorithms contribute to variability that can distort mRNA transcript abundance measurements and introduce inconsistencies to hinder cross-platform comparisons. Some success has been demonstrated in reducing the problem to the most consistent sequence-verified gene annotations between two platforms (e.g. UniGene cluster membership) and examining correlations, ratio values, or gene calls, although sensitivity and global statistical inference of such approaches still remains a challenge [7, 10–12].
Utility and discussion
The user interface was constructed to enable intuitive navigating and efficient data mining. The main site contains the primary index for the database's 4 general segmented areas: Patients, Studies, Genes, and Analysis, each of which is a gateway to unique focus areas, with mutual associations between each, such as clinical information vs. genomics results and individual study content vs. cross-platform combined analyses. The Genes tab contains an open text search engine (with partial matches) to enable queries by gene, LocusLink, or pathway for any single or combined study results.
The intended users of the database include any genomics researchers facing the persistent challenges of sensitivity for biomarker discovery and cross-platform microarray comparisons. However, the content within the SMRIDB is primarily designed for biologists, clinical researchers, bioinformaticians, and scientist in the field of brain disease.
The size and scope of the SMRIDB makes it a unique contribution to genomics-based brain disease research. With combined gene expression profile summaries across 12 studies and 6 platforms, there is greater confidence in scientific findings such as biomarkers for disease, biological functional roles, and regulated pathways, as compared to results obtained from any one individual study.
The SMRIDB is a comprehensive data mining tool to enable researchers to elucidate the biological mechanisms of bipolar disorder, schizophrenia, and depression. A diverse patient population combine with data generated across six microarray platforms and 12 studies to provide robust results to enhance the understanding of brain disease.
Availability and requirements
The SMRIDB can be accessed at https://www.stanleygenomics.org. All users must register (name and email address) to obtain a username and password.
Postmortem brain tissue was donated by The Stanley Medical Research Institute's brain collection courtesy of Drs. Michael B. Knable, E. Fuller Torrey, Maree J. Webster, Serge Weis, and Robert H. Yolken.
- Pavlidis P, Noble WS: Analysis of strain and regional variation in gene expression in mouse brain. Genome Biology. 2001, 2 (10): RESEARCH0042-10.1186/gb-2001-2-10-research0042.PubMedPubMed CentralView ArticleGoogle Scholar
- Cho H, Lee JK: Bayesian hierarchical error model for analysis of gene expression data. Bioinformatics. 2001, 20 (13): 2016-25. 10.1093/bioinformatics/bth192.View ArticleGoogle Scholar
- Iacobas DA, Urban M, Iacobas S, Spray DC: Control and variability of gene expression in mouse brain and in a neuroblastoma cell line. Rom J Physiol. 2003, 39–40: 2002-71.Google Scholar
- Jurata LW, Bukhman YV, Charles V, Capriglione F, Bullard J, Lemire AL, Mohammed A, Pham Q, Laeng P, Brockman JA, Altar CA: Comparison of microarray-based mRMA profiling technologies for identification of psychiatric disease and drug signatures. J Neurosci Methods. 2004, 138 (1–2): 173-88. 10.1016/j.jneumeth.2004.04.002.PubMedView ArticleGoogle Scholar
- Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics. 2002, 18: 405-412. 10.1093/bioinformatics/18.3.405.PubMedView ArticleGoogle Scholar
- Li J, Pankratz M, Johnson JA: Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicological Sciences. 2002, 69: 383-390. 10.1093/toxsci/69.2.383.PubMedView ArticleGoogle Scholar
- Mah N, Thelin A, Lu T, Nikolaus S, Kuhbacher T, Gurbuz Y, Eickhoff H, Kloppel G, Lehrach H, Mellgard B, Costello CM, Schreiber S: A comparison of oligonucleotide and cDNA-based microarray systems. Physiol Genomics. 2004, 16: 361-370. 10.1152/physiolgenomics.00080.2003.PubMedView ArticleGoogle Scholar
- Tan P, Downey TJ, Spitznagel EL, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC: Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Research. 2003, 31: 5676-5684. 10.1093/nar/gkg763.PubMedPubMed CentralView ArticleGoogle Scholar
- DAVID (Database for Annotation and Visualization and Integrated Discovery). [http://apps1.niaid.nih.gov/david]
- Lee JK, Bussey KJ, Gwadry FG, Reinhold W, Riddick G, Pelletier SL, Nishizuka S, Szakacs G, Annereau JP, Shankavaram U, Lababidi S, Smith LH, Gottesman MM, Weinstein JN: Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI-60 cancer cells. Genome Biology. 2003, 4: R82-10.1186/gb-2003-4-12-r82.PubMedPubMed CentralView ArticleGoogle Scholar
- Mecham B, Klus GT, Strovel J, Augustus M, Byrne D, Bozso P, Wetmore DZ, Mariani TJ, Kohane IS, Szallasi Z: Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. Nucleic Acids Research. 2004, 32: 9-10.1093/nar/gnh071.View ArticleGoogle Scholar
- Nimgaonkar A, Sanoudou D, Butte AJ, Haslett JN, Kunkel LM, Beggs AH, Kohane IS: Reproducibility of gene expression across generations of Affymetrix Microarrays. BMC Bioinformatics. 2003, 4: 27-10.1186/1471-2105-4-27.PubMedPubMed CentralView ArticleGoogle Scholar