Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data
- Leandro Hermida†1Email author,
- Carine Poussin†1,
- Michael B Stadler2, 3, 4,
- Sylvain Gubian1,
- Alain Sewer1,
- Dimos Gaidatzis2, 3, 4,
- Hans-Rudolf Hotz2, 3, 4,
- Florian Martin1,
- Vincenzo Belcastro1,
- Stéphane Cano1,
- Manuel C Peitsch1 and
- Julia Hoeng1Email author
© Hermida et al.; licensee BioMed Central Ltd. 2013
Received: 22 October 2012
Accepted: 17 July 2013
Published: 29 July 2013
High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).
To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.
Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.
KeywordsGene expression Contrast data Gene set Gene set enrichment Omics Microarray Next-generation sequencing Reproducible research system Knowledge acquisition
The development and application of high-throughput technologies in biological research has presented researchers with unprecedented amounts of omics data. Management, analysis and interpretation of such data still pose significant challenges. A plethora of open-source software solutions (e.g. caArray , MARS , BASE , EMMA , MIMAS [5, 6], TM4 , MADMAX , MiMiR , ExpressionPlot  to name a few) are readily available for storage and management of raw and preprocessed high-throughput datasets and metadata. These solutions provide a data management platform to facilitate the beginning of the experimental data analysis process. Depending on the complexity of experimental designs, statistical analysis of high-throughput data can involve a number of sophisticated techniques and tools. In transcriptomics, the identification of differentially expressed genes when studying effect(s)/contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering, network building, etc.) leading to biological interpretation and mechanistic insights. While many research sites use systems to manage raw and processed data they still do not take advantage of a central downstream infrastructure to store and further exploit analyzed data in an integrated way. In this situation, the value of knowledge gained from analyzed data is restricted to the specific study in which these data were generated, whereas this knowledge could be leveraged during analysis of other studies. Even with the arrival of bioinformatics workflow management systems (e.g. Galaxy [11–13], GenePattern , Taverna ), which facilitate reproducible analyses, these systems by themselves do not provide the functionality necessary to centrally manage and further utilize analyzed data. Currently, no open-source and free software solutions of this kind exist to store, manage and leverage analyzed data and provide an integrated platform for downstream computational analysis, knowledge acquisition and integration leading to new experimental hypothesis generation. Integration of tools and development of such platforms are important to assemble a systems biology computational workflow supporting interpretation of complex biological data .
Here we present an innovative and extensible solution to store and exploit analyzed omics data for the purpose of knowledge acquisition and biological interpretation. Confero enables research sites to store and manage analyzed contrast datasets and identifier (ID) lists of interest (e.g. gene lists extracted from research papers, diagnostic gene signatures), automatically compute and store gene sets from these contrast data and ID lists, and analyze data to support biological interpretation. Confero includes a local database for storage and management of data and metadata as well as tools for downstream computational analysis and biological interpretation, including gene set enrichment analysis (GSEA) and over-representation analysis (ORA) . The Confero Functional Enrichment Analysis module includes specialized tools to facilitate and accelerate enrichment/over-representation analysis and extraction and interpretation of results.
A contrast corresponds to a quantitative estimate of the differential effect between treatment and reference conditions or more generally as defined by a contrast matrix . Linear models are generally used to estimate the coefficient(s) related to the contrast(s). The estimation of contrast data including differential expression (e.g. most often corresponding to coefficient(s) of the linear model), t-statistics and p-value can be computed for any entity (gene, protein, probe set, transcript, microRNA, etc.) using the Bioconductor limma or samr packages or other statistical analysis methods [21, 22].
Advantages and strengths of platform
The Confero platform can serve a variety of different research areas, including biomarker and drug discovery, diagnostics, clinical research, consumer products (e.g. nutrition) or any area that performs omics experiments and analyses. Incorporation of such a platform into a research site’s analysis workflow provides a number of advantages, including that Confero:
Is open-source, freely installable, customizable, and easily integrates into the Galaxy bioinformatics workflow management system
Stores, manages, and leverages analyzed omics data
Enables traceable and reproducible data analysis
Compiles new biological knowledge (extraction of gene sets from contrast data and population of Confero gene set database) that can be exported and easily shared
Leverages compiled biological knowledge to analyze (e.g. GSEA or ORA) and support biological interpretation of new contrast data
Integrates public sources of a priori biological knowledge (e.g. MSigDB, GeneSigDB)
Enables dataset comparison, i.e. systems (in-vitro vs. in-vivo), organisms (human vs. mouse), treatments (interleukin 1 (IL-1) vs. tumor necrosis factor (TNF)) in a platform-independent manner
Enables further downstream data mining and meta-analysis of compiled contrast and gene set data, e.g. biomarker discovery, iterative gene set refinement
The Confero platform currently supports two types of data input. The first type corresponds to the contrast data resulting from statistical analysis (e.g. microarray or NGS RNA-seq gene expression, microRNA expression, etc.) and the second type can be a simple list of identifiers (e.g. probe set, gene, microRNA, gene symbols, etc.) processed and imported into the Confero database as external gene sets.
idMAPS file format for contrast data
To support any type of statistical analysis approach, it was necessary to devise a comprehensive yet straightforward file format to represent statistical analysis results. In addition, as Confero requires and leverages various important metadata describing input datasets, it was also necessary that the file format support encoding and passing of such metadata from the upstream workflow in a comprehensive yet independent manner. For this purpose, the idMAPS file format was designed to represent the statistical analysis results of omics experiments including experimental and analysis metadata in fields present in the header of the idMAPS file (e.g. dataset name and description, contrast names, source ID type, etc.). A utility Confero Galaxy tool, Convert LIMMA/SAMR Object (R object imported via the Upload LIMMA/SAM R Object tool) is provided to convert a Bioconductor limma or samr R object into an idMAPS file with appropriate header information which can then be used as input for the Confero Galaxy Submit Contrast Dataset tool. During the idMAPS file import, metadata are parsed and stored in the Confero database together with contrast data and gene sets. An example of the idMAPS file format is shown in Additional file 2.
Identifier list file format for external gene sets
In addition to supporting contrast data input, Confero also accepts identifier (ID) lists. An example of an ID list is shown in Additional file 3. This simple file format is a single data column of source IDs with the same Confero metadata file header as in the idMAPS data format.
Data import, processing and storage
As shown in Figure 3, idMAPS contrast datasets and ID lists are submitted for processing and loading into the Confero database using the Confero Galaxy submission tools Submit Contrast Dataset and Submit Gene Set, respectively, or via the Confero application programming interface (API). Confero utilizes a comprehensive and robust idMAPS and ID list parser and data integrity checker which, during data processing and submission, will notify users of any problems with their input file.
Input data identifier (ID) mapping and collapsing methodology
Input idMAPS and ID list data files can use a variety of different source ID types, such as Affymetrix probe set IDs, HUGO gene symbols, and Entrez Gene IDs. To compute gene sets from such data, Confero uses the latest NCBI Entrez Gene  annotations to map data to a single gene-centric ID space. For this purpose, a novel and robust ID mapping and collapsing algorithm was developed and includes the following features:
Source ID-to-multiple Entrez Gene ID mappings are fully supported and handled robustly
Entrez Gene RefSeq status information is leveraged to determine best mapping genes
Gene symbol synonyms are supported and properly mapped
Multiple available collapsing strategies
Summary report of procedure is generated and stored in Confero database along with each dataset and viewable via the Confero web application
A detailed flowchart describing the Confero ID mapping and collapsing algorithm is shown in Additional file 4: Figure S1.
Gene set extraction methodology
As prior biological knowledge, a gene set is information commonly utilized to assess enrichment (e.g. GSEA or ORA) of co-regulated genes representative of a specific biological process, pathway, chromosomal location, etc. In the context of contrast data, a gene set corresponds to a set of genes characteristic of an effect of interest. During the Confero submission process, once input data files have completed the ID mapping and collapsing procedure, the Entrez Gene ID-based processed data undergo a novel and robust procedure to extract and store gene sets. The Confero platform builds a gene set database from all imported data that is then leveraged by Confero tools.
Each contrast in a dataset has at least three gene sets automatically generated and named with the following suffixes: the UP (up-regulated genes), DN (down-regulated genes), and AR (all-regulated genes) gene sets. AR gene sets are a special type used to represent the global response of a system to the applied stimulus. The Confero platform provides the user complete and granular control over how each gene set is extracted. As shown in Additional files 2 and 3, special parameters can be provided in the idMAPS metadata header to override default behavior and specify to the algorithm exactly how to proceed. One can also specify to Confero not to create gene sets for a certain contrast (e.g. an intercept coefficient of a linear model) or even for an entire dataset. Different gene set extraction parameters can be specified for each contrast, such as minimum and maximum size thresholds, P (significance level, p-value, or false discovery rate (FDR)), A (average signal) and M (estimated effect of interest, e.g. log2 fold change) value thresholds, and even specific desired gene set sizes. A detailed flowchart describing the Confero gene set extraction algorithm is shown in Additional file 5: Figure S2.
Data management and export
The Confero platform provides an integrated web application to view and manage data and metadata in the Confero database. The web application operates independently of Galaxy but for convenience Confero also embeds it into the Galaxy user interface as the View and Manage Data tool. The web application also allows users to export source data, processed data, generated gene sets and data processing reports via the user interface or via the Confero API. Confero also provides an Extract Gene Set Matrix Galaxy tool to generate and export boolean gene set-to-gene membership matrices and an Extract Gene Set Overlap Matrix tool to extract gene set-to-gene set overlap matrices (i.e. number/percentage of shared genes between two gene sets).
Functional enrichment analysis module for biological interpretation
Functional enrichment analysis is commonly used to support biological interpretation of gene expression data. The Confero platform currently supports: 1) over-representation analysis (ORA) and 2) gene set enrichment analysis (GSEA), a commonly used and powerful approach for biological data interpretation . An important advantage of GSEA is that full contrast data (e.g. genome-wide expression profiles) can be analyzed in a p-value threshold independent manner unlike other approaches such as ORA.
Both approaches require as inputs a gene list (partial list for ORA and full ranked list for GSEA) and a collection of gene sets used as a priori knowledge. A targeted choice of gene sets selected for analyses can provide insight to specific biological questions. We developed a fully integrated Functional Enrichment Analysis module which can seamlessly use Confero contrast data and gene sets (Figure 3; Additional file 1: Table S1). The code and reporting for ORA was developed internally. For GSEA, Confero uses the Broad Institute’s GSEA Java implementation and results reporting . Importantly, Confero dynamically customizes the GSEA results report and provides several tools to accelerate downstream analysis of results. The Functional Enrichment Analysis module includes the following tools:
Create Ranked or DEG Lists: gene lists can be easily generated from contrast data in the Confero database using the statistic (S, moderated t-statistic) or differential expression value (e.g. M, log2 fold change) data as the rank metric for GSEA, or using the significance level (P) for ORA (further leveraged to filter the gene list while using the Analyze Data functionality described below).
Analyze Data: ranked and DEG (p-value threshold defined by the user) lists can be analyzed for gene set enrichment/over-representation against dynamically definable Confero gene set collections using annotation filters as well as the latest MSigDB and GeneSigDB gene set collections. The selection of analysis algorithm (GSEA Preranked or ORA (Hypergeometric Test)) conditions the Galaxy menu displayed to choose specific parameters for the analysis.
Extract Leading Edge Matrix: leading edge matrices of various types can be extracted from a GSEA result; the leading edge matrix is comprised of GSEA leading edge genes (in rows) from all gene sets in the result (in columns) passing a specified FDR threshold with rank metric score, rank in list, or boolean membership values as the matrix fields.
Extract Results Matrix: a comprehensive results matrix with user selected output columns can be extracted from one or more functional enrichment analysis results.
In summary, the Confero Functional Enrichment Analysis module allows biologists to compare datasets in a contextual manner (e.g. by organism, cell/tissue type, stimulus type, experimental system, etc.) and to more efficiently identify underlying molecular mechanisms based on biological interpretation of results.
Results and discussion
Case study: estrogen bioconductor dataset
To provide an example of the application of the Confero platform, we have used the estrogen expression dataset available from the Bioconductor web site . In this 2×2 factorial experiment, MCF7 breast cancer cells were treated with estrogen for 10 or 48 hours. The experimental factors were as follows: “estrogen treatment” with conditions present or absent, and “time” also with two conditions 10 or 48 hours. Following extraction, RNA was hybridized on to Affymetrix HG_U95Av2 microarrays. The purpose of this study was to identify early and late biological processes driven by estrogen putative direct target genes for early response, while for later events the response might be driven by more downstream targets in the molecular pathway.
Galaxy integration and workflows
The Confero platform functions as a standalone system and, as shown in Figure 3, can also be fully integrated into the Galaxy reproducible research framework which allows sites to easily incorporate Confero into their existing analysis and knowledge acquisition workflows. Galaxy offers the possibility to chain as well as parallelize, in a customized and flexible way, Confero tools via Galaxy’s workflow framework. For example, the Confero Functional Enrichment Analysis module tools can be connected in a workflow to efficiently analyze data and extract results in parallel.
Public data update and confero database reprocessing
Vendor technology platform and public Entrez Gene information and annotations update frequently and since all Confero gene sets are computed utilizing this information over time the Confero gene set database would become stale and out-of-date. In addition, if a site chose to change Confero platform configuration parameters for data processing and/or gene set extraction it would be important that this change propagate not only to new data but to all existing data stored in the Confero database. An important and powerful management feature of the Confero software platform is the ability to automatically download and update all relevant and supported vendor technology platform and public gene information and then, using this information, fully reprocess all data contained in the Confero database utilizing current desired configuration parameters. Processed datasets are tagged with processing date and annotation build versions to ensure analysis reproducibility. This functionality is provided by a management program within the Confero distribution.
The Confero distribution currently supports Affymetrix, Illumina, and any NCBI GEO-derived microarray platform as well as all HUGO gene symbols. The management program automatically downloads source annotation files and Entrez Gene information to generate new mapping files. Adding support for other technology platforms (e.g. Exiqon, Nimblegen, etc.) is also possible. For NGS platforms, currently gene level data and analysis is supported.
Management features and application programming interface (API)
Confero provides a number of useful management functionalities neatly packaged as server-side programs for administrators and software platform managers. This includes programs to perform the following tasks:
Process and submit batches of contrast datasets or ID lists
Download, process, and load the latest vendor annotations and Entrez Gene information to generate new Confero source-to-Entrez Gene ID mapping files and fully reprocessing all Confero database data
Control the Confero embedded web server and application
Export the entire Confero gene set database with annotations
The API currently provides functions to export different data from the platform.
Comparison to existing software
The Confero platform provides a unique set of functionalities and differs from existing publicly available software in a number of key aspects. Currently, there are other systems, such as Cistrome  and the Genomic HyperBrowser , which also have Galaxy integration. In contrast with Confero these systems are focused on certain types of omics data types and are not provided as a software distribution that can be installed locally to work with private as well as public data. Other systems such as TM4 [7, 36], provide some similar analysis functionalities such as GSEA. Yet these systems are only all-in-one standalone solutions, not integrated with popular workflow management systems like Galaxy, and do not process imported data to build a database of a priori biological knowledge that is then leveraged by system tools. Finally, ArrayExpress and GEO, the two major public repositories, provide tools [37–41] to mine data across a subset of curated and preprocessed studies in their databases. However, such tools are clearly limited to what is available in their systems and cannot be locally installed to work with private data. In addition, many of these systems start with raw/preprocessed data and perform automated statistical analysis, which in Confero a deliberate design choice was made to allow each site the freedom to have customized statistical analysis approaches appropriate for each experimental design and relevant biological questions.
Storage and exploitation of analyzed omics data is a crucial component of a research site’s analysis workflow as it enables acquisition of new biological knowledge which facilitates interpretation of data. However, these needs are not met by the current open-source solutions freely available to researchers. The Confero platform has been built to provide an innovative, flexible and extensible solution to store and leverage analyzed data and build new a priori biological knowledge. In addition, Confero enables cross-platform comparison of omics data in a traceable and reproducible manner.
While GSEA or ORA provide a useful global overview of the perturbed biological processes occurring during an experiment, there are a number of potential methods that can further utilize Confero data to gain a more detailed understanding of underlying mechanisms. We are currently developing additional integrated tools in the following areas:
Clustering module to group Confero data (e.g. gene sets, genes, etc.) in order to highlight patterns of co-regulation in experimental data
Visualization module to provide integrated and easy-to-use plotting functions (e.g. volcano plots) on Confero results
HomoloGene infrastructure and integration to provide cleaner species-to-species translation during gene set enrichment analysis
Incorporation of additional analysis tools to provide complementary approaches to GSEA for biological interpretation
Availability and requirements
An overview of the Confero platform software architecture is shown in Additional file 9: Figure S3. Confero is written using the Perl  programming language and requires Perl 5.12 or higher. To use the Convert LIMMA/SAMR Object tool, an R version and the Bioconductor limma and samr packages should be installed. The Confero platform requires a backend database and MySQL  is currently supported. MySQL 5.0 or higher (5.1 and 5.5) have been fully tested and are being used in production installations. Confero can be installed on any UNIX like operating system (e.g. BSD, Linux, etc.) and has been fully tested and used in production using the Linux operating system. The documentation for the Confero command line interface (CLI) is provided as Additional file 10 and is also available on SourceForge (http://sourceforge.net/projects/confero/).
Application programming interface
Comprehensive perl archive network
Database management system
False discovery rate
Gene expression omnibus
Gene set enrichment analysis
Differentially expressed genes
Command line interface.
The authors would like to thank Dean Flanders for his assistance and support during project development at the Friedrich Miescher Institute.
- Workspace cSP: The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Stud Health Technol Inform. 2007, 129: 330-334.Google Scholar
- Maurer M, Molidor R, Sturn A, Hartler J, Hackl H, Stocker G, Prokesch A, Scheideler M, Trajanoski Z: MARS: microarray analysis, retrieval, and storage system. BMC Bioinforma. 2005, 6: 101-10.1186/1471-2105-6-101.View ArticleGoogle Scholar
- Vallon-Christersson J, Nordborg N, Svensson M, Hakkinen J: BASE–2nd generation software for microarray data management and analysis. BMC Bioinforma. 2009, 10: 330-10.1186/1471-2105-10-330.View ArticleGoogle Scholar
- Dondrup M, Albaum SP, Griebel T, Henckel K, Junemann S, Kahlke T, Kleindt CK, Kuster H, Linke B, Mertens D: EMMA 2–a MAGE-compliant system for the collaborative analysis and integration of microarray data. BMC Bioinforma. 2009, 10: 50-10.1186/1471-2105-10-50.View ArticleGoogle Scholar
- Gattiker A, Hermida L, Liechti R, Xenarios I, Collin O, Rougemont J, Primig M: MIMAS 3.0 is a Multiomics Information Management and Annotation System. BMC Bioinforma. 2009, 10: 151-10.1186/1471-2105-10-151.View ArticleGoogle Scholar
- Hermida L, Schaad O, Demougin P, Descombes P, Primig M: MIMAS: an innovative tool for network-based high density oligonucleotide microarray data management and annotation. BMC Bioinforma. 2006, 7: 190-10.1186/1471-2105-7-190.View ArticleGoogle Scholar
- Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li J, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. Methods Enzymol. 2006, 411: 134-193.View ArticlePubMedGoogle Scholar
- Lin K, Kools H, de Groot PJ, Gavai AK, Basnet RK, Cheng F, Wu J, Wang X, Lommen A, Hooiveld GJ: MADMAX - Management and analysis database for multiple omics experiments. J Integr Bioinform. 2011, 8: 160-PubMedGoogle Scholar
- Tomlinson C, Thimma M, Alexandrakis S, Castillo T, Dennis JL, Brooks A, Bradley T, Turnbull C, Blaveri E, Barton G: MiMiR–an integrated platform for microarray data sharing, mining and analysis. BMC Bioinforma. 2008, 9: 379-10.1186/1471-2105-9-379.View ArticleGoogle Scholar
- Friedman BA, Maniatis T: ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data. Genome Biol. 2011, 12: R69-10.1186/gb-2011-12-7-r69.PubMed CentralView ArticlePubMedGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J, Galaxy T: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11: R86-10.1186/gb-2010-11-8-r86.PubMed CentralView ArticlePubMedGoogle Scholar
- Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. 2010, Chapter 19: Unit 19 10 11-21.Google Scholar
- Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J: Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005, 15: 1451-1455. 10.1101/gr.4086505.PubMed CentralView ArticlePubMedGoogle Scholar
- Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP: GenePattern 2.0. Nat Genet. 2006, 38: 500-501. 10.1038/ng0506-500.View ArticlePubMedGoogle Scholar
- Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 2006, 34: W729-W732. 10.1093/nar/gkl320.PubMed CentralView ArticlePubMedGoogle Scholar
- Ghosh S, Matsuoka Y, Asai Y, Hsin KY, Kitano H: Software for systems biology: from tools to integrated platforms. Nat Rev Genet. 2011, 12: 821-832.PubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
- Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: Article3-PubMedGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMed CentralView ArticlePubMedGoogle Scholar
- Mansourian R, Mutch DM, Antille N, Aubert J, Fogel P, Le Goff JM, Moulin J, Petrov A, Rytz A, Voegel JJ, Roberts MA: The Global Error Assessment (GEA) model for the selection of differentially expressed genes in microarray data. Bioinformatics. 2004, 20: 2726-2737. 10.1093/bioinformatics/bth319.View ArticlePubMedGoogle Scholar
- Kim SY, Lee JW, Sohn IS: Comparison of various statistical methods for identifying differential gene expression in replicated microarray data. Stat Methods Med Res. 2006, 15: 3-20. 10.1191/0962280206sm423oa.View ArticlePubMedGoogle Scholar
- Culhane AC, Schroder MS, Sultana R, Picard SC, Martinelli EN, Kelly C, Haibe-Kains B, Kapushesky M, St Pierre AA, Flahive W: GeneSigDB: a manually curated database and resource for analysis of gene expression signatures. Nucleic Acids Res. 2012, 40: D1060-D1066. 10.1093/nar/gkr901.PubMed CentralView ArticlePubMedGoogle Scholar
- Culhane AC, Schwarzl T, Sultana R, Picard KC, Picard SC, Lu TH, Franklin KR, French SJ, Papenhausen G, Correll M, Quackenbush J: GeneSigDB–a curated database of gene expression signatures. Nucleic Acids Res. 2010, 38: D716-D725. 10.1093/nar/gkp1015.PubMed CentralView ArticlePubMedGoogle Scholar
- Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y: Improving gene set analysis of microarray data by SAM-GS. BMC Bioinforma. 2007, 8: 242-10.1186/1471-2105-8-242.View ArticleGoogle Scholar
- Kupershmidt I, Su QJ, Grewal A, Sundaresh S, Halperin I, Flynn J, Shekar M, Wang H, Park J, Cui W: Ontology-based meta-analysis of global collections of high-throughput public data. PLoS One. 2010, 5 (9): e13066-10.1371/journal.pone.0013066.PubMed CentralView ArticlePubMedGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2011, 39: D52-D57. 10.1093/nar/gkq1237.PubMed CentralView ArticlePubMedGoogle Scholar
- Bioconductor estrogen dataset.http://www.bioconductor.org/packages/release/data/experiment/html/estrogen.html,
- Gadal F, Starzec A, Bozic C, Pillot-Brochet C, Malinge S, Ozanne V, Vicenzi J, Buffat L, Perret G, Iris F, Crepin M: Integrative analysis of gene expression patterns predicts specific modulations of defined cell functions by estrogen and tamoxifen in MCF7 breast cancer cells. J Mol Endocrinol. 2005, 34: 61-75. 10.1677/jme.1.01631.View ArticlePubMedGoogle Scholar
- Frasor J, Danes JM, Komm B, Chang KC, Lyttle CR, Katzenellenbogen BS: Profiling of estrogen up- and down-regulated gene expression in human breast cancer cells: insights into gene networks and pathways underlying estrogenic control of proliferation and cell phenotype. Endocrinology. 2003, 144: 4562-4574. 10.1210/en.2003-0567.View ArticlePubMedGoogle Scholar
- Martinez-Diez M, Santamaria G, Ortega AD, Cuezva JM: Biogenesis and dynamics of mitochondria during the cell cycle: significance of 3'UTRs. PLoS One. 2006, 1: e107-10.1371/journal.pone.0000107.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen JQ, Delannoy M, Cooke C, Yager JD: Mitochondrial localization of ERalpha and ERbeta in human MCF7 cells. Am J Physiol Endocrinol Metab. 2004, 286: E1011-E1022. 10.1152/ajpendo.00508.2003.View ArticlePubMedGoogle Scholar
- Chen J, Delannoy M, Odwin S, He P, Trush MA, Yager JD: Enhanced mitochondrial gene transcript, ATP, bcl-2 protein levels, and altered glutathione distribution in ethinyl estradiol-treated cultured female rat hepatocytes. Toxicol Sci. 2003, 75: 271-278. 10.1093/toxsci/kfg183.View ArticlePubMedGoogle Scholar
- Liu T, Ortiz JA, Taing L, Meyer CA, Lee B, Zhang Y, Shin H, Wong SS, Ma J, Lei Y: Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 2011, 12: R83-10.1186/gb-2011-12-8-r83.PubMed CentralView ArticlePubMedGoogle Scholar
- Sandve GK, Gundersen S, Rydbeck H, Glad IK, Holden L, Holden M, Liestol K, Clancy T, Ferkingstad E, Johansen M: The Genomic HyperBrowser: inferential genomics at the sequence level. Genome Biol. 2010, 11: R121-10.1186/gb-2010-11-12-r121.PubMed CentralView ArticlePubMedGoogle Scholar
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M: TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003, 34: 374-378.PubMedGoogle Scholar
- Kapushesky M, Adamusiak T, Burdett T, Culhane A, Farne A, Filippov A, Holloway E, Klebanov A, Kryvych N, Kurbatova N: Gene Expression Atlas update–a value-added database of microarray and sequencing-based functional genomics experiments. Nucleic Acids Res. 2012, 40: D1077-D1081. 10.1093/nar/gkr913.PubMed CentralView ArticlePubMedGoogle Scholar
- Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams E, Parkinson H, Brazma A: Gene expression atlas at the European bioinformatics institute. Nucleic Acids Res. 2010, 38: D690-D698. 10.1093/nar/gkp936.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res. 2007, 35: D760-D765. 10.1093/nar/gkl887.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009, 37: D885-D890. 10.1093/nar/gkn764.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM: NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2011, 39: D1005-D1010. 10.1093/nar/gkq1184.PubMed CentralView ArticlePubMedGoogle Scholar
- Perl Programming Language.http://www.perl.org/,
- MySQL Database Management System: MySQL Database Management System. MySQL Database Management System, MySQL Database Management System: MySQL Database Management System,http://www.mysql.org/,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.