BMDExpress: a software tool for the benchmark dose analyses of genomic data
© Yang et al. 2007
Received: 28 June 2007
Accepted: 25 October 2007
Published: 25 October 2007
Skip to main content
© Yang et al. 2007
Received: 28 June 2007
Accepted: 25 October 2007
Published: 25 October 2007
Dose-dependent processes are common within biological systems and include phenotypic changes following exposures to both endogenous and xenobiotic molecules. The use of microarray technology to explore the molecular signals that underlie these dose-dependent processes has become increasingly common; however, the number of software tools for quantitatively analyzing and interpreting dose-response microarray data has been limited.
We have developed BMDExpress, a Java application that combines traditional benchmark dose methods with gene ontology classification in the analysis of dose-response data from microarray experiments. The software application is designed to perform a stepwise analysis beginning with a one-way analysis of variance to identify the subset of genes that demonstrate significant dose-response behavior. The second step of the analysis involves fitting the gene expression data to a selection of standard statistical models (linear, 2° polynomial, 3° polynomial, and power models) and selecting the model that best describes the data with the least amount of complexity. The model is then used to estimate the benchmark dose at which the expression of the gene significantly deviates from that observed in control animals. Finally, the software application summarizes the statistical modeling results by matching each gene to its corresponding gene ontology categories and calculating summary values that characterize the dose-dependent behavior for each biological process and molecular function. As a result, the summary values represent the dose levels at which genes in the corresponding cellular process show transcriptional changes.
The application of microarray technology together with the BMDExpress software tool represents a useful combination in characterizing dose-dependent transcriptional changes in biological systems. The software allows users to efficiently analyze large dose-response microarray studies and identify reference doses at which particular cellular processes are altered. The software is freely available at http://sourceforge.net/projects/bmdexpress/ and is distributed under the MIT Public License.
The endogenous control and external perturbation of biological processes are inherently dose-dependent. Examples include developmental events that require gradients of growth factor concentrations , zonation in the liver due to differences in oxygen and nutrient concentration , the pharmacological inhibition of key proteins in disease , and the toxic effects of environmental chemicals . Without a proper understanding of the dose-response characteristics, the molecular mechanisms underlying the regulation or perturbation of these biological processes would remain unknown.
Microarray technology has been broadly accepted as an efficient and reproducible way to explore the gene expression changes involved in the regulation of biological processes. The ability to survey thousands of genes allows a comprehensive assessment of the transcriptional changes involved in specific cellular events. Bioinformatic methods have been developed to interpret these changes by applying standardized functional annotations to each gene and identifying whether certain biological processes or molecular functions are over- or under-represented [5–10]. This approach has been referred to as a gene ontology (GO) enrichment analysis and allows large lists of transcriptional alterations to be distilled down into changes in cellular processes such as the immune response, DNA repair, apoptosis, etc.
To quantitatively assess the dose-response behavior of endogenous molecules and environmental chemicals, benchmark dose (BMD) methods have been employed to estimate reference doses [11–13]. In the BMD method, dose-response data for the biological effect is fit with a statistical model and a BMD is identified that results in a defined level of response over that observed in control populations. The BMD method has been used extensively by regulatory agencies to set standards for human health effects [14, 15].
A method for integration of BMD calculations with GO classification analysis in the examination of microarray dose-response data has recently been developed . The combination of microarray technology with these analysis methods results in a unique bioinformatic tool that provides both a comprehensive survey of transcriptional changes together with dose estimates at which different cellular processes are altered based on a defined increase in response. In this application note, we describe the development and availability of a user-friendly software tool that integrates these standard methods in the analysis of microarray dose-response data.
BMDExpress was written in the Java programming language with a Swing graphical user interface. The application requires a Java Runtime Environment of 1.6.0 or newer. Model fitting to the dose-response data is performed using a dynamic link library (DLL) written in C and FORTRAN that are called using a Java Native Interface. The DLL was written using source code modified from the BMDS software application developed by the U.S. Environmental Protection Agency . In mapping the Affymetrix probe identifiers to corresponding GO categories, the software application queries a client-accessible MySQL database that resides at The Hamner Institutes. The database is constructed using annotations provided by NetAffx  and the Gene Ontology Consortium . The database is updated weekly to ensure the annotations are current. At the present time, only Affymetrix microarrays are supported by BMDExpress and include the following: Human (HG_Focus, HG_U133A, HG-U133A_2, and HG-U133_Plus_2); Mouse (MG_U74A, MG_U74Av2, MOE430A, MOE430B, Mouse430A_2, and Mouse430_2); Rat (RAE230A, RAE230B, Rat230_2, and RG_U34A); Drosophila (DrosGenome1 and Drosophila_2); and Zebrafish (Zebrafish Genome).
The BMDExpress software is designed to perform a stepwise analysis on dose-response microarray data that combines standard BMD methods with GO classification analysis. The program has a series of interfaces that guide the user through the analysis process.
The number of probe sets on a standard Affymetrix microarray is relatively large and in the typical experiment, most probe sets are not significantly altered by the experimental treatment. As a result, an initial probe set selection process is recommended to reduce both the computational requirements and the variability in the final analysis by selecting probe sets that show significant dose-response behavior. The probe set selection process consists of a one-way analysis of variance (ANOVA) together with a false discovery rate correction for multiple comparisons  (Fig. 1B). The user is allowed to input a cutoff based on the adjusted p-value and may choose whether to filter out the various control genes (e.g., BioB, BioC, etc.) present on an Affymetrix array. The output of the one-way ANOVA lists the probe set identifier, the associated degrees of freedom, the F-value, the p-value, and the adjusted p-value. The user can export the list to a tab-delimited file by right-clicking on the corresponding node in the data tree within the left-hand window.
The generic definition of a BMD is the dose or concentration of a substance that corresponds to a specified level of response above or below that observed in a control or background population. The specified level of response within this definition is referred to as the benchmark response (BMR) and a statistical lower confidence bound on the BMD (BMDL) has been typically used by regulatory agencies to set safe levels of exposure. In BMDExpress, the identification of the BMD involves fitting the gene expression data to a selection of statistical models (linear, 2° polynomial, 3° polynomial, and power models) and selecting the model that best describes the data with the least amount of complexity (Fig. 1C). The user is allowed to select which statistical models they would like to fit to the data. By default, the power model and the three polynomial models are selected. However, the user can select a single model or any combination of two or more models. The user should note that depending on the number of doses in the study, the 3° polynomial may not be appropriate. In addition to model selection, the user is allowed to modify several critical parameters associated with the BMD analyses including the maximum number of iterations, the BMR, and the confidence level for calculation of the BMDL. The maximum number of iterations is a convergence criteria for the model fitting. The BMR is the number of standard deviations at which the BMD is defined. As a default, a BMR of 1.349 is provided. To derive this value, a normal distribution was assumed for control animals and it was assumed, a priori, that the changes in expression could occur in either tail, with a 1% chance of that occurring in the absence of exposure (0.5% in each tail). A BMR of 1.349 is the amount required to shift the mean response of the control distribution such that the treated distribution contains 11% in a single tail, i.e., a 10% increase over the assumed background rate of response. The 10% value for the shift in the tail area of the distribution is standard for BMD analysis [16, 21]. The confidence level is the statistical lower confidence limit applied to BMD estimated by the model. The resultant lower bound on the BMD is the BMDL and is a conservative estimate of the dose at which the particular gene is altered.
In processing the model output, the user is allowed to choose the method for selecting which model best describes the data with the least amount of complexity. In the first method, a nested likelihood ratio test can be used to select among the linear, 2° polynomial, and 3° polynomial models followed by an Akaike information criterion (AIC) comparison  between the best nested model and the power model (i.e., the model with the lowest AIC is selected). In the second method, a completely AIC-based selection process is performed. Finally, the user is allowed to remove probe sets where the BMD is greater than the highest dose. This option was provided to avoid model extrapolation.
In the output from the BMD analyses, the probe set identifier is provided with the best overall model together with selected values for each of the statistical models evaluated. These include the BMD, the BMDL, the fit p-value from the likelihood ratio test, the log-likelihood value for the fit of the model, the AIC, and the direction of the response (i.e., increased expression or decreased expression). On an average computer, 10 probe sets can be processed per minute.
Output from BMDExpress Characterizing the Dose-dependent Behavior of Each GO Category
Output from GO Analysis
Description of Output
Level in the hierarchy of the GO category
GO Term Name
Official name of the GO category
Total number of genes on the array assigned to the GO category
Genes from BMD Analysis
Number of genes from BMD analyses in the GO category
Percentage of the total number of genes in GO category used in BMD analysis
Entrez Gene identifiers in GO category based on Affymetrix probe set identifiers from BMD analysis
Affymetrix probe set identifiers from BMD analysis
Mean BMD for the genes in GO category
Median BMD for the genes in GO category
Minimum BMD for the genes in GO category
Standard deviation of BMD for the genes in GO category
Weighted mean BMD for the genes in GO category (weighted by fit p-value)
Standard deviation of the weighted mean BMD for the genes in GO category (weighted by fit p-value)
Mean BMDL for the genes in GO category
Median BMDL for the genes in GO category
Minimum BMDL for the genes in GO category
Standard deviation of the BMDL for the genes in GO category
Weighted mean BMDL for the genes in GO category (weighted by fit p-value)
Standard deviation of the weighted mean BMDL for the genes in GO category (weighted by fit p-value)
5th Percentile Index
Nth gene number representing the 5th percentile for all the genes in the category. The value is zero-based and a 0.5 value means that it falls between two values
BMD at 5th Percentile
BMD at the 5th percentile for all genes in the GO category (including genes with no significant dose response)
10th Percentile Index
Nth gene number representing the 10th percentile for all the genes in the category. The value is zero-based and a 0.5 value means that it falls between two values
BMD at 10th Percentile
BMD at the 10th percentile for all genes in the GO category (including genes with no significant dose response)
Probes with Adverse Direction Up
Number of probe sets in the GO category for which the final change in expression was in the up (i.e., increased) direction
Probes with Adverse Direction Down
Number of probe sets in the GO category for which the final change in expression was in the down (i.e., decreased) direction
To illustrate the features and functionality of BMDExpress, microarray data were taken from a study of hepatic gene expression changes in zebrafish following exposure to 17 α-ethynylestradiol (EE2) (ArrayExpress Accession No. E-TABM-105) . In this study, female zebrafish were exposed to EE2 in the water at nominal concentrations of 0, 15, 40, and 100 ng/L for 24 and 168 h. Gene expression changes in the liver were then evaluated using the Affymetrix Zebrafish array to identify potential genomic biomarkers for endocrine disrupting compounds (natural and synthetic) that are released into the environment and obtain a better understanding of the biological effects of these compounds in fish. The data from the study was downloaded and normalized using robust multi-array averaging (RMA) with a log2 transformation . The log2 transformed data for each time point were imported and analyzed separately using BMDExpress. The data were pre-filtered using a one-way ANOVA with a false discovery rate of 5%. A total of 1061 and 864 probe sets showed significant dose-response behavior at the 24 h and 168 h time-points, respectively. The data were then fit to linear, 2° polynomial, and power models and the best model was selected using a likelihood ratio test for the linear and 2° polynomial followed by an AIC comparison with the power model. The BMD values from the best model were then used as input for the GO analysis.
Biological process GO categories with the lowest mean BMD in zebrafish exposed to EE2
Biological Process GO Categorya
Total Genes in Category
Genes with BMD
Mean BMD (ng/L)
Std Dev BMD
Mean BMDL (ng/L)
Minimum BMD (ng/L)
Protein amino acid glycosylation
Cell migration involved in gastrulation
Monocarboxylic acid metabolic process
Nitrogen compound catabolic process
Regulation of apoptosis
Amino acid and derivative metabolic process
Amino acid metabolic process
Amino acid biosynthetic process
Amine metabolic process
Response to external stimulus
Carbohydrate catabolic process
Alcohol metabolic process
Cellular macromolecule catabolic process
Protein-DNA complex assembly
Monosaccharide metabolic process
Cellular protein complex assembly
In the original analysis of the data, the investigators used a linear model to identify genes that were significantly affected by dose together with a nonparametric stratified test for trend . Pairwise comparisons between dose-levels and the control group were also performed. The investigators summarized these results using Venn diagrams showing overlap of genes between doses and used a GO enrichment analysis at each individual dose showing which processes were affected. The reanalysis of this dataset using BMDExpress provides several improvements over the standard analysis performed by the investigators in the original article. First, the statistical modeling capabilities of BMDExpress utilized the dose-response information inherent in the data and identified concentrations at which the response to the treatment was significantly changed based on the variability observed in the control animals. These reference concentrations were not limited to the specific doses used in the study unlike the analysis using the linear model and pair-wise comparisons. Second, the ability to model the dose-response curves and calculate associated confidence limits identified a set of potential biomarkers that were sensitive and robust. Third, instead of performing a GO enrichment analysis at each dose level, the analysis by BMDExpress summarized the data by showing which biological processes were the most sensitive to environmental estrogens and provided reference concentrations at which they were affected. In this manner, the analysis was able to highlight the changes in amino acid glycosylation that occur at lower concentrations than the effects on tRNA aminoacylation and apoptosis.
Despite the numerous software programs for microarray data analysis, the majority do not provide any statistical modeling tools for analyzing dose-response studies. In the past, microarray dose-response studies have been typically analyzed using ANOVA with pair-wise comparisons between dose groups and the associated control. This type of analysis only identifies which genes are significantly altered at the specific doses used in the study. In BMDExpress, a statistical model is fit to the data and a dose-level is identified at which the response to the treatment is significantly different than that observed in the control animals. Using this method, the analysis is not constrained to the experimental doses and provides better use of the dose-response information . The software tool then allows users to summarize the statistical modeling of the individual genes based on GO categories and characterize the dose-dependent behavior of specific cellular processes. These integrated capabilities make BMDExpress a useful tool for the dose-response analysis of microarray data.
Project Name: BMDExpress
Project Home Page: http://sourceforge.net/projects/bmdexpress/
Operating Systems: Windows 2000 and Xp
Programming Languages: Java, C, FORTRAN
Other Requirements: Java 1.6.0 or higher
License: MIT Public License
Any Restrictions to Use By Non-Academics: None
dynamic link library
one-way analysis of variance
BMD lower confidence limit
Akaike information criterion
robust multi-array averaging.
The authors would like to thank Jeffrey S. Gift and R. Woodrow Setzer for providing helpful suggestions and the source code used in the original BMDS software. This work was supported by a grant from the American Chemistry Council's Long Range Initiative and a Superfund Program Project Grant (2 P42 ES004911-17).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.