Cell culture and sample preparation
Murine hepatoma cells, Hepa1c1c7 as well as the mutant tao BpRc1 cells (both LG Standards GmbH, Wesel, Germany), deficient in endogenous Ahr, were used for all experiments. Cells were cultured in phenol red-free DMEM supplemented with 7% FCS, 1% glutamine and 1% penicillin/streptomycin. Ahr translocation was investigated in a stable cell line based on tao BpRc1 cells expressing GFP-Ahr under tetracycline control. Cells were stimulated with different concentrations of benzo-[a]-pyrene (B[a]P; Sigma Aldrich, Steinheim, Germany), BPDE (Midwest Research Institute, NCI Chemical Repository, Kansas City, MO, USA) and TCDD (Sigma-Aldrich, Steinheim, Germany) dissolved in DMSO respectively.
Microarrays
To investigate the differential kinetic behavior of the transcriptome after B[a]P exposure, and to identify the primary Ahr response, we used two different setups: (1) short term exposure, Hepa1c1c7 cells were treated with 50 nM B[a]P for 0, 1, 2, and 4 hours and (2) long term exposure, Hepa1c1c7 cells were treated with 50 nM or 5 μ M B[a]P for 2, 4, 12 and 24h. Corresponding time-matched vehicle controls were generated. All experiments were performed in triplicate. Cells were lysed in Trizol reagent (Invitrogen, Darmstadt, Germany) and RNA extracted using RNAeasy kits (Qiagen, Valencia, CA, USA). RNA was quantified and integrity verified on a Bioanalyzer (Agilent Technologies, Palo Alto, CA). Sample preparation for Affymetrix GeneChip Mouse Exon 1.0 ST arrays (Affymetrix, Santa Clara, CA, USA) was performed following the manufacturer's recommendations. Microarray data was deposited in the Gene Expression Omnibus (GEO) under the identifier GSE29188.
Detection of differential expression
Microarrays were normalized using RMA and the University of Michigan custom CDF file (version 12.1.0) with mappings to Ensembl exon IDs. After normalization, but before proceeding with the analysis, we subtracted the (log2) DMSO expression values from the corresponding time point and batch of each of the B[a]P treatments. Exon expression values were then summarized to their corresponding Ensembl gene IDs, with the summarized gene expression value being the mean of its constituent exons. A 2-way ANOVA analysis was performed on each gene, with time and concentration as the factors. We then corrected for multiple testing by using the FDR. We considered only genes with an FDR < 0.05 for any of the main effects or time*concentration interaction. In addition, we admitted genes with an FDR < 0.05 from a simple t-test each B[a]P concentration (all time points pooled) vs. DMSO. Of these, we only considered genes that achieved 2-fold (or greater) differential expression at at least one time point. This left us with a total of 2,338 genes regulated in the long-term exposure (24 hour) data set. We interpolated the expression between the measured time points by averaging the simple linear interpolation with the spline interpolation. Since we have no measurement at time 0 hours, we assume equivalent expression with the DMSO samples, i.e. the expression ratio at time 0 hours is 0 on the log2 scale. The interpolation gave us a total of 25 values per gene, 1 value every hour from 0 to 24 hours. Whether or not the expression values were interpolated did not significantly affect the results of the classification and clustering, but we opted to use interpolated values to aid in visualization and interpretation.
Classification with Random Forests
We used the R implementation of Random Forests [46] to perform the two-class classification (Ahr primary response vs. side effects), using the time course expression measurements of significantly regulated genes as predictors. To derive training labels (Additional File 1, Figure S2), we used data available from two BPDE studies in human cell lines [47, 48], combining the P values from the studies using Fisher's method. We labeled mouse orthologs of genes with BPDE-perturbed expression (FDR < 0.05) as "secondary" since BPDE does not bind Ahr, but indicates affected genes further downstream of Ahr. We labeled genes as "primary Ahr" that showed differential expression (FDR < 0.05) in an independent gene expression time course of cells exposed to 50 nM B[a]P from 0 to 4 hours, with the additional condition that they were not significantly regulated in the BPDE data (i.e. orthologs had FDR > 0.05). These criteria led to 28 "Ahr-primary" labeled genes and 559 "side effect" labeled genes.
With this training set we ran RF with mtry set to 5, and ntree set to 5,000. We used the built-in outlier measure and removed genes in the 95 th percentile of outlier scores (resulting in 27 primary response and 530 side effect training cases), then re-ran RF, this time with 1,000 trees. In both cases, to avoid biased predictions (since there are far more "secondary" samples) we randomly sampled 20 genes from each class for the construction of each tree in the forest. The overall misclassification rate for the final forest was 7% (out of bag error estimate).
Predictions were made for all 2,338 differentially expressed genes, and genes with a proportion of class votes greater than 80% were retained for further analysis. This cutoff was chosen because when the training labels were permuted randomly and a RF trained, no prediction had a proportion of votes greater than 80%. Using these criteria, a total of 82 genes were predicted to be responding to Ahr directly, and 1,365 genes were predicted to be side effects (e.g. regulated through the presence of B[a]P metabolites). In addition to predictions, the RF proximity measure was calculated for all significant and confidently classified genes, yielding a 1,447 by 1,447 matrix.
Clustering
The RF proximity matrix was used as a distance measure by the transformation , where P is the original proximity matrix and D is the distance matrix. This distance matrix was then used as the input for PAM clustering, available in the R cluster package. We tested a range of k values and found that specifying 3 clusters gave the best average silhouette.
To assess the degree of confidence in cluster assignment for each gene, an RF was fit to predict cluster label using the gene expression measurements. The proportion of votes for the correct cluster is an indication of how well a gene fits in the cluster. Genes that were given a lower proportion of votes for the correct class than expected under the null hypothesis (labels permuted randomly) were excluded. When including this additional filtering criterion, the final number of genes classified as primary responders was 81, with 1,308 genes as side effects. In addition, the importance measurements obtained in the construction of this RF give an indication of which time points and which concentrations are important parts of the cluster's identity.
GO enrichment was performed for each cluster (Additional File 1, Table S3) using the topGO package [49]. Enrichment of the clusters for genes perturbed by an Ahr mutation was performed using the Kolmogorov-Smirnov test, using P values derived from differential expression of genes from [7, 19]. P values were calculated for each study separately, then combined using Fisher's method. Genes used to train the RF classifier were removed prior to calculation of enrichment, to ensure that the results reflected the actual predictive ability of the classifier.
Cell proliferation
Long-term exposure studies in Hepa1c1c7 cells treated with B[a]P versus BPDE were performed using the xCELLigence System (Roche Diagnostics, Mannheim, Germany). This system measures electrical impedance across micro-electrodes integrated on the bottom of 96-well tissue culture E-plates (Roche Applied Science, Germany). Shifts in impedance are measured in real time, indicating changes in cell proliferation. Cells were monitored every 15 min for up to 24 h after treatment with 50 nM, 500 nM, 1 μ M, 2.5 μ M or 5 μ M of B[a]P or BPDE respectively. Each experiment was performed in triplicate.
qPCR
In a separate experiment Hepa1c1c7 and tao BpRc1 cells were exposed to B[a]P (50 nM, 5 μ M), BPDE (50 nM, 5 μ M) and 1 nM TCDD for 0.5, 1, 2, and 4 h. mRNA was extracted and isolated using the MagNA Pure LC System (Roche Diagnostics GmbH, Mannheim, Germany). 50 ng of mRNA was reverse transcribed according to the protocol provided with the AMV reverse transcriptase (Promega, Madison, WI, USA). Resulting cDNA was diluted 1:5 and 4 μ l of template used in a 12 μ l PCR reaction. qPCRs were performed for the following example genes: Tnfaip2, Tiparp, Cdkn1a, Cdkn1b Mpp2, Cyp2s1, Nfe2l2, Klf9, Lig3, Myst2, Axin2, Agfg1, Anapc1, Nfkb1, Parp1, and the housekeeping genes 18S rRNA and Gapdh (primer sequences, Additional File 1, Table S1). All qPCR experiments were carried out on a LightCycler®480 system (Roche Diagnostics GmbH, Mannheim, Germany) with the following settings: touchdown amplification with an initial step of 96°C for 10 min; followed by the first cycle at 95°C for 10 sec. The annealing step started at 68°C for 20 sec (decrease of 0.5°C/cycle with a step delay of 1 cycle) and reaching the annealing temperature of 58°C for the last 25 cycles, followed by 72°C for 20 sec for extension. A total of 45 cycles were performed in each experiment.
ChIP
Hepa1c1c7 cells were exposed to 50 nM B[a]P or DMSO as the vehicle control for 1 h. Subsequently, cells were exposed to 50 nM B[a]P or DMSO as the vehicle control for 1 h respectively. Subsequently cells were cross-linked for 10 min at 37°C in 1% formaldehyde followed by a quenching step for 10 min with 150 mM glycine. After cross-linking, chromatin DNA was sheared into 200-500 bp fragments by sonication using a Bioruptor®Next Gen (UCD-300, Diagenode SA, Liege, Belgium). Sonicated, soluble chromatin was immune-precipitated with 2.5 μ g of an anti-Ahr antibody (Enzolifesciences/Biomol, Lörrach, Germany) or anti-Pol II (Millipore, Billerica, MA, USA). Control IPs were performed using rabbit IgG (Millipore, Billerica, MA, USA) corresponding to our specific antibodies. DNA isolates from immunoprecipitates were used as templates for real-time quantitative PCR amplification using the primer pairs listed in Additional File 1, Table S2. All ChIP experiments were performed at least two times.
Ahr translocation
Stably transfected tao BpRc1c cells, expressing a GFP-tagged Ahr under tetracycline control, were used to investigate the differences in translocation behavior for different concentrations of B[a]P. Cells were seeded in 96-well imaging plates (BD, Franklin Lakes, NJ, USA) and taken off tetracycline 24 h before exposure to allow for sufficient GFP-Ahr expression. Final B[a]P concentrations were 50 nM and 5 μ M respectively, including a corresponding DMSO control (0.05%). After treatment, cells were fixed using 3.7% formaldehyde, and the nuclei stained with Hoechst 33342 (Invitrogen, Darmstadt, Germany). Imaging was performed on a BD Pathway™Imager 855 in a non-confocal mode using a 20X U-Apo 340 objective (Olympus, NA 0.75). Images were binned 2 × 2 and montaged 2 × 2. Further analysis of fluorescence intensity was performed using the Attovision software (BD, Franklin Lakes, NJ, USA). After segmentation of the nucleus and the cytoplasm, the ratio of the nuclear and cytoplasmic fluorescence was calculated for each cell. Ratios were conflated in 0.01 intervals and relative frequencies determined. To allow for comparability the measurements were standardized so that the mean of the negative control equals 1. For the statistical analysis, more than 250 cells/treatment were considered.