Animal handling and sample collection
All animal procedures conformed to the Home Office (UK) guidelines for experimentation with animals and were approved by local ethics committee at Syngenta. Eighty nine week old male Fischer (F344) rats were randomly divided into four groups of twenty animals (Additional file 4). Each group was exposed to a different concentration (0, 50, 500 or 1000 ppm) of the sodium salt of phenobarbital which was added to the standard laboratory diet and milled to homogeneity. Animals were fed ad lib throughout the study. Five animals from each group were killed by exsanguination under halothane anesthesia after 1, 3, 7 and 14 days of PB exposure. Blood was collected into lithium heparin tubes, centrifuged at 2000 rpm for 10 minutes and the plasma separated for clinical biochemistry. Plasma samples for metabolomic analysis were stored at -80°C until analysis. Livers were removed immediately and weighed. Samples of liver (~150 mg) were taken from the left lateral lobe and snap frozen in liquid nitrogen for genomic and metabolomic analysis. Representative samples of liver for histopathology were taken from the three main lobes, fixed for 48 hours in 10% neutral buffered formol saline and processed to paraffin wax blocks. Sections of liver (4 μm) were stained for ki67 by immunohistochemistry and the labeled nuclei counted to produce a labeling index. A sample of liver was taken from the right median lobe, fixed in 3% glutaraldehyde in cacodylate buffer, processed, embedded and examined by electron microscopy.
All chemicals were purchased from Sigma-Aldrich (Poole, UK) with the exception of D2O, sodium-3-(trimethylsilyl)-2,2,3,3-tetradeuteriopropionate (TSP) and N-Methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA) which were obtained from Goss Scientific (Great Baddow, UK), Cambridge Isotope Laboratories (Andover, MA) and Macherey-Nagel (Düren, Germany), respectively.
Plasma was analysed for the following parameters: total protein, total bilirubin, creatine kinase activity, aspartate aminotransferase activity, cholesterol, creatinine, albumin, alkaline phosphatase activity, gamma-glutamyl transferase activity, triglycerides and glutamate dehydrogenase using a Konelab clinical chemistry analyzer (Thermo Scientific, Waltham, MA).
Tissue samples were extracted using methanol-chloroform as described by Le Belle et al . Approximately 75 mg of frozen tissue was ground up with dry ice. Methanol-chloroform solution (2:1, 600 μl) was added and the samples were sonicated for 15 min. Chloroform-water (1:1, 400 μl) was added and the samples centrifuged (20 min, 13 500 rpm). Aliquots of the resulting aqueous and organic layers were taken for GC-MS, LC-MS and 1H-NMR analysis. The aqueous fractions were dried in an evacuated centrifuge. The organic fractions and pellets were dried in a fume hood overnight.
High Resolution 1H-NMR spectroscopy
For analysis of liver tissue extracts, dried aqueous extracts from 500 μl of the aqueous layer were dissolved in phosphate buffered D2O (pH 7.0, 600 μl, 33 mM Na2HPO4, 6.7 mM NaH2PO4, 0.1% sodium azide, 0.17 mM TSP). 1H-NMR spectra were acquired on a Bruker DPX400 spectrometer operating at 400.13 Hz using a 5 mm probe. The standard "noesypr1d" pulse sequence (Bruker Analytik GmbH, Germany) was used for solvent suppression (relaxation delay = 2 s, t1 = 3 μs, mixing time = 150 ms, solvent presaturation applied during the relaxation time and the mixing time). 128 transients were collected into 16 K data points over a spectral width of 12 ppm at 37°C. NMR free induction decays (FIDs) were Fourier transformed with a line broadening of 0.3 Hz. The spectra were phased, base line corrected, referenced to TSP at 0.00 pm and integrated using ACD labs NMR Processor (version 8, ACD, Toronto, Canada). The spectra were integrated in 0.01 ppm buckets between 0.20 and 9.95 ppm excluding the water region (4.72-5.05 ppm). To account for concentration differences, the total integral of each spectrum was normalized to 10000 and each peak represented as a ratio to the normalized total integral. However, it became apparent that there were very large differences in the amount of glucose and glycogen in the liver extract samples and so the spectra were also renormalized with the glucose and glycogen resonances (3.35-4.05 ppm, 4.60-4.72 ppm, 5.20-5.55 ppm) excluded (Additional files 5).
For plasma analysis, plasma (250 μl) was added to 500 μl of deuterated acetonitrile/D2O (50:50). Samples were mixed and centrifuged (12 000 rpm, 5 minutes) and 600 μl of supernatant taken for analysis. 1H-NMR spectra were acquired on a Bruker DPX400 spectrometer operating at 400.13 Hz using a 5 mm probe. Samples were analysed using a Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence to suppress the contribution from large molecules which also included a water pre-saturation regime. 128 transients were collected into 16 k data points across a spectral width of 20 ppm at 37°C. The relaxation delay was 2 s and the spin echo delay (d20) was 1 ms. 48 loops (spin echoes) were used to give a total spin echo time of 96 ms. NMR FIDs were Fourier transformed with a line broadening of 0.3 Hz. The spectra were phased, base line corrected, referenced to the glucose doublet at 5.23 ppm and integrated using ACD labs NMR Processor (version 8, ACD, Toronto, Canada). The spectra were integrated in 0.02 ppm buckets between 0.60 and 8.00 ppm excluding the water region (4.68-5.15 ppm). To account for concentration differences, the total integral of each spectrum was normalized to 10000 and each peak represented as a ratio to the normalized total integral.
For HRMAS 1H NMR experiments approximately 15 mg of tissue was soaked in D2O with 10 mM TSP then placed in a 4 mm diameter zirconium oxide rotor for analysis. The sample was packed into a homogeneous ball of 30 μl volume using a Teflon spacer. HRMAS experiments were conducted on a Bruker AVANCE II+ spectrometer operating at 500.13 MHz for the 1H frequency and using a high resolution 4 mm MAS probe. Samples were spun at a frequency of 4500 Hz at a temperature of 27°C. A "noesypr1d" pulse sequence was used (relaxation delay = 2.5 s, t1 = 4 μs, mixing time = 50 ms, solvent presaturation applied during the relaxation time and the mixing time). 64 transients were collected into 32 k data points across a spectral width of 16 ppm. FIDs were Fourier transformed with a line broadening of 1 Hz The spectra were phased, base line corrected, referenced to TSP at 0.00 pm and integrated using ACD labs NMR Processor (version 8, ACD, Toronto, Canada). The spectra were integrated in 0.04 ppm buckets between 0.50 and 5.60 ppm, excluding the water region (4.75-5.00 ppm).
Gas Chromatography Mass Spectrometry (GC-MS)
The dried aqueous extracts from 100 μl of the aqueous phase of the extraction mixture were derivatised with methoxyamine hydrochloride in pyridine (20 mg ml-1, 30 μl, 17 hr, 22°C) and silylated using MSTFA (30 μl, 1 hr, 22°C) . 50 μl of the derivatised sample was added to 200 μl hexane prior to analysis (Additional files 5).
The dried organic fraction was treated with a methylating reagent which formed the fatty acid methyl esters of carboxylic acids. Chloroform-methanol (1:1, 750 μl) and boron trifluoride in methanol (~10%, 125 μl) were added to the extract and the samples heated to 80°C for 90 minutes. Deionized water (300 μl) and hexane (600 μl) were added, the samples vortex mixed and the lower layer was removed and dried. The dried samples were reconstituted in 150 μl hexane prior to analysis  (Additional files 5).
GC-MS analysis was carried out using a Thermo Electron Trace GC Ultra coupled to a Thermo Electron DSQ single quadrupole mass spectrometer. For all analyses the injector temperature was 230°C. Helium was used as the carrier gas and maintained at a constant flow of 1.2 ml min-1. The MS was operated in full scan mode with 3 scans s-1 across a range of 50-650 m/z.
Derivatised aqueous extracts were analysed using a 30 m × 0.25 mm ID × 0.25 μm df 5%-phenyl-95%-dimethylpolysiloxane column. 1 μl derivatised sample was injected splitless. The initial column temperature of 70°C was held for 2 minutes then the temperature was ramped at 5°C min-1 to 310°C and this temperature was held for 20 minutes.
Derivatised organic extracts were analysed on a 30 m × 0.25 mm × 0.25 μm 100% polyethylene glycol column. 2 μl derivatised sample was injected with a split ratio of 20:1. The initial column temperature of 60°C was held for 2 min and then the temperature was ramped at 6°C min-1 to 230°C and this temperature was held for 7 min. GC-MS data was analyzed using Qual Browser version 1.4 (Thermo Electron Corporation). Peaks were integrated individually and the total integrated peak area of each spectrum normalized to 10000.
Univariate statistical analysis
Liver and body weight, clinical chemistry, food consumption and Ki67 staining were analysed by 1-way analysis of variance to identify differences between the dose groups. Dunnett's multiple comparison test was then used to identify which of the treatment group means were significantly different to the control group mean .
Multivariate statistical analysis
Normalized data was imported into SIMCA-P+ (v. 11, Umetrics, Umeå, Sweden) for multivariate statistical analysis. The data was mean centered and scaled. Scaling is necessary so that high concentration metabolites do not dominate the models. NMR data was Pareto scaled to suppress the contribution from regions of the spectra containing no peaks. Since in the GC-MS analysis only identified peaks were used, GC-MS data was scaled to unit variance. Initially, PCA models were built to look for clustering and identify outliers, and then PLS was applied to improve the separation and look for changes correlated with increasing dose of PB. These are essentially data reduction methods whereby the major variation in the dataset is summarized in new variables known as components. Thus, a dataset which has thousands of variable can be summarized in typically fewer than ten new components . PCA is an unsupervised technique which identifies the major sources of variation in the dataset. PLS is a supervised technique which identifies variation in the dataset (the X-block) which correlates with a specific variable (the Y-variable), in this study the dose of PB. R2 and Q2 values were used to assess the amount of variation represented by the principal components and robustness of the model, respectively. To prevent over fitting, the predictive capability of the models was checked by samples from the dataset, building a model and then predicting the dose for the omitted samples. The validate function in SIMCA was also used to check the models. This function randomly permutes the Y variables and calculates the R2 and Q2. As the similarity between the actual doses and the permuted doses falls, so should the R2 and Q2 values.
Gene expression analysis
Total RNA was isolated from 150 mg of tissue using a midi column according to the manufactures instructions (Qiagen; West Sussex, UK). RNA quality and concentration were determined using an Agilent Bioanalyzer and samples were repurified if the calculated RNA Integrity Number was below 7. Purified RNA was converted to complementary DNA, amplified and then converted to biotin-labeled complementary RNA using Affymetrix GeneChip 3' Amplification reagents (Affymetrix, High Wycombe). 15 ug was hybridized to Affymetrix Rat Expression RG-230 v2 GeneChips as described in the Affymetrix GeneChip Expression Analysis Technical Manual http://www.affymetrix.com/support/technical/manual/expression.manual.affx. The probe arrays were scanned and the probeset intensities were combined and averaged using Microarray Suite 5.0 (Affymetrix). The mean of each array was globally normalized to 500. Data were imported into GeneSpring GX version 7 and the following normalizations performed: data transformation to set expression values below 0.01 to 0.01; per gene normalization, where the signal strength of each gene is normalized to the median of its signal strength in all samples. The microarray data is available at Gene Expression Omnibus (GSE18753).
The probesets on the array were then analyzed by a number of filters and statistical tests. Firstly to remove probesets for genes without detectable expression: genes that did not have at least 3 present flags were excluded and genes that had an expression value of less than 90 in at least 3 samples were also excluded. To ensure robust detection of any differentially expressed genes any probeset that showed less than a 1.5-fold change in expression relative to its time-matched control were also excluded. To detect significant changes in expression levels a 2-way analysis of variance was performed on the filtered gene list (using a Benjamini and Hochberg false discovery rate of < 0.05) on time and treatment. Ratios of changes in gene expression were calculated by normalizing each treated sample to the median value of the time-matched control. These ratios were used for hierarchical clustering of the differentially expressed genes using the Pearson correlation coefficient. Differentially expressed genes were functionally categorized using their Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways annotations using GeneSpring GX. GO over-representation analysis was performed using the GeneSpring GX GO browser. The significance of a Gene Ontology Biological Process being over-represented in a gene list relative to the annotations on the chip was assessed using a hypergeometric p-value < 0.05 without multiple testing corrections. KEGG pathway over-representation was performed using the GeneSpring GX script "SG3b-1 Find genes in a GL present in PATHWAYS" contained within the Bioscript Library 2.2 (Agilent). A pathway needed to have at least 3 differentially expressed genes and a p < 0.05 to be included in the analysis .
Integration of metabolomic and genomic data
Two approaches were taken to integrate the data generated by metabolomic and genomic analyses. Firstly, significant changes in gene expression were mapped on to Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways using GeneSpring (Agilent Technologies, Santa Clara, CA). This allowed the identification of pathways where both the concentrations of transcripts and metabolites were altered.
In the second approach, PLS was used to build regression models using the genomic data as the X-block and the normalized integrated areas of peaks corresponding to individual metabolites identified which were altered by PB exposure as the Y-variable. PLS is an extension of PCA where a projection model is developed predicting Y from X via scores of X, and is a generalized multiple regression method suitable for multiple collinear X and Y variables. This method identifies changes in the gene expression data which correlate with changes in the concentration of a particular metabolite and is particularly advantageous when investigating changes in metabolites which lie on many pathways. Correlations were identified by the examination of the X and Y components of the first component of the PLS model. In addition, the goodness of fit (Q2) of the model was evaluated by excluding every 7th sample as part of a leave one out cross validation process. Q2 is defined as the fraction of the total variation of the X matrix that can be predicted by a component as estimated by cross-validation and Q2 = (1.0-PRESS/SS) where PRESS is the predicted error sum of squares which is the squared difference between observed Y and predicted Y and SS is the residual sum of squares of the previous component.
The loading plots for the PLS model show the correlation between the X variables, or their residuals in subsequent dimensions, in the first dimension, and the Y variables (or Y residuals scores in subsequent dimensions). The weights are selected so as to maximize the covariance between the X matrix and Y block. The loadings were ranked according to magnitude and in this way, the transcripts which were most highly correlated with changes in the concentration of a particular metabolite could be identified. Correlates were considered as significant if Q2>0.40 and the procedure passed the cross-validation routine in Simca-P+ whereby the real Q2 was calculated to be greater than 30 models generated using a random permutation of the Y scores.