The NGS and microarray data have been deposited in the NCBI Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO) [SRA: SRP008849, GEO: GSE32892].
VCaP human prostate cell lines were obtained from ATCC and grown in DMEM (GIBCO Cat# 11995) 10% FBS (SIGMA Cat# 12103 C-500 ml). Medium was supplemented with standard antibiotics (Penicillin-Streptomycin, GIBCO #15070-063).
AR trans-activation assay was performed as previously described . Briefly, LNCaP cells were engineered to over-express wild type human AR and to express an ARE2-PB-Luc reporter (LNAR cells). Cells were starved for 3 days prior to performing trans-activation assays, in phenol red free (PRF)-RPMI Medium supplemented with 5% of charcoal stripped FBS. On the day of the assay, cells were seeded at a density of 5,000 cells/well in 96 well plate in starvation medium and 4 hr later treated with the compounds in the absence (agonistic mode) or presence (antagonistic mode) of 100pM R1881 for 24 hr. Luciferase readings were acquired by means of a Perkin Elmer EnVision Excite Multilabel Reader (Ultra Sensitive Luminescence method).
AR nuclear translocation
LNAR cells were starved for two days in phenol red free RPMI medium containing 5% charcoal-dextran stripped FBS (Omega Scientific) prior to the assay. For Nuclear Translocation (NT) assay, 3,000 cells per well (in 384 well plates) were treated with compounds and 100pM R1881. Following overnight incubation the cells were fixed with 10% Formalin and permeabilized with PBS containing 0.5% Triton X-100 and blocked with 1% BSA. Cells were then stained with anti-AR monoclonal antibody (Abcam) followed with alexa 488 conjugated anti-Mouse IgG secondary reagent (Invitrogen). Finally the cells were counter stained with Hoechst 33342 (Invitrogen) and Cell mask Deep Red (Invitrogen). Plates were sealed and imaged using an Evotec Opera high content imager. Images were analyzed using an Acapella (Evotec) algorithm customized by Pfizer to quantify the fluorescence associated with anti-AR in the cytoplasm and nuclear regions. The ratio of Nuclear to Cytoplasm fluorescence was calculated and used as to tract inhibition of AR translocation.
Cells were seeded at a density of 15,000 cells/well (VCaP) or 1,000 cells/well (PC3 and DU145) in 96 well-plates and treated after attachment to the plate with test compounds. Medium and compounds were refreshed every 2–3 days. Number of live cells was analyzed at day 7 using the Resazurin assay (SIGMA Cat# R7017).
Cells were starved for 3 days in phenol red free DMEM containing 5% charcoal stripped FBS, and then seeded in 6-well plates at a density of 1 million cells per well in starvation medium. In the case of compound treatment (for ChIP-Seq study), cells were allowed to attach overnight and then were treated with various doses of the AR antagonists or vehicle alone (0.1% DMSO) in the absence or presence of 1nM R1881. Cells were incubated for 30 min at 37°C, 5% CO2 and then processed.
For siRNA treatment, cells were seeded in DMEM containing 10% FBS and transfected the day after seeding with 25nM AR-siRNA pool (Thermo Scientific, L-003400-00-0020) using Lipofectamine 2000 reagents (Invitrogen, 11668–019) and following manufacturer’s instructions. Cells were incubated at 37°C 5% CO2 for 48 h and then processed.
In both procedures, after the indicated treatment time, cells were rinsed once with ice-cold PBS and then lysed with Qiagen RNeasy Plus Kit (Cat# 74134, Qiagen, Valencia, CA). RNA quality was assessed using the Bioanalyzer (Agilent, Sunnyvale, CA) and spectrophotometer.
Chromatin immunoprecipitation (ChIP)
ChIP was carried out by Active Motif (formerly Genpathway) as follows. Cells were fixed with 1% formaldehyde at room temperature for 15 minutes. Fixation was stopped by the addition of glycine to a final concentration of 0.125 M glycine. Chromatin was isolated from the sample by adding 10 ml lysis buffer containing PIPES, Igepal, PMSF and Protease Inhibitor Cocktail, followed by disruption with a Dounce homogenizer. Samples were pelleted by centrifugation and resuspended in buffer containing sodium deoxycholate, SDS, and Triton X-100. Lysates were sonicated using a Misonix Sonicator 3000 equipped with a microtip in order to shear the DNA to an average length of 300–500 bp. Lysates were cleared by centrifugation and the chromatin suspensions were transferred to new tubes and stored at −80°C. To prepare Input DNA (genomic DNA), two aliquots of 10 μl each (approximately 1/50 of each chromatin preparation) were removed and treated with RNase for 1 hr at 37°C, proteinase K for 3 hr at 37°C, and 65°C heat for at least 6 hr to overnight for de-cross-linking. DNAs were purified by phenol-chloroform extraction and ethanol precipitated. Pellets were resuspended in 1/5 TE buffer. Resulting DNAs were quantified on a Nanodrop spectrophotometer. Extrapolation to the original chromatin volume allowed determination of the yield for each chromatin preparation (as measured by the DNA content).
Prior to use in ChIP, protein A agarose beads (Invitrogen) were pre-blocked using blocking proteins and nucleic acids for 3 hr. For each ChIP reaction, an aliquot of chromatin (30 μg) was pre-cleared with 30 μl pre-blocked protein A agarose beads for 2 hr. ChIP reactions were set up using pre-cleared chromatin and antibody AR (Santa Cruz Biotechnology, Cat# sc-13062, Lot# D0610) in a buffer containing sodium deoxycholate and incubated overnight at 4°C. Pre-blocked protein A agarose beads were added and incubation at 4°C was continued for another 3 hr. Agarose beads containing the immune complexes were washed two times each with a series of buffers consisting of the deoxycholate sonication buffer, high salt buffer, LiCl buffer, and TE buffer. An SDS-containing buffer was added to elute the immune complexes from the beads, and the eluates were subjected to RNase treatment at 37°C for 20 min and proteinase K treatment at 37°C for 3 hr. Cross-links were reversed by overnight incubation at 65°C, and ChIP DNAs were purified by phenol-chloroform extraction and ethanol precipitation. Quality of ChIP enrichment was assayed by qPCR using primers against known positive control site(s). Input DNA was queried at the same sites in parallel.
ChIP DNA was amplified by following the Illumina ChIP-Seq DNA Sample Prep Kit protocol. In brief, DNA ends were polished and 5’-phosphorylated using T4 DNA polymerase, Klenow polymerase and T4 polynucleotide kinase. After addition of 3’-A to the ends using Klenow fragment (3’-5’ exo minus), Illumina genomic adapters were ligated and the sample was size-fractionated (200–250 bp) on a 2% agarose gel. After a final PCR amplification step (18 cycles, Phusion polymerase), the resulting DNA libraries were quantified and tested by QPCR at the same specific genomic regions as the original ChIP DNA to assess quality of the amplification reactions. DNA libraries were sequenced on a Genome Analyzer II.
Identification of AR binding sites
Alignment of the 36-bp single-read sequences (“tags”) from ChIP-Seq to the human genome (hg19) was conducted by Active Motif with ELAND (Illumina CASAVA 1.5 pipeline) software. Tag density was calculated by dividing the genome into 32-nt bins and counting the number of 3’-end extended tags in each bin (Active Motif). Only sequence reads that pass quality filtering, with an alignment score of at least 10 and perfect genomic match were included for peak detection. AR-enriched genomic regions (binding sites) were identified by comparing the ChIPed samples with input sample using MACS algorithm  (1.4.0rc 2) and option of “-p 1e-10”. For subsequent analyses, we used the most high-confidence regions (FDR < 0.01) based on joint p-value score and fold enrichment cutoffs of 500 and 20. The values were chosen in consideration of “negative” peaks generated from swapping the ChIP-Seq and control channels (These “negative” peaks have no biological meaning and thus serve as a control for estimating/filtering out technical noises). To enable quantitative comparison (e.g. fold change) of the same binding site across samples, a “signal” measurement was computed for it in each sample by combining tag density values for bins that fall within the binding site with one-step Tukey's biweight algorithm.
Quantitative PCR (qPCR) validation
Twelve AR binding sites identified from ChIP-Seq were tested for enrichment by real-time quantitative PCR. Reactions were carried out in triplicate. Fold enrichment was determined relative to a non-enriched region (a region in gene desert on chromosome 12). Their primer sequences were included in Table 1.
Mapping to genomic annotations
AR binding sites were mapped to transcriptional start sites (TSS) of genes based on refFlat (hg19) table from UCSC Genome Browser. The classification of AR binding sites relative to genomic annotations (promoter/exomic/intronic/intergenic) and calculation of associated enrichment statistics were performed with RegionMiner tool and ElDorado database (Genomatix).
Sequence conservation analysis
Sequence conservation was assessed using phastCons conserved elements [17, 18] derived from multiple alignments of 45 vertebrate genomes to the human genome. Conservation score was sampled every 100 bp from the summit of each AR binding site (as reported by MACS) to 10 kb in both directions. It was defined to be the phastCons score of the overlapping conserved element, or zero for those outside of conserved elements. To explore the relationship between sequence conservation and mode of AR regulation, binding sites were classified in a binary fashion as conserved or non-conserved based on summit position. Statistical significance of the association was determined using two-tail Fisher’s exact test.
AR bound-sequences were searched for predefined motif matrices of transcription factors from MatBase library v8.3 vertebrate collection using RegionMiner (Genomatix). Over-representation statistics were reported as Z-score (the distance from the population mean in units of the population standard deviation) computed against genomic background (NCBI37/hg19). V$GREF-V$FKHD pair (module) is defined as two elements from 10 to 50 bp (middle to middle) of each other. Their occurrences were examined for distance distribution within the range.
MEME algorithm  was used to discover enriched sequence motifs ab initio from repeat-masked AR-bound sequences. In cases where a binding site is longer than 500 bp, only 500 bp centered on its summit were used. In consideration of computational time, we preformed the search with 2500 top sequences in terms of binding score. MEME was run using “-dna -mod zoops -revcomp -evt 0.01” command line options. Specificity was assessed as Z-score from 100 randomly sampled groups of the same number of sequences of the same length from the same chromosomes as AR binding sites. To investigate the enrichment and score distribution of the above MEME-derived ARE consensus motif, we scanned the AR-bound sequences as well as randomly sampled genomic sequences with its position weight matrix using PATSER  (v. 3e) and command line options “-c -li -s u2” or “-c -ls 0 -s -u2”. We determined presence/absence of motifs as predefined vertebrate matrices from MatBase in a similar manner (PATSER and “-c -li -s u2” option), whereas their statistical association with mode of AR regulation (direct activation/repression) was computed using two-tail Fisher’s exact test.
Quantitative Reverse Transcriptase- PCR (qRT-PCR) – in vivo samples
Approximately 20 mg of tumor samples in RNALater were homogenized by means of Qiagen TissueLyser 2, for 2 min @ 20 Hz. Homogenates were then processed using Qiagen RNeasy Plus kit (Cat#74134). Samples were resuspended in 60μL water and 2 μg RNA from each sample were subsequently subjected to qRT-PCR using TaqMan RT-PCR ABI7900HT, in a two-step procedure, as following: 1) For reverse transcriptase step we utilized ABI High Capacity cDNA Reverse Transcription Kit (4368814,from Applied Biosystems). Cycle run was 10 min 25°C, 2 h 37°C, 5 min 85°C and cool down to 4°C. For the qPCR reaction we utilized ABI 2X universal Master mix (4324018, Applied Biosystems) using ddCt (RQ) method for quantification. Cycle was 10 min 95°C, 15 s 95°C, 60s 60°C, 40 cycles. Primers used were ID# Hs00907244-ml for for AR and # 4352934E for GADPH (Applied Biosystems).
Tumor growth inhibition (TGI) – VCaP CRPC in vivo model
VCaP (3.5million) cells (in 50% matrigel product# 354234, lot A7141) were implanted subcutaneously in CB17/lcr-Prkdc SCID mice, and when tumors reached about 200 mm3 in size the animals were castrated. Since PSA levels remain low in this in vivo model until the tumors are significantly large (400 mm3), the re-growth of tumors post-castration was interpreted as a sign that the animals entered into the castration refractory phase. Animals were then randomized based on tumor volume and treatment commenced. In general and unless otherwise indicated, compounds were given by oral gavage once daily. Vehicle formulation consisted of 0.9% benzyl alcohol, 1% Tween-80 and 98.1% methylcellulose (0.5%). Tumor volume was calculated by the formula: length x width x depth x 0.5236. To measure PSA, 15 μl serum was diluted 1:3 v/v in water and then 25 μl of the dilution samples were transferred to the ELISA plate for the assay (PSA ELISA Kit from American Qualex Antobodies, Cat# KD4310). At the end of the study, animals were sacrificed and the tumors were extracted and treated with RNALater (Qiagen, Cat# 76154). Statistic analysis was performed using two-way ANOVA (GraphPad Prism, Version 5.01 - http://www.graphpad.com).
Expression profiling and data analysis
RNA from VCaP cells treated with AR-siRNA, Compound 30 (10 μM) or corresponding controls underwent 1st and 2nd strand cDNA synthesis, in vitro reverse transcription, and target preparation following the GeneChip Expression Analysis Technical Manual (Affymetrix). Overnight hybridization of the fragmented cRNA on the GeneChip® Human Genome U133A 2.0 array and subsequent washing, staining and scanning steps were performed as suggested by the manufacturer (Affymetrix). Image analysis was done with the Expression Console (Affymetrix).
Expression profiling data were RMA normalized with “affy” package of Bioconductor, followed by exclusion of spike-in controls (AFFX) and mixed cross-hybridization (_x) probe sets. Significance Analysis of Microarray (SAM) algorithm  implemented in “samr” package was used for differential expression analysis between compound/siRNA-treated and control samples. The fold change (FC) and d-score outputs from all probe sets were used for computation of genome-wide correlations. Significantly differentially expressed genes refer to those with FDR < 0.05 and |FC| > 1.5. Genes with probe sets going opposite directions were not included in subsequent analyses.
Gene signature enrichment analysis
Gene signature enrichment analysis was performed by comparing direct AR- activation/repression targets from small molecule antagonism with signatures collected from a variety of public databases and studies (e.g. MSigDB, GeneSigDB, NetPath, Gene Ontology, KEGG). Statistical significance of signature enrichment was determined using cumulative hypergeometric probability distribution as previously described  and correction for multiple hypothesis testing was conducted with the Q-value package . Some significantly enriched signatures and their connections were plotted with network visualization tool Pajek . We only reported enriched signatures with corresponding FDR < 0.05.