Reagents & Antibodies
EGF from murine submaxillary gland and anti Tubulin (1:10000) were purchased from Sigma. Anti p-ERK1/2 (1:2000), Anti p-p90rsk (1:2000), anti p-EGFR (1:1000), anti p27 Kip1 (1:1000) and anti p-CREB (1:2000) were from Cell Signaling. Anti Cyclin D1 and anti cyclin E were from Santa Cruz. U0126 and AG1478 were from Calbiochem.
Cell Culture and Sample preparation
HeLa cells were cultured at 37°C in a 95/5 Air/CO2 water saturated atmosphere in Dulbecco's modified Eagle's medium (DMEM) containing 10% heat inactivated fetal bovine serum (FBS), 2 mM L-glutamine and 100 U/ml Penicillin/streptomycin. For treatments, the cells were transferred to 60 mm dishes and, after 48 h, starved for 24 h in DMEM containing 2% FBS. The cells were incubated (if indicated) with the protein kinase inhibitors U0126 (10 μM) or AG1478 (10 μM) for 30 min, and then stimulated with EGF (150 ng/ml) for the indicated times. Cells were harvested, washed twice with cold phosphate-buffered saline and lysed with either 2 × Laemmli sample buffer (Sigma), for protein extraction, or RNeasy RLT lysis buffer (Qiagen), for total RNA extraction.
Total RNA was quantified with a NanoDrop ND-1000 spectrophotometer followed by quality assessment with the 2100 Bioanalyzer (Agilent Technologies) according to the manufacturer's instructions. Acceptable quality values were in the 1.8-2.2 range for A260/A280 ratios, >0.9 for rRNA ratio (28S/18S) and >8.0 for RIN (RNA Integrity Number).
For Western blotting 50 μg of cell extracts from HeLa cells were subjected to 8-10% SDS-PAGE. Gels were transferred onto PVDF membranes and processed for specific immunodetection by ECL using the antibodies at the dilutions indicated above.
Quantitative real time PCR was performed on two sets of genes. The first set was validated on the original three biological replicate experiments analyzed by microarrays and DGE (set 1: DUSP1, DUSP6, IL8, CCND1, CCNE2, MYC, FOS, CDKN1A, CDKN1B, CDKN1C, MAP3K6, IL11, EGFR, AURKC, E2F1, TGFA, CEBPD) and the second set on three independent biological replicates (set 2: MT1E, MT1F, MT1G, MT1H, MT1X, MT2A). Total RNA was extracted from HeLa cells, for set 1, with mirVana isolation kit (Ambion) and, for set 2, with miRNeasy Mini kit (Qiagen) following the respective manufacturer's instructions. Purified RNAs were treated with RNase-free DNAse (DNA-free, Ambion) and reverse-transcribed, for set 1, with Superscript II (Invitrogen) and, for set 2, Omniscript (Qiagen) to generate the corresponding cDNAs that served as PCR templates for mRNA quantification. Primers used in this study for RT-qPCR validation can be found on Additional file 14, Table S8.
PCR amplification and detection were performed with the ROCHE LightCycler 480 detector, using 2 × SYBR GREEN Master Mix (Roche) as reagent and oligonucleotide primers (0.25 uM or 0.3 μM of each primer, for set 1 and set 2 respectively) following the manufacturer's instructions. The reaction profile had a denaturation-activation cycle (95°C for 10 min) followed by 40 cycles of denaturation-annealing-extension (for set 1: 95°C for 15 sec, 60°C for 40 sec, 72°C for 5 sec and, for set 2: 95°C for 10 sec, 60°C for 10 sec, 72°C for 12 sec). Each sample was run in duplicate. mRNA levels were calculated using the LightCycler 480 software. The mRNA levels of each target gene and the housekeeping gene SF3A, were determined for each sample. PCR amplification efficiencies for all target genes and the housekeeping gene were determined using cDNA dilutions. The relative expression ratio was calculated for set 1 using the delta-delta-Ct method and for set 2 applying a mathematical model incorporating the PCR efficiencies and the crossing point deviation of EGF-treated HeLa cells- versus control non treated cells at each time point.
RNA (500 ng) was labeled using Agilent's Low Input RNA Labeling Kit, which involves reverse transcribing the mRNA in the presence of T7-oligo-dT primer to produce cDNA and then in vitro transcribing with T7 RNA polymerase in the presence of Cy3-CTP or Cy5-CTP to produce labeled cRNA. The labeled cRNA of the EGF-treated and the control samples from each biological replicate were labeled with alternate dyes and co-hybridized in duplicate with dye reversal to the Agilent Human 4 × 44K 60-mer oligo microarray according to the manufacturer's protocol. The arrays were washed, dried by centrifugation and scanned on an Agilent G2565BA microarray scanner at 100% PMT and 5 μm resolution. Dual channel Cy5 and Cy3 fluorescence data were extracted using Genepix 6.0 (Molecular Devices) software using the irregular spot finding feature.
Human Operon V4 37K arrays were used featuring 70-mer probes. First and second strand cDNA were synthesized from total RNA (500 ng) with the Aminoallyl Message Amp II Kit (Ambion). cDNA was purified and in vitro transcribed for aRNA synthesis. aRNA was purified and coupled to the Cy ester, and further purified, to remove unincorporated dye. Arrays were hybridized with dye swapping as in Agilent arrays, washed and dried following Operon's instructions on a Maui hybridization station and scanned on an Agilent G2565BA microarray scanner under at 100% PMT and 10 μm resolution. Dual channel Cy5 and Cy3 fluorescence data were extracted using Genepix 6.0 (Molecular Devices) software using the irregular spot finding feature.
Biotinylated cRNA was prepared using the Illumina RNA Amplification Kit (Ambion) according to the manufacturer's instructions starting with from 200 ng total RNA from each sample. cRNA was purified and each sample was hybridized once on 55-mer probe 48 K Illumina Human WG-6 V 2.0 Expression BeadChips following the manufacturer's instructions. After 16 h of hybridization arrays were washed, dried, stained with Cy3-Streptavidin and scanned using Illumina BeadScan software on the Illumina BeadArray scanning system. Single channel Cy3 fluorescence data were extracted using BeadStudio data analysis software with default settings.
Digital gene expression (DGE) profiling by high throughput tag sequencing
For each sample, 2 μg of total RNA were used following Illumina's protocol for sequencing of DGE tags. Briefly, libraries of cDNA fragments were generated by capturing transcripts on oligo-dT beads, followed by synthesis of first and second strand cDNA in situ. Cleavage with DpnII resulted in recovery of the most 3' portion of the cDNA molecules, still attached to beads. A 5' adaptor containing a cut site for the type II restriction endonuclease MmeI was ligated to the cDNA. Cleavage with MmeI released fragments of 17-18 bp from the beads. Following 3' adapter ligation, the resulting library was enriched by PCR amplification (15 cycles), and purified by PAGE. Sequencing by synthesis was carried out on the Genome Analyzer I (Illumina), as recommended by the manufacturer, for 36 cycles.
Raw data were processed using the Illumina pipeline version 1.3.0. 3' adapters were recognized and trimmed using a script that penalizes mismatches to a lesser extent at read ends, following the distribution of sequencing errors along Illumina DGE reads . Several datasets of reference sequences (RefSeq, GeneID predictions, GenScan predictions, RNAgenes) were reduced in complexity by in silico identification of DpnII cut sites and retrieval of these sequences plus 36 nt flanks on either side. The final mapping step was performed by applying Eland iteratively in order to include all possible product sizes, allowing up to 2 mismatches. The compiled collection of expression tags with removed adapters was initially aligned against the reduced-complexity set of RefSeq entries and the targets reference sequences were filtered as in the microarray probe mapping to exclude any targets corresponding to different gene symbols or with no associated gene symbol. Reads mapping unambiguously were counted for each unique transcript within the reduced-complexity RefSeq reference set. Raw transcript counts were first filtered by removal of RefSeq probes with values smaller than 'mean minus standard error' in at least 90% of the samples, where 'mean = average counts of RefSeq probes corresponding to the same gene within one sample' and 'standard error = standard error of counts of RefSeq probes corresponding to the same gene within one sample'. Subsequently, counts were normalized by making sample-wise total numbers of reads equal to the median total number of reads for all samples. Finally, normalized counts of RefSeq probes corresponding to the same gene (defined by gene symbol) were summed up.
Cross-mapping between platforms
For the purpose of the comparison and to have consistent up to date annotation we remapped all probes in the different microarray platforms to assign them to gene symbols. For each of the platforms (Agilent, Illumina and Operon) sequences for each probe were mapped to the human reference genome and RefSeq reference transcriptome (hg18 accessed through UCSC). Mapping was done using BLAST, BWA and BOWTIE independently. Only unambiguously mapping probes were selected. All ambiguous probes were discarded. Up to 2 mismatches were allowed to consider differences in probe sequence relative to the reference. These can originate from the disparity of sources of sequence information and genomic annotation used by the different microarray manufacturers and can include natural sequence variation as well as sequencing errors in databases, or artifacts generated during probe design. When mapping to the reference genome, annotation information (GTF from UCSC) was used from the same genome version to create a probe-transcript link ID. We selected probes that could be unambiguously mapped at least once to either the genome (where there was an annotated transcript) or to the reference transcriptome, with the main requirement being that there is an association to an official gene symbol. Transcripts corresponding to genes without official gene symbols were ignored.
In the case where a gene was represented by multiple array-specific probes we took the median log2ratio value of the corresponding probes. For the Illumina GA-I sequencing data, counts of probes representing the same gene were summed up before calculating log2ratio values. We took the intersection of genes in all platforms and merged the corresponding log2ratio data.
Next, we took intersections for all combinations of three platforms, then for all combinations of two platforms and, finally, the probes with no overlap between platforms were also scored. Each time, the corresponding data was appended to the existing data matrix. Hence we end up with a matrix containing data for 20,322 RefSeq genes with known HUGO symbols, the union of genes in all platforms under consideration.
Log2ratio values were computed for all pairs of control and EGF stimulated samples. This was also done for the one-channel microarray platforms since samples are to be considered as paired due to the study design. Further, this procedure makes one- and two-channel data directly comparable.
Analysis for differential expression on a gene-by-gene basis was done by SAM  and limma , including correction for multiple testing using the False Discovery Rate (FDR) method.
For cross-platform comparisons Gene Set Enrichment Analysis (GSEA)  was applied where the gene set of interest was defined as the list of differentially expressed genes as derived from one platform, and its enrichment among differentially expressed genes within the remaining platforms was tested. In order to further assess comparability between platforms we computed CAT ('concordance at the top') plots as described .
We also aimed at defining a consensus list of regulated genes using information from all platforms simultaneously. Since expression measures are not directly comparable between different platforms we used the RankProd approach  that is based on differential gene expression ranks. Only genes present in all the platforms under consideration can be included in this analysis. Therefore we applied the RankProd analysis for all combinations of platforms as given by the complete merge data matrix described above. P-value adjustment according to  (FDR) was then applied to the union of all genes.
In order to explore the changes in gene expression due to EGF stimulation from a more global point of view, we analyzed 218 KEGG pathways  with the GlobalAncova approach . Only genes present in all platforms were used for this analysis. The 196 pathways are all available human pathways that contain at least one of those genes. Since GlobalAncova is quite sensitive, we applied a rather conservative method for multiple testing correction . We further explored the pathways with adjusted p-values < 0.01 with respect to interconnections between them. We propose a network of pathways where an edge corresponds to an overlap of regulated genes between the two respective pathways.
Network and pathway analysis
Ingenuity pathway analysis 3.1 software (IPA; Ingenuity Systems) was used for evaluating the functional significance of EGF-induced gene profiles. Specified lists of genes identified by RankProd as being affected by EGF were used for network generation and pathway analyses implemented in IPA tools. HUGO official gene symbols for the selected gene lists were uploaded into the IPA suite, which were then mapped to the Ingenuity Pathway Knowledge Base. The so-called focus genes were then used for generating biological networks. A score was generated for each network according to the fit of the original set of significant genes. This score reflects the negative logarithm of the p-value, which indicates the likelihood for the focus genes in a network of being found together due to random chance. Using a 99% confidence level, scores of ≥2 were considered significant. Significances for biological functions were then assigned to each network by determining a p-value for the enrichment of the genes in the network for such functions compared with the whole Ingenuity Pathway Knowledge Base as a reference set.