The l1-l2 regularization framework unmasks the hypoxia signature hidden in the transcriptome of a set of heterogeneous neuroblastoma cell lines
- Paolo Fardin†1Email author,
- Annalisa Barla†2,
- Sofia Mosci2, 3,
- Lorenzo Rosasco2, 4,
- Alessandro Verri2 and
- Luigi Varesio1
© Fardin et al; licensee BioMed Central Ltd. 2009
Received: 05 May 2009
Accepted: 15 October 2009
Published: 15 October 2009
Gene expression signatures are clusters of genes discriminating different statuses of the cells and their definition is critical for understanding the molecular bases of diseases. The identification of a gene signature is complicated by the high dimensional nature of the data and by the genetic heterogeneity of the responding cells. The l1-l2 regularization is an embedded feature selection technique that fulfills all the desirable properties of a variable selection algorithm and has the potential to generate a specific signature even in biologically complex settings. We studied the application of this algorithm to detect the signature characterizing the transcriptional response of neuroblastoma tumor cell lines to hypoxia, a condition of low oxygen tension that occurs in the tumor microenvironment.
We determined the gene expression profile of 9 neuroblastoma cell lines cultured under normoxic and hypoxic conditions. We studied a heterogeneous set of neuroblastoma cell lines to mimic the in vivo situation and to test the robustness and validity of the l1-l2 regularization with double optimization. Analysis by hierarchical, spectral, and k-means clustering or supervised approach based on t-test analysis divided the cell lines on the bases of genetic differences. However, the disturbance of this strong transcriptional response completely masked the detection of the more subtle response to hypoxia. Different results were obtained when we applied the l1-l2 regularization framework. The algorithm distinguished the normoxic and hypoxic statuses defining signatures comprising 3 to 38 probesets, with a leave-one-out error of 17%. A consensus hypoxia signature was established setting the frequency score at 50% and the correlation parameter ε equal to 100. This signature is composed by 11 probesets representing 8 well characterized genes known to be modulated by hypoxia.
We demonstrate that l 1 -l2 regularization outperforms more conventional approaches allowing the identification and definition of a gene expression signature under complex experimental conditions. The l 1 -l2 regularization and the cross validation generates an unbiased and objective output with a low classification error. We feel that the application of this algorithm to tumor biology will be instrumental to analyze gene expression signatures hidden in the transcriptome that, like hypoxia, may be major determinant of the course of the disease.
Clues to the prognosis of cancer are reflected at the time of surgical removal in the pattern of gene expression in the primary tumor. The ultimate goal is to identify specific "gene expression signatures" that define subsets of tumors and that will ultimately allow to predict the clinical course. Unsupervised analysis of the gene expression pattern has led to the definition of "gene expression signatures" that add independent prognostic information to that provided by a risk assessment based solely on clinical-pathologic factors. One limitation of the unsupervised cluster analysis is the lack of appreciation of the tumor pathology, which makes these signatures difficult to interpret with respect to the underlying cancer biology which comprises the intrinsic properties of the cancer cell, such as activation of transforming genes, and the response to signals generated within the tissue microenvironment, such as the hypoxic situation occurring in poorly vascularized or necrotic areas of the tumor. Ultimately, finding gene signatures that can be linked to the molecular mechanisms of cancer development is critical for translating these markers into the clinic. Alternative strategies to combine the prognostic value and biologic knowledge are being developed. Specifically, gene expression signatures are derived from in vitro studies on the pathophysiology of the disease. This is a novel approach standing on the concept that the tumor biology will give us the clues to characterize the outcome of the disease.
In this manuscript, we address the above-mentioned issues by developing a novel approach to identify the signature of low oxygen tension (hypoxia) in a set of neuroblastoma cell lines. Oxygen is essential for aerobic metabolism in all mammalian cells. To maintain function and homeostasis, cells have to be able to sense and respond to inadequate oxygen levels. The O2 levels within the neoplastic lesion are an important factor in determining the tumor phenotype  and hypoxia is associated with metastatic spread, resistance to radio- and chemotherapy and poor prognosis [1–3]. The cellular response to hypoxia is caused by changes in gene expression [4–6] through the activation of several transcription factors among which the hypoxia-inducible transcription factor-1α (HIF-1α) [1, 7], and -2α (HIF-2α)  are those taken as indicators of a hypoxic status of the cell. HIFs transactivate the hypoxia-responsive element (HRE) present in the promoter or enhancer elements of many genes encoding angiogenic, metabolic and metastatic factors [3, 9, 10]. Although hypoxia responses are thought to be evolutionarily conserved in all mammalian cells [11, 12] not every cell responds to hypoxia in an identical fashion. Although certain biochemical pathways are common hypoxia targets, the specific genes modulated by hypoxia within each pathways will depend heavily on the nature, type and genetic makeup of the responding cell [5, 6, 13]. In other words, hypoxia-induced common biochemical pathways may utilize different genes depending on the cell type.
Neuroblastoma is the most common pediatric solid tumor, deriving from immature or precursor cells of the ganglionic lineage of the sympathetic nervous system (SNS) [14, 15]. Neuroblastoma shows notable heterogeneity, with regard to both histology and clinical behavior . The outcome of the disease ranges from rapid progression and poor clinical outcome, to spontaneous regression into benign ganglioneuroma . The heterogeneity of neuroblastoma (NB) cells is found also in the cell lines derived from the fresh tumors which manifest various degree of differentiation and chromosomal alteration. For example, the amplification and/or expression of MYCN oncogene is a relatively frequent event, that is indicative of poor prognosis in fresh tumors and is present in several cell lines which share an aggressive behavior . Recent data of microarray analysis confirm the existence of different patterns of gene expression profile among different NB cell lines . The heterogeneity of the NB cell transcriptome complicates the identification of specific gene expression signatures associated to defined biological responses such as environmental stimulation that, albeit biologically very important, may be overshadowed by major genetic alterations as those caused by oncogenes which impact on several aspects of cell physiology. This problem is of major concern when several different NB cell lines have to be compared in in vitro studies.
The problem of identifying a gene signature, namely a significant group of variables, is aggravated by the typical high dimensional nature of the data. Complexity grows even more when the heterogeneity of the cells must be factored in. Several feature selection techniques have been proposed to deal with these problems (for review see ). The number of data available for a single study is usually small with respect to the number of variables, and it is crucial to adopt sound methodologies and strict experimental protocols to ensure statistical robustness . Cross validation loops are valid approaches to avoid selection bias  and to separate training and test phases. The standard categorization proposed by Blum et al.  groups variable selection techniques in three main classes: filters, wrappers and embedded. Filters [23–25] are mostly based on ranking criteria where the features are ordered and then selected or discarded according to a fixed threshold. These methods are broadly employed due to their simplicity and fast computation, despite the lack of guarantee that the selection is optimal with respect to the class discrimination. In wrapper methods [26–28] the relevance of a feature subset is determined according to prediction performance of the learning algorithm itself, though variable selection and training are two separate processes. In contrast, embedded methods [29–34] have the advantage of incorporating feature selection within the construction of the classifier or regression model, i.e. as part of the training phase. We applied the embedded feature selection technique l1-l2 regularization with double optimization to the analysis of gene expression profile. This technique is based on the optimization presented by Zou et al. . Theoretical studies  and empirical experiments [37, 38] showed that such technique fulfills all the desirable properties of a variable selection algorithm. Indeed, the use of regularization allows performing embedded feature selection in the supervised learning framework, since the particular type of penalty used in l1-l2 regularization forces the classifier or the regression model to depend on a small number of selected features. Another asset of l1-l2 regularization is that it is multivariate by design since its solution is a classification or regression model that takes into account the combined effect of multiple features, and the set of relevant features is selected while looking at all the features at the same time. A strong advantage of l1-l2 regularization over other embedded methods is also its ability to take into account correlation among variables. In other words, when one variable is considered relevant to the problem, its correlated variables are considered relevant as well. While most feature selection techniques are based on heuristics, l1-l2 regularization is asymptotically consistent from the statistical viewpoint, i.e. theoretical results  guarantee that the best possible estimator is found as the number of training samples increases. Finally, the use of the double optimization allows to identify the relevant genes and to provide accurate discrimination. This approach was successfully applied in different contexts ranging from computer vision  to computational biology .
In this study we demonstrate that the application of l1-l2 regularization allows to model the effect of low oxygen tension, which was not detectable by supervised approaches, and to find a cluster of genes discriminating the normoxic and the hypoxic statuses of neuroblastoma cell lines.
NB cell lines used and the relative characteristics
BM, bone, LN
Unsupervised clustering analysis
In conclusion, the unsupervised approaches detected major transcriptome differences among the cell lines driven in part by the cascade of events triggered by MYCN expression. However, the disturbance generated by this transcriptional pattern was such that the detection of more subtle changes induced by hypoxia was completely masked.
Supervised univariate analysis hypothesis test
Supervised multivariate l1-l2 regularization analysis
BCL2/adenovirus E1B 19 kDa interacting protein 3
DNA-damage-inducible transcript 4
pyruvate dehydrogenase kinase; isoenzyme 1
Transcribed locus, hypothetical protein FLJ11267
growth and transformation-dependent protein
adenylate kinase 3-like 1
growth and transformation-dependent protein
aldolase C; fructose-bisphosphate
vascular endothelial growth factor
BCL2/adenovirus E1B 19 kDa interacting protein 3
WD repeat domain 5B
We conclude that l1-l2 regularization algorithm was able to identify 11 stable probesets that clearly separated the hypoxic from normoxic cell lines even in the case of the disturbance generated by the genetic alterations of the cell lines. Therefore, this cluster represents the consensus hypoxia signature hidden in the neuroblastoma cells transcriptome that we wanted to sort out.
Discussion and Conclusion
We have analyzed the gene expression profile of 9 cell lines cultured in a normoxic or hypoxic environment in order to identify the hypoxia signature. We demonstrated that, differently from unsupervised approaches, l1-l2 regularization with double optimization identified a cluster of 11 stable probesets separating hypoxic from normoxic cell lines even when hidden in the neuroblastoma cells transcriptome characterized by the high disturbance of genetic alterations. Biological signatures can be derived from cell lines based datasets using many different informatics approaches (for review see ). This is the first report describing the use of the l1-l2 regularization with double optimization protocol described in  to distinguish datasets based on the biological status of the cells.
The first attempts to identify the hypoxia signature relied on three different unsupervised clustering analyses. These approaches detected major differences among the transcriptome of the cell lines driven by the characteristics of the cell lines themselves of which the cascade of events triggered by MYCN expression was a major component. However, the disturbance generated by these transcriptional patterns was such that the detection of more subtle changes induced by hypoxia was completely masked. This conclusion is supported by the results obtained with the supervised univariate analysis t-test which was able to identify a strong response associated to MYCN expression and, to a lesser extent to MYCN amplification, but not the hypoxia dependent response. The heterogeneity of neuroblastoma and neuroblastoma cell lines has been previously observed .
The impossibility to obtain a hypoxia signature by unsupervised approaches prompted us to consider different algorithms based on a robust supervised variable selection technique, capable of detecting the hypoxia-induced transcriptome even in the presence of the disturbance of a strong competing signal. The l1-l2 regularization allowed us to build a powerful discriminative rule and to define a signature of probesets also taking into account the presence of variables (probesets) correlated (collinear) with each other. The use of cross validation allows the selection protocol to generate an unbiased and objective output  beyond the theoretical results that guarantee the robustness of the core algorithm . The strong discriminative power is proven by the 17% classification error that is a very low value when dealing with 18 samples and nearly 50 thousands variables.
We adopted a validation framework based on a double loop of leave-one-out cross-validation in order to extract unbiased estimates of the classification error. The outer loop produces 18 lists of relevant variables, from which we extract a common list by setting a frequency threshold. It can be appreciated from visual inspection of the frequency distribution that, for values lower than 30%, a large number of probesets is included, which are extremely unstable. For frequency above 70%, the number of selected probesets slowly decreases, and a plateau is present between 30 and 70%. Therefore, we set our frequency threshold to 50% that is the intermediate value of such a frequency plateau. The correlation parameter ε can be potentially tuned between 0 and +∞ in order to extract lists of probesets with different correlation degree. However, values of ε equal to or smaller than 1 provide the same minimal list which comprises 9 probesets. This list is minimal in that it does not include correlated probesets, and it can be viewed as the smallest set of variables needed to predict the hypoxic status without any prior information. Conversely, by increasing the correlation parameter ε we are able to expand the list to 11 probesets, which is obtained for ε ≥ 100. Since we are interested in all genes involved in the hypoxic condition, we define such a correlation-aware list as our hypoxia signature in neuroblastoma cell lines. This signature identifies also hypoxic status in astrocyte cell lines. We estimated a similar low cross validation error demonstrating that its application goes beyond the neuroblastoma lineage and providing an additional proof of its discriminatory power on out-of-set data. However, the neuroblastoma hypoxia signature is less efficient in discriminating hypoxic dendritic cells indicating the existence of a limited spectrum of hypoxic cell types that can be identified by this signature. Dendritic cells are terminally differentiated mononuclear phagocytes, biologically very far from the other cell types, deriving from hematopoietic precursors, short lived and programmed to serve as immunomodulatory cells . This conclusion supports the concept of the heterogeneity of the response to hypoxia in different cell types [13, 42, 43]. The enrichment of the 11 probeset signature in the hypoxic phenotype of neuroblastoma cells and astrocytes provides biological validation of our approach by establishing a clear link between our signature and the hypoxic microenvironment.
The 11 probesets represents 8 genes all of which are known from the literature to be modulated by hypoxia in different cell types and to be part of key biological processes associated with the response to hypoxia, indicating, once more, the biological roots of our signature. ALDOC and PDK1 belong to the glycolytic pathway and are known to be up regulated by hypoxia in neuroblastoma . Potentiating of the oxygen-independent glycolytic pathway to comply with the energy demand, is one of the major cellular response to hypoxia . The energetic balance in cell metabolism has to be controlled by different mechanisms. For example, AK3L1 is known to be modulated by hypoxia and catalyzes the interconversion of adenine nucleotides playing an important role in cellular energy homeostasis in mitochondria . DDIT4, induced by hypoxic stimulus, is an inhibitor of the mTOR signalling pathway, that results in inhibition of protein synthesis which, in turn, may affect the cellular tolerance to hypoxia by promoting energy homeostasis . VEGF, a direct target of HIF-1, is secreted by a large variety of different hypoxic cells and promotes angiogenesis thereby favouring tumor growth and metastasis . BNIP3 and E2IG5 are two genes promoting hypoxia-induced apoptosis observed mainly at very low oxygen concentrations . E2IG5 is localized to mitochondria and facilitates apoptotic cell death via permeability transition, cytochrome c release, and caspase 9 activation . Hypoxia is also known to increase histone H3 methylation through histone methyltransferase G9a . WDR5B encodes for a protein that is the core component of histone methylation complexes, which are essential for histone H3 methylation. Thus, hypoxia might regulate chromatin organization and gene transcription by modulating WDR5B. Finally, the GenBank entry W57613 is part of the signature it is associated with a transcribed hypothetical protein FLJ11267 and was not previously known to be induced by hypoxia.
Hypoxia gene signatures overlapping
The classification performance of our signature is evident in representations that indicate as first approximation the up-regulation of the signature in hypoxic condition. However, the multidimensional visualization is needed to fully appreciate the strong discriminative power of our signature because it takes into account its multivariate nature. In fact, when projecting over the individual probesets of the signature, the two classes are only approximately separated, since they appear either partially overlapping or very close. Indeed, since the l1-l2 regularization is a multivariate method, there is no need to expect a single probeset to have perfect discriminatory power on the classes, but one has to take into account the 11-dimensional model. While the normoxic cell lines are highly grouped and close to low expression values, the hypoxic lines are well spread over the multidimensional space, though well separated by the normoxic ones. Again, this behavior can be detected only by means of a multivariate analysis, since the analysis of individually regulated genes allows detecting only those probesets which multidimensional representation would see the hypoxic cell lines very well lumped together.
The advances in genome biology provide a growing and impressive amount of data. The challenge is to unmask specific, biologically relevant gene clusters that may be hidden by the disturbance of changes in an overwhelming number of unrelated genes. Our study demonstrated that the l1-l2 regularization framework is able to discriminate between two statuses of a cell that, albeit biologically very different, does not elicit a modulation of gene expression comparable in magnitude to that induced, for example, by genetic alterations. This scenario mimics the situation occurring in the tumor mass in which the signal will be perceived by cell differing in their genetic makeup, differentiation and progression in the cell cycle. The strategy described here can be readily applied to the detection of the response to other environmental signals such as small metabolites or pH changes to allow the creation of a database of tissue environment related variables that will ultimately be a great asset in unraveling the biology of the tumor and the possibly the description of better prognostic signatures.
Cells and culture conditions
The human neuroblastoma cell lines GI-LI-N, ACN, GI-ME-N, IMR-32, LAN-1, SK-N-BE(2)C, SK-N-F1, and SK-N-SH were purchased from the Interlab Cell Line Collection and SHEP-2 was kindly provided by Dr. Schwab (Division of Tumour Genetics, German Cancer Research Centre, Heidelberg, Germany). The cell lines were cultured in RPMI 16140 (Euroclone Ltd., Celbio, Milan, Italy), supplemented with 10% heat-inactivated fetal bovine serum (Sigma, Milan Italy), 2 mmol/L L-glutamine, 10 mM Hepes, 100 units/mL penicillin, and 100 μg/mL streptomycin (Euroclone Ltd), at 37°C in a humidified incubator containing 20% O2, 5% CO2, and 75% N2. Hypoxic conditions (1% O2) were achieved by culturing the cells in an anaerobic workstation incubator (BUG BOX, Jouan, ALC International S.r.l., Cologno Monzese, Milano, Italy) flushed with a gas mixture containing 1% O2, 5% CO2, and balanced N2 at 37°C in a humidified atmosphere. Oxygen tension in the medium was measured with a portable, trace oxygen analyzer (Oxi 315i/set, WTW; VWR International, Milano, Italy).
Western blot analysis was done as detailed in . Briefly, total cell lysates (100 μg) were electrophoresed on a 8% SDS-PAGE and electroblotted to Immobilon-P nitrocellulose membranes (Millipore, Billerica, MA). Immunoblotting was done with anti-HIF-1α mouse monoclonal antibody (BD Biosciences, San Jose, CA). An anti-β-actin mAb (Sigma) was used as an internal control for loading. Detection was carried out by enhanced chemiluminescence (Pierce, Rockford, IL) with peroxidase-conjugated goat anti-mouse or anti-rabbit antibodies (Sigma). Quantitative assessment of band intensities was carried out with the VersaDoc Image Analyzer (Bio-Rad, Hercules, CA).
RNA extraction and northern blotting
Total RNA was extracted from cell lines using Trizol (Invitrogen Life technologies, Irvine, CA) according to the manufacturer's instructions. RNA was resuspended in diethyl pyrocarbonate-treated H2O (DEPC water), the physical quality control of RNA integrity was carried out by electrophoresis using Agilent Bioanalyzer 2100 (Agilent Technologies Waldbronn, Germany) and quantified by NanoDrop (NanoDrop Technologies Wilmington, Delawere USA). 2 μg of total RNA from each sample were electrophoresed under denaturing conditions on a 1.2% agarose gel containing 2.2 mol/L formaldehyde and transferred to Nytran membranes. A RNA marker was run in parallel as a molecular-sized standard. Filter hybridization was done with 2 × 106 cpm/mL of 5'-[α32P]dCTP-labeled human MYCN cDNA, in Hybrisol I hybridization solution (Oncor, Gaithersburg, MD) as described previously . Blots were autoradiographed with Kodak XAR-5 film (Eastman Kodak, Rochester, NY), and quantitative assessment of the band intensities was carried out with the VersaDoc Image Analyzer (Bio-Rad Laboratories, Hercules, CA).
Total RNA from neuroblastoma cell lines in normoxic and hypoxic conditions was reverse transcribed into cDNA and biotin labeled according to the Affymetrix instructions (Affymetrix, SantaClara, CA). Biotin-labeled cRNA was cleaned up with the Qiagen RNeasy Mini kit and ethanol precipitation, checked for quality with Agilent Bioanalyzer 2100, and fragmented by incubation at 94°C for 35 min in 40 mmol/L Tris-acetate (pH 8.1), 100 mmol/L potassium acetate, and 30 mmol/L magnesium acetate. Fragmented cRNA was used for hybridization to Affymetrix HG-U133 Plus 2.0 arrays. GeneChips were scanned using an Affymetrix GeneChip Scanner 3000. All microarrays were examined for surface defects, grid placement, background intensity, housekeeping gene expression, and a 3':5' ratio of probe sets from genes of various lengths. Gene expressions were then extracted from CEL files and normalized using the Robust Multichip Average (RMA) method  by running a R script using the Bioconductor  package affy .
The complete data set for each microarray experiments (accession number GSE15583) was uploaded in the Gene Expression Omnibus public repository at National Center for Biotechnology Information.
We adopted three clustering methods, k-means, hierarchical clustering and spectral clustering . K-means and hierarchical clustering require defining a similarity measure (or a distance) between points and a corresponding distance between a point and a cluster. K-means procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assuming k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different results. Hierarchical clustering proceeds by agglomerating into clusters points that are similar to each other or similar to a previously found group of points. The algorithm stops when all the points and the clusters collapse in a single cluster. The user is asked to decide when to stop the procedure hence defining the number of clusters to be found. The results of the clustering procedure can be visualized using the so called dendrogram. We used correlation distance as a similarity measure among points and complete linkage as clusters distance. Spectral clustering is based on the idea of recursively dividing the data into homogenous clusters. As the number of data increases the obtained clusters converge to an ideal/optimal clustering. The algorithm requires defining a similarity among the examples in the data set. Such a similarity function is used to build the so called Graph Laplacian. If we denote with W the n × n similarity matrix among the examples in the data set and D the diagonal matrix whose entries are the sum of the rows in W, then the Graph Laplacian is defined as L = I-W/D. This latter is a n × n matrix whose eigenvectors have special meaning. The second eigenvector in fact allows partitioning the data in two disjoint sets. Each entry of the vector is associated to an example. Examples corresponding to positive entries are assigned to a cluster and examples corresponding to negative entries are assigned to another cluster. More clusters are defined looking at the following eigenvectors (multiway spectral clustering) or recursively applying the procedure on each cluster separately. The eigenvectors of the Graph Laplacian has a further property. Similarly to principal components analysis (PCA) they can be used to perform dimensionality reduction. The corresponding procedure is known as Laplacian-eigenmaps. Differently to PCA where the data are projected on the directions of maximal variance, in Laplacian-eigenmaps the first eigenvectors entail the direction preserving the distance among the examples. This last property makes Laplacian-eigenmap an ideal tool for data visualization. In spectral clustering, for each sample we evaluate the average of the distances with its 5 nearest neighbors and select σ as the average over all the considered samples
Supervised methods for gene selection - l1-l2 regularization
where the least square error is penalized with the l1 and l2 norm of the coefficient vector. The least square term ensures fitting of the data whereas adding the two penalties allows to avoid over-fitting. The relative weight of the two terms is controlled by the parameter ε. The role of the two penalties is different, the l1 term (sum of absolute values) enforces the solution to be sparse, the l2 term (sum of the squares) preserves correlation among the genes. This approach guarantees consistency of the estimator  and enforces the sparsity of the solution by the l1term, while preserving correlation among input variables with the l2 term. Differently to  we follow the approach proposed in , where the solution βl 1l 2, computed through the simple iterative soft-thresholding, is followed by a second optimization, namely regularized least squares (RLS), to estimate the classifier on the selected features. The parameter ε in the l1-l2 regularization is fixed a priori and governs the amount of correlation. By tuning ε we obtain a one-parameter family of solutions which are all equivalent in terms of prediction accuracy, but differ on the degree of correlation among the selected features. The training for selection and classification requires the choice of the regularization parameters for both l1-l2 regularization and RLS denoted with λ* and τ*, respectively. Hence, statistical significance and model selection is performed within double selection bias free cross validation loops (see  for details). In order to assess a common list of probesets, it is necessary to choose an appropriate criterion . We based ours on the frequency, i.e. we decided to promote as relevant variables the most stable probesets across the lists. The complete validation framework comprising the l1-l2 regularization is implemented in MATLAB code (available at http://slipguru.disi.unige.it)
Univariate analysis via hypotheses test
We test the hypothesis of equal distribution of the probesets in the two different statuses by means of t-statistic. We correct for multiple hypothesis testing with Benjamini and Hochberg method for controlling the False Discovery Rate .
To assess the generalization properties of the signature, we used the publicly available gene expression profile datasets of immature dendritic cells (GEO accession number: GSE6863) and astrocytes (GSE3045) cultured under normoxic and hypoxic conditions. Both datasets consist of 6 samples (3 hypoxic and 3 normoxic cell lines) and are measured on the Affymetrix HG-U133 Plus 2.0 GeneChip. The dendritic cells dataset was normalized with the RMA method, similarly to the neuroblastoma cell lines. We could not repeat the same procedure for the astrocytes data, since the .CEL files were not available. We therefore had to use the previously normalized intensities published on the Gene Expression Omnibus. As a measure of relevance of our signature we used its prediction accuracy on out-of-sample data. For each dataset, we restricted the expression matrix to a submatrix of the 11 variables of the inferred signature and we estimated the generalization error of a Regularized Least Squares classifier in a leave-one-out cross-validation loop.
Gene Set Enrichment Analysis (GSEA)  was used to determine if the members of our hypoxia gene signature were generally associated with hypoxic status, and was therefore performed on all probesets on the HG-U133 Plus 2.0 GeneChip. A normalized enrichment score (NES) was calculated for the gene set and the statistical significance of the NES was estimated by an empirical permutation test using 1.000 gene permutations to obtain the nominal p-value and a false discovery rate.
The study was supported by the Fondazione Italiana per la Lotta al Neuroblastoma and partially by the FIRB project RBIN04PARL and by the EU Integrated Project Health-e-Child IST-2004-027749. PF is recipient of a Fondazione Italiana per la Lotta al Neuroblastoma fellowship. The authors would like to thank Nicola Rebagliati for useful discussions and Ms Sara Barzaghi for the editorial assistance.
- Semenza G: HIF-1 and tumor progression: pathophysiology and therapeutics. Trends in molecular medicine. 2002, 8: S62-S67. 10.1016/S1471-4914(02)02317-1.View ArticlePubMedGoogle Scholar
- Semenza G: Targeting HIF-1 for cancer therapy. Nat Rev Cancer. 2003, 3: 721-732. 10.1038/nrc1187.View ArticlePubMedGoogle Scholar
- Carmeliet P, Dor Y, Herbert JM, Fukumura D, Brusselmans K, Dewerchin M, Neeman M, Bono F, Abramovitch R, Maxwell P, et al: Role of HIF-1alpha in hypoxia-mediated apoptosis, cell proliferation and tumour angiogenesis. Nature. 1998, 394: 485-490. 10.1038/28867.View ArticlePubMedGoogle Scholar
- Harris AL: Hypoxia--a key regulatory factor in tumour growth. Nat Rev Cancer. 2002, 2: 38-47. 10.1038/nrc704.View ArticlePubMedGoogle Scholar
- Bosco MC, Puppo M, Santangelo C, Anfosso L, Pfeffer U, Fardin P, Battaglia F, Varesio L: Hypoxia modifies the transcriptome of primary human monocytes: modulation of novel immune-related genes and identification of CC-chemokine ligand 20 as a new hypoxia-inducible gene. J Immunol. 2006, 177: 1941-1955.View ArticlePubMedGoogle Scholar
- Ricciardi A, Elia AR, Cappello P, Puppo M, Vanni C, Fardin P, Eva A, Munroe D, Wu X, Giovarelli M, et al: Transcriptome of hypoxic immature dendritic cells: modulation of chemokine/receptor expression. Mol Cancer Res. 2008, 6: 175-185. 10.1158/1541-7786.MCR-07-0391.View ArticlePubMedGoogle Scholar
- Carta L, Pastorino S, Melillo G, Bosco M, Massazza S, Varesio L: Engineering of macrophages to produce IFN-gamma in response to hypoxia. J Immunol. 2001, 166: 5374-5380.View ArticlePubMedGoogle Scholar
- Talks KL, Turley H, Gatter KC, Maxwell PH, Pugh CW, Ratcliffe PJ, Harris AL: The expression and distribution of the hypoxia-inducible factors HIF-1alpha and HIF-2alpha in normal human tissues, cancers, and tumor-associated macrophages. Am J Pathol. 2000, 157: 411-421.PubMed CentralView ArticlePubMedGoogle Scholar
- Melillo G, Musso T, Sica A, Taylor L, Cox G, Varesio L: A hypoxia-responsive element mediates a novel pathway of activation of the inducible nitric oxide synthase promoter. J Exp Med. 1995, 182: 1683-1693. 10.1084/jem.182.6.1683.View ArticlePubMedGoogle Scholar
- Melillo G, Sausville E, Cloud K, Lahusen T, Varesio L, Senderowicz A: Flavopiridol, a protein kinase inhibitor, down-regulates hypoxic induction of vascular endothelial growth factor expression in human monocytes. Cancer Res. 1999, 59: 5433-5437.PubMedGoogle Scholar
- Vengellur A, Woods BG, Ryan HE, Johnson RS, LaPres JJ: Gene expression profiling of the hypoxia signaling pathway in hypoxia-inducible factor 1alpha null mouse embryonic fibroblasts. Gene Expr. 2003, 11: 181-197. 10.3727/000000003108749062.View ArticlePubMedGoogle Scholar
- Denko NC, Fontana LA, Hudson KM, Sutphin PD, Raychaudhuri S, Altman R, Giaccia AJ: Investigating hypoxic tumor physiology through gene expression patterns. Oncogene. 2003, 22: 5907-5914. 10.1038/sj.onc.1206703.View ArticlePubMedGoogle Scholar
- Chi JT, Wang Z, Nuyten DS, Rodriguez EH, Schaner ME, Salim A, Wang Y, Kristensen GB, Helland A, Borresen-Dale AL, et al: Gene expression programs in response to hypoxia: cell type specificity and prognostic significance in human cancers. PLoS Med. 2006, 3: e47-10.1371/journal.pmed.0030047.PubMed CentralView ArticlePubMedGoogle Scholar
- De Preter K, Vandesompele J, Heimann P, Yigit N, Beckman S, Schramm A, Eggert A, Stallings R, Benoit Y, Renard M, et al: Human fetal neuroblast and neuroblastoma transcriptome analysis confirms neuroblast origin and highlights neuroblastoma candidate genes. Genome Biology. 2006, 7: R84-10.1186/gb-2006-7-9-r84.PubMed CentralView ArticlePubMedGoogle Scholar
- Thiele CJ: Neuroblastoma. Human Cell Culture. Edited by: Master JRW, Palsson B. 1999, London: Kluwer Academic, 21-22.Google Scholar
- Maris J, Hogarty M, Bagatell R, Cohn S: Neuroblastoma. Lancet. 2007, 369: 2106-2120. 10.1016/S0140-6736(07)60983-0.View ArticlePubMedGoogle Scholar
- Weinstein J, Katzenstein H, Cohn S: Advances in the diagnosis and treatment of neuroblastoma. Oncologist. 2003, 8: 278-292. 10.1634/theoncologist.8-3-278.View ArticlePubMedGoogle Scholar
- Jogi A, Ora I, Nilsson H, Lindeheim A, Makino Y, Poellinger L, Axelson H, Pahlman S: Hypoxia alters gene expression in human neuroblastoma cells toward an immature and neural crest-like phenotype. PNAS. 2002, 99: 7021-7026. 10.1073/pnas.102660199.PubMed CentralView ArticlePubMedGoogle Scholar
- Saeys Y, Inza I, Larranaga P: A review of feature selection techniques in bioinformatics. Bioinformatics. 2007, 23: 2507-2517. 10.1093/bioinformatics/btm344.View ArticlePubMedGoogle Scholar
- Ioannidis JPA, Allison DB, Ball CA, Coulibaly I, Cui XQ, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, et al: Repeatability of published microarray gene expression analyses. Nature Genetics. 2009, 41: 149-155. 10.1038/ng.295.View ArticlePubMedGoogle Scholar
- Ambroise C, McLachlan G: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA. 2002, 99: 6562-6566. 10.1073/pnas.102102699.PubMed CentralView ArticlePubMedGoogle Scholar
- Blum AL, Langley P: Selection of relevant features and examples in machine learning. Artif Intell. 1997, 97: 245-271. 10.1016/S0004-3702(97)00063-5.View ArticleGoogle Scholar
- Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V: Feature selection for svms. Advances in Neural Information Processing Systems (NIPS). 2000, MIT PressGoogle Scholar
- Forman G: An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res. 2003, 3: 1289-1306. 10.1162/153244303322753670.Google Scholar
- Weston J, Elisseeff A, Schoelkopf B, Tipping M: Use of the zero norm with linear models and kernel methods. J Mach Learn Res. 2003, 3: 1439-1461. 10.1162/153244303322753751.Google Scholar
- John GH, Kohavi R, Pfleger P: Irrelevant features and the subset selection problem. Proc.11th International Conference on Machine Learning. 1994, 121-129.Google Scholar
- Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines. Machine Learning. 2002, 46: 389-422. 10.1023/A:1012487302797.View ArticleGoogle Scholar
- Furlanello C, Serafini M, Merler S, Jurman G: Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics. 2003, 4: 54-10.1186/1471-2105-4-54.PubMed CentralView ArticlePubMedGoogle Scholar
- Freund Y, Schapire RE: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. 1997, 55: 119-139. 10.1006/jcss.1997.1504.View ArticleGoogle Scholar
- Hastie T, Tibshirani R, Friedman J: The elements of statistical learning: data mining, inference, and prediction. 2001, New York: Springer-VerlagView ArticleGoogle Scholar
- Breiman L, Stone CJ, Olshen RA, Friedman JH: Classification and Regression Trees. 1984, Belmont, CA: Wadsworth International GroupGoogle Scholar
- Schapire RE, Singer Y: Improved Boosting Using Confidence-rated Predictions. Machine Learning. 1999, 37: 297-336. 10.1023/A:1007614523901.View ArticleGoogle Scholar
- Friedman J, Hastie T, Tibshirani R: Additive logistic regression: A statistical view of boosting. Annals of Statistics. 2000, 28: 337-374. 10.1214/aos/1016218223.View ArticleGoogle Scholar
- Li SZ, Zhang ZQ: FloatBoost learning and statistical face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004, 26: 1112-1123. 10.1109/TPAMI.2004.68.View ArticlePubMedGoogle Scholar
- Zou H, Hastie T: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2005, 67: 301-320. 10.1111/j.1467-9868.2005.00503.x.View ArticleGoogle Scholar
- De Mol C, De Vito E, Rosasco L: Elastic Net Regularization in Learning Theory. Journal of Complexity. 25 (2): 201-230. 10.1016/j.jco.2009.01.002.
- Destrero A, Mosci S, De Mol C, Verri A, Odone F: Feature selection for high dimensional data. Computational Management Science. 2008, 6: 25-40. 10.1007/s10287-008-0070-7.View ArticleGoogle Scholar
- De Mol C, Mosci S, Traskine M, Verri A: A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data. Journal of Computational Biology. 2009, 16: 677-690. 10.1089/cmb.2008.0171.View ArticlePubMedGoogle Scholar
- Boon K, Caron H, van Asperen R, Valentijn L, Hermus M, van Sluis P, Roobeek I, Weis I, Voute P, Schwab M, et al: N-myc enhances the expression of a large set of genes functioning in ribosome biogenesis and protein synthesis. EMBO J. 2001, 20: 1383-1393. 10.1093/emboj/20.6.1383.PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the False Discovery Rate - A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological. 1995, 57: 289-300.Google Scholar
- Barla A, Mosci S, Rosasco L, Verri A: A method for robust variable selection with significance assessment. Proceedings of ESANN; Bruges, Belgium. 2008Google Scholar
- Mense SM, Sengupta A, Zhou M, Lan C, Bentsman G, Volsky DJ, Zhang L: Gene expression profiling reveals the profound upregulation of hypoxia-responsive genes in primary human astrocytes. Physiol Genomics. 2006, 25: 435-449. 10.1152/physiolgenomics.00315.2005.View ArticlePubMedGoogle Scholar
- Fredlund E, Ovenberger M, Borg K, Pahlman S: Transcriptional adaptation of neuroblastoma cells to hypoxia. Biochem Biophys Res Commun. 2008, 366: 1054-1060. 10.1016/j.bbrc.2007.12.074.View ArticlePubMedGoogle Scholar
- Banchereau J, Briere F, Caux C, Davoust J, Lebecque S, Liu YT, Pulendran B, Palucka K: Immunobiology of dendritic cells. Annual Review of Immunology. 2000, 18: 767-811. 10.1146/annurev.immunol.18.1.767.View ArticlePubMedGoogle Scholar
- Jogi A, Vallon-Christersson J, Holmquist L, Axelson H, Borg A, Pahlman S: Human neuroblastoma cells exposed to hypoxia: induction of genes associated with growth, survival, and aggressive behavior. Exp Cell Res. 2004, 295: 469-487. 10.1016/j.yexcr.2004.01.013.View ArticlePubMedGoogle Scholar
- Seagroves TN, Ryan HE, Lu H, Wouters BG, Knapp M, Thibault P, Laderoute K, Johnson RS: Transcription factor HIF-1 is a necessary mediator of the pasteur effect in mammalian cells. Mol Cell Biol. 2001, 21: 3436-3444. 10.1128/MCB.21.10.3436-3444.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- O'Rourke JF, Pugh CW, Bartlett SM, Ratcliffe PJ: Identification of hypoxically inducible mRNAs in HeLa cells using differential-display PCR. Role of hypoxia-inducible factor-1. Eur J Biochem. 1996, 241: 403-410. 10.1111/j.1432-1033.1996.00403.x.View ArticlePubMedGoogle Scholar
- Wouters BG, van den Beucken T, Magagnin MG, Koritzinsky M, Fels D, Koumenis C: Control of the hypoxic response through regulation of mRNA translation. Semin Cell Dev Biol. 2005, 16: 487-501. 10.1016/j.semcdb.2005.03.009.View ArticlePubMedGoogle Scholar
- Wenger R, Stiehl D, Camenisch G: Integration of oxygen signaling at the consensus HRE. Sci STKE. 2005, 2005: re12-10.1126/stke.3062005re12.PubMedGoogle Scholar
- Lee M, Kim J, Suk K, Park J: Identification of the hypoxia-inducible factor 1 alpha-responsive HGTD-P gene as a mediator in the mitochondrial apoptotic pathway. Mol Cell Biol. 2004, 24: 3918-3927. 10.1128/MCB.24.9.3918-3927.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen H, Yan Y, Davidson TL, Shinkai Y, Costa M: Hypoxic stress induces dimethylated histone H3 lysine 9 through histone methyltransferase G9a in mammalian cells. Cancer Res. 2006, 66: 9009-9016. 10.1158/0008-5472.CAN-06-0101.View ArticlePubMedGoogle Scholar
- Kim H, Lee DK, Choi JW, Kim JS, Park SC, Youn HD: Analysis of the effect of aging on the response to hypoxia by cDNA microarray. Mech Ageing Dev. 2003, 124: 941-949. 10.1016/S0047-6374(03)00166-0.View ArticlePubMedGoogle Scholar
- Jiang Y, Zhang W, Kondo K, Klco JM, St Martin TB, Dufault MR, Madden SL, Kaelin WG, Nacht M: Gene expression profiling in a renal cell carcinoma cell line: dissecting VHL and hypoxia-dependent pathways. Mol Cancer Res. 2003, 1: 453-462.PubMedGoogle Scholar
- Manalo DJ, Rowan A, Lavoie T, Natarajan L, Kelly BD, Ye SQ, Garcia JG, Semenza GL: Transcriptional regulation of vascular endothelial cell responses to hypoxia by HIF-1. Blood. 2005, 105: 659-669. 10.1182/blood-2004-07-2958.View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
- Battaglia F, Delfino S, Merello E, Puppo M, Piva R, Varesio L, Bosco MC: Hypoxia transcriptionally induces macrophage-inflammatory protein-3 alpha/CCL-20 in primary human mononuclear phagocytes through nuclear factor (NF)-kappa B. Journal of Leukocyte Biology. 2008, 83: 648-662. 10.1189/jlb.0607349.View ArticlePubMedGoogle Scholar
- Rapella A, Negrioli A, Melillo G, Pastorino S, Varesio L, Bosco MC: Flavopiridol inhibits vascular endothelial growth factor production induced by hypoxia or picolinic acid in human neuroblastoma. Int J Cancer. 2002, 99: 658-664. 10.1002/ijc.10392.View ArticlePubMedGoogle Scholar
- Irizarry R, Bolstad B, Collin F, Cope L, Hobbs B, Speed T: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31: e15-10.1093/nar/gng015.PubMed CentralView ArticlePubMedGoogle Scholar
- R Development Core Team: R: A language and environment for statistical. R Foundation for Statistical Computing. 2004Google Scholar
- Gautier L, Cope L, Bolstad BM, Irizarry RA: affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004, 20: 307-315. 10.1093/bioinformatics/btg405.View ArticlePubMedGoogle Scholar
- von Luxburg U: A tutorial on spectral clustering. Statistics and Computing. 2007, 17: 395-416. 10.1007/s11222-007-9033-z.View ArticleGoogle Scholar
- Jurman G, Merler S, Barla A, Paoli S, Galea A, Furlanello C: Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics. 2007, 24: 258-264. 10.1093/bioinformatics/btm550.View ArticlePubMedGoogle Scholar
- Semenza G: Hypoxia-inducible factor 1: oxygen homeostasis and disease pathophysiology. Trends Mol Med. 2001, 7: 345-350. 10.1016/S1471-4914(01)02090-1.View ArticlePubMedGoogle Scholar
- Kim JW, Tchernyshyov I, Semenza GL, Dang CV: HIF-1-mediated expression of pyruvate dehydrogenase kinase: a metabolic switch required for cellular adaptation to hypoxia. Cell Metab. 2006, 3: 177-185. 10.1016/j.cmet.2006.02.002.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.