The SOX2 response program in glioblastoma multiforme: an integrated ChIP-seq, expression microarray, and microRNA analysis
© Fang et al; licensee BioMed Central Ltd. 2011
Received: 31 August 2010
Accepted: 6 January 2011
Published: 6 January 2011
SOX2 is a key gene implicated in maintaining the stemness of embryonic and adult stem cells. SOX2 appears to re-activate in several human cancers including glioblastoma multiforme (GBM), however, the detailed response program of SOX2 in GBM has not yet been defined.
We show that knockdown of the SOX2 gene in LN229 GBM cells reduces cell proliferation and colony formation. We then comprehensively characterize the SOX2 response program by an integrated analysis using several advanced genomic technologies including ChIP-seq, microarray profiling, and microRNA sequencing. Using ChIP-seq technology, we identified 4883 SOX2 binding regions in the GBM cancer genome. SOX2 binding regions contain the consensus sequence wwTGnwTw that occurred 3931 instances in 2312 SOX2 binding regions. Microarray analysis identified 489 genes whose expression altered in response to SOX2 knockdown. Interesting findings include that SOX2 regulates the expression of SOX family proteins SOX1 and SOX18, and that SOX2 down regulates BEX1 (brain expressed X-linked 1) and BEX2 (brain expressed X-linked 2), two genes with tumor suppressor activity in GBM. Using next generation sequencing, we identified 105 precursor microRNAs (corresponding to 95 mature miRNAs) regulated by SOX2, including down regulation of miR-143, -145, -253-5p and miR-452. We also show that miR-145 and SOX2 form a double negative feedback loop in GBM cells, potentially creating a bistable system in GBM cells.
We present an integrated dataset of ChIP-seq, expression microarrays and microRNA sequencing representing the SOX2 response program in LN229 GBM cells. The insights gained from our integrated analysis further our understanding of the potential actions of SOX2 in carcinogenesis and serves as a useful resource for the research community.
The SOX (SRY-like HMG box) gene family represents a family of transcriptional factors characterized by the presence of a homologous sequence called the HMG (high mobility group) box. The HMG box is a DNA binding domain that is highly conserved throughout eukaryotic species. So far, twenty SOX genes have been identified in humans and mice and they can be divided into 10 subgroups on the basis of sequence similarity and genomic organization [1, 2]. SOX genes bind to the minor groove in DNA to control diverse developmental processes .
SOX2, one of the key members of the SOX family gene, is highly expressed in embryonic stem cells . Recently, Takahashi et al. showed that SOX2 is a key transcription factor, in conjunction with KLF4, OCT4 and c-Myc, whose over expression can induce pluripotency in both mice and human somatic cells [5, 6]. SOX2 is one of the four factors (OCT4, SOX2, NANOG, and LIN28) that Yu et al. used to reprogram human somatic cells to pluripotent stem cells that exhibit the essential characteristics of embryonic stem (ES) cells . SOX2 is one of the two factors (SOX2 and OCT4) that were sufficient to generate induced pluripotent stem cells from human cord blood cells . Due to its importance in conferring stemness of cells, the target genes for SOX2 in mouse embryonic stem cells were defined using ChIP-seq technology .
SOX2 has also been implicated in several cancers including gastric cancer [10, 11], breast cancer [12, 13], pancreatic cancer , pulmonary non-small cell and neuroendocrine carcinomas . In addition, SOX2 was identified to be a prognostic marker for human esophageal squamous cell carcinoma  and rectal cancer . Schmitz et al. found that SOX2 is over expressed in malignant glioma while displaying minimal expression in normal tissues . More recently, Gangemi et al. showed that silencing of the SOX2 in freshly derived glioblastoma tumor-initiating cells (TICs) stopped proliferation and the resulting cells lost tumorigenicity in immunodeficient mice . Ikushima et al. showed that inhibition of TGF-beta signaling drastically deprived tumorigenicity of glioma-initiating cells (GICs) by promoting their differentiation, and that these effects were attenuated in GICs transduced with SOX2 or SOX4. Taking together, these data suggested that SOX2 is also a key gene in maintaining the stemness of glioma stem cells.
Given that SOX2 is predominantly expressed in embryonic and adult stems cells, including neural progenitor cells, and re-activates in cancers, including malignant gliomas, we hypothesized that the re-activation program of SOX2 may play an important role in the carcinogenesis and maintenance of GBM. Although the SOX2 response program in mouse stem cells was previously defined , the re-activation program in cancers such as GBM has not yet been defined. Using ChIP-seq technology, we conducted a genome-wide target identification for SOX2 binding in GBM cells. We generated mRNA expression profiles using the Applied Biosystems' microarray platform and microRNA expression profiles using next-generation sequencing after knockdown of SOX2 expression in GBM cells. An integrated analysis of these data reveals key response programs that potentially play important roles in GBM.
SOX2 affects colony formation and cell proliferation in GBM
We previously completed massively parallel signature sequencing (MPSS) and identified SOX2 as significantly over expressed in GBM tissues compared to normal brain tissues . We identified two MPSS tags that correspond to different polyadenylated isoform, and both are up-regulated in GBM tissues compared to normal brain tissues . Our data is consistent with the observation that SOX2 is widely expressed in gliomas including glioblastomas but not in normal brains except for in ependymal layers .
Knockdown of the SOX2 gene in LN229 cells significantly reduced the numbers of colonies formed as shown in Figure 1B. In three replicate experiments, the colony numbers for the MOCK-knockdown cells were 53.3 (STDEV = 2.5) while that for the SOX2 knockdown were 24.7 (STDEV = 2.5) (T-test P = 0.00015, 2 tails, type 2). Furthermore, knockdown of SOX2 in LN229 cells reduced the numbers of cells, reaching statistical significance at day four (T-test P < 0.001) and further at day six (T-test P = 1.45E-06) by MTT assays (Figure 1C).
Global identification of SOX2 binding sites in GBM cells by ChIP-seq analysis
In order to understand the genome-wide binding patterns of SOX2, we applied ChIP-seq technology, which is a novel approach for identifying transcription factor binding sites genome-wide [23, 24]. We performed replicate SOX2 ChIP and IgG ChIP. After sequencing analysis, we obtained a total of 1,139,535 and 638,279 sequence tags respectively for SOX2 and IgG that can be mapped uniquely to the human genome allowing two mismatches.
There are 4714 SOX2 binding regions that can be mapped to TSS (transcription start site) of 3420 known genes. We calculated the distance of the SOX2 binding regions to TSS (transcription start sites) and then tabulated the frequency across the distance intervals before TSS and after TSS. Figure 2B shows that the peak of the SOX2 binding regions is around the TSS sites. We found that about 13% SOX2 (605 of 4714) binding regions are mapped within 8 kb of TSS (Figure 2B), and about 25% (1161 binding regions) are mapped > 8 kb 5' distal to the TSS. The rest mapped to > 8 kb downstream of TSS start sites of genes.
GO terms that are enriched in SOX2 binding genes identified by ChIP-seq
P value (Log10)
Marson et al. and Chen et al. recently used ChIP-seq to map binding sites of SOX2 and other key TFs in the mouse ES cells [9, 26]. Morsen et al. identified 4,087 SOX2 binding sites corresponding to 2,884 genes based on the criteria that a binding site is within 50 kb of the TSS or TES (transcript end site). Chen et al. identified 4,526 SOX2 binding regions (from their Supplementary Table 3) that could be assigned to 2,601 genes of the gene using the same criteria. The union of the two lists generated 4,380 genes. Interestingly, the overlapped genes between the two lists is 1105 genes (25.2%) (Additional File 2). The difference could be due to the use of different antibodies, Chen et al. used the SOX2 antibody (sc-17320, Santa Cruz Inc) while Morson et al. used an affinity purified goat polyclonal antibody (AF2018, R&D Systems), or differences in the analysis pipeline and down stream analysis procedures [9, 26].
Using the homologene table for human and mouse from NCBI (http://www.ncbi.nlm.nih.gov/homologene), we compared the SOX2 targets that we identified in LN229 cells with the SOX2 targets that were identified in mouse ES cells . We were able to identify 929 human homologues of 1105 mouse SOX2 binding genes from Chen et al's paper, and then were able to identify 233 unique genes (25%) (Additional File 1) that are common to the SOX2 binding gene in the human GBM cells (Figure. 2C). These suggest that there are common sets of genes regulated by SOX2 in humans and mice, and in ES cells and in cancer cells. However, we identified many SOX2 binding sites that are only present in the glioblastoma cell line, suggesting that SOX2 targets different pathways in the context of cancer cells.
Boyer et al. applied ChIP-chip technology to identify OCT4, SOX2, and NANOG target genes in human embryonic stem cells using a human promoter array . They identified 1,271 of the SOX2 binding promoter regions for known protein-coding genes in human ES cells. In LN229 cells, we found 258 unique genes that overlapped with their data (Additional File 1 and Figure. 2C). Analysis with the Fisher's Exact test (one sided) revealed that the overlap is highly significant (P < 0.001). This suggests that while there is some conservation of the genes regulated by SOX2 in ES cells and GBM cells, there are also differences in SOX2 binding regions between the cells. The difference could be due to several factors. First, there are differences in technologies used. The array designed by Boyer et al. covered the -8 kb to +2 kb region relative to each transcription start site of 18,002 transcription start sites representing 17,917 unique genes. If a SOX2 binding region is outside of the region covered by the printed oligos, or is not on the array, it would be missed by ChIP-chip analysis. However, the ChIP-seq technology is not limited by the probes selected to be printed on a chip, and therefore could identify SOX binding regions further upstream or down stream of genes. Second, the SOX2 response program could be different in different cells (i.e. GBM vs. ES cells). It is possible that different SOX proteins interact selectively with and regulate a unique repertoire of target genes, and the selectivity is dependent on the type of cell in which the protein is expressed.
To see whether they were unique functional classification or over representation for the SOX2 targets in GBM cells versus those in human ES cells, we compared 3162 unique SOX2 targets in GBM cells with 817 unique SOX2 targets in human ES cells using GSEA to identify unique over represented GO terms in each set of targets. We found that the unique SOX2 targets in GBM cells were enriched for ion transport, receptor activities, neuron differentiation neurogenesis, etc. (Additional File 3), while the unique SOX2 targets in human ES cells are enriched for macromolecular complex, ion homeostasis, apoptotic program, ATPase activity, etc. (Additional File 4).
We were interested to see whether the genes related to stemness and/or differentiation are SOX2 targets in GBM cells. Using the molecular signature database (http://www.broadinstitute.org/gsea/msigdb/index.jsp) at the Broad Institute, we found that GO term GO:0045595 (REGULATION OF CELL DIFFERENTIATION) consists of a compiled set of 59 genes related to differentiation. In addition, Ben-Porath et al. curated a gene set with 378 genes over expressed in human embryonic stem cells according to 5 or more out of 20 profiling studies in the Table S1 of their published paper . We found that 17 of 59 genes related to regulation of cell differentiation were SOX2 targets in GBM cells including ACIN1 (apoptotic chromatin condensation inducer 1), BMPR1B (bone morphogenetic protein receptor, type IB), ETS1 (V-ets erythroblastosis virus E26 oncogene homolog 1), SHH (sonic hedgehog homolog, Drosophila), IGFBP3 (insulin-like growth factor binding protein 3) and RUNX1 (Runt-related transcription factor 1) (Additional File 5). In addition, 71 of 378 ES enriched genes were SOX2 targets in GBM cells including CDC20 (cell division cycle 20 homolog of S. cerevisiae), CHEK2 (CHK2 checkpoint homolog of S. pombe), FGF13 (fibroblast growth factor 13), RFC3 [replication factor C (activator 1) 3, 38 kDa] and UTF1 (undifferentiated embryonic cell transcription factor 1) (Additional File 6). However, we did not find OCT4 and NANOG to be SOX2 targets in GBM cells.
Identification of the DNA binding consensus and other known TF binding sites in the SOX2 bound regions
To see whether the human SOX2 binding regions in GBM cells have their own unique and enriched binding motif, we used the MotifSampler program (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/Motif_Sampler.html) to identify binding consensus sequences enriched in the SOX2 binding regions that we identified. We found a consensus sequence wwTGnwTw with a very high log-likelihood score of 13920.71. The output matrix for this consensus sequence is shown in Additional File 7, and there are 3931 instances of this motif in 2312 SOX2 binding regions (Additional File 8). The consensus logo is shown in Figure 2D.
Top known TFs binding sites in the SOX2 binding regions
Number of occurrencies
Percentage of total sites
POU1F1 is the POU class 1 homeobox 1. OCT family TFs also contain POU domains. These suggest that SOX2 and many POU domain proteins may act together to control gene expression. SOX2 and OCT family TFs such as OCT1 (POU2F1, POU class 2 homeobox 1) and OCT3/4 (POU5F1, POU class 5 homeobox 1) are well known to work synergistically in embryonic stem cells [29, 30]. Additionally, we identified novel transcriptional factors in the SOX2 bound regions, including FOX (fork head transcription factor) and HNF (hepatocyte nuclear factor) family proteins. However, its significance remains to be determined. Interestingly, HNF1 (hepatocyte nuclear factor 1) also contains a POU-homeodomain, while HNF3 alpha, which is also named FOXA1, contains a fork head domain .
Microarray analysis reveals that SOX2 knockdown reduces the expression of other SOX family members but up-regulates BEX1 and BEX2
Enriched GO terms in biological processes for SOX2 DEGs
FALSE DISCOVERY RATE
Examples of SOX2 regulated genes and families
ratio SOX2 _KO/MOCK
SRY (sex determining region Y)-box 1
SRY (sex determining region Y)-box 18
SRY (sex determining region Y)-box 2
brain abundant, membrane attached signal protein 1
brain expressed X-linked 2
brain expressed, X-linked 1
G protein-coupled receptor
G protein-coupled receptor 1
G protein-coupled receptor 172B
G protein-coupled receptor 37
Interleukins and their receptors
interleukin 1 receptor-like 1
interleukin 23, alpha subunit p19
interleukin 6 (interferon, beta 2)
interleukin 7 receptor
Solute carrier family
solute carrier family 14 (urea transporter), member 1 (Kidd blood group)
solute carrier family 2 (facilitated glucose transporter), member 3
solute carrier family 22 (organic cation transporter), member 1
solute carrier family 3 (activators of dibasic and neutral amino acid transport), member 2
solute carrier family 30 (zinc transporter), member 1
solute carrier family 7 (cationic amino acid transporter, y+ system), member 1
protocadherin alpha 4
protocadherin beta 11
protocadherin gamma subfamily A, 3
protocadherin gamma subfamily C, 3
The expression of many interesting gene families were up regulated after SOX2 knockdown. For example, we found that brain expressed genes BASP1 (brain abundant, membrane attached signal protein 1), BEX1 (brain expressed X-linked 1) and BEX2 (brain expressed X-linked 2) were up regulated after SOX2 knockdown (Table 4). We also found that knocking down SOX2 also increase the expression of many solute carrier family proteins including SLC2A3, SLC3A2, SLC7A1, SLC14A1, SLC22A1 and SLC30A1. These solute carrier proteins transport many important solutes such as urea, glucose, organic cations, dibasic and neutral amino acids, zinc, and cationic amino acids (Table 4).
Common SOX2 targets of human GBM and ES cells that changed expression after SOX2 knockdown
Mock/SOX2 _KO ratios
activated leukocyte cell adhesion molecule
brain abundant, membrane attached signal protein 1
branched chain aminotransferase 1, cytosolic
collagen, type XII, alpha 1
cystathionase (cystathionine gamma-lyase)
early B-cell factor 3
Kruppel-like factor 5 (intestinal)
one cut domain, family member 1
parathyroid hormone-like hormone
solute carrier family 30 (zinc transporter), member 1
solute carrier family 3 (activators of dibasic and neutral amino acid transport), member 2
SOX2 and miR145 form a double-negative feedback loop in GBM cells
SOX2 regulated miRNAs identified by next generation sequencing
precursor miRNA ID
mature miRNA ID
Xu et al. showed that miR-145 targets SOX2 and down regulates its expression in human embryonic stem cells . To see whether the same is true in GBM cells, we transfected LN229 GBM cell with miR-145 mimics and we found that miR-145 also decreased SOX2 expression in GBM cells (Figure. 3C). As knocking down SOX2 up regulates miR145 in the RT-PCR and next-generation sequencing data (Figure. 3B and Table 6), this suggests that SOX2 itself down regulates miR-145. Taken together, SOX2 down regulates miR-145 and miR145 also down regulates SOX2, suggesting that SOX2 and miR145 form a double-negative feedback loop in GBM cells. We also checked to see whether there are SOX2 binding regions in the proximity of miR145 genomic locus. We found that there are no SOX2 binding regions with significant P values (P < 0.01) in the close proximity of the miR145 locus. The closest one is about 23 kb from the miR145 genomic locus. This may suggest that the SOX2 feedback regulation of miR145 is indirect, not resulting from direct binding of the SOX2 to the miR145 genomic region.
We applied ChIP-seq technology to identify global SOX2 binding regions in GBM cells. To our best knowledge, this is the first global analysis of SOX2's binding regions in cancer cells. SOX2 encodes a member of the SRY-related HMG-box (SOX) family of transcription factors. We investigated SOX2's global binding targets by ChIP-Seq analysis, and found that SOX2 binding regions in GBM cells are enriched for AT nucleotides with a consensus sequence wwTGnwTw [w = A or T; its reverse and complement strand wAwnCAww]. The mouse sox2 consensus motif in the mouse ES cells found by Chen et al. has the sequence 5'-CATTGTT-3' . The similarity lies in that both consensus sequences are AT rich sequences with a core TG di-nucleotide flanked by AT rich sequences. The difference may due to the fact that they derived from different types of cells (ES vs. glioma) and species (human vs. mouse).
The AT rich sequence we identified for SOX2 consensus is consistent with previous in vitro studies showing that the HMG domain of SOX proteins binds to the minor groove of DNA through AT rich sequences with a heptamer motif WWCAAAG (W = A or T) [35, 36]. Therefore we have identified AT rich SOX2 specific binding sequences. Before the development of ChIP-chip or ChIP-Seq technologies, Mertin et al. determined the DNA-binding properties of SOX9 using random oligonucleotide selection assay  and they identified a core sequence of an AT rich sequence AACAAT or wwCAAw (w, A or T) for SOX9 binding. The HMG domain in SOX family proteins forms an L-shaped module composed of three helices that binds to DNA in the minor groove. SOX proteins are categorized into Groups A-G based on their sequence homology . SOX2 belongs to Group A (also named SRY) and SOX9 belongs to group E . The amino acid sequence identity of the HMG domain within the same group is high at >90%, however, the amino acid sequence identity between distant groups decreases to ~60% . A sequence alignment revealed that SOX2 and SOX9 only have about 61% amino acid sequence identity in the HMG domain. The sequence variations may explain the similar AT rich properties yet different consensus in their binding regions for SOX2 and SOX9. Additional functional binding assays including mutagenesis and footprinting analysis will be needed to confirm the binding activities and specificities. Further experimentation is therefore warranted.
It was a surprise to find that about one quarter of genes regulated by SOX2 encompass important GO categories: 196 out of 792 genes (about 25%) were found to have signal transducer activity (GO:0004871), 101 of 410 belong to transmembrane receptor genes (about 25%) (GO: 0004888), and 92 of 365 are kinase genes (about 25%) (GO:0016301). Signal transducer, receptor and kinase genes are important genes that play an essential role in cellular functions and therefore it is not surprising that SOX2 is an essential gene that plays important roles in development and in carcinogenesis.
We found that BEX1 (brain expressed X-linked 1) and BEX2 (brain expressed X-linked 2) were up regulated after SOX2 knockdown (Table 4). We have previously shown that BEX1 and BEX2 are silenced in GBM tumor specimens and exhibited extensive promoter hypermethylation . We demonstrated by in vitro and in a xenograft mouse model that BEX1 or BEX2 possess tumor suppressor activity . Our data suggested that SOX2 might down regulate BEX1 and BEX2 expression, reducing their tumor suppressor activities and thus promoting carcinogenesis. However, we did not find SOX2 binding regions in the BEX1 and BEX2 gene loci, suggesting the down regulation was properly an indirect effect of SOX2 knockdown.
We found that SOX2 also regulates the expression of SOX family protein SOX1 and SOX18 (Table 4). SOX1 plays roles in neural determination and differentiation  and is a neural stem cell marker . Bylund et al. showed that sox1, sox2 and sox3 are the transcription factors that keep neural cells undifferentiated by counteracting the activity of proneural proteins . However, the role of SOX1 in GBM has not yet been studied. SOX18 plays important roles in blood vasculature formation . Young et al. assessed the effects of disrupted SOX18 function on MCF-7 human breast cancer and human umbilical vein endothelial cell (HUVEC) proliferation by measuring BrdU incorporation and by MTT assay, cell migration using Boyden chamber assay, and capillary tube formation in vitro . They showed that over expression of wild-type SOX18 promoted capillary tube formation of HUVECs in vitro, whereas expression of dominant-negative SOX18 impaired tube formation of HUVECs . Therefore, SOX18 is a potential target for antiangiogenic therapy of human cancers. The role of SOX18 in GBM has not been studied. Taking together, SOX2 could act through SOX1 and SOX18, and thus play roles in both maintaining stem cell properties of glioma cells and forming tumor vasculature in gliomas, which are two major obstacles preventing us from treating these tumors effectively.
By microRNA sequencing we determined that levels of 105 precursor microRNAs (corresponding to 95 mature miRNAs) are altered in response to SOX2 knockdown (Table 6 and Additional File 12). We showed that SOX2 could down regulate the expression of miR-143 and miR-145. miR-145 was shown to be down regulated in several cancers such as colon cancers  and prostate cancers , and miR-143 was shown to be down regulated in colon cancers  and bladder cancers . The relationship of miR-143 and miR-145 and GBM has not been studied and is worth future investigation.
We have comprehensively characterized the SOX2 response program by integrated analysis using several advanced technologies including ChIP-seq, microarrays and microRNA sequencing. The datasets of ChIP-seq, microarrays and microRNA sequencing of SOX2 response program, which, to our best knowledge, are the first datasets of SOX2 in cancers, will be useful resources for the research community. Furthermore, the insights we gained from our integrated analysis further our understanding of the roles of SOX2 in carcinogenesis.
Cell culture and functional assays
LN229 cells were obtained from the American Type Culture Collection (Manassas, VA) and maintained in DMEM with 10% fetal bovine serum. Cell proliferation was analyzed using the MTT assay kits (Millipore, Billerica, MA) according to the manufacturer's protocol. For Soft agar colony formation assay, cells were trypsinized and counted. 10,000 cells were seeded in six-well plates. After 2 weeks of growth, colonies with a diameter greater than 4 mm were counted. Experiments were performed in quadruplicates .
Chromatin immunoprecipition (ChIP) - Sequencing
About 3 × 106 LN229 cells were used for chromatin immunoprecipition (ChIP) assay, carried out according to the manufacturer's instructions (Millipore, EZ-Magna ChIP™ A). Antibodies used for ChIP included SOX2 (ab59776, Abcam Inc.) and IgG (sc-2027, Santa Cruz Biotehnology Inc.). For ChIP, SOX2 antibody was tested for its specificity and specific band was found (Figure 1A). ChIP assay was performed using the ChIP kit (Upstate Biotechnology, Lake Placid, NY) according to the manufacturer's protocol. Briefly, cells were cross-linked by adding fresh formaldehyde to cell culture medium to a final concentration of 1%. Fixation was monitored at 37°C for 10 min. The fixed cells were re-suspended in the lysis buffer. Nuclei were collected by centrifugation at 2000 × g, and resuspended in the nuclei lysis buffer. Samples were sonicated on ice to the length of 200-500 base pairs. 5 μg antibody and 50 μl Dynal protein G beads were incubated for 2 hours at 4°C. Sonicated chromatin were incubated with the protein G- antibody complex overnight at 4°C. Precipitated immunocomplex was treated with proteinase K for 2 hours at 65°C, and DNA was purified Qiagen Qiaquick PCR purification kit. ChIP DNA end repairing, adaptor ligation, and amplification were performed as described earlier . Fragments of about 200 bp (without linkers) were isolated from agarose gel and used for sequencing using the Illumina 2 G genetic analyzer. Illumina data analysis pipeline was performed as described . For this manuscript, the same genome build hg18 and its associated annotations were used for all analysis. Sequence reads that map to multiple sites in the human genome were removed. To identify SOX2 binding peaks, we used SISSRs (Site Identification from Short Sequence Reads) (http://www.rajajothi.com/sissrs/)  with default parameters with E-value is set to 10, P value set to 0.001. SOX2 ChIP-seq as positive and IgG control ChIP-seq data as negative input.
To calculate the distance to the TSS start site, annotations from the UCSC (hg18) were used. We also took into consideration the direction of the strand when we calculated the distance to TSS. As the SOX2 binding regions is always recorded on the positive strands, for genes mapped to the positive strand, the distance is the end position of the SOX2 binding region minus the TSS start position; for genes mapped to the negative strand, the distance is the TSS start position minus the SOX2 binding region start position.
Validation of ChIP-seq datasets by ChIP-qPCR
We selected a list of binding peaks for validation using quantitative real-time PCR. The primers were listed in the Additional File 13. Three replicates were run. Real-time PCR was performed using the SYBR® Green (Invitrogen) dye detection method on ABI PRISM 7900 HT Sequence Detection System under default conditions: 95°C for 10 min, and 35 cycles of 95°C for 15 s and 55°C for 1 min. Comparative Ct method was used for quantification of the transcripts.
Gene Ontology Analysis
High-Throughput GoMiner  was used to find statistically over represented Gene Ontology (GO) terms. GO terms of all evidence levels and categories were used for the analysis. The algorithm used by the High-throughput GoMiner , which is the one-sided Fisher exact p value corrected for multiple comparisons, was used to calculate the FDR (false discovery rate). To identify over represented GO terms in the 3162 unique SOX2 targets in GBM cells versus the 817 unique SOX2 targets in human ES cells, we GSEA (http://www.broadinstitute.org/gsea/index.jsp). The parameters used were: 1000 permutations using the C5 gene sets (GO gene sets), the diff_of_classes algorithm as metric for ranking genes, weighted enrichment statistic, minimum gene set size of 3. Other parameters were set as default.
Motif scanning and identification
To identify novel motifs, SOX2 binding regions identified by ChIP-seq were extended to 100 bp 5' and 3' and the sequences were retrieved in FASTA format. The sequences were first subjected to the RepeatMask program (http://www.repeatmasker.org/) to mask all human repeats. We used the MotifSampler to find over-expressed motifs in the SOX2 binding regions with the default parameters. The over represented sequences were used as input for the Weblog program (http://weblogo.berkeley.edu/)  to display the consensus sequence graphically. For a systematic search for all potential transcription binding sites, we used the Motifscanner software (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/Motif_Sampler.html). Human upstream sequences from EPD (The Eukaryotic Promoter Database) (epd_homo_sapiens_499_chromgenes_non_split_3.bg) were downloaded from the motif scanner web site and used as the background model. The human subset of the Transfac professional 7.0 PWM matrices was used. Matched TF matrices with likelihood ratios (LR) of 500 or higher were tabulated and their frequencies calculated.
Small interfering RNA transfection
SOX2 SiRNAs (Ambion Inc) were used for transient knockdown of SOX2. The SiRNA sequences are: s13295, sense sequence AGUGGAAACUUUUGUCGGATT and antisense sequence, UCCGACAAAAGUUUCCACUCG; S13296, sense sequence ACCAGCGCAUGGACAGUUATT and anti-sense sequence UAACUGUCCAUGCGCUGGUTC. LN229 cells were seeded into six well plates, cultured overnight and transfected with SOX2 SiRNAs at a final concentration of 100 nM using TransIT-OT1 (Mirus Bio LLC, Madison WI) according to the manufacturer's instructions. At 72 hours after tansfection, cells were harvested for western blot analysis and for microarray analysis.
The Applied Biosystems' microarray platform was used using the standard array hybridization protocol as we described previously . The ABI arrays contain 31,700 60-mer oligonucleotide probes representing 29,098 individual human genes. Two biological replicates were performed including cell transfection and microarray analysis. GeneSpring program were used to analyze the array data. The raw signal intensities individual probes were combined (averaged) based on Celera's Gene ID (ABI's annotation table), and then imported into the GeneSpring program, and data from individual chip were normalized per chip with 75% percentile and normalized per gene by median. To filter out those lowly expressed genes across all chips, only arrays data with a signal to noise (S/N) ratio of > 3 in one of the arms (SOX2 Knockdown or mock knockdown) were used for analysis. Two biological replicates were performed. The normalized data for the replicates were averaged for each gene. To identify differentially expressed genes, a two fold cutoff value was used.
The human genome build hg18 and its associated annotations were used for all analysis. The SOX2 binding regions identified by ChIP-seq (Table S2) was annotated with the genome annotation of hg18 for their associated genes using the nearest gene within 50 kb of the TSS (transcription start site) or TES (transcription end site). As the published human ChIP-Chip data in human embryonic stem cells  was annotated in 2005 with human genome hg17 when the paper was published, the annotation was outdated. We first converted the SOX2 binding regions coordinates (Table S3 of the Boyer's paper) into the hg18 coordinates using the tool liftover (http://genome.ucsc.edu/cgi-bin/hgLiftOver). We re-annotated the human ES SOX2 binding regions with hg18 annotations using the chromosomal coordinates with the same criteria as we used for the GBM SOX2 ChIP-seq data. There were 1,075 human ES SOX2 binding regions that could be annotated with nearby genes. To identify overlapping genes between SOX2 ChIP-seq and the microarray data, we used the gene symbols from the HUGO gene nomenclature committee (http://www.genenames.org/) to compare the two lists. To compare the human ChIP-seq data with the sox2 targets that were identified in mouse ES cells , we used the homologene table for human and mouse from NCBI (http://www.ncbi.nlm.nih.gov/homologene) to identify the human homologues for the mouse sox2 targets and then used the human homologues to compare with the human SOX2 ChIP-seq data.
RNA isolation and Small RNA sequencing
RNA was isolated from cells using mirVana™ miRNA Isolation Kit (Ambion, Austin, TX) according to the manufacturer's instructions. Sequencing library preparation was carried out according to Illumina mirna sample preparation protocol. Small RNA samples were sequenced using GA2 sequencer (Illumina)
For data analyses, Solexa adapters were first trimmed from raw sequences using custom Perl scripts, and the trimmed sequences were then aligned to known human miRNAs precursors (miRBase release 14) using miRExpress. The -t parameter (alignment identity between query and reference sequences) for miRExpress was set to be 0.9. The expression abundance of corresponding miRNAs were counted by miRExpress and normalized by the counts of trimmed sequences in the library and used for further analysis.
Validation of miRNAs expression by RT-PCR
We selected a list of miRNAs for validation using quantitative real-time PCR. The primers were available upon request. Three replicates were run. Real-time PCR was performed using the SYBR® Green (Invitrogen) dye detection method on ABI PRISM 7900 HT Sequence Detection System under default conditions: 95°C for 30 s, and 40 cycles of 95°C for 15 s and 60°C for 20 s. Comparative Ct method was used for quantification of the transcripts.
Transfection of microRNA
microRNA-145 precursor mimics was obtained from Ribobio company (Guanzhou China). A scrambled precursor with no homology to the human genome was used as controls. LN229 cells were transfected with the precursor mimics by lipofectamine 2000 (Invitrogen).
The array data for SOX2 knockdown and control data from this study have been submitted to Gene Expression Omnibus (GEO) under accession No.GSE23839. The SOX2 ChIP-seq data from this study have been submitted to Gene Expression Omnibus (GEO) under accession No. GSE23795.
sex determining region Y
small hairpin RNA.
This work was supported by grants 2006AA02A303, 2006AA02Z4A2, 2006DFA32950, 2007DFC30360 and 2004CB518707 from the MOST, China to BL. The work was also support by Swedish Medical Foundation (GF). The authors declare no conflicts of interest.
- Bowles J, Schepers G, Koopman P: Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators. Dev Biol. 2000, 227 (2): 239-255. 10.1006/dbio.2000.9883.PubMedView Article
- Schepers GE, Teasdale RD, Koopman P: Twenty pairs of sox: extent, homology, and nomenclature of the mouse and human sox transcription factor gene families. Dev Cell. 2002, 3 (2): 167-170. 10.1016/S1534-5807(02)00223-X.PubMedView Article
- Wegner M: From head to toes: the multiple facets of Sox proteins. Nucleic Acids Res. 1999, 27 (6): 1409-1420. 10.1093/nar/27.6.1409.PubMed CentralPubMedView Article
- Avilion AA, Nicolis SK, Pevny LH, Perez L, Vivian N, Lovell-Badge R: Multipotent cell lineages in early mouse development depend on SOX2 function. Genes Dev. 2003, 17 (1): 126-140. 10.1101/gad.224503.PubMed CentralPubMedView Article
- Takahashi K, Yamanaka S: Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006, 126 (4): 663-676. 10.1016/j.cell.2006.07.024.PubMedView Article
- Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K, Yamanaka S: Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007, 131 (5): 861-872. 10.1016/j.cell.2007.11.019.PubMedView Article
- Yu J, Vodyanik MA, Smuga-Otto K, Antosiewicz-Bourget J, Frane JL, Tian S, Nie J, Jonsdottir GA, Ruotti V, Stewart R, et al: Induced pluripotent stem cell lines derived from human somatic cells. Science. 2007, 318 (5858): 1917-1920. 10.1126/science.1151526.PubMedView Article
- Giorgetti A, Montserrat N, Rodriguez-Piza I, Azqueta C, Veiga A, Izpisua Belmonte JC: Generation of induced pluripotent stem cells from human cord blood cells with only two factors: Oct4 and Sox2. Nat Protoc. 5 (4): 811-820. 10.1038/nprot.2010.16.
- Marson A, Levine SS, Cole MF, Frampton GM, Brambrink T, Johnstone S, Guenther MG, Johnston WK, Wernig M, Newman J, et al: Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell. 2008, 134 (3): 521-533. 10.1016/j.cell.2008.07.020.PubMed CentralPubMedView Article
- Li XL, Eishi Y, Bai YQ, Sakai H, Akiyama Y, Tani M, Takizawa T, Koike M, Yuasa Y: Expression of the SRY-related HMG box protein SOX2 in human gastric carcinoma. Int J Oncol. 2004, 24 (2): 257-263.PubMed
- Park ET, Gum JR, Kakar S, Kwon SW, Deng G, Kim YS: Aberrant expression of SOX2 upregulates MUC5AC gastric foveolar mucin in mucinous cancers of the colorectum and related lesions. Int J Cancer. 2008, 122 (6): 1253-1260. 10.1002/ijc.23225.PubMedView Article
- Rodriguez-Pinilla SM, Sarrio D, Moreno-Bueno G, Rodriguez-Gil Y, Martinez MA, Hernandez L, Hardisson D, Reis-Filho JS, Palacios J: Sox2: a possible driver of the basal-like phenotype in sporadic breast cancer. Mod Pathol. 2007, 20 (4): 474-481. 10.1038/modpathol.3800760.PubMedView Article
- Chen Y, Shi L, Zhang L, Li R, Liang J, Yu W, Sun L, Yang X, Wang Y, Zhang Y, et al: The molecular mechanism governing the oncogenic potential of SOX2 in breast cancer. J Biol Chem. 2008, 283 (26): 17969-17978. 10.1074/jbc.M802917200.PubMedView Article
- Sanada Y, Yoshida K, Ohara M, Oeda M, Konishi K, Tsutani Y: Histopathologic evaluation of stepwise progression of pancreatic carcinoma with immunohistochemical analysis of gastric epithelial transcription factor SOX2: comparison of expression patterns between invasive components and cancerous or nonneoplastic intraductal components. Pancreas. 2006, 32 (2): 164-170. 10.1097/01.mpa.0000202947.80117.a0.PubMedView Article
- Sholl LM, Long KB, Hornick JL: Sox2 Expression in Pulmonary Non-small Cell and Neuroendocrine Carcinomas. Appl Immunohistochem Mol Morphol. 2009
- Wang Q, He W, Lu C, Wang Z, Wang J, Giercksky KE, Nesland JM, Suo Z: Oct3/4 and Sox2 are significantly associated with an unfavorable clinical outcome in human esophageal squamous cell carcinoma. Anticancer Res. 2009, 29 (4): 1233-1241.PubMed
- Saigusa S, Tanaka K, Toiyama Y, Yokoe T, Okugawa Y, Ioue Y, Miki C, Kusunoki M: Correlation of CD133, OCT4, and SOX2 in Rectal Cancer and Their Association with Distant Recurrence After Chemoradiotherapy. Ann Surg Oncol. 2009
- Schmitz M, Temme A, Senner V, Ebner R, Schwind S, Stevanovic S, Wehner R, Schackert G, Schackert HK, Fussel M, et al: Identification of SOX2 as a novel glioma-associated antigen and potential target for T cell-based immunotherapy. Br J Cancer. 2007, 96 (8): 1293-1301. 10.1038/sj.bjc.6603696.PubMed CentralPubMedView Article
- Gangemi RM, Griffero F, Marubbi D, Perera M, Capra MC, Malatesta P, Ravetti GL, Zona GL, Daga A, Corte G: SOX2 silencing in glioblastoma tumor-initiating cells causes stop of proliferation and loss of tumorigenicity. Stem Cells. 2009, 27 (1): 40-48. 10.1634/stemcells.2008-0493.PubMedView Article
- Ikushima H, Todo T, Ino Y, Takahashi M, Miyazawa K, Miyazono K: Autocrine TGF-beta signaling maintains tumorigenicity of glioma-initiating cells through Sry-related HMG-box factors. Cell Stem Cell. 2009, 5 (5): 504-514. 10.1016/j.stem.2009.08.018.PubMedView Article
- Lin B, Madan A, Yoon JG, Fang X, Yan X, Kim TK, Hwang D, Hood L, Foltz G: Massively parallel signature sequencing and bioinformatics analysis identifies up-regulation of TGFBI and SOX4 in human glioblastoma. PLoS One. 2010, 5 (4): e10210-10.1371/journal.pone.0010210.PubMed CentralPubMedView Article
- Phi JH, Park SH, Kim SK, Paek SH, Kim JH, Lee YJ, Cho BK, Park CK, Lee DH, Wang KC: Sox2 expression in brain tumors: a reflection of the neuroglial differentiation pathway. Am J Surg Pathol. 2008, 32 (1): 103-112. 10.1097/PAS.0b013e31812f6ba6.PubMedView Article
- Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K: High-resolution profiling of histone methylations in the human genome. Cell. 2007, 129 (4): 823-837. 10.1016/j.cell.2007.05.009.PubMedView Article
- Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007, 316 (5830): 1497-1502. 10.1126/science.1141319.PubMedView Article
- Jothi R, Cuddapah S, Barski A, Cui K, Zhao K: Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 2008, 36 (16): 5221-5231. 10.1093/nar/gkn488.PubMed CentralPubMedView Article
- Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, et al: Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008, 133 (6): 1106-1117. 10.1016/j.cell.2008.04.043.PubMedView Article
- Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, et al: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122 (6): 947-956. 10.1016/j.cell.2005.08.020.PubMed CentralPubMedView Article
- Ben-Porath I, Thomson MW, Carey VJ, Ge R, Bell GW, Regev A, Weinberg RA: An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet. 2008, 40 (5): 499-507. 10.1038/ng.127.PubMed CentralPubMedView Article
- Okumura-Nakanishi S, Saito M, Niwa H, Ishikawa F: Oct-3/4 and Sox2 regulate Oct-3/4 gene in embryonic stem cells. J Biol Chem. 2005, 280 (7): 5307-5317. 10.1074/jbc.M410015200.PubMedView Article
- Williams DC, Cai M, Clore GM: Molecular basis for synergistic transcriptional activation by Oct1 and Sox2 revealed from the solution structure of the 42-kDa Oct1.Sox2.Hoxb1-DNA ternary transcription factor complex. J Biol Chem. 2004, 279 (2): 1449-1457. 10.1074/jbc.M309790200.PubMedView Article
- Costa RH, Kalinichenko VV, Holterman AX, Wang X: Transcription factors in liver development, differentiation, and regeneration. Hepatology. 2003, 38 (6): 1331-1347.PubMedView Article
- Sano K, Tanihara H, Heimark RL, Obata S, Davidson M, St John T, Taketani S, Suzuki S: Protocadherins: a large family of cadherin-related molecules in central nervous system. EMBO J. 1993, 12 (6): 2249-2256.PubMed CentralPubMed
- Morin RD, Zhao Y, Prabhu AL, Dhalla N, McDonald H, Pandoh P, Tam A, Zeng T, Hirst M, Marra M: Preparation and analysis of microRNA libraries using the Illumina massively parallel sequencing technology. Methods Mol Biol. 650: 173-199. full_text.
- Xu N, Papagiannakopoulos T, Pan G, Thomson JA, Kosik KS: MicroRNA-145 regulates OCT4, SOX2, and KLF4 and represses pluripotency in human embryonic stem cells. Cell. 2009, 137 (4): 647-658. 10.1016/j.cell.2009.02.038.PubMedView Article
- Harley VR, Jackson DI, Hextall PJ, Hawkins JR, Berkovitz GD, Sockanathan S, Lovell-Badge R, Goodfellow PN: DNA binding activity of recombinant SRY from normal males and XY females. Science. 1992, 255 (5043): 453-456. 10.1126/science.1734522.PubMedView Article
- van de Wetering M, Clevers H: Sequence-specific interaction of the HMG box proteins TCF-1 and SRY occurs within the minor groove of a Watson-Crick double helix. EMBO J. 1992, 11 (8): 3039-3044.PubMed CentralPubMed
- Mertin S, McDowall SG, Harley VR: The DNA-binding specificity of SOX9 and other SOX proteins. Nucleic Acids Res. 1999, 27 (5): 1359-1364. 10.1093/nar/27.5.1359.PubMed CentralPubMedView Article
- Kamachi Y, Uchikawa M, Kondoh H: Pairing SOX off: with partners in the regulation of embryonic development. Trends Genet. 2000, 16 (4): 182-187. 10.1016/S0168-9525(99)01955-1.PubMedView Article
- Foltz G, Ryu GY, Yoon JG, Nelson T, Fahey J, Frakes A, Lee H, Field L, Zander K, Sibenaller Z, et al: Genome-wide analysis of epigenetic silencing identifies BEX1 and BEX2 as candidate tumor suppressor genes in malignant glioma. Cancer Res. 2006, 66 (13): 6665-6674. 10.1158/0008-5472.CAN-05-4453.PubMedView Article
- Pevny LH, Sockanathan S, Placzek M, Lovell-Badge R: A role for SOX1 in neural determination. Development. 1998, 125 (10): 1967-1978.PubMed
- Alcock J, Sottile V: Dynamic distribution and stem cell characteristics of Sox1-expressing cells in the cerebellar cortex. Cell Res. 2009, 19 (12): 1324-1333. 10.1038/cr.2009.119.PubMedView Article
- Bylund M, Andersson E, Novitch BG, Muhr J: Vertebrate neurogenesis is counteracted by Sox1-3 activity. Nat Neurosci. 2003, 6 (11): 1162-1168. 10.1038/nn1131.PubMedView Article
- Francois M, Caprini A, Hosking B, Orsenigo F, Wilhelm D, Browne C, Paavonen K, Karnezis T, Shayan R, Downes M, et al: Sox18 induces development of the lymphatic vasculature in mice. Nature. 2008, 456 (7222): 643-647. 10.1038/nature07391.PubMedView Article
- Young N, Hahn CN, Poh A, Dong C, Wilhelm D, Olsson J, Muscat GE, Parsons P, Gamble JR, Koopman P: Effect of disrupted SOX18 transcription factor function on tumor growth, vascularization, and endothelial development. J Natl Cancer Inst. 2006, 98 (15): 1060-1067. 10.1093/jnci/djj299.PubMedView Article
- Arndt GM, Dossey L, Cullen LM, Lai A, Druker R, Eisbacher M, Zhang C, Tran N, Fan H, Retzlaff K, et al: Characterization of global microRNA expression reveals oncogenic potential of miR-145 in metastatic colorectal cancer. BMC Cancer. 2009, 9: 374-10.1186/1471-2407-9-374.PubMed CentralPubMedView Article
- Zaman MS, Chen Y, Deng G, Shahryari V, Suh SO, Saini S, Majid S, Liu J, Khatri G, Tanaka Y: The functional significance of microRNA-145 in prostate cancer. Br J Cancer. 103 (2): 256-264. 10.1038/sj.bjc.6605742.
- Akao Y, Nakagawa Y, Naoe T: MicroRNA-143 and -145 in colon cancer. DNA Cell Biol. 2007, 26 (5): 311-320. 10.1089/dna.2006.0550.PubMedView Article
- Lin T, Dong W, Huang J, Pan Q, Fan X, Zhang C, Huang L: MicroRNA-143 as a tumor suppressor for bladder cancer. J Urol. 2009, 181 (3): 1372-1380. 10.1016/j.juro.2008.10.149.PubMedView Article
- Bracken CP, Gregory PA, Kolesnikoff N, Bert AG, Wang J, Shannon MF, Goodall GJ: A double-negative feedback loop between ZEB1-SIP1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res. 2008, 68 (19): 7846-7854. 10.1158/0008-5472.CAN-08-1942.PubMedView Article
- Johnston RJ, Chang S, Etchberger JF, Ortiz CO, Hobert O: MicroRNAs acting in a double-negative feedback loop to control a neuronal cell fate decision. Proc Natl Acad Sci USA. 2005, 102 (35): 12449-12454. 10.1073/pnas.0505530102.PubMed CentralPubMedView Article
- Ferrell JE: Self-perpetuating states in signal transduction: positive feedback, double-negative feedback and bistability. Curr Opin Cell Biol. 2002, 14 (2): 140-148. 10.1016/S0955-0674(02)00314-9.PubMedView Article
- Zeeberg BR, Qin H, Narasimhan S, Sunshine M, Cao H, Kane DW, Reimers M, Stephens RM, Bryant D, Burt SK, et al: High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics. 2005, 6: 168-10.1186/1471-2105-6-168.PubMed CentralPubMedView Article
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMed CentralPubMedView Article
- Wang WC, Lin FM, Chang WC, Lin KY, Huang HD, Lin NS: miRExpress: analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinformatics. 2009, 10: 328-10.1186/1471-2105-10-328.PubMed CentralPubMedView Article
- Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26 (1): 139-140. 10.1093/bioinformatics/btp616.PubMed CentralPubMedView Article
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, et al: TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003, 34 (2): 374-378.PubMed
- Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li J, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. Methods Enzymol. 2006, 411: 134-193. 10.1016/S0076-6879(06)11009-5.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.