The SOX2 response program in glioblastoma multiforme: an integrated ChIP-seq, expression microarray, and microRNA analysis

Background SOX2 is a key gene implicated in maintaining the stemness of embryonic and adult stem cells. SOX2 appears to re-activate in several human cancers including glioblastoma multiforme (GBM), however, the detailed response program of SOX2 in GBM has not yet been defined. Results We show that knockdown of the SOX2 gene in LN229 GBM cells reduces cell proliferation and colony formation. We then comprehensively characterize the SOX2 response program by an integrated analysis using several advanced genomic technologies including ChIP-seq, microarray profiling, and microRNA sequencing. Using ChIP-seq technology, we identified 4883 SOX2 binding regions in the GBM cancer genome. SOX2 binding regions contain the consensus sequence wwTGnwTw that occurred 3931 instances in 2312 SOX2 binding regions. Microarray analysis identified 489 genes whose expression altered in response to SOX2 knockdown. Interesting findings include that SOX2 regulates the expression of SOX family proteins SOX1 and SOX18, and that SOX2 down regulates BEX1 (brain expressed X-linked 1) and BEX2 (brain expressed X-linked 2), two genes with tumor suppressor activity in GBM. Using next generation sequencing, we identified 105 precursor microRNAs (corresponding to 95 mature miRNAs) regulated by SOX2, including down regulation of miR-143, -145, -253-5p and miR-452. We also show that miR-145 and SOX2 form a double negative feedback loop in GBM cells, potentially creating a bistable system in GBM cells. Conclusions We present an integrated dataset of ChIP-seq, expression microarrays and microRNA sequencing representing the SOX2 response program in LN229 GBM cells. The insights gained from our integrated analysis further our understanding of the potential actions of SOX2 in carcinogenesis and serves as a useful resource for the research community.


Background
The SOX (SRY-like HMG box) gene family represents a family of transcriptional factors characterized by the presence of a homologous sequence called the HMG (high mobility group) box. The HMG box is a DNA binding domain that is highly conserved throughout eukaryotic species. So far, twenty SOX genes have been identified in humans and mice and they can be divided into 10 subgroups on the basis of sequence similarity and genomic organization [1,2]. SOX genes bind to the minor groove in DNA to control diverse developmental processes [3].
SOX2, one of the key members of the SOX family gene, is highly expressed in embryonic stem cells [4]. Recently, Takahashi et al. showed that SOX2 is a key transcription factor, in conjunction with KLF4, OCT4 and c-Myc, whose over expression can induce pluripotency in both mice and human somatic cells [5,6]. SOX2 is one of the four factors (OCT4, SOX2, NANOG, and LIN28) that Yu et al. used to reprogram human somatic cells to pluripotent stem cells that exhibit the essential characteristics of embryonic stem (ES) cells [7]. SOX2 is one of the two factors (SOX2 and OCT4) that were sufficient to generate induced pluripotent stem cells from human cord blood cells [8]. Due to its importance in conferring stemness of cells, the target genes for SOX2 in mouse embryonic stem cells were defined using ChIP-seq technology [9].
SOX2 has also been implicated in several cancers including gastric cancer [10,11], breast cancer [12,13], pancreatic cancer [14], pulmonary non-small cell and neuroendocrine carcinomas [15]. In addition, SOX2 was identified to be a prognostic marker for human esophageal squamous cell carcinoma [16] and rectal cancer [17]. Schmitz et al. found that SOX2 is over expressed in malignant glioma while displaying minimal expression in normal tissues [18]. More recently, Gangemi et al. showed that silencing of the SOX2 in freshly derived glioblastoma tumor-initiating cells (TICs) stopped proliferation and the resulting cells lost tumorigenicity in immunodeficient mice [19]. Ikushima et al. showed that inhibition of TGF-beta signaling drastically deprived tumorigenicity of glioma-initiating cells (GICs) by promoting their differentiation, and that these effects were attenuated in GICs transduced with SOX2 or SOX4 [20]. Taking together, these data suggested that SOX2 is also a key gene in maintaining the stemness of glioma stem cells.
Given that SOX2 is predominantly expressed in embryonic and adult stems cells, including neural progenitor cells, and re-activates in cancers, including malignant gliomas, we hypothesized that the re-activation program of SOX2 may play an important role in the carcinogenesis and maintenance of GBM. Although the SOX2 response program in mouse stem cells was previously defined [9], the re-activation program in cancers such as GBM has not yet been defined. Using ChIP-seq technology, we conducted a genome-wide target identification for SOX2 binding in GBM cells. We generated mRNA expression profiles using the Applied Biosystems' microarray platform and microRNA expression profiles using next-generation sequencing after knockdown of SOX2 expression in GBM cells. An integrated analysis of these data reveals key response programs that potentially play important roles in GBM.

SOX2 affects colony formation and cell proliferation in GBM
We previously completed massively parallel signature sequencing (MPSS) and identified SOX2 as significantly over expressed in GBM tissues compared to normal brain tissues [21]. We identified two MPSS tags that correspond to different polyadenylated isoform, and both are up-regulated in GBM tissues compared to normal brain tissues [21]. Our data is consistent with the observation that SOX2 is widely expressed in gliomas including glioblastomas but not in normal brains except for in ependymal layers [22].
To assess the functional consequences of SOX2, we knocked down the SOX2 gene by siRNAs in the GBM cell line LN229 using SOX2 SiRNAs (Ambion Inc.). As shown in Figure 1A, we were able to knockdown SOX2 almost completely using either of the pre-designed SOX2 siRNAs (s13295 and s13296) from Ambion Inc. (Now Applied Biosystems Inc.).
Knockdown of the SOX2 gene in LN229 cells significantly reduced the numbers of colonies formed as shown in Figure 1B. In three replicate experiments, the colony numbers for the MOCK-knockdown cells were 53.3 (STDEV = 2.5) while that for the SOX2 knockdown were 24.7 (STDEV = 2.5) (T-test P = 0.00015, 2 tails, type 2). Furthermore, knockdown of SOX2 in LN229 cells reduced the numbers of cells, reaching statistical significance at day four (T-test P < 0.001) and further at day six (T-test P = 1.45E-06) by MTT assays ( Figure 1C).

Global identification of SOX2 binding sites in GBM cells by ChIP-seq analysis
In order to understand the genome-wide binding patterns of SOX2, we applied ChIP-seq technology, which is a novel approach for identifying transcription factor binding sites genome-wide [23,24]. We performed replicate SOX2 ChIP and IgG ChIP. After sequencing analysis, we obtained a total of 1,139,535 and 638,279 sequence tags respectively for SOX2 and IgG that can be mapped uniquely to the human genome allowing two mismatches.
Using the SISSRs (Site Identification from Short Sequence Reads) ChIP-seq analysis program [25], we identified a total of 4,883 SOX2 binding regions with a P value < 0.01 using IgG control ChIP-seq data as the negative control (Additional File 1). We randomly picked 15 genes for which the promoter regions are enriched for the SOX2 IP, and we were able to confirm all 15 genes to be enriched in the SOX2 IP DNAs compared to the IgG-IP DNAs using real time quantitative PCR (Figure 2A), suggesting that the false positive rate is negligible in our dataset. There are 4714 SOX2 binding regions that can be mapped to TSS (transcription start site) of 3420 known genes. We calculated the distance of the SOX2 binding regions to TSS (transcription start sites) and then tabulated the frequency across the distance intervals before TSS and after TSS. Figure 2B shows that the peak of the SOX2 binding regions is around the TSS sites. We found that about 13% SOX2 (605 of 4714) binding regions are mapped within 8 kb of TSS ( Figure 2B), and about 25% (1161 binding regions) are mapped > 8 kb 5' distal to the TSS. The rest mapped to > 8 kb downstream of TSS start sites of genes.
To understand the function of the SOX2 binding genes, we performed a GO analysis using the GO miner program using evidence level 3 of molecular functions. The enriched GO terms are shown in Table 1. The top enriched GO terms include GO:0005096 GTPase activator activity, GO:0022843 voltage-gated cation channel activity, GO:0008066 glutamate receptor activity, GO:0005070 SH3 SH2 adaptor activity, GO:0005001 transmembrane receptor protein tyrosine phosphatase activity (Table 1). It should be pointed that expression  ChIP-seq analysis of SOX2 in GBM cells. Quantitative real time PCR for the confirmation of ChIP-seq peaks. Relative amount of PCR products from SOX2-ChIP and IgG-ChIP were shown as bar graph with the amount of IgG-ChIP normalized to 1. Standard deviations were also shown for SOX2-IP. (B) Histograms of SOX2 binding sites around annotated TSS (Transcription start sites) Frequencies of SOX2 island binding were calculated every 10 kilobases (Y-axis). Relative distance to TSS is shown in X-axis, Negative and positive values indicate localization 5' or 3' to TSS respectively. (C). A VENN diagram showing the overlaps of SOX2 targets among human GBM cells, human ES cells and mouse ES cells. (D). The consensus sequence wwTGnwTw with log-likelihood score of 13920.71 identified by the MotifSampler program. The over represented sequences were used as input for the Weblog program (http://weblogo.berkeley.edu/) [53] to display the consensus sequence graphically.  Table 3) that could be assigned to 2,601 genes of the gene using the same criteria. The union of the two lists generated 4,380 genes. Interestingly, the overlapped genes between the two lists is 1105 genes (25.2%) (Additional File 2). The difference could be due to the use of different antibodies, Chen et al. used the SOX2 antibody (sc-17320, Santa Cruz Inc) while Morson et al. used an affinity purified goat polyclonal antibody (AF2018, R&D Systems), or differences in the analysis pipeline and down stream analysis procedures [9,26].
Using the homologene table for human and mouse from NCBI (http://www.ncbi.nlm.nih.gov/homologene), we compared the SOX2 targets that we identified in LN229 cells with the SOX2 targets that were identified in mouse ES cells [26]. We were able to identify 929 human homologues of 1105 mouse SOX2 binding genes from Chen et al's paper, and then were able to identify 233 unique genes (25%) (Additional File 1) that are common to the SOX2 binding gene in the human GBM cells ( Figure. 2C). These suggest that there are common sets of genes regulated by SOX2 in humans and mice, and in ES cells and in cancer cells. However, we identified many SOX2 binding sites that are only present in the glioblastoma cell line, suggesting that SOX2 targets different pathways in the context of cancer cells.
Boyer et al. applied ChIP-chip technology to identify OCT4, SOX2, and NANOG target genes in human embryonic stem cells using a human promoter array [27]. They identified 1,271 of the SOX2 binding promoter regions for known protein-coding genes in human ES cells. In LN229 cells, we found 258 unique genes that overlapped with their data (Additional File 1 and Figure. 2C). Analysis with the Fisher's Exact test (one sided) revealed that the overlap is highly significant (P < 0.001). This suggests that while there is some conservation of the genes regulated by SOX2 in ES cells and GBM cells, there are also differences in SOX2 binding regions between the cells. The difference could be due to several factors. First, there are differences in technologies used. The array designed by Boyer et al. covered the -8 kb to +2 kb region relative to each transcription start site of 18,002 transcription start sites representing 17,917 unique genes. If a SOX2 binding region is outside of the region covered by the printed oligos, or is not on the array, it would be missed by ChIP-chip analysis. However, the ChIP-seq technology is not limited by the probes selected to be printed on a chip, and therefore could identify SOX binding regions further upstream or down stream of genes. Second, the SOX2 response program could be different in different cells (i.e. GBM vs. ES cells). It is possible that different SOX proteins interact selectively with and regulate a unique repertoire of target genes, and the selectivity is dependent on the type of cell in which the protein is expressed. To see whether they were unique functional classification or over representation for the SOX2 targets in GBM cells versus those in human ES cells, we compared 3162 unique SOX2 targets in GBM cells with 817 unique SOX2 targets in human ES cells using GSEA to identify unique over represented GO terms in each set of targets. We found that the unique SOX2 targets in GBM cells were enriched for ion transport, receptor activities, neuron differentiation neurogenesis, etc. (Additional File 3), while the unique SOX2 targets in human ES cells are enriched for macromolecular complex, ion homeostasis, apoptotic program, ATPase activity, etc. (Additional File 4).
We were interested to see whether the genes related to stemness and/or differentiation are SOX2 targets in GBM cells. Using the molecular signature database (http:// www.broadinstitute.org/gsea/msigdb/index.jsp) at the Broad Institute, we found that GO term GO:0045595 (REGULATION OF CELL DIFFERENTIATION) consists of a compiled set of 59 genes related to differentiation. In addition, Ben-Porath et al. curated a gene set with 378 genes over expressed in human embryonic stem cells according to 5 or more out of 20 profiling studies in the Table S1 of their published paper [28]. We found that 17 of 59 genes related to regulation of cell differentiation were SOX2 targets in GBM cells including ACIN1 (apoptotic chromatin condensation inducer 1), BMPR1B (bone morphogenetic protein receptor, type IB), ETS1 (V-ets erythroblastosis virus E26 oncogene homolog 1), SHH (sonic hedgehog homolog, Drosophila), IGFBP3 (insulin-like growth factor binding protein 3) and RUNX1 (Runt-related transcription factor 1) (Additional File 5). In addition, 71 of 378 ES enriched genes were SOX2 targets in GBM cells including CDC20 (cell division cycle 20 homolog of S. cerevisiae), CHEK2 (CHK2 checkpoint homolog of S. pombe), FGF13 (fibroblast growth factor 13), RFC3 [replication factor C (activator 1) 3, 38 kDa] and UTF1 (undifferentiated embryonic cell transcription factor 1) (Additional File 6). However, we did not find OCT4 and NANOG to be SOX2 targets in GBM cells.

Identification of the DNA binding consensus and other known TF binding sites in the SOX2 bound regions
To see whether the human SOX2 binding regions in GBM cells have their own unique and enriched binding motif, we used the MotifSampler program (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/Motif_-Sampler.html) to identify binding consensus sequences enriched in the SOX2 binding regions that we identified. We found a consensus sequence wwTGnwTw with a very high log-likelihood score of 13920.71. The output matrix for this consensus sequence is shown in Additional File 7, and there are 3931 instances of this motif in 2312 SOX2 binding regions (Additional File 8). The consensus logo is shown in Figure 2D.
We were curious whether known TFs could bind to the SOX2 binding regions that we identified and act as SOX2 cooperators for the regulation of gene expression. In order to systematically search for potential bindings of other transcription factors, we used the MotifScanner program (http://bioinformatics.psb.ugent.be/webtools/ plantcare/html/Motif_Sampler.html) and scanned all TF motif matrices (PWM databases) using the human transcription factor subset of the Transfac professional 7.0. Matched matrices with likelihood (LR) ratios of 500 or higher were tabulated and frequencies calculated (Additional File 9). Among the top known TF matrices ( Table  2) that were identified as co-occurred more than 10% of the time with SOX2 in the SOX2-binding region are: the OCT family, the FOX family, the HNF family, the GATA family and several other TF IRF1 (interferon regulatory factor 1), POU1F1 (POU class 1 homeobox 1), TEF1 (TEAD1, TEA domain family member 1), AREB6 (ZEB1, zinc finger E-box binding homeobox 1) and GR (glucocorticoid receptor, also named NR3C1, nuclear receptor subfamily 3, group C, member 1).
POU1F1 is the POU class 1 homeobox 1. OCT family TFs also contain POU domains. These suggest that SOX2 and many POU domain proteins may act together to control gene expression. SOX2 and OCT family TFs such as OCT1 (POU2F1, POU class 2 homeobox 1) and OCT3/4 (POU5F1, POU class 5 homeobox 1) are well known to work synergistically in embryonic stem cells [29,30]. Additionally, we identified novel transcriptional factors in the SOX2 bound regions, including FOX (fork head transcription factor) and HNF (hepatocyte nuclear factor) family proteins. However, its significance remains to be determined. Interestingly, HNF1 (hepatocyte nuclear factor 1) also contains a POU-homeodomain, while HNF3 alpha, which is also named FOXA1, contains a fork head domain [31].
Microarray analysis reveals that SOX2 knockdown reduces the expression of other SOX family members but up-regulates BEX1 and BEX2 We performed microarray analysis comparing SOX2 knockdown and MOCK transfected LN229 cells, and we identified a total of 565 probes (489 known genes with annotations) that were changed > 2 fold between SOX2 knockdown and SOX2-MOCK transfected LN229 cells (Additional File 10). Array analysis confirmed that SOX2 expression was indeed decreased after SOX2 knockdown. Additionally, we confirmed the array data by RT-PCR analysis of 13 randomly selected up-regulated genes after SOX2 knockdown (BIRC3, NGFR, IL8, CRISPD2, CEBPA, NFKB2, MAP3K14, NFKBIE, NR1H3, APIP, SOCS2, PDGFRA, KIT, BEX1, BEX2, IL-6) ( Figure. 3A). Gene Ontology analysis of all GO categories revealed that proteins belonging to these cellular locations GO:0005576 extracellular region and GO:0005796 Golgi lumen are enriched, and that proteins involved in the GO:0042035 regulation of cytokine biosynthetic process (Table 3). Indeed, we found that knockdown of SOX2 increased the expression of several cytokines including IL6 and IL8, IL23, IL24 and IL32 (Table 4), and the expression of two interleukin receptors-interleukin 7 receptor and interleukin 1 receptor-like 1. We also found many interesting families of proteins that were regulated by SOX2 by visual inspection of the gene list.
For example, we noticed that two other SOX family protein, SOX1 and SOX18, also exhibited reduced expression by more than 3 fold after SOX2 knockdown (Table 4). We also found that many members of the protocadherins including protocadherin 9, 10, beta 11, and gamma A3 and C3 were reduced after SOX2 knockdown. However, the expression of protocadherin alpha 4 was increased after SOX2 knockdown (Table 4). Protocadherin are a large family of cadherin-related molecules that are highly expressed in the brain and their expression appears to be developmentally regulated [32].
The expression of many interesting gene families were up regulated after SOX2 knockdown. For example, we found that brain expressed genes BASP1 (brain abundant, membrane attached signal protein 1), BEX1 (brain expressed X-linked 1) and BEX2 (brain expressed X-linked 2) were up regulated after SOX2 knockdown (Table 4). We also found that knocking down SOX2 also increase the expression of many solute carrier family proteins including SLC2A3, SLC3A2, SLC7A1, SLC14A1, SLC22A1 and SLC30A1. These solute carrier proteins transport many important solutes such as urea, glucose, organic cations, dibasic and neutral amino acids, zinc, and cationic amino acids ( Table 4).
The gene regulated by SOX2 could be directly regulated or indirectly regulated. By integrating the array data with the ChIP-seq data, the directly targeted genes of SOX2 can be inferred. We found 88 SOX2regulated genes whose promoters were bound by 127 SOX2 binding regions ( Figure. 3D and Additional File 1). Interesting genes include BCL2 (B-cell CLL/lymphoma 2) and BCL2 interacting protein 3 (BNIP3), four brain and neuron expressed genes BASP1 (brain abundant, membrane attached signal protein 1), NEDD4 (Neural precursor cell expressed, developmentally down-regulated 4), NRG1 (Neuregulin 1) and NEGR1 (Neuronal growth regulator 1), two interleukins IL6 and IL8, two protocadherins PCDH9 and PCDH10, RUNX1 (Runtrelated transcription factor 1, acute myeloid leukemia 1 oncogene), and three solute carrier proteins SLC3A2, SLC7A1 and SLC30A1 (Additional File 11). The VENN diagram also showed that 11 genes were common SOX2 targets for GBM and ES cells and changed in expression after SOX2 knockdown ( Figure. 3D and Table 5). They include EBF3 (early B-cell factor 3), BASP1 (brain abundant, membrane attached signal protein 1), SLC30A1 (solute carrier family 30 (zinc transporter), member 1), and SLC3A2 (solute carrier family 3, member 2). We speculate that these genes may be involved in GBM stem cells. However, further experimentations are necessary to understand the role of these genes in glioma stem cells. We also analyzed the effect of SOX2 on miRNAs using next generation sequencing (Illumina). MicroRNA sequencing is an efficient way to identify known and novel microRNAs that are differentially expressed [33].
After miRNA sequencing and data analysis, we found 105 precursor microRNAs (corresponding to 95 mature miRNAs) that were changed > 2 fold between SOX2 knockdown and SOX2-MOCK transfected LN229 cells (Table 6 and Additional File 12). Six microRNAs including miR-145, -143, -145*, -143*, -253-5p and miR-452, were up regulated after SOX2 knockdown and the rest were down regulated when SOX2 was knocked down. We picked 6 miRNAs to confirm the next generation data using RT-PCR. We confirmed that miR-143 and miR-145 were up regulated after SOX2 knockdown and that miR-146a, -25, -20b and miR-9-1 were down regulated after SOX2 knockdown ( Figure. 3B). Xu et al. showed that miR-145 targets SOX2 and down regulates its expression in human embryonic stem cells [34]. To see whether the same is true in GBM cells, we transfected LN229 GBM cell with miR-145 mimics and we found that miR-145 also decreased SOX2 expression in GBM cells (Figure. 3C). As knocking down SOX2 up regulates miR145 in the RT-PCR and next-generation sequencing data ( Figure. 3B and Table 6), this suggests that SOX2 itself down regulates miR-145. Taken together, SOX2 down regulates miR-145 and miR145 also down regulates SOX2, suggesting that SOX2 and miR145 form a double-negative feedback loop in GBM cells. We also checked to see whether there are SOX2 binding regions in the proximity of miR145 genomic locus. We found that there are no SOX2 binding regions with significant P values (P < 0.01) in the close proximity of the miR145 locus. The closest one is about 23 kb from the miR145 genomic locus. This may suggest that the SOX2 feedback regulation of miR145 is indirect, not resulting from direct binding of the SOX2 to the miR145 genomic region.

Discussion
We applied ChIP-seq technology to identify global SOX2 binding regions in GBM cells. To our best knowledge, this is the first global analysis of SOX2's binding regions in cancer cells. SOX2 encodes a member of the SRYrelated HMG-box (SOX) family of transcription factors. We investigated SOX2's global binding targets by ChIP-Seq analysis, and found that SOX2 binding regions in GBM cells are enriched for AT nucleotides with a consensus sequence wwTGnwTw [w = A or T; its reverse and complement strand wAwnCAww]. The mouse sox2 consensus motif in the mouse ES cells found by Chen et al. has the sequence 5'-CATTGTT-3' [26]. The similarity lies in that both consensus sequences are AT rich sequences with a core TG di-nucleotide flanked by AT rich sequences. The difference may due to the fact that they derived from different types of cells (ES vs. glioma) and species (human vs. mouse).
The AT rich sequence we identified for SOX2 consensus is consistent with previous in vitro studies showing that the HMG domain of SOX proteins binds to the  [37] and they identified a core sequence of an AT rich sequence AACAAT or wwCAAw (w, A or T) for SOX9 binding. The HMG domain in SOX family proteins forms an L-shaped module composed of three helices that binds to DNA in the minor groove. SOX proteins are categorized into Groups A-G based on their sequence homology [38]. SOX2 belongs to Group A (also named SRY) and SOX9 belongs to group E [38]. The amino acid sequence identity of the HMG domain within the same group is high at >90%, however, the amino acid sequence identity between distant groups decreases to~60% [38]. A sequence alignment revealed that SOX2 and SOX9 only have about 61% amino acid sequence identity in the HMG domain. The sequence variations may explain the similar AT rich properties yet different consensus in their binding regions for SOX2 and SOX9. Additional functional binding assays including mutagenesis and footprinting analysis will be needed to confirm the binding activities and specificities. Further experimentation is therefore warranted. It was a surprise to find that about one quarter of genes regulated by SOX2 encompass important GO categories: 196 out of 792 genes (about 25%) were found to have signal transducer activity (GO:0004871), 101 of 410 belong to transmembrane receptor genes (about 25%) (GO: 0004888), and 92 of 365 are kinase genes (about 25%) (GO:0016301). Signal transducer, receptor and kinase genes are important genes that play an essential role in cellular functions and therefore it is not surprising that SOX2 is an essential gene that plays important roles in development and in carcinogenesis.
We found that BEX1 (brain expressed X-linked 1) and BEX2 (brain expressed X-linked 2) were up regulated after SOX2 knockdown (Table 4). We have previously shown that BEX1 and BEX2 are silenced in GBM tumor specimens and exhibited extensive promoter hypermethylation [39]. We demonstrated by in vitro and in a xenograft mouse model that BEX1 or BEX2 possess tumor suppressor activity [39]. Our data suggested that SOX2 might down regulate BEX1 and BEX2 expression, reducing their tumor suppressor activities and thus promoting carcinogenesis. However, we did not find SOX2 binding regions in the BEX1 and BEX2 gene loci, suggesting the down regulation was properly an indirect effect of SOX2 knockdown.
We found that SOX2 also regulates the expression of SOX family protein SOX1 and SOX18 (Table 4). SOX1 plays roles in neural determination and differentiation [40] and is a neural stem cell marker [41]. Bylund et al. showed that sox1, sox2 and sox3 are the transcription factors that keep neural cells undifferentiated by counteracting the activity of proneural proteins [42]. However, the role of SOX1 in GBM has not yet been studied. SOX18 plays important roles in blood vasculature formation [43].
Young et al. assessed the effects of disrupted SOX18 function on MCF-7 human breast cancer and human umbilical vein endothelial cell (HUVEC) proliferation by measuring BrdU incorporation and by MTT assay, cell migration using Boyden chamber assay, and capillary tube formation in vitro [44]. They showed that over expression of wild-type SOX18 promoted capillary tube formation of HUVECs in vitro, whereas expression of dominant-negative SOX18 impaired tube formation of HUVECs [44]. Therefore, SOX18 is a potential target for antiangiogenic therapy of human cancers. The role of SOX18 in GBM has not been studied. Taking together, SOX2 could act through SOX1 and SOX18, and thus play roles in both maintaining stem cell properties of glioma cells and forming tumor vasculature in gliomas, which are two major obstacles preventing us from treating these tumors effectively.
By microRNA sequencing we determined that levels of 105 precursor microRNAs (corresponding to 95 mature miRNAs) are altered in response to SOX2 knockdown (Table 6 and Additional File 12). We showed that SOX2 could down regulate the expression of miR-143 and miR-145. miR-145 was shown to be down regulated in several cancers such as colon cancers [45] and prostate cancers [46], and miR-143 was shown to be down regulated in colon cancers [47] and bladder cancers [48]. The relationship of miR-143 and miR-145 and GBM has not been studied and is worth future investigation.
We further demonstrated that SOX2 and miR-145 form a double-negative regulation loop in GBM cells ( Figure. 4A). Double-negative feedback loop involving microRNAs and their targets have been observed previously [49,50]. A double-negative feedback mechanism has been proposed as a mechanism to form bistability in cellular states. Johnston et al. demonstrated that the stability and irreversibility of the terminal differentiated state of neuronal cells is ensured by a double-negative feedback loop between two microRNAs lsy-6 and mir-273 and their transcription factor targets in the nematode Caenorhabditis elegans [50].   A positive-feedback loop or double-negative feedback loops can convert graded inputs into switch-like, irreversible responses [51]. Such a system will be "bistable" as the system exists almost exclusively in one of two possible states. Bistability has been shown in several signal transduction and transcriptional regulatory events such as the p42 mitogen-activated protein kinase and c-Jun aminoterminal kinase pathways in Xenopus oocytes [51]. For a bistable system with two components A and B, the system will toggle between two stable states: one with A on and B off and one with B on and A off. For example, for the SOX2 and miR-145 bistable system, the system can be on SOX2 on, miR-145 off state or SOX2 off, miR-145 on state ( Figure. 4B). Further experimentation will be necessary to analyze in detail the two cellular states for the SOX2-miR145 double-negative feedback loop in GBM cells.

Conclusion
We have comprehensively characterized the SOX2 response program by integrated analysis using several advanced technologies including ChIP-seq, microarrays and microRNA sequencing. The datasets of ChIP-seq, microarrays and microRNA sequencing of SOX2 response program, which, to our best knowledge, are the first datasets of SOX2 in cancers, will be useful resources for the research community. Furthermore, the insights we gained from our integrated analysis further our understanding of the roles of SOX2 in carcinogenesis.

Methods
Cell culture and functional assays LN229 cells were obtained from the American Type Culture Collection (Manassas, VA) and maintained in DMEM with 10% fetal bovine serum. Cell proliferation was analyzed using the MTT assay kits (Millipore, Billerica, MA) according to the manufacturer's protocol. For Soft agar colony formation assay, cells were trypsinized and counted. 10,000 cells were seeded in six-well plates. After 2 weeks of growth, colonies with a diameter greater than 4 mm were counted. Experiments were performed in quadruplicates [39].

Chromatin immunoprecipition (ChIP) -Sequencing
About 3 × 10 6 LN229 cells were used for chromatin immunoprecipition (ChIP) assay, carried out according to the manufacturer's instructions (Millipore, EZ-Magna ChIP™ A). Antibodies used for ChIP included SOX2 (ab59776, Abcam Inc.) and IgG (sc-2027, Santa Cruz Biotehnology Inc.). For ChIP, SOX2 antibody was tested for its specificity and specific band was found ( Figure  1A). ChIP assay was performed using the ChIP kit (Upstate Biotechnology, Lake Placid, NY) according to the manufacturer's protocol. Briefly, cells were crosslinked by adding fresh formaldehyde to cell culture medium to a final concentration of 1%. Fixation was monitored at 37°C for 10 min. The fixed cells were resuspended in the lysis buffer. Nuclei were collected by centrifugation at 2000 × g, and resuspended in the nuclei lysis buffer. Samples were sonicated on ice to the length of 200-500 base pairs. 5 μg antibody and 50 μl Dynal protein G beads were incubated for 2 hours at 4°C. Sonicated chromatin were incubated with the protein G-antibody complex overnight at 4°C. Precipitated immunocomplex was treated with proteinase K for 2 hours at 65°C, and DNA was purified Qiagen Qiaquick PCR purification kit. ChIP DNA end repairing, adaptor ligation, and amplification were performed as described earlier [23]. Fragments of about 200 bp (without linkers) were isolated from agarose gel and used for sequencing using the Illumina 2 G genetic analyzer. Illumina data analysis pipeline was performed as described [23].  For this manuscript, the same genome build hg18 and its associated annotations were used for all analysis. Sequence reads that map to multiple sites in the human genome were removed. To identify SOX2 binding peaks, we used SISSRs (Site Identification from Short Sequence Reads) (http://www.rajajothi.com/sissrs/) [25] with default parameters with E-value is set to 10, P value set to 0.001. SOX2 ChIP-seq as positive and IgG control ChIP-seq data as negative input.
To calculate the distance to the TSS start site, annotations from the UCSC (hg18) were used. We also took into consideration the direction of the strand when we calculated the distance to TSS. As the SOX2 binding regions is always recorded on the positive strands, for genes mapped to the positive strand, the distance is the end position of the SOX2 binding region minus the TSS start position; for genes mapped to the negative strand, the distance is the TSS start position minus the SOX2 binding region start position.

Validation of ChIP-seq datasets by ChIP-qPCR
We selected a list of binding peaks for validation using quantitative real-time PCR. The primers were listed in the Additional File 13. Three replicates were run. Realtime PCR was performed using the SYBR® Green (Invitrogen) dye detection method on ABI PRISM 7900 HT Sequence Detection System under default conditions: 95°C for 10 min, and 35 cycles of 95°C for 15 s and 55°C for 1 min. Comparative C t method was used for quantification of the transcripts.

Gene Ontology Analysis
High-Throughput GoMiner [52] was used to find statistically over represented Gene Ontology (GO) terms. GO terms of all evidence levels and categories were used for the analysis. The algorithm used by the High-throughput GoMiner [52], which is the one-sided Fisher exact p value corrected for multiple comparisons, was used to calculate the FDR (false discovery rate). To identify over represented GO terms in the 3162 unique SOX2 targets in GBM cells versus the 817 unique SOX2 targets in human ES cells, we GSEA (http://www.broadinstitute. org/gsea/index.jsp). The parameters used were: 1000 permutations using the C5 gene sets (GO gene sets), the diff_of_classes algorithm as metric for ranking genes, weighted enrichment statistic, minimum gene set size of 3. Other parameters were set as default.

Motif scanning and identification
To identify novel motifs, SOX2 binding regions identified by ChIP-seq were extended to 100 bp 5' and 3' and the sequences were retrieved in FASTA format. The sequences were first subjected to the RepeatMask program (http://www.repeatmasker.org/) to mask all human repeats. We used the MotifSampler to find overexpressed motifs in the SOX2 binding regions with the default parameters. The over represented sequences were used as input for the Weblog program (http:// weblogo.berkeley.edu/) [53] to display the consensus sequence graphically. For a systematic search for all potential transcription binding sites, we used the Motifscanner software (http://bioinformatics.psb.ugent.be/ webtools/plantcare/html/Motif_Sampler.html). Human upstream sequences from EPD (The Eukaryotic Promoter Database) (epd_homo_sapiens_499_chromgenes_-non_split_3.bg) were downloaded from the motif scanner web site and used as the background model. The human subset of the Transfac professional 7.0 PWM matrices was used. Matched TF matrices with likelihood ratios (LR) of 500 or higher were tabulated and their frequencies calculated.

Small interfering RNA transfection
SOX2 SiRNAs (Ambion Inc) were used for transient knockdown of SOX2. The SiRNA sequences are: s13295, sense sequence AGUGGAAACUUUUGUCGGATT and antisense sequence, UCCGACAAAAGUUUCCACUCG; S13296, sense sequence ACCAGCGCAUGGACAGU-UATT and anti-sense sequence UAACUGUCCAUGCG-CUGGUTC. LN229 cells were seeded into six well plates, cultured overnight and transfected with SOX2 SiRNAs at a final concentration of 100 nM using Tran-sIT-OT1 (Mirus Bio LLC, Madison WI) according to the manufacturer's instructions. At 72 hours after tansfection, cells were harvested for western blot analysis and for microarray analysis.

Microarray analysis
The Applied Biosystems' microarray platform was used using the standard array hybridization protocol as we described previously [39]. The ABI arrays contain 31,700 60-mer oligonucleotide probes representing 29,098 individual human genes. Two biological replicates were performed including cell transfection and microarray analysis. GeneSpring program were used to analyze the array data. The raw signal intensities individual probes were combined (averaged) based on Celera's Gene ID (ABI's annotation table), and then imported into the GeneSpring program, and data from individual chip were normalized per chip with 75% percentile and normalized per gene by median. To filter out those lowly expressed genes across all chips, only arrays data with a signal to noise (S/N) ratio of > 3 in one of the arms (SOX2 Knockdown or mock knockdown) were used for analysis. Two biological replicates were performed. The normalized data for the replicates were averaged for each gene. To identify differentially expressed genes, a two fold cutoff value was used.

Integrative analysis
The human genome build hg18 and its associated annotations were used for all analysis. The SOX2 binding regions identified by ChIP-seq (Table S2) was annotated with the genome annotation of hg18 for their associated genes using the nearest gene within 50 kb of the TSS (transcription start site) or TES (transcription end site). As the published human ChIP-Chip data in human embryonic stem cells [27] was annotated in 2005 with human genome hg17 when the paper was published, the annotation was outdated. We first converted the SOX2 binding regions coordinates (Table S3 of the Boyer's paper) into the hg18 coordinates using the tool liftover (http://genome.ucsc. edu/cgi-bin/hgLiftOver). We re-annotated the human ES SOX2 binding regions with hg18 annotations using the chromosomal coordinates with the same criteria as we used for the GBM SOX2 ChIP-seq data. There were 1,075 human ES SOX2 binding regions that could be annotated with nearby genes. To identify overlapping genes between SOX2 ChIP-seq and the microarray data, we used the gene symbols from the HUGO gene nomenclature committee (http://www.genenames.org/) to compare the two lists. To compare the human ChIP-seq data with the sox2 targets that were identified in mouse ES cells [26], we used the homologene table for human and mouse from NCBI (http://www.ncbi.nlm.nih.gov/homologene) to identify the human homologues for the mouse sox2 targets and then used the human homologues to compare with the human SOX2 ChIP-seq data.
RNA isolation and Small RNA sequencing RNA was isolated from cells using mirVana™ miRNA Isolation Kit (Ambion, Austin, TX) according to the manufacturer's instructions. Sequencing library preparation was carried out according to Illumina mirna sample preparation protocol. Small RNA samples were sequenced using GA2 sequencer (Illumina) For data analyses, Solexa adapters were first trimmed from raw sequences using custom Perl scripts, and the trimmed sequences were then aligned to known human miRNAs precursors (miRBase release 14) using miRExpress [54]. The -t parameter (alignment identity between query and reference sequences) for miRExpress was set to be 0.9. The expression abundance of corresponding miRNAs were counted by miRExpress and normalized by the counts of trimmed sequences in the library and used for further analysis.
Differentially expressed miRNAs were identified by statistics analysis using edgeR package from bioconductor [55]. The top differentially expressed ones were clustered by MeV [56,57].

Validation of miRNAs expression by RT-PCR
We selected a list of miRNAs for validation using quantitative real-time PCR. The primers were available upon request. Three replicates were run. Real-time PCR was performed using the SYBR® Green (Invitrogen) dye detection method on ABI PRISM 7900 HT Sequence Detection System under default conditions: 95°C for 30 s, and 40 cycles of 95°C for 15 s and 60°C for 20 s. Comparative Ct method was used for quantification of the transcripts.
Transfection of microRNA microRNA-145 precursor mimics was obtained from Ribobio company (Guanzhou China). A scrambled precursor with no homology to the human genome was used as controls. LN229 cells were transfected with the precursor mimics by lipofectamine 2000 (Invitrogen).