Identification of novel targets for breast cancer by exploring gene switches on a genome scale
© Wu et al; licensee BioMed Central Ltd. 2011
Received: 22 August 2011
Accepted: 3 November 2011
Published: 3 November 2011
Skip to main content
© Wu et al; licensee BioMed Central Ltd. 2011
Received: 22 August 2011
Accepted: 3 November 2011
Published: 3 November 2011
An important feature that emerges from analyzing gene regulatory networks is the "switch-like behavior" or "bistability", a dynamic feature of a particular gene to preferentially toggle between two steady-states. The state of gene switches plays pivotal roles in cell fate decision, but identifying switches has been difficult. Therefore a challenge confronting the field is to be able to systematically identify gene switches.
We propose a top-down mining approach to exploring gene switches on a genome-scale level. Theoretical analysis, proof-of-concept examples, and experimental studies demonstrate the ability of our mining approach to identify bistable genes by sampling across a variety of different conditions. Applying the approach to human breast cancer data identified genes that show bimodality within the cancer samples, such as estrogen receptor (ER) and ERBB2, as well as genes that show bimodality between cancer and non-cancer samples, where tumor-associated calcium signal transducer 2 (TACSTD2) is uncovered. We further suggest a likely transcription factor that regulates TACSTD2.
Our mining approach demonstrates that one can capitalize on genome-wide expression profiling to capture dynamic properties of a complex network. To the best of our knowledge, this is the first attempt in applying mining approaches to explore gene switches on a genome-scale, and the identification of TACSTD2 demonstrates that single cell-level bistability can be predicted from microarray data. Experimental confirmation of the computational results suggest TACSTD2 could be a potential biomarker and attractive candidate for drug therapy against both ER+ and ER- subtypes of breast cancer, including the triple negative subtype.
The expression level of a gene switch does not change gradually but rather has two distinct steady-states: HIGH or LOW, ON or OFF, ALL or NONE. The ability of switches to convert a graded signal into a binary response ensures that a cell responds in a decisive manner or unambiguously commit to a specific program . Furthermore switches have been noted for their noise-filtering capacity. Endogenous noise are typically lower for fully repressed or induced expression states than in a gene where the state changes continuously [4, 5].
Bistable behavior of gene switches have been reported to play pivotal roles in many important aspects of cell physiology, including cell fate decisions, cell cycle control, and cellular responses to environmental stimulation [6, 2]. E. coli lac operon is a famous gene switch that uses a hysteretic feedback to decide between glucose and lactose utilization . Many bistable systems have been discovered in bacteria, including the genetic transformation in Bacilius subtillis and sporulation in many bacterial species . In mammalian systems, gene switches and bistability have been postulated as the underlying mechanism for cellular differentiation, but rarely has this been confirmed experimentally, until recently with the work on neutrophil differentiation . Another interesting observation is that cells have "memory", and hysteresis has been shown to govern short-term memory in lymphoid cells, preserving information of past encounters with antigen . Thus, the discovery of gene switches in cellular responses has become a milestone in molecular biology and prompt strong interest in understanding the function and design of gene networks .
Despite the importance of gene switches, identifying multiple steady-states, and in particular switches, has been difficult. Our understanding of gene switches has been mostly based on simulations of generic feedback circuits and well-characterized biological modules [11–14]. Theoretical studies of feedback circuits have elucidated general principles of network dynamics, but they usually lack solid evidence to associate these principles with real physiological processes in cells. Few studies have succeeded in demonstrating functional roles of actual switches in biological systems by coupling detailed kinetic modeling with rigorous experimentation [15, 10]. This is because well-characterized models with equations and kinetic parameters are difficult to obtain for real, complex biological systems, in part because current techniques are not able to quantitatively measure reaction constants at the single-cell resolution for all the network components. Alternatively, researchers in synthetic biology have designed artificial gene networks with specific functions and implemented the interactions by manipulating or bringing together exogenous genetic components [16–18]. Thus current methods of experimentally studying switches have been limited to well-characterized or synthetic small modules.
Switches play a central role in cell decision, and the ability to predict whether switches can occur without a priori detail information of the network would be significant. For instance, the ability to identify which genes are turned on or off in cancer versus normal cells would have a tremendous impact on identifying the most pertinent molecular signatures or targets for drug therapy. Therefore a major challenge confronting the field, which we address in this study, is how to effectively identify gene switches or bistable states by mining high-throughput data.
Previous approaches addressed this question by analyzing the network topology. These studies assume that bistability requires particular feedback structures [19, 3], and discovered dynamic features by searching for these structures (e.g. positive feedbacks) in protein-protein interaction and protein-DNA interaction networks . However, these feedback structures do not ensure switch-like behavior. From modeling and simulations of genetic circuits, positive feedback (or even feedback itself) has been shown to be neither necessary [21, 6] nor sufficient  to ensure switch-like behavior. Furthermore, it is less likely that one can uncover a dynamic property from static networks.
Alternatively, we theorize that the dynamic "behavior" of a switch could be identified by analyzing the gene expression profiles from a wide range of conditions. We propose a top-down mining approach to identify gene switches from microarray gene expression data. Taking advantage of the tremendous amount of expression data, our approach aims to identify bimodality, which we hypothesize is an essential characteristic of a gene switch. We perform theoretical analysis and provide proof-of-concept applications on both synthetic and yeast microarray datasets. We further apply our methodology on an integrated human expression dataset to probe the characteristic signatures of human cancer and confirm that our approach is able to identify a gene with switch-like behavior. To the best of our knowledge, this is the first attempt at applying mining approaches to explore gene switches on a genome-scale.
Since the state of gene switches in the genetic network governs the phenotype , we postulate that recognizing specific gene switches will enable one to identify biomarkers or molecular signatures that would be better drug targets for treating a disease. We demonstrate the utility of our mining approach in human breast cancer by analyzing a paired breast cancer/normal tissue expression dataset against the integrated human gene expression dataset. We uncover two types of potential gene switches in breast cancer, with one type (denoted as Type 1) showing bimodality within the breast cancer and a second type (denoted as Type 2) showing predominantly one modality in breast cancer.
Known therapeutic targets for breast cancer are uncovered under the Type 1 genes, such as estrogen receptor (ER, or ESR) and human epidermal growth factor receptor 2 (ERBB2, or HER2/neu), which are identified as gene switches for this cancer, and their bimodality in the cancer samples represent well-known subtypes in breast cancer. The other type of gene switch shows predominantly one modality in the breast cancer samples, and is where we discover the TACSTD2 (Trop2) gene. The expression of TACSTD2 is turned OFF in most normal samples but ON in almost all of the breast cancer samples independent of the subtypes. We predict through sequence matching of transcription factor (TF) sites that CREB could regulate TACSTD2, thereby implicating a novel transcriptional mechanism by which TACSTD2 is regulated. Our experimental studies on multiple breast cancer cell lines confirm the switch-like behavior of TACSTD2 and provide evidences for the transcriptional regulation of the gene. These results demonstrate the ability of our mining approach to identify gene switches that could be candidate biomarkers and novel therapeutic targets in breast cancer.
A gene switch has two steady states, which will produce a bimodal distribution in its expression profile when sampled across different conditions. Figures 1-A and 1D show the gene network topology of two typical regulatory circuits that exhibit bistable behavior. Figure 1-A is a positive self-feedback transcriptional system under the control of a transcriptional repressor. Figure 1-D is a double-negative feedback system, also known as a toggle switch, which produces mutually exclusive activation of two genes. Both circuits have been synthesized and implemented in cell systems [24, 16] to confirm their switching behavior. Simulations based on the kinetic models of these systems [24, 16] (see details of the equations in the METHOD) confirm the on/off and toggle-like switching behavior in their response curve (Figure 1-B) and state space (Figure 1-E). By simulating random samples from a wide range of conditions with different initial states, this unique feature of two distinct steady-states of gene switches results in a gene expression histogram profile containing two modes (Figures 1-C and 1-F). This bimodality is observed despite the noise (20% Gaussian noise) imposed on the parameters.
A challenge in experimentally identifying gene switches is their population effect. In single cell experiments, if obtainable, the response curves would represent individual cell measurements, and a gene that switches will exhibit a steep jump between the steady states (Figure 1-G). However many biological measurements (RT-PCR, Western Blotting), including microarray analysis, provide the population-average. In fact, even with single cell measurements, individual clones can contribute cell-cell variances, with differences in the protein expression levels across different cells. In Figure 1-H the cell-cell variance is modeled by a Gaussian distribution in the protein expression and different cells in a clone would then respond differently to stimulation, leading to a continuous change in the averaged response curve (see small graph in Figure 1-H). This explains, in part, the difficulty in identifying switches through standard experiments.
We proposed that an unbiased sampling across a range of different conditions could address this issue and help reveal the dynamic feature of gene switches. In Figure 1-I we show analytically, potential response-curves (the blue trajectories) in the whole state space of a gene switch. Each sample within the system would asymptotically approach one of the two possible steady states (dark blue region). Since the on/off states are the steady states which most cells will concentrated in upon stimulation, the samples will have higher probability of staying in these states, leading to a bimodal distribution in the observed expression profiles. We use a ΔAIC value (see METHOD) to capture whether a gene is likely to show multistable behavior. Compare with simpler criteria, such as separation and kurtosis , ΔAIC is more reliable and more resistance to noisy data (Additional file 1, Figure S1). The ΔAIC value is computed by comparing the goodness of fit of the data to either a Gaussian mixture model or a single Gaussian model, and assesses whether a bimodal distribution can explain the data better.
The E2F-Rb network is a well-characterized system in mammalian cell fate determination, whereby the Retinoblastoma (Rb) protein regulates the transcriptional factor, E2F, to control the restriction point for the G1-S transition in cell cycle . A simplified kinetic model was constructed for the E2F-Rb system , in which two genes Myc and CycD (Cyclin D: Cdk4,6) are activated by sufficient growth signal (serum) to induce E2F activation, which then directs the synthesis of downstream factors, such as CycE for DNA replication. The E2F self-activation and CycE-mediated E2F activation constitute two positive feedbacks in the system. It then was experimentally  confirmed that the level of E2F switches ON or OFF for cell-proliferation and cell-cycle arrest, respectively, suggesting E2F acts as a gene switch, while CycD and Myc do not show such switch-like behaviors.
We then simulate 100 cell clones, each clone with a random initial condition, and measure the steady state expression level of the network components for each clone. In this way we synthetically generate 100 "microarrays" for 100 different conditions. It is clear genes that have two steady states, i.e. E2F and CycE (effector of the gene switch), exhibit two distinct modes in their expression profiles (Figure 2-C). Each gene's ΔAIC value is calculated from the synthetic expression data and the switches exhibit higher ΔAIC values than the non-switches. Thus ΔAIC can be used to rank and help uncover genes that are bistable.
The genes with high ΔAIC values have distinct states under different conditions. By collating and comparing those conditions under the two distinct expression states, one can potentially identify the phenotypes in which the genes are functioning. Given that a phenotypic ontology is not available, it is difficult to compare conditions. Nevertheless, one approach is to categorize conditions by the type of perturbations, e.g. heat shock (with different temperature and length of time), hypo-osmotic shock (different time points), and extra carbon sources (different carbon source), etc., and check if one of the two states of a putative switch is enriched within a category of conditions. Using this approach, we correctly uncovered genes that have switch-like behavior, namely GAL1, GAL2, GAL7 and GAL10 (Figure 3-C). These genes all have ΔAIC values that rank among the top 5% and show bimodality, with one of their two modes containing conditions from the same category, i.e. "adding extra carbon sources". The bimodal profiles show that by adding 2% (weight to volume) extra carbon sources into the media, with the exception of galactose as the extra carbon source, the expression of these four genes shut down. It has been reported that these four genes function in the same pathway for galactose utilization, i.e., the well-known "GAL genetic switch" (review: ). The addition of alternative carbon sources results in "glucose-repression" of the GAL pathway. During this process, the high level of glucose or other carbon sources (other than galactose) induces the formation of the repressor complex (protein Mig1p and Cyc8-Tup1) and upon its binding to specific upstream repressing sequences (URSG) on the GAL promoters, it prevents the activation of these four GAL genes by the transcription factor GAL4, thereby turning off the galatose utilization pathway.
Current knowledge on the existence and functional machinery of other gene switches is limited. However we show next that by integrating information of the regulatory network and proteomic data, the genes with high ΔAIC obtained from our analysis could be possible switches or at least important genes with respect to the phenotypes. We calculate the ΔAIC values of transcription factors in the yeast transcriptional regulatory network (based on binding motif data, see METHOD), and observe that the leaf-nodes --- genes that are only regulated by one factor and are not regulating any other transcription factors ---- tend to have significantly lower ΔAIC value (average ΔAIC = 135 ± 9 compared with overall average ΔAIC = 223 ± 58 for transcription factors, p < 0.01, also see Additional file 1, Figure S2). These genes which have few regulators and do not transcriptionally control transcription factors are less likely to have feedbacks at the transcriptional level, and therefore switching dynamics. Thus the dynamic property we infer of the molecular components within a network is contingent on the network organization.
Next, we analyze single-cell proteomic data that includes noise in the protein expression measurements. We find a weak but significant negative correlation between the ΔAIC value of a gene and its coefficient of variation, which captures the noise of its protein expression (Figure 3-B and Additional file 1, Figure S3). This suggests that genes with higher ΔAIC value, showing bimodality, tend to express relatively lower levels of noise. This observation that genes with lower expression noise under normal conditions are more tightly controlled highlights their importance in the network, and is consistent with previous suggestions that gene switches have noise-filtering capacity [4, 5].
We further apply the mining approach to an integrated human gene expression dataset (ArrayExpress E-TABM-185 ) which collated 5897 microarray experiments performed on the same platform, across a wide range of conditions and cell types, including normal human tissues, carcinoma cells and tissues, hematopoietic cells, and other diseases.
This large integrated dataset  provides a sampling of the state-space of the gene network, and interestingly, the p53 gene (Figure 4-C), reported to be up-regulated in response to DNA damage  shows bimodality. This information cannot readily be obtained from comparing the expression data from just two conditions, normal and γ-irradiation  for instance (Figure 4-C). Recent single cell measurements with high temporal resolution observed p53 pulses with fixed amplitude and duration, suggesting an on/off rapid switching in the p53 dynamics [35–37]. Although p53 is regulated coordinately on multiple levels (transcription, translation, post-translational modification), our analysis of bimodality provide evidence to support a possible switching dynamics of p53 at the transcriptional level.
We analyze a paired breast cancer/normal tissue expression dataset (GSE15852)  against the integrated human gene expression dataset  to identify characteristic signatures of human breast cancer. First, we calculate the separation value D  for the top 10% ranked genes by ΔAIC to examine whether the expressions of these genes are bimodal when comparing the breast cancer (1119) samples against all other phenotypes (4,777 samples for ~300 conditions). Biologically this indicates whether a gene potentially shows bistability and could be involved in the "switching" or transition to a breast cancer phenotype. D > 2 has been suggested to indicate whether the separation into two Gaussian distributions or modes is distinctive . Considering the large amount of noise in the microarray data, we accept separation values of greater than or close to 2 (> 1.8) to indicate bimodality, which results in 17 genes showing distinct bimodality in breast cancer.
The molecular characterization of the Type 1 genes (e.g. ESR, HER2) suggests the development of therapies for ER+/PR+ and HER2+ would be effective for these breast cancer subtypes, however ~15-20% of the breast cancer tissues expressing low levels of these biomarkers (i.e. triple negative subtype) have poor prognosis and few treatment options. Moreover, patients that are responsive to commonly used drugs, such as tamoxifen (estrogen antagonist) and trastuzumab (anti-HER2 agent), eventually acquire resistance to the drugs. ~30% of tamoxifen-responsive tumors become resistant [45, 46], and the resistance invariably ensues at some point with trastuzumab. Given the increase in resistance to drugs that target the ESR receptor alternative therapeutic targets are needed.
To assess the possible switching behavior and regulation by CREB of TACSTD2, we performed flow cytometry to probe the TACSTD2 protein level at single-cell resolution. For both MCF10A and MCF7 breast cell lines the TACSTD2 protein level shows a bimodal distribution in their cell population (Figure 7-D E F), which is a property of a bistable system. We stimulated the cells with FI (Forskolin and IBMX) to induce cAMP, which is an activator of CREB , and measured the TACSTD2 levels. Both TACSTD2 mRNA and protein levels increased significantly upon stimulation (Figure 7-C, and Figures 7-D, E, respectively; for quantification of flow cytometry results see Additional file 1, Table S2), thereby supporting a possible transcriptional regulation by CREB. Upon activation of TACSTD2 by FI, a decrease in one of the modes with a concomitant increase in the other mode, instead of a gradual increase in the protein level, (Figure 7-DE) is indicative of a switching behavior. The activation essentially increases the number of cells with TACSTD2 levels at the ON state and decreases the cells with TACSTD2 at the OFF level. In contrast, the expression of the TACSTD2 protein in primary rat astrocytes shows a unimodal expression under the same test conditions. Furthermore stimulation of astrocytes by FI leads to a non-significant change in the protein level and with the cells predominantly remaining in the OFF steady-states (Figure 7-F).
The activation of TACSTD2 has been suggested to transduce calcium signal, likely by mediating calcium release from intracellular stores . It has been shown that the cross-linking (stimulation) of the TACSTD2 gene leads to a significant rise in the cytoplasmic calcium (Ca2+) level . The release of calcium can activate CREB  and the MAPK/ERK pathway  through calmodulin-dependent protein kinases (e.g. CaMKII). Indeed it is reported  in murine system that a high level of TACSTD2 can activate MAPK signaling to induce c-Fos and AP-1. This results in elevated levels of CycD1 and CycE as well as reduced levels of the CDK inhibitor, p27, which together can de-regulate and promote cell proliferation .
In light of these studies, our analysis uncovered TACSTD2 gene to have a switch-like behavior, with CREB as a possible regulator, which is activated in breast cancer and modulates the transcription of TACSTD2. CREB can provide a positive feedback in the transcriptional regulation of TACSTD2 (Figure 6-C), to thereby support a bimodal distribution in TACSTD2 expression and possible bistable behavior.
Researchers recognized that "genetic switches" behave in a discrete manner, but this feature is usually lost in biochemical analysis of large cell populations due to the difficulty in distinguishing between changes in the proportion of cells and their expression level in the two states . For example, it is hard to determine from population measurements whether the expression level of a gene increases gradually by 70%, or whether 70% of the cells are "switched" ON. In this study, we provide an alternative approach to identifying possible gene switches by capitalizing upon the large amount of available microarray data. The large sample set enables the characterization of the state space by uncovering the presence of the two attractor-states where the majority of the samples should fall. Thus, if an ON/OFF switch behavior exists in a system the state space will show bimodality or bistability, which are relatively stable with respect to perturbations . It has been suggested that bistability or multiple steady states  exists in large gene networks [53, 54], and these attractor-states represent different phenotypes . Thus, by sampling across different conditions, which are less affected by population averaging, one can reveal this dynamic feature of regulatory networks.
Our mining approach demonstrates that in the absence of a priori knowledge of the specific network architecture, one can capitalize on genome-wide expression profiling to capture dynamic properties of a complex network.
The increase in publically available microarray repositories provides a tremendous potential for data mining to unravel knowledge of cellular processes. Current approaches that integrate and analyze the wealth of expression data continues to emerge. The concept of "meta-analysis" comes from statistics and has been extended to integrate analysis of expression data. However most of the current studies have focused on database comparison, integration, and clustering . Furthermore, the statistical analysis of combining datasets of differentially expressed genes [56–58] have been used primarily to enhance the statistical power, i.e. reducing false discovery rate, as oppose to providing insight into the biological mechanisms.
Our study provides a different perspective that takes advantage of the large integrated set of expression data, and suggests a mechanism-based framework to perform the meta-analysis. This approach of integrating microarray data from a diverse set of conditions provides a common "context" of gene behaviors, whereby one can obtain a better understanding of the specific function of a gene for a particular condition under investigation. The example of p53 expression, shown in Figure 4-C is a case in point. p53 is known to be a major regulator in response to DNA damage , however it is difficult to identify from a small set of microarray experiment (E-MEXP-549, 21 samples, collected under the condition of DNA damage response) since it does not appear in the top ranked differentially expressed genes (, http://www.ebi.ac.uk/gxa/experiment/E-MEXP-549). This is likely due to its multi-level regulation (transcription, translation, post-translational modification) and also the lack of appropriate control conditions in the experiment. Nevertheless, by comparing the small set of samples against the global expression of the integrated dataset provides a "context", whereby one observes a significant reduction in the variance in expression within the small set of microarray experiment (E-MEXP-549), and suggests tightly-controlled regulation of this gene in DNA damage response.
Although p53 has both switching and oscillation dynamic features [35–37], we only discovered and discussed its switching property with our novel approach. Our approach identifies switch-like behavior based on the bimodal distribution induced by the feature of bistability. Oscillatory dynamics could have multiple (more than two) steady states and furthermore, the cells in the microarray experiments are not necessarily synchronized according to the periodic feature of the oscillation dynamics being investigated, thereby making it difficult to uncover this type of dynamics. Our approach is designed to discover gene switches and currently cannot be directly apply to identify oscillatory dynamics.
Our mining approach uncovered a unique expression pattern of TACSTD2 in breast cancer, and experiments confirmed TACSTD2 show bimodal behavior in breast cancer cell lines. TACSTD2 (Trop2) is a cell surface glycoprotein, first discovered to be highly expressed in trophoblast cells that become invasive and metastasized to form the outer layer of blastocyst in embryo development . Recent studies, along with our analysis of breast cancer samples, found TACSTD2 to be highly expressed in a variety of epithelial cancers and show low to no expression in normal somatic cells. High expression of TACSTD2 in squamous-cell carcinoma , pancreatic , colorectal  and gastric  cancers have been associated with poor prognosis and higher incidence of metastasis and death. TACSTD2 was identified as an oncogene in colorectal cancer cells . Although not essential for cell proliferation under normal condition, ectopic expression of TACSTD2 enhances anchorage-independent cell growth, promotes tumorigenesis and metastasis in colon cancer cells. Knock-down or inhibition of the protein reduces the invasiveness of aggressive colon cancer cells . In our analysis we also found TACSTD2 to be highly expressed in many colon cancer samples and shows bimodality (Additional file 1, Figure S6), however the percentage of colon cancer samples with TACSTD2 at the ON state (~60%) are less than in breast cancer (~99%), suggesting TACSTD2 could be a better target for breast cancer.
In previous microarray analysis of breast tumors,  Huang et al. studied "aggregate patterns of gene expression" with respect to lymph node status and recurrence, and identified "metagenes" that could predict the outcomes of the patients. TACSTD2 is found among the metagenes in their list; however the list consists of more than a hundred genes with potential predictive value. In contrast, we find the TACSTD2 gene to be the top gene in the list (Additional file 1, Table S3) that shows the Type 2 behavior. Interestingly, the distinctive HIGH/LOW expression level of the TACSTD2 gene has been implicated as a marker for stem cell characteristics in prostate basal cells  and hepatic oval cells . The prostate basal cells and hepatic oval cells, considered progenitor cells, show HIGH expression of the TACSTD2 gene and maintain self-renewal capability [66, 67], and thereby implicating a potential role of TACSTD2 in cancer initiating stem cells.
Although TACSTD2 has been reported to be associated with cancer, the regulatory mechanism of TACSTD2 remains unclear. Combining computational prediction and experimental analysis, we found that CREB could regulate TACSTD2 in breast cancer cells, and suggest a potential feedback structure of TACSTD2 regulation (Figure 6-C). To the best of our knowledge, this is the first regulatory circuit discovered to control TACSTD2 expression.
Studies of tamoxifen-resistant breast cancer cells found these cells develop altered activation of CREB and AP-1 , which we speculate could be related to TACSTD2 signaling. Trastuzumab resistance in HER2+ breast cancer cells is reported to involve elevated CycE expression level which is associated with the desensitization of p27 regulation . Given that TACSTD2 could increase CycE level by inhibiting p27, it provides a possible mechanistic connection with the TACSTD2 gene as a potential target for ERBB2/HER2 regulation and trastuzumab resistance (Figure 6-C).
Overall, our computational analysis demonstrate a distinctively high expression of TACSTD2 in almost all ER+/-, HER2+/- subtypes of breast cancer. Experiment shows that TACSTD2 expression is high in breast cancer cell lines, MCF-7 and MDA-MB-231 (Figure 7-B), and FI treatment enhances the expression of TACSTD2 (Figure 7-C and Additional file 1, Table S2). Comparing MCF-7 (an ER+/ERBB- cell line) and MDA-MB-231 (a triple negative cell line) cells, many more cells in the MDA-MB-231 than in the MCF-7 cell line have TACSTD2 in the ON state (Additional file 1, Table S2 and Figure S7). In fact most cells in the MDA-MB-231 cell line have TACSTD2 turned ON. This observation, together with the information extracted from the microarray data (Figure 7-A), highlights TACSTD2 as an important biomarker for both ER+ and ER- breast cancer subtypes, as well as an attractive candidate for drug therapy against the triple negative (ER-, PR- (progesterone receptor) and HER2-) subtype of breast cancer, with potential implications for treating drug-resistant cases that are non-responsive to ER/HER2-targeted therapies. In addition, the presence of TACSTD2 on the cell surface makes it more accessible to antibody-based therapeutics.
We propose a top-down mining approach to exploring gene switches on a genome-scale level. Our mining approach demonstrates that in the absence of a priori knowledge of the specific network architecture, one can capitalize on genome-wide expression profiling to capture dynamic properties of a complex network. To the best of our knowledge, this is the first attempt in applying mining approaches to explore gene switches on a genome-scale. By applying the computational analysis on human microarray data, we uncovered a unique expression pattern of TACSTD2 in breast cancer, and experiments confirmed TACSTD2 show bimodal behavior in breast cancer cell lines, further, our perturbation study suggest a potential bistable mechanism is involved. To the best of our knowledge, this is a first case a single cell level bimodality and bistability can be predicted from microarray data. Combining our computational and experimental analysis, together with previous studies in the literature, we suggest TACSTD2 could be an attractive candidate for drug therapy against both ER+ and ER- subtypes, including possibly the triple negative (ER-, PR- (progesterone receptor) and HER2-) subtype of breast cancer, and finally with potential implications for treating drug-resistant cases that are non-responsive to ER/HER2-targeted therapies.
A represents the expression level of Gene A, pA 2 /[1+A 2 ] describes the self-binding and activation of the transcription, and 1/1+R 2 is the effect of the repressor, in which R depends on the stimulation i--the concentration of the inhibitor i. deg(A) is a linear function for the degradation of A. The model is constructed by  for a mammalian cell system.
A, B represents the expression level of Gene A and B, respectively. α is a parameter about the strength of the cross-repression of the two genes. deg() is a linear function for degradation. The model is constructed by  in E.Coli.
The kinetic equations and parameters for the E2F-Rb system is obtained from the previous study by Yao et al. .
ODEs are implemented in MATLAB and numerically simulated with Runge-Kutta Method. Multiple simulations with different initial conditions are performed with customized MATLAB codes.
in which AIC 1 is the AIC of the gene expression profile assuming a unimodal distribution, while AIC 2 assumes the profile is a bimodal distribution. ΔAIC compares the fit with a unimodal vs. a bimodal distribution. The higher the ΔAIC value, the more likely the expression profile shows bimodality. Comparing with simpler criteria such as separation and kurtosis, ΔAIC is more reliable, and is more resistance to noise in data (see Additional file 1, Figure S1).
Where (μ 1 , σ 1 ) are the mean and deviation of samples in the specified condition, while (μ 2 , σ 2 ) are the mean and deviation for all other samples.
The "Mega Yeast Gene Expression Data" is downloaded from http://gasch.genetics.wisc.edu/datasets.html. The dataset contains 500 yeast microarray experiments. The conditions include environmental stress , cell cycle , sporulation and various other perturbations. A yeast putative transcriptional regulatory network based on the known motifs on the gene promoters is obtained from the YEASTRACT database . Information on the noise of the yeast protein expression is extracted from the Integrate Single-cell Proteomic Analysis data .
The human gene expression dataset (ArrayExpress E-TABM-185) is a per platform integration provided by ArrayExpress database http://www.ebi.ac.uk/arrayexpress/. The dataset integrates (and normalized) 5897 microarray experiments performed on the same Affymetrix GeneChip Human Genome HG-U133A platform . A variety of conditions and cell types are involved, including normal human tissues, carcinoma cells and tissues, hematopoietic cells, Alzheimer's disease, asthma, Down syndrome, Huntington's Disease, etc. Although it is an integration of experiments from different sources (different labs), the providers have cleared and normalized the dataset such that "the biological signal in these data is significantly stronger than the laboratory effects" .
Here Y represents the expression level of a gene. Since the expression level is continuous, we discretize this attribute into several categories with equal frequency. Choosing different number of categories in the discretization changes the entropy value in the calculation but does not affect the comparison made in estimating the contribution of genes to the prediction of the phenotypes.
which represents the contribution of gene Y to the prediction of the phenotype (defined by the reduction in the uncertainties given the information of gene Y).
Human mammary epithelial and breast cancer cell lines were obtained from Dr. Kathleen Gallo in Michigan State University. MCF7 and MDA-MB-231 were cultured in Dulbecco's modified Eagles's media (DMEM, Gibco BRL, Paisley, PA, USA) with 10% fetal bovine serum (FBS), 2 mM glutamine and penicillin/streptomycin. MCF10A cells were cultured in DMEM/F12(1:1) with 5% horse serum, 10 ug/ml insulin, 20 ng/ml EGF, 100 ng/ml choleratoxin, 0.5 ug/ml hydrocortisone and pen/strep. Primary astrocytes were maintained in DMEM/F12 (1:1) plus 10% FBS and pen/strep. Cells were maintained at 37°C and 10% CO2. Forskolin (Sigma, St. Louis, MO, USA) and isobutylmethylxanthine (IBMX) (Sigma) were used at the concentrations of 10 M and 100 M, respectively.
mRNA was extracted using the RNA extraction kit (Qiagen, Valencia, CA, USA), then mRNA was reverse transcribed into cDNA using the cDNA synthesis kit (Bio-Rad, Hercules, CA, USA). The following primer sets (Operon, Huntsville, AL, USA) were used for PCR: human-TROP2 (5'- GAGATTCCCCCGAAGTTCTC -3',5'- AACTCCCCCAGTTCCTTGAT -3'), rat-TROP2 (5'-TACTGCTACTGCTGGCGATC-3',5'-GCAGGCACTTGGAAGTTAGC-3'), rat-actin (5'-CTCTTCCAGCCTTCCTTCCT-3', 5'-AATGCCTGGGTACATGGTG-3'), human-actin (5'- TGGACTTCGAGCAAGAGATG -3', 5'- AGGAAGGAAGGCTGGAAGAG -3'). Amplifications of the cDNA templates were detected by SYBR Green Supermix (Bio-Rad) using Real-Time PCR Detection System (Bio-Rad). The cycle threshold values were determined by the MyIQ software.
Cells were washed with PBS and collected by trypsinization. Cells were then incubated with TACSTD2 primary antibody (BD bioscience, CA, USA; Santa Cruze, CA, USA) at 4°C for 30 min. After washing twice with wash buffer, the cells were incubated with Alexa Fluor-488 conjugated goat anti-mouse secondary antibody for 30 min at 4°C in the dark. Cells were washed twice with wash buffer and resuspended in staining buffer, and then subjected to flow cytometry analysis by BD FACSVantage.
All experiments were performed at least three times, the results were shown as mean ± standard deviation, and representative results are shown. Statistical analysis were performed by an unpaired, two tail student t-test. * indicates p < 0.05, ** indicates p < 0.01 and *** indicates p < 0.001.
This research was supported in part by the NIH (R01GM079688, R21RR024439 and 1R01GM089866), the NSF (CBET 0941055, CBET 1049127 and DBI 0701709), and the MSU Foundation.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.