Volume 11 Supplement 1
Bimodal gene expression patterns in breast cancer
- Marina Bessarabova†1,
- Eugene Kirillov†1,
- Weiwei Shi2,
- Andrej Bugrim2,
- Yuri Nikolsky2Email author and
- Tatiana Nikolskaya1
© Bessarabova et al; licensee BioMed Central Ltd. 2010
Published: 10 February 2010
We identified a set of genes with an unexpected bimodal distribution among breast cancer patients in multiple studies. The property of bimodality seems to be common, as these genes were found on multiple microarray platforms and in studies with different end-points and patient cohorts. Bimodal genes tend to cluster into small groups of four to six genes with synchronised expression within the group (but not between the groups), which makes them good candidates for robust conditional descriptors. The groups tend to form concise network modules underlying their function in cancerogenesis of breast neoplasms.
Whole-genome gene expression studies primarily aim to identify conditional descriptors, i.e. subsets of genes or functional groups whose expression profiles distinguish between different biological states. Different biological conditions might include: disease state vs. normal state, good prognosis vs. bad, drug treated vs. untreated tissues, etc. Differential expression descriptors can be calculated in two ways. The traditional method consists of selecting a set of descriptor genes (gene signatures) using a variety of statistical methods [1–5]. Using this approach, a number of gene signatures were deduced for breast cancer phenotypes, including an "intrinsic" set for clustering of breast cancers , an "Amsterdam" signature consisting of 70 genes , a 76-gene "Rotterdam" signature  for metastasis, and a set of 21 genes associated with disease outcomes for ER+ tumors . Some of these sets are commercialized as multivariant diagnostics by Genomic Health http://www.genomichealth.com and Agendia http://www.agendia.com. Although important, gene signatures have many issues as descriptors - for instance, loss of specificity in validation studies with an increased number of samples , generally poor cross-platform compatibility (Amsterdam and Rotterdam signatures virtually do not overlap in gene content), lack of mechanistic (functional) correlation with phenotype, etc.
The second, more recent, approach deals with so-called "functional descriptors," such as pathways, signaling networks, enrichment distribution in ontologies, etc., which are differentially perturbed in the conditions being compared [11–14]. In good accordance with the original concept of "modularity" of biological functions systems , functional entities seem to be more robust descriptors than gene lists [16, 17]. In addition, functional descriptors provide strong mechanistic linkages with clinical phenotypes and, in the case of cancer, may explain important aspects of cancerogenesis.
However, in both cases, the genes composing gene signatures or functional categories are selected regardless of their individual patterns of expression among the samples in the study. In general, gene expression distribution in a population is assumed to be normal, as for any quantitative trait . However, it is not. As we have recently shown, distributions of expression signals of certain genes feature two distinct peaks among the samples in breast cancer . The phenomenon of expression "bimodality" was reported for other cancers as well [20–22], where «bimodality» was calculated by selection of hypervariable (HV) genes using F-statistics  and a combination of mixture modelling and kurtosis [21, 22].
Here we report a meta-analysis of bimodally expressed genes from five previously published independent breast cancer studies. We show that "bimodality" is a general phenomenon (at least for breast cancer), independent of a microarray platform and clinical phenotype (patient cohort). Bimodality is intrinsically associated with physiological states of the system, such as cancer vs. normal. Moreover, bimodally-expressed genes tend to cluster into groups with synchronised expression within a group. Consequently, bimodal group expression can be effectively used as an efficient and robust conditional descriptor, applicable for a variety of studies.
We also demonstrate the platform-independence of bimodality in three different microarrays used in the studies. Although compatibility between arrays can be high for certain end-points in limited size studies, as shown in the MAQCII project , in general, gene signatures are not robust and cannot be directly compared across platforms. There are several statistical methods of meta-analysis which enable direct comparison between gene expression levels in multiple experiments and allow for identification of genes with consistent signal values across the studies [20–27]. Here, we offer an approach to normalization of expression signal values into a binary mode corresponding to different conditions, which makes expression profiles on different arrays directly compatible.
The phenomenon of bimodality of gene expression
Gene expression datasets used for identification of genes with bimodal expression patterns. In all five datasets, bimodality was defined by τ = 2.64 and standard deviation over 25th percentile of the distribution. a Recognized genes for each platform.
Pair-wise intersections of the sets of bimodal genes in five studies. Fisher exact tests were used to estimate p-values.
All genes intersection
Bimodal genes intersection
Bimodal genes for set Aa
Bimodal genes for set Ba
Therefore, we conclude that bimodality of gene expression is a phenomenon not limited to a specific microarray platform, a study/endpoint or a dataset/patient cohort. Bimodality of individual genes is confirmed for at least three different studies, and in some cases in four or five studies.
"Bimodality" is conditional (disease-related)
We believe that "bimodality" is a conditional expression property of a gene and each «mode» corresponds to a certain physiological condition, for example, a normal and a disease state. It is also possible that the two modes could correspond to different disease subtypes.
Normalization of expression for bimodal genes
In order to clearly separate the patient samples by bimodal gene expression, we normalized the signals, so the signals could be presented in a binary manner, with one peak designated as -1 and another as 1. The original expression signals varied significantly between the genes in the same sample, and individual bimodal genes could be both over- and under-expressed in different samples. Therefore, the step of normalization was neccessary for minimizing the difference in amplitude of the expression of the genes in order to profile separate experiments in a uniform way. There can be two cases: 1. one gene from different experiments in which the intensities of its expression are different, and 2. different genes have similar intensity within one experiment. In the former case, normalization makes comparable the profiles of genes with different original intensities of expression. In the latter case, it allows one to identify truly similar genes within one set, with synchronised expression profiles for a physiological condition. The process of normalization is described in detail in "Methods," and an example of normalization of GRB7 expression from the Sorlie295 dataset is shown in Figure 2B.
The "close neighbors" groups of synchronously expressed bimodal genes for Sorlie295 data set.a Italics - breast cancer-associated genes.
"Close neighbors" - groups of synchronously-expressed bimodal genes
"Close neighbors" expression groups as potential descriptors for breast cancer end-points
As every gene in the group is bimodal, and the expression profiles of genes in each group are synchronised, each group can be used as an effective descriptor dividing patients into two clusters corresponding to the two expression modes. An average expression value for all genes in the group was used as the measure of the group's expression. For instance, ERBB2 group expression is downregulated in some patients and up-regulated in another part of the cohort (Figure 5C). It was shown that the expression group profiles are more robust descriptors than individual genes .
The expression of the "ñlose neighbors" groups is a remarkably robust descriptor between microarray platforms. "Robustness" can be defined as retained performance on larger validation datasets and «across platforms», i.e. the descriptor genes have to be synchronously expressed on different types of arrays. It is particularly important in the cases when the descriptors are deduced using a training set on one array platform and validation sets on a different platform, and especially when descriptor genes are present on the training array but are missing on the validation array [40–44, 23]. Using groups of genes (instead of individual genes in «gene signatures») and their summarized «group» expression instead of individual gene expression allows one to reduce or eliminate this problem. Thus, the gene TMIM158 from the PLAUR group is missing on Agilent arrays, but the group itself can still be used effectively as the descriptor with one gene missing. The average or summarized expression of the remaining genes in the group can be used as the group expression metric in this case.
Importantly, the pattern of group expression (i.e. an average of gene expression within a group) is remarkably stable between different studies and unique for the group, group expression profiles are essentially different and among the samples in all studies, i.e. the groups are expressed independently from each other. Therefore, the groups can be applied as robust descriptors for dividing samples (patients in the cohort) into sub-clusters (see additional file 6). The group descriptors can be applied consequently: Group 1 divided patients into two clusters, then Group 2 sub-categorizes each part into two and so on. Eventulally, every sample will be "barcoded" with 5 numbers reflecting the Group's expression mode as "1" or "2", for instance 1-1-2-1-2 (see additional file 6), and samples can be grouped together based on the matching "barcodes".
Here we described a fundamental property of certain genes to be expressed in two «modes» or expression levels depending on physiological condition/disease state. We studied this phenomenon in invasive breast cancer in five different studies using different array platforms, including cDNA arrays, Affymetrix and Agilent [28–35]. We have shown that bimodal genes are present on all arrays, and that the sets of bimodal genes statistically significantly overlap among the platforms. Therefore, we assume that bimodality is a common property of gene expression, dependent on physiological or disease states and independent of the end-points of the study or the microarray platform. In total, we identfied 866 bimodal genes shared among all platforms.
We developed and applied a computationally efficient algorithm to estimate bimodality of expression intensity distributions of genes based on maximization of the t-statistic-like measure τ (see Methods). Gene expression distribution is often modeled by a mixture of Gaussians with model parameters fit through expectation maximization (see e.g., [21, 22]). Bimodality of expression can then be deduced from testing log-likelihood ratios of two component mixture distribution versus a single component normal distribution as in , or through calculation of Bayesian information criterion as in . These approaches are computationally demanding and do not offer clear advantages over t-statistics. Also, characterization of a gene's bimodality via excess kurtosis as in  disregards bimodal distributions with unbalanced sizes of peaks, while a t-statistic still captures such unbalanced bimodality. A different approach for characterizing "hypervariable" genes was applied in , where authors searched for genes with higher variability than in a majority of genes. The F-test was used to select the genes with variances significantly higher than the variance of genes in a 'reference group'.
Conditional bimodality is an unexpected and non-trivial property of gene expression. The expected distribution of any quantitative character in biological systems is expected to be normal . Moreover, most genes in the studied datasets are not «bimodal» (Table 1). Distribution with two distinct peaks means that transcriptional regulation of some genes is conditional - breast cancer-dependent in our case. Alternatively, one could expect expression of two different conditionally prevailing splice variants for certain transcripts - a phenomenon shown for some cancers . However, observation of this case is not likely, as we see the same genes on three different platforms, and the array set does not allow us to separate different splice variants, at least for the original cDNA array.
Bimodal expression is conditional and, in our case, is linked to the complex condition known as breast cancer. The set of 866 common bimodal genes is heavily enriched in breast cancer-associated genes, participating in many pathways and processes of cancerogenesis. Thus, the «group query» genes ERBB2, ESR1, CEACAM5 and AR are well known markers of breast cancer [46–49].
According to the theory of modularity of biological processes , bimodal genes tend to cluster into synchronously expressed functional groups of «close neighbors». We described an approach for identification of such groups based on normalized gene expression, which makes it platform-independent and comparable between different arrays. We have selected 23 genes divided into 5 groups which were co-expressed within groups in all five studies on three different platforms (but the group expression was independent from one to another). The genes within the groups are functionally close. Thus, genes in each group form statistically significant protein interaction networks. Some groups, such as TCAP, PSMD3, GRB7, and ERBB2, belong to a well known amplicon . Transcription of MX1, CXCL10, PLSCR1, and ISG15 from the STAT1 group is regulated by STAT1 [37, 38]. 15 out of 23 bimodal genes in groups are known in the literature as breast cancer-associated genes, which suggests breast cancer specificity of these functional modules.
As gene expression within a group is synchronised through many studies, «group expression» can be applied as a «binary» conditional descriptor separating a patient cohort into sub-groups with «-1» and «1» expression. Consecutive application of different groups can be appled for further sub-division of the patient cohort into patient clusters, with "1s" and "2s" for each group used as a bar-code for the patent cluster. The advantage of using group expression instead of individual gene expression is in high robustness: an average per group expression fluctuates at a lower scale than dispersed expression of individual genes. The «close neighbors» gene groups can be used as prognostic descriptors for clinical end-points such as patient survival, metastases development, response to therapy, etc. Sub-categorization of cancer patients is a non-trivial problem due to high heterogeneity of expression profiles. Thus, in a well-known sub-categorisation scheme which divided invasive breast cancers into five clusters based on expression of certain "centroid" genes , over 1/3 of samples could not be categorized into any cluster and expression heterogeneity within clusters was still high, especially in validation studies with more samples, despite running the studies on essentially the same cDNA array [50, 51]. Importantly, when we applied normalized group expression as the clustering metrics, we saw not a single outlier among over 1000 samples in five studies on three different microarray platform. The heterogeneity was also much lower within the clusters (data not shown). Such high robustness makes the "close neighbors" groups potentially very promising biomarkers for clinical end-points in breast cancer and, likely, other types of cancers.
Functional grouping of genes as descriptors also deals with an important issue of reduction of dimensionality in meta-analysis. Meta-analysis can be defined as a cross-study analysis of different patient cohorts united by a clinical end-point or any other parameter [24, 52]. This type of analysis is broadly applied, for example, during comparison of a study of interest with the expression data accumulated in GEO http://www.ncbi.nlm.nih.gov/geo/ or other expression databases (ArrayExpress http://www.ebi.ac.uk/microarray-as/ae/, Stanford Microarray Database http://smd.stanford.edu/), Yale Microarray Database http://www.med.yale.edu/microarray/, etc). Platform compatibility and minimization of «dimensionality» are two major problems in meta-analysis, where «gene signatures» consisting of individual genes are notorious for poor reproducibility [40–44, 23]. Here, we offer a general solution for the problem, consisting of identification of bimodal genes, normalization of their expression and grouping of the normalized expression into synchronised clusters of «close neighbors». Normalization consists of transformation of expression signals into a binary system of «-1» and «1», and it enables comparison of otherwise incomparable expression data between platforms and studies . Lack of individual genes on a certain array platform does not prevent using the group as the descriptor.
We described the phenomenon of bimodality of gene expression in breast cancer and grouping of the bimodal genes into «close neighbor» groups. The sets of bimodal genes are non-random; they are enriched in disease markers and targets and tend to form functionally related groups with synchronised expression. These groups of «close neighbors» can be used as robust descriptors for certain sub-groups of patients and associated with clinically important phenotypes (end-points). Application of functional descriptors consisting of bimodal genes is important in the area of meta-analysis of gene expression experiments across platforms and across studies.
Identfication of bimodal genes
In a set of expression experiments (for instance, a patient cohort), each gene has a distribution of expression signals across the set. Bimodal genes feature a distribution with two distinct peaks (maximal signals) (Fig 1B). For each gene, we can set up a distinguishing expression value such that the signals lower than this value correspond to the lower peak in the bimodal distribution, and the signals higher than this value correspond to the higher peak. The characteristic value was chosen as follows: all expression values for a gene were randomly divided onto two groups, and average and sum of squared deviations were calculated for each group. The lower the sum of deviations, the better the partition.
In calculation of bimodality, we assume that distribution of expression within the cohort for a bimodal gene is a sum of two normal distributions. Let us consider - an expression value of i-s gene in the j-s experiment; L i , U i - partition of the set of all experiments onto subsets depending on i-s gene); #L i , #U i - the number of experiments in each subset; - average signal for i-s gene in each subset L i , U i . We need to find a partition with the minimum γ(L i , U i ): . We need to look at only the subsets with - the values in subset L i are lower than in subset U i . The number of possible partitions with such a property is larger by 1 than the number of experiments (including two cases with empty subsets). For the «optimal» partition, l i = ⟨L i ⟩, u i = ⟨U i ⟩ and γ i = γ(L i , U i ). In this case, the characteristic signal T i will be calculated as follows:
. In other words, the characteristic signal T is the border with two optimal divisions, i.e. (Figure 2A).
We can consider as the level of bimodality a relative dicrepancy in the values in sub-sets (measure of signal isolation) τ i , which is calculated as , where M - is the total number of experiments.
The list of values for i- s gene is entered into an algorithm
All signals are sorted by value: the number of signals is n, where j is the number of the signal
For all n - 1 possible partitions of the sorted list of values onto two groups (partition is defined by the number of the highest signal in the smaller by value group), Partition is defined as k ∈ [1, n - 1]):
1. The average for each sub-set is for the sub-set with lower signals - for the sub-set with higher signals
2. The sum of squares of deviations for each sub-set:
We choose K i , for which is ( )
The algorithm results in K i , , .
where divides the signals for the given partition according to . This property allows us to clearly divide signals for each peak (mode).
One of the drawbacks of the method described above is its sensitivity to outliers. For instance, if in three experiments out of 100 the expression values are significantly higher than the others, the three signals will be assigned to one peak, and the remaining 97 to another peak. This situation can be avoided if all values for a group will be considered as outliers if its relative size is small, for instance, less than 5%.
This transformation allows us to reduce all signals l i and u i to -1 and 1, correspondingly (Figure 2B). If the set contains a certain number of control experiments (for instance, normal samples among the disease samples), we can consider the expression values for the group with normal samples as 1, and the other group as -1. This allows us to compare expression profiles which are synchronised among the patients but in different directions. Also, the genes with control values belonging to different modes can be excluded.
• The mean for normal patients was calculated , where N - samples from normal patients, and #N is their number
• In the case of ν i <T i , the gene expression values were transformed by ; otherwise . Therefore, always.
Selection of the groups of synchronously expressed genes
The genes with similar signal profiling constitute a group of «close neighbors»
This study was supported by NCI grant 5R44CA112828-03 Elucidation of protein networks implicated in breast cancer
This article has been published as part of BMC Genomics Volume 11 Supplement 1, 2010: International Workshop on Computational Systems Biology Approaches to Analysis of Genome Complexity and Regulatory Gene Networks. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/11?issue=S1.
- Nadon R, Shoemaker J: Statistical issues with microarrays:processing and analysis. Trends Genet. 2002, 18: 265-271. 10.1016/S0168-9525(02)02665-3.View ArticlePubMedGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMed CentralView ArticlePubMedGoogle Scholar
- Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics. 2001, 17: 509-519. 10.1093/bioinformatics/17.6.509.View ArticlePubMedGoogle Scholar
- Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol. 2000, 7: 819-837. 10.1089/10665270050514954.View ArticlePubMedGoogle Scholar
- Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, Gibson G: The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet. 2001, 29: 389-395. 10.1038/ng766.View ArticlePubMedGoogle Scholar
- Perou CM, Sørlie T, Eisen MB, Rijn van de M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lønning PE, Børresen-Dale AL, Brown PO, Botstein D: Molecular portraits of human breast tumours. Nature. 2000, 406: 747-752. 10.1038/35021093.View ArticlePubMedGoogle Scholar
- van't Veer LJ, Dai H, Vijver van de MJ, He YD, Hart AA, Mao M, Peterse HL, Kooy van der K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 415: 530-536. 10.1038/415530a.View ArticleGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365: 671-679.View ArticlePubMedGoogle Scholar
- Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004, 351: 2817-2826. 10.1056/NEJMoa041588.View ArticlePubMedGoogle Scholar
- Ein-Dor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA. 2006, 103 (15): 5923-5928. 10.1073/pnas.0601231103.PubMed CentralView ArticlePubMedGoogle Scholar
- Nikolsky Y, Ekins S, Nikolskaya T, Bugrim A: A novel method for generation of signature networks as biomarkers from complex high-throughput data. Tox Letters. 2005, 158: 20-29. 10.1016/j.toxlet.2005.02.004.View ArticleGoogle Scholar
- Ideker T, Sharan R: Protein networks in disease. Genome Res. 2008, 18 (4): 644-652. 10.1101/gr.071852.107.PubMed CentralView ArticlePubMedGoogle Scholar
- Huang Q, Jin X, Gaillard ET, Knight BL, Pack FD, Stoltz JH, Jayadev S, Blanchard KT: Gene expression profiling reveals multiple toxicity endpoints induced by hepatotoxicants. Mutat Res. 2004, 549: 147-168.View ArticlePubMedGoogle Scholar
- Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network Motifs: Simple Building Blocks of Complex Networks. Science. 2002, 298: 824-827. 10.1126/science.298.5594.824.View ArticlePubMedGoogle Scholar
- Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular to modular cell biology. Nature. 1999, 402 (6761 Suppl): C47-C52. 10.1038/35011540.View ArticlePubMedGoogle Scholar
- Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007, 3: 140-10.1038/msb4100180.PubMed CentralView ArticlePubMedGoogle Scholar
- Shi W, Tsyganova M, Dosymbekov D, Dezso Z, Nikolskaya T, Dudoladova M, Serebryiskaya T, Guryanov A, Brennan R, Shah R, Dopazo J, Chen M, Deng Y, Shi Y, Jurman G, Furlanello G, Thomas RS, Corton JC, Tong W, Shi L, Nikolsky Y: The Tale of "Underlying biology": Functional Analysis of MAQC II data. Nat Biotech. 2009,Google Scholar
- Fisher RA: The correlation between relatives under the supposition of Mendelian inheritance. Trans R Soc Edinburgh. 1918, 52: 399-433.View ArticleGoogle Scholar
- Nikolsky Y, Kirillov E, Serebryiskaya T, Rakhmatulin R, Perlina A, Bugrim A, Lingle W, Nikolskaya T: Sequential clustering of breast cancers using bimodal gene expression. Proceed AACR Ann Meeting. 2007, 141-Google Scholar
- Dozmorov I, Knowlton N, Tang Y, Shields A, Pathipvanich P, Jarvis JN, Centola M: Hypervariable genes--experimental error or hidden dynamics. Nucleic Acids Res. 2004, 32 (19): e147-10.1093/nar/gnh146.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao HY, Yue PY, Fang KT: Identification of differentially expressed genes with multivariate outlier analysis. J Biopharm Stat. 2004, 14 (3): 629-646. 10.1081/BIP-200025654.View ArticlePubMedGoogle Scholar
- Teschendorff AE, Naderi A, Barbosa-Morais NL, Caldas C: PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer. Bioinformatics. 2006, 22 (18): 2269-2275. 10.1093/bioinformatics/btl174.View ArticlePubMedGoogle Scholar
- MAQC Consortium, Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24: 1151-1161. 10.1038/nbt1239.PubMed CentralView ArticleGoogle Scholar
- Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Meta-analysis of microarrays: Interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 2002, 62: 4427-4433.PubMedGoogle Scholar
- Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics. 2003, 19: 184-190. 10.1093/bioinformatics/btg1010.View ArticleGoogle Scholar
- Ghosh D, Barette TR, Rhodes D, Chinnaiyan AM: Statistical issues and methods for meta-analysis of microarray data: A case study in prostate cancer. Funct Integr Genomics. 2003, 3: 180-188. 10.1007/s10142-003-0087-5.View ArticlePubMedGoogle Scholar
- Choi JK, Choi JY, Kim DG, Choi DW, Kim BY, Lee KH, Yeom YI, Yoo HS, Yoo OJ, Kim S: Integrative analysis of multiple gene expression profiles applied to liver cancer study. FEBS Lett. 2004, 565: 93-100. 10.1016/j.febslet.2004.03.081.View ArticlePubMedGoogle Scholar
- Chang HY, Nuyten DS, Sneddon JB, Hastie T, Tibshirani R, Sørlie T, Dai H, He YD, van't Veer LJ, Bartelink H, Rijn van de M, Brown PO, Vijver van de MJ: Robustness, scalability and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA. 2005, 102: 3738-3743. 10.1073/pnas.0409462102.PubMed CentralView ArticlePubMedGoogle Scholar
- Pawitan Y, Bjöhle J, Amler L, Borg AL, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, Liu ET, Miller L, Nordgren H, Ploner A, Sandelin K, Shaw PM, Smeds J, Skoog L, Wedrén S, Bergh J: Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res. 2005, 7: R953-R964. 10.1186/bcr1325.PubMed CentralView ArticlePubMedGoogle Scholar
- Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d'Assignies MS, Bergh J, Lidereau R, Ellis P, Harris AL, Klijn JG, Foekens JA, Cardoso F, Piccart MJ, Buyse M, Sotiriou C, TRANSBIG Consortium: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007, 13: 3207-3214. 10.1158/1078-0432.CCR-06-2765.View ArticlePubMedGoogle Scholar
- Ivshina AV, George J, Senko O, Mow B, Putti TC, Smeds J, Lindahl T, Pawitan Y, Hall P, Nordgren H, Wong JE, Liu ET, Bergh J, Kuznetsov VA, Miller LD: Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res. 2006, 66: 10292-10301. 10.1158/0008-5472.CAN-05-4414.View ArticlePubMedGoogle Scholar
- Hu Z, Fan C, Oh DS, Marron JS, He X, Qaqish BF, Livasy C, Carey LA, Reynolds E, Dressler L, Nobel A, Parker J, Ewend MG, Sawyer LR, Wu J, Liu Y, Nanda R, Tretiakova M, Ruiz Orrico A, Dreher D, Palazzo JP, Perreard L, Nelson E, Mone M, Hansen H, Mullins M, Quackenbush JF, Ellis MJ, Olopade OI, Bernard PS, Perou CM: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006, 7: 96-10.1186/1471-2164-7-96.PubMed CentralView ArticlePubMedGoogle Scholar
- Oh DS, Troester MA, Usary J, Hu Z, He X, Fan C, Wu J, Carey LA, Perou CM: Estrogen-regulated genes predict survival in hormone receptor-positive breast cancers. J Clin Oncol. 2006, 24: 1656-1664. 10.1200/JCO.2005.03.2755.View ArticlePubMedGoogle Scholar
- Weigelt B, Hu Z, He X, Livasy C, Carey LA, Ewend MG, Glas AM, Perou CM, Van't Veer LJ: Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. Cancer Res. 2005, 65: 9155-9158. 10.1158/0008-5472.CAN-05-2553.View ArticlePubMedGoogle Scholar
- Mullins M, Perreard L, Quackenbush JF, Gauthier N, Bayer S, Ellis M, Parker J, Perou CM, Szabo A, Bernard PS: Agreement in breast cancer classification between microarray and quantitative reverse transcription PCR from fresh-frozen and formalin-fixed, paraffin-embedded tissues. Clin Chem. 2007, 53: 1273-1279. 10.1373/clinchem.2006.083725.View ArticlePubMedGoogle Scholar
- Kauraniemi P, Kuukasjärvi T, Sauter G, Kallioniemi A: Amplification of a 280-kilobase core region at the ERBB2 locus leads to activation of two hypothetical proteins in breast cancer. Am J Pathol. 2003, 163: 1979-1984.PubMed CentralView ArticlePubMedGoogle Scholar
- Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S: Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature methods. 2007, 4: 651-657. 10.1038/nmeth1068.View ArticlePubMedGoogle Scholar
- Zhao KW, Li D, Zhao Q, Huang Y, Silverman RH, Sims PJ, Chen GQ: Interferon-alpha-induced expression of phospholipid scramblase 1 through STAT1 requires the sequential activation of protein kinase Cdelta and JNK. The Journal of biological chemistry. 2005, 280: 42707-42714. 10.1074/jbc.M506178200.View ArticlePubMedGoogle Scholar
- Kim SY, Kim YS: A gene sets approach for identifying prognostic gene signatures for outcome prediction. BMC Genomics. 2008, 9: 177-10.1186/1471-2164-9-177.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu CC, Hu J, Kalakrishnan M, Huang H, Zhou XJ: Integrative disease classification based on cross-platform microarray data. BMC Bioinformatics. 2009, 10 (Suppl 1): S25-10.1186/1471-2105-10-S1-S25.PubMed CentralView ArticlePubMedGoogle Scholar
- Mao S, Wang C, Dong G: Evaluation of inter-laboratory and cross-platform concordance of DNA microarrays through discriminating genes and classifier transferability. J Bioinform Comput Biol. 2009, 7 (1): 157-173. 10.1142/S0219720009004011.View ArticlePubMedGoogle Scholar
- Shi L, Jones WD, Jensen RV, Harris SC, Perkins RG, Goodsaid FM, Guo L, Croner LJ, Boysen C, Fang H, Qian F, Amur S, Bao W, Barbacioru CC, Bertholet V, Cao XM, Chu TM, Collins PJ, Fan XH, Frueh FW, Fuscoe JC, Guo X, Han J, Herman D, Hong H, Kawasaki ES, Li QZ, Luo Y, Ma Y, Mei N, Peterson RL, Puri RK, Shippy R, Su Z, Sun YA, Sun H, Thorn B, Turpaz Y, Wang C, Wang SJ, Warrington JA, Willey JC, Wu J, Xie Q, Zhang L, Zhang L, Zhong S, Wolfinger RD, Tong W: The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies. BMC Bioinformatics. 2008, 9 (Suppl 9): S10-10.1186/1471-2105-9-S9-S10.PubMed CentralView ArticlePubMedGoogle Scholar
- McCall MN, Irizarry RA: Consolidated strategy for the analysis of microarray spike-in data. Nucleic Acids Res. 2008, 36 (17): e108-10.1093/nar/gkn430.PubMed CentralView ArticlePubMedGoogle Scholar
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martínez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005, 2: 345-350. 10.1038/nmeth756.View ArticlePubMedGoogle Scholar
- Zhang C, Li HR, Fan JB, Wang-Rodriguez J, Downs T, Fu XD, Zhang MQ: Profiling alternatively spliced mRNA isoforms for prostate cancer classification. BMC Bioinformatics. 2006, 7: 202-10.1186/1471-2105-7-202.PubMed CentralView ArticlePubMedGoogle Scholar
- Klijn JG, Berns EM, Foekens JA: Prognostic factors and response to therapy in breast cancer. Cancer Surv. 1993, 18: 165-198.PubMedGoogle Scholar
- Perren TJ: c-erbB-2 oncogene as a prognostic marker in breast cancer. Br J Cancer. 1991, 63: 328-332.PubMed CentralView ArticlePubMedGoogle Scholar
- Lacroix M: Significance, detection and markers of disseminated breast cancer cells. Endocr Relat Cancer. 2006, 13: 1033-1067. 10.1677/ERC-06-0001.View ArticlePubMedGoogle Scholar
- Turashvili G, Bouchal J, Baumforth K, Wei W, Dziechciarkova M, Ehrmann J, Klein J, Fridman E, Skarda J, Srovnal J, Hajduch M, Murray P, Kolar Z: Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer. 2007, 7: 55-10.1186/1471-2407-7-55.PubMed CentralView ArticlePubMedGoogle Scholar
- Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Rijn van de M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Eystein Lønning P, Børresen-Dale AL: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001, 98: 10869-10874. 10.1073/pnas.191367098.PubMed CentralView ArticlePubMedGoogle Scholar
- Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lønning PE, Brown PO, Børresen-Dale AL, Botstein D: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA. 2003, 100: 8418-8423. 10.1073/pnas.0932692100.PubMed CentralView ArticlePubMedGoogle Scholar
- Stanton JL, Green DP: Meta-analysis of gene expression in mouse preimplantation embryo development. Mol Hum Reprod. 2001, 7: 545-552. 10.1093/molehr/7.6.545.View ArticlePubMedGoogle Scholar
- Severgnini M, Bicciato S, Mangano E, Scarlatti F, Mezzelani A, Mattioli M, Ghidoni R, Peano C, Bonnal R, Viti F, Milanesi L, De Bellis G, Battaglia C: Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment. Anal Biochem. 2006, 353: 43-10.1016/j.ab.2006.03.023.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.