The microarray-based technology has drawn more and more attention in the biomedical researches. Numerous experiments have focused on the gene expression profiling generated with microarray technology to better understand the biological mechanisms of disease pathogenesis. Meanwhile, gene signatures selected through microarray data analysis, have been used to predict clinical response, disease stages or subtypes. A lot of investigators have already discussed different aspects of gene signature selection including classification algorithm, producing process, cross-platform comparison, validation, best signature selection.
Feature selection and classifier selection are two core steps in gene expression microarray-based clinical outcome classification for disease. Based on the hypothesis that a causal relation exists between a disease-specific phenotype and corresponding gene expression profile, the feature selection step is considered an exploration of the potential molecular mechanism of endpoints, which is often a time-consuming process. Because of the ambiguity and changeability of disease states under certain criteria, the consistency between phenotype and transcriptome state is instable and may weaken the microarray-based disease related prediction. Furthermore, some pathological standards are defined empirically and restricted to contemporary diagnostic techniques. Since there may be no consistent gene expression profile mechanism underlying a given endpoint phenotype, this indicates that the relationship between a phenotype and gene expression profile should be evaluated prior to exploring a microarray-based predictive model for pathological classification.
Therefore, instead of a priori assuming an association between endpoint phenotype and gene expression profile, we propose to first compute the consistency degree to test and evaluate this association. Based on the performances of thousands of classifiers from the MAQC-II project, the validity of the consistency degree was explicitly identified by our study. For example, for the endpoints G, J and K, our results show that the initially attributed cutoff criterion for each endpoint was close to our redefined one, thus indicating a relative consistency between clinical phenotypes and gene expression profiles. However, based on the given criteria, the predictive power of models for those endpoints are still insufficient. This can be reflected by the consistency degree (additional file 3). Since the consistency degrees cannot be increased any more by iterating all of the possible cutoffs, it indicates that there are weak relationships between the pathological traits and gene expression profiles for those three endpoints.
In all of the six actual tested endpoints, endpoint E, refers to estrogen-receptor status, is the only one that can be predicted relatively accurately (average top ten MCC, 0.76, additional file 3). ER is phenotype defined based on the activity of estrogen receptor on the tumor  and has clear related molecular mechanism, the consistency degree between this endpoint phenotype and gene expression profile is relative high. The high consistency degree of the endpoint E is a good example to confirm the predictability of the microarray-based models for a pathological endpoint, and is a positive evidence for the existence of the causal relation between gene expression profiles and a pathological endpoint. However, for low predictable endpoints, such as pCR, OS, EFS (additional file 3), which reflect more complicated clinical outcomes resulting from complex clinical treatments, no explicit molecular mechanisms has been elucidated to date. The lower predictive ability of microarray-based model for those complex endpoint phenotypes indicates that the characteristics of those endpoint phenotypes cannot easily be captured by the snapshot of gene expression profile. Therefore, the consistency degree between the phenotype and gene expression profile for those endpoints is much lower. These results imply that there are still wide gaps between complex endpoints and gene expression profile that need to be filled up and the current defined cutoffs for those endpoints need to be further evaluated comprehensively and defined accurately before applying the microarray-based models during clinical applications.
Above all, our results demonstrate that cautions should be taken during the development of microarray-based predictive model and that most importantly, the pathological status need to be carefully examined and defined. Otherwise, enormous effects made by the statistical approach eventually may end up with failure of reaching the ultimate goal, since the maximum predictive power of the models is limited by the correlation between clinical phenotype status and gene expression profile. Based on our findings, we conclude that the consistency degree score is an important index that should be determined before building predictive models based on microarray measurements. Ultimatively, calculating the consistency degree will help to build more reliable classification models.