Identification of genes in different EECs. (A) A multidimensional scaling (MDS) plot drawn by all probe sets (~54600 ones) on the chip. Normal endometrium (Normal) and EECs of all 4 stages are included. Each spot represents an array. (B) A Venn diagram summarizing genes differentially expressed between normal and tumor tissues or between early (Stages 1 & 2) and late (Stages 3 & 4) EEC samples in the training cohort. (C) Narrowing down the existing gene signature using a machine learning strategy. When probe sets were ranked by signal-to-noise ratios (weights), the top 217 features was the largest panel to give the lowest error rate (i.e., a best classification effect; upper panel). (D) The discrimination ability of the 217-probeset signature. A prediction strength plot  shows the prediction strengths of the identified 217 probe sets in discriminating early from late EECs in the training cohort. Samples 1B and 2B denote 2 early EECs (Stages 1B and 2B, respectively) which express late EEC gene signatures. (E) A MDS plots using the above 217 probe sets. 2 misgrouped early EECs are indicated. (F) Signature evaluation by an independent testing data set. One Stage 1B case, which expresses late EEC gene signatures, is grouped into the late EEC area (separated by a red line).