Skip to main content
Figure 7 | BMC Genomics

Figure 7

From: Discovery and validation of breast cancer subtypes

Figure 7

Steps of the procedure. Pictorial representation of steps 1 – 5 described in the Procedure subsection of the Methods section. (Upper) Filter all 23,946 genes by removing genes with at least 10% missing data or a standard deviation less than 1.5. Keep all seed genes that define two training dataset sample groups between which at least one of the 23,946 genes is significantly differentially expressed. Repeatedly do the following steps. Select two of the 133 candidate genes and hierarchically cluster the training dataset sample on these two genes. Cut the dendrogram from the top down to produce three groups of samples. Cut the same dendrogram from the top down again to produce four groups of samples. Use PAM to determine which of the 23,946 genes best define centroids for the training dataset sample groups obtained from the dendrogram. Form the centroids by taking only the data for those genes and averaging over the sample classified to the same group. Use the centroids to classify the training dataset samples. (Lower) If all the groups are validated in the training dataset then use the centroids to classify the testing datasets' samples. If all the groups are validated in all of the validation datasets, then the significance of the groups' clinical difference is determined (not pictured).

Back to article page