Skip to main content

Table 1 The confusion matrix of Random Forest classification using the proposed methylation signature

From: DNA methylation profiles capturing breast cancer heterogeneity

Signature

Data type

Dataset

 

TN

nonTN

F1 score

MCC

 

F1_TN

F1_nonTN

pCpGs

Methylation

GSE72245

TN

7

2

0.67

0.90

0.58

   

non-TN

5

32

   

pDMGs

Methylation

TCGA

TN

11

43

0.29

0.93

0.26

   

non-TN

12

384

   

pDMGs

mRNA

TCGA

TN

44

25

0.70

0.97

0.67

   

non-TN

12

519

   

PAM50

mRNA

TCGA

TN

60

9

0.84

0.98

0.82

   

non-TN

14

517

   

PAM50–6

mRNA

TCGA

TN

55

15

0.79

0.97

0.76

   

non-TN

15

515

   
  1. The values show the number of consistent and inconsistent samples clustered using each signature panel and identified using immunohistochemistry staining. F1 score that captures both false positives and false negatives is used to assess the classification accuracy. ‘TN’ and ‘non-TN’ are triple negative and non-triple negative breast cancers, respectively. PAM50 and PAM50–6 (with ER, PR, HER2, FOXA1, MYC, MYBL2 excluded) are used as a benchmark. ‘F1_TN’ and ‘F1_nonTN’ each refers to F1 scores in identifying TN and non-TN tumors