Skip to main content

Table 1 Seven pairs (fourteen in total) of independent microarray datasets used for benchmarking

From: Comparison and evaluation of pathway-level aggregation methods of gene expression data

Phenotype

Dataset name and reference

Class 1 (control group) samples

Class 2 (case group) samples

Data source

Platform

Effect of smoking on bronchial epithelium

 

Never smokers

Current smokers

  
 

Beane [18]

21

52

GSE7895

U133A

 

Vanni [19]

22

37

GSE10135

U133 Plus 2

Subtypes of non-small cell lung cancer (NSCLC)

 

AD (adenocarcinoma)

SCC (squamous cell carcinoma)

  
 

Bild [20]

58

53

GSE3141

U133 Plus 2

 

Lee [21]

63

75

GSE8894

U133 Plus 2

Subtypes of primary high grade glioma

 

AA (anaplastic astrocytoma)

GBM (glioblastoma multiforme)

  
 

Phillips [22]

21

56

GSE4271

U133 Set

 

Sun [23]

19

77

GSE4290

U133 Plus 2

Estrogen receptor (ER) status in breast cancer

 

ER-negative

ER-positive

  
 

Chin [24]

46

84

E-TABM-158

U133A

 

Minn [25]

42

57

GSE2603

U133A

Breast cancer grade

 

Grade 1

Grade 3

  
 

Desmedt [26]

30

83

GSE7390

U133A

 

Sotiriou [27]

28

32

GSE2990

U133A

Lung cancer grade

 

Grade 1

Grade 3

  
 

Dana-Farber [28]

13

37

Author's website

U133A

 

Michigan [28]

26

66

  

Clear cell renal cell carcinoma (CCRCC) vs Normal kidney

 

Normal kidney

Tumorous kidney

  
 

Jones [29]

23

32

GSE15641

U133A

 

Kort [30]

12

10

GSE11024

U133 Plus 2

  1. Each of the datasets was referred to in the main text by the dataset name, which is the first author's name or cohort name. Numbers in the table indicate the number of samples.