Skip to main content

Table 1 Seven pairs (fourteen in total) of independent microarray datasets used for benchmarking

From: Comparison and evaluation of pathway-level aggregation methods of gene expression data

Phenotype Dataset name and reference Class 1 (control group) samples Class 2 (case group) samples Data source Platform
Effect of smoking on bronchial epithelium   Never smokers Current smokers   
  Beane [18] 21 52 GSE7895 U133A
  Vanni [19] 22 37 GSE10135 U133 Plus 2
Subtypes of non-small cell lung cancer (NSCLC)   AD (adenocarcinoma) SCC (squamous cell carcinoma)   
  Bild [20] 58 53 GSE3141 U133 Plus 2
  Lee [21] 63 75 GSE8894 U133 Plus 2
Subtypes of primary high grade glioma   AA (anaplastic astrocytoma) GBM (glioblastoma multiforme)   
  Phillips [22] 21 56 GSE4271 U133 Set
  Sun [23] 19 77 GSE4290 U133 Plus 2
Estrogen receptor (ER) status in breast cancer   ER-negative ER-positive   
  Chin [24] 46 84 E-TABM-158 U133A
  Minn [25] 42 57 GSE2603 U133A
Breast cancer grade   Grade 1 Grade 3   
  Desmedt [26] 30 83 GSE7390 U133A
  Sotiriou [27] 28 32 GSE2990 U133A
Lung cancer grade   Grade 1 Grade 3   
  Dana-Farber [28] 13 37 Author's website U133A
  Michigan [28] 26 66   
Clear cell renal cell carcinoma (CCRCC) vs Normal kidney   Normal kidney Tumorous kidney   
  Jones [29] 23 32 GSE15641 U133A
  Kort [30] 12 10 GSE11024 U133 Plus 2
  1. Each of the datasets was referred to in the main text by the dataset name, which is the first author's name or cohort name. Numbers in the table indicate the number of samples.