Skip to main content

Table 2 Five groups of omics datasets used for testing classification models

From: Architectures and accuracy of artificial neural network for disease classification from omics data

Dataset group

Classification problem

# Datasets

# Classes

# Raw features

# Reduced features

# Subjects

Maximum class ratio

TCGA*

TCGA (cancer vs normal)

14x5a

2

20,501

40

48–258

1:1

TCGA

TCGA (stage classification)

12

2, 3, or 4

20,501

40

190–974

3:1

NSCLC.h

NSCLC (adenocarcinoma vs squamous)

4

2

21,619 - 54,675

40

58–254

4.3:1

NSCLC.s

NSCLC (stage classification)

5

3

21,619 - 54,675

40

58–265

4.4:1

CKD

CKD (stage classification)

2

6

14,742b and 7,852c

54b and 49c

703

1.3:1

  1. aFive repetitive sets of positive subjects were randomly sampled from the full TCGA samples to match with the negative dataset. Training and testing were performed on each combined dataset, and performance values were averaged across the five repetitive datasets to return one value per cancer type
  2. bCKD positive ion metabolomics
  3. cCKD negative ion metabolomics