Skip to main content

Advertisement

Table 2 Five groups of omics datasets used for testing classification models

From: Architectures and accuracy of artificial neural network for disease classification from omics data

Dataset group Classification problem # Datasets # Classes # Raw features # Reduced features # Subjects Maximum class ratio
TCGA* TCGA (cancer vs normal) 14x5a 2 20,501 40 48–258 1:1
TCGA TCGA (stage classification) 12 2, 3, or 4 20,501 40 190–974 3:1
NSCLC.h NSCLC (adenocarcinoma vs squamous) 4 2 21,619 - 54,675 40 58–254 4.3:1
NSCLC.s NSCLC (stage classification) 5 3 21,619 - 54,675 40 58–265 4.4:1
CKD CKD (stage classification) 2 6 14,742b and 7,852c 54b and 49c 703 1.3:1
  1. aFive repetitive sets of positive subjects were randomly sampled from the full TCGA samples to match with the negative dataset. Training and testing were performed on each combined dataset, and performance values were averaged across the five repetitive datasets to return one value per cancer type
  2. bCKD positive ion metabolomics
  3. cCKD negative ion metabolomics