Architectures and accuracy of artificial neural network for disease classification from omics data

BMC Genomics

Table 2 Five groups of omics datasets used for testing classification models

Dataset group	Classification problem	# Datasets	# Classes	# Raw features	# Reduced features	# Subjects	Maximum class ratio
TCGA*	TCGA (cancer vs normal)	14x5^a	2	20,501	40	48–258	1:1
TCGA	TCGA (stage classification)	12	2, 3, or 4	20,501	40	190–974	3:1
NSCLC.h	NSCLC (adenocarcinoma vs squamous)	4	2	21,619 - 54,675	40	58–254	4.3:1
NSCLC.s	NSCLC (stage classification)	5	3	21,619 - 54,675	40	58–265	4.4:1
CKD	CKD (stage classification)	2	6	14,742^b and 7,852^c	54^b and 49^c	703	1.3:1

^aFive repetitive sets of positive subjects were randomly sampled from the full TCGA samples to match with the negative dataset. Training and testing were performed on each combined dataset, and performance values were averaged across the five repetitive datasets to return one value per cancer type
^bCKD positive ion metabolomics
^cCKD negative ion metabolomics

ISSN: 1471-2164