Skip to main content

Table 1 Negative training data sets in individual models, and corresponding accuracy, sensitivity, specificity and AUC values

From: Prediction of plant lncRNA by ensemble machine learning classifiers

Training dataset Negative data AUC Accuracy Specificity Sensitivity
   GB RF GB RF GB RF GB RF
1 3000 H. sapiens (set A) 0.940 0.943 0.962 0.956 0.988 0.990 0.548 0.404
  1000 M. musculus (set A)         
  3000 O. sativa (set A)         
2 3000 H. sapiens (set A) 0.943 0.944 0.960 0.953 0.988 0.989 0.576 0.461
  3000 O. sativa (set A)         
3 3000 H. sapiens (set A) 0.961 0.962 0.973 0.970 0.990 0.992 0.693 0.592
  1000 M. musculus (set A)         
  3000 A. thaliana (set A)         
4 3000 H. sapiens (set A) 0.962 0.966 0.972 0.967 0.990 0.990 0.725 0.640
  3000 A. thaliana (set A)         
5 3000 H. sapiens (set B) 0.955 0.959 0.965 0.958 0.991 0.980 0.608 0.530
  3000 A. thaliana (set B)         
6 4500 H. sapiens (set A + 1500 seq) 0.961 0.967 0.979 0.979 0.995 0.995 0.633 0.571
  4500 A. thaliana (set A + 1500 seq)         
7 3000 H. sapiens (set A) 0.963 0.967 0.976 0.971 0.993 0.992 0.700 0.603
  4500 A. thaliana (set A + 1500 seq)         
8 2000 H. sapiens (2000 from set A) 0.964 0.965 0.968 0.965 0.988 0.990 0.695 0.619
  1000 M. musculus (set A)         
  3000 A. thaliana (set A)         
  1. Training datasets of random forest (RF) and gradient boosting (GB) individual models are described. The positive training dataset, 436 validated lncRNAs, remained constant throughout all training datasets. Specificity, sensitivity, accuracy and AUC values were found using 10-fold cross validation of all training data