Skip to main content

Table 2 KNN accuracy on test data with 5% simulated sequencing error for different sample sizes, test sizes, training sizes and different numbers of neighbors

From: Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA

Test size Training size K = 1 K = 2 K = 3 K = 4 K = 5 K = 6 K = 7 K = 8 K = 9 K = 10
Samples of 50 MBases
 1 91 0.93 0.93 0.90 0.92 0.90 0.91 0.84 0.85 0.71 0.81
 17 75 0.88 0.88 0.84 0.87 0.82 0.83 0.76 0.79 0.71 0.78
 32 60 0.86 0.86 0.80 0.83 0.78 0.79 0.74 0.79 0.75 0.82
 47 45 0.80 0.80 0.73 0.76 0.69 0.74 0.71 0.76 0.73 0.80
 62 30 0.77 0.77 0.68 0.75 0.71 0.77 0.74 0.79 0.78 0.81
 77 15 0.66 0.66 0.64 0.68 0.70 0.74 0.74 0.73 0.69 0.67
Samples of 100 MBases
 1 91 1.00 1.00 0.98 1.00 1.00 1.00 0.99 0.99 0.82 0.92
 17 75 0.99 0.99 0.96 0.98 0.93 0.93 0.86 0.90 0.82 0.88
 32 60 0.96 0.96 0.92 0.94 0.87 0.89 0.84 0.87 0.83 0.88
 47 45 0.93 0.93 0.87 0.90 0.83 0.87 0.83 0.88 0.83 0.88
 62 30 0.86 0.86 0.79 0.84 0.78 0.83 0.80 0.84 0.82 0.85
 77 15 0.77 0.77 0.72 0.76 0.73 0.75 0.73 0.72 0.68 0.65
Samples of 300 MBases
 1 91 1.00 1.00 1.00 1.00 1.00 1.00 0.98 0.99 0.95 0.98
 17 75 1.00 1.00 1.00 1.00 0.98 0.99 0.95 0.97 0.93 0.95
 32 60 0.99 0.99 0.97 0.98 0.94 0.95 0.92 0.95 0.93 0.95
 47 45 0.98 0.98 0.94 0.95 0.92 0.93 0.91 0.94 0.92 0.95
 62 30 0.95 0.95 0.90 0.93 0.90 0.93 0.89 0.91 0.87 0.89
 77 15 0.88 0.88 0.84 0.86 0.81 0.82 0.78 0.76 0.72 0.70