Skip to main content

Table 2 KNN accuracy on test data with 5% simulated sequencing error for different sample sizes, test sizes, training sizes and different numbers of neighbors

From: Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA

Test size

Training size

K = 1

K = 2

K = 3

K = 4

K = 5

K = 6

K = 7

K = 8

K = 9

K = 10

Samples of 50 MBases

 1

91

0.93

0.93

0.90

0.92

0.90

0.91

0.84

0.85

0.71

0.81

 17

75

0.88

0.88

0.84

0.87

0.82

0.83

0.76

0.79

0.71

0.78

 32

60

0.86

0.86

0.80

0.83

0.78

0.79

0.74

0.79

0.75

0.82

 47

45

0.80

0.80

0.73

0.76

0.69

0.74

0.71

0.76

0.73

0.80

 62

30

0.77

0.77

0.68

0.75

0.71

0.77

0.74

0.79

0.78

0.81

 77

15

0.66

0.66

0.64

0.68

0.70

0.74

0.74

0.73

0.69

0.67

Samples of 100 MBases

 1

91

1.00

1.00

0.98

1.00

1.00

1.00

0.99

0.99

0.82

0.92

 17

75

0.99

0.99

0.96

0.98

0.93

0.93

0.86

0.90

0.82

0.88

 32

60

0.96

0.96

0.92

0.94

0.87

0.89

0.84

0.87

0.83

0.88

 47

45

0.93

0.93

0.87

0.90

0.83

0.87

0.83

0.88

0.83

0.88

 62

30

0.86

0.86

0.79

0.84

0.78

0.83

0.80

0.84

0.82

0.85

 77

15

0.77

0.77

0.72

0.76

0.73

0.75

0.73

0.72

0.68

0.65

Samples of 300 MBases

 1

91

1.00

1.00

1.00

1.00

1.00

1.00

0.98

0.99

0.95

0.98

 17

75

1.00

1.00

1.00

1.00

0.98

0.99

0.95

0.97

0.93

0.95

 32

60

0.99

0.99

0.97

0.98

0.94

0.95

0.92

0.95

0.93

0.95

 47

45

0.98

0.98

0.94

0.95

0.92

0.93

0.91

0.94

0.92

0.95

 62

30

0.95

0.95

0.90

0.93

0.90

0.93

0.89

0.91

0.87

0.89

 77

15

0.88

0.88

0.84

0.86

0.81

0.82

0.78

0.76

0.72

0.70