Skip to main content

Table 2 The performance of various models for discriminating clustered strains from non-clustered strains in the lineage2 cohort

From: Association between two-component systems gene mutation and Mycobacterium tuberculosis transmission revealed by whole genome sequencing

Parameters

Training set

(n = 3595, 2081 clustered strains,

1514 non-clustered strains)

Test set

(n = 1541, 918 clustered strains,

623 non-clustered strains)

 

Random Forest

Gradient Boosted Classification Tree

Random Forest

Gradient Boosted Classification Tree

Kappa

0.641

0.613

0.454

0.442

AUC

(95% CI)

0.908

(0.899, 0.917)

0.877

(0.866, 0.888)

0.791

(0.771, 0.811)

0.778

(0.757, 0.799)

Sensitivity

(95% CI)

0.873

(0.862, 0.884)

0.836

(0.824, 0.848)

0.786

(0.766, 0.806)

0.807

(0.787, 0.827)

Specificity

(95% CI)

0.762

(0.748, 0.776)

0.779

(0.765, 0.793)

0.666

(0.642, 0.690)

0.628

(0.604, 0.652)

PPV

(95% CI)

0.837

(0.825, 0.849)

0.845

(0.833, 0.857)

0.771

(0.750, 0.792)

0.741

(0.719, 0.763)

NPV

(95% CI)

0.811

(0.798, 0.824)

0.767

(0.753, 0.781)

0.686

(0.663, 0.709)

0.712

(0.689, 0.735)

PLR

(95% CI)

4.437

(4.415, 4.459)

3.625

(3.597, 3.653)

2.451

(2.402, 2.50)

2.571

(2.528, 2.614)

NIR

(95% CI)

0.225

(0.15, 0.30)

0.276

(0.198, 0.354)

0.408

(0.313, 0.503)

0.389

(0.301, 0.477)

Accuracy

(95% CI)

0.827

(0.815, 0.839)

0.813

(0.8, 0.826)

0.737

(0.715, 0.759)

0.730

(0.708, 0.752)

  1. AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; PLR, positive likelihood ratio; NLR, negative likelihood ratio; CI, confidence