Skip to main content

Table 3 SNV calling accuracy

From: Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias

Data set

Method

ShoRAH

VarScan

LoFreq

  

TP

FP

FN

Rc.

Pr.

TP

FP

FN

Rc.

Pr.

TP

FP

FN

Rc.

Pr.

HCV1

Raw

38

29

0

1.000

0.567

38

206

0

1.000

0.156

37

1

1

0.974

0.974

 

Fisher’s exact

32

6

6

0.842

0.842

36

179

2

0.947

0.167

29

0

9

0.763

1.000

 

Bin. (σ = 0)

37

7

1

0.974

0.841

36

108

2

0.947

0.250

     
 

B-bin., σ = 0.0004

37

7

1

0.974

0.841

37

108

1

0.947

0.255

     
 

B-bin., σ = 0.0014

37

8

1

0.974

0.822

37

108

1

0.947

0.255

     
 

B-bin., σ = 0.0111

38

8

0

1.000

0.826

38

108

0

1.000

0.261

     

HCV2

Raw

38

50

0

1.000

0.432

38

826

0

1.000

0.044

37

1

1

0.974

0.974

 

Fisher’s exact

33

9

5

0.868

0.785

36

766

2

0.947

0.045

29

0

9

0.763

1.000

 

Bin. (σ = 0)

34

8

4

0.895

0.810

35

577

3

0.921

0.057

     
 

B-bin., σ = 0.0004

36

8

2

0.947

0.818

36

577

2

0.947

0.059

     
 

B-bin., σ = 0.0014

36

8

2

0.947

0.818

36

577

2

0.947

0.059

     
 

B-bin., σ = 0.0111

38

13

0

1.000

0.745

38

589

0

1.000

0.061

     

HCV2

Raw

153

2

35

0.814

0.987

175

732

13

0.931

0.193

121

0

67

0.644

1.000

 

Fisher’s exact

87

0

101

0.462

1.000

125

671

63

0.665

0.157

41

0

147

0.218

1.000

 

Bin. (σ = 0)

88

0

100

0.468

1.000

113

473

75

0.601

0.193

     
 

B-bin., σ = 0.0004

101

1

87

0.537

0.990

121

474

67

0.644

0.203

     
 

B-bin., σ = 0.0014

126

1

62

0.670

0.992

135

475

53

0.718

0.221

     
 

B-bin., σ = 0.0111

151

2

37

0.803

0.987

162

490

26

0.862

0.248

     
  1. SNV calling statistics for ShoRAH, VarScan, and LoFreq. For ShoRAH and VarScan, SNV calls without the strand bias test (Raw) and using different values of the beta-binomial dispersion parameter σ in the strand bias test are given, with σ = 0 corresponding to a binomial forward read distribution. For LoFreq, the results of applying our strand bias test are absent as this software does not report forward and reverse strand counts. For all SNV calling methods, the results of applying Fisher’s exact test to the raw output are also given (this was possible for LoFreq as it reports a strand bias Fisher’s exact test p-value for each variant). Reported statistics include true positives (TP), i.e., genomic sites with a variant matching a known true variant, false positives (FP), i.e., genomic sites with a variant that is not a known true variant, and false negatives (FN), i.e., known variants which are not identified by the relevant SNV calling method. Individual genomic sites may contribute to both true positives and false positives. Recall (TP/(TP + FN)) and precision (TP/(TP + FP)) are also reported.