Skip to main content

Table 2 Impact of independent variables on variant call accuracy and variant call completeness

From: Impacts of low coverage depths and post-mortem DNA damage on variant calling: a simulation study

Divergence

 

%GC

Read length

Damage

Coverage depth

Low

% indels called correctly

-0.0194

0.0153

0.2150

0.4723log(covdepth)***/-0.1687***

 

% SNPs called correctly

-0.0757

0.0828*

-5.1790***

-0.2730log(covdepth)/0.0405

 

% of total indels called

-0.0459

-0.1057***

-2.7754**

3.9708***

 

% of total SNPs called

-0.0314

-0.0521

-1.6778*

5.5177***

High

% indels called correctly

0.0092

0.0396***

0.2563*

1.660log(covdepth)***/-0.3292***

 

% SNPs called correctly

-0.1495***

0.0249***

-0.9437***

-0.2730***

 

% of total indels called

-0.0282

-0.1159***

-1.8524***

2.8565***

 

% of total SNPs called

-0.0867**

-0.0616**

-1.0032*

3.1498***

  1. Slope coefficients resulting from multiple variable regression analysis, considering as dependent variables the percent of indels/SNPs correctly called (variant call accuracy) and the percent of total indels/SNPs called (variant call completeness), while GC content, read length, damage level and coverage depth were treated as independent variables. Values of 1, 2 and 3 were assigned to variables no-, low- and high-damage, respectively. Values not significant at Pr < 0.05 unless otherwise indicated; * signifies 0.01 < Pr < =0.05; ** signifies 0.001 < Pr < =0.01; *** signifies Pr < =0.001. For each row, the independent variable with the strongest significant effect is highlighted in bold. Results solely for best-fit linear models are shown except for “%indels called correctly” at low and high divergence and “% SNPs called correctly” at low divergence. For these cases, a slight improvement in correlation was seen using a logarithmic model, and slope coefficients for both models are shown as: log coefficient / linear coefficient.