Skip to main content

Table 2 Model selection among candidate models predicting frequencies of SNPs, transition and transversion in contigs

From: The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)

Model

Parametersa

AIC

Δ AICb

wic

for estimating the number of SNPs

 

M1 (best, full model)

Intercept**, LENGTH*, DEPTH**, C/NC†

7283.4

0.0

0.904

 

M2

Intercept**, DEPTH**, C/NC

7287.9

4.5

0.096

 

M3

Intercept**, LENGTH**, C/NC

7540.7

257.3

1.2 × 10-56

 

M4

Intercept**, C/NC**

7748.3

464.9

1.0 × 10-101

for estimating the number of transitions (T i )

 

M1 (best, full model)

Intercept**, LENGTH†, DEPTH**, C/NC

5682.4

0.0

0.681

 

M2

Intercept**, DEPTH**, C/NC

5683.9

1.5

0.319

 

M3

Intercept**, LENGTH**, C/NC†

5896.1

213.7

2.7 × 10-47

 

M4

Intercept**, C/NC**

6061.0

378.5

6.3 × 10-83

for estimating the number of transversions (T v )

 

M1 (best, full model)

Intercept**, LENGTH**, DEPTH**, C/NC*

4077.1

0.0

0.991

 

M2

Intercept**, DEPTH**, C/NC†

4086.6

9.5

0.009

 

M3

Intercept**, LENGTH**, C/NC

4183.8

106.7

6.7 × 10-24

 

M4

Intercept**, C/NC*

4310.4

233.3

2.2 × 10-51

  1. Model selection among candidate regression models using negative binomial distribution predicting the frequency of SNPs, transitions, and transversions in contigs (depth of 10 reads or more; length of 501 bp or longer), using Akaike's Information Criteria (AIC).
  2. a LENGTH and DEPTH are the length and the depth of contigs, respectively, and C/NC is a dummy variable for the type of transcript (protein-coding (coded with 1) vs non-coding transcript (coded with 0). **, P < 0.01; *, P < 0.05; †, P < 0.1) variable in each model.
  3. bΔ AIC is the difference between the AIC of the best fitting model and that of each model.
  4. cw i is Akaike weight of each model.