Skip to main content

Table 1 The selected dissimilarity measures used to calculate genomic distance among tomato accessions from SNPsa

From: Dissimilarity based Partial Least Squares (DPLS) for genomic prediction from SNPs

Distance

Equation

R-packages

References

Euclidean

\( {\mathbf{d}}_{\mathbf{i1i2}}=\sqrt{{\displaystyle \sum_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}{\left({\mathbf{x}}_{\mathbf{i1k}}-{\mathbf{x}}_{\mathbf{i2k}}\right)}^{\mathbf{2}}}} \)

gstudio

[65]

Gower

\( {\mathbf{d}}_{\mathbf{i1i2}}=\frac{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}\;{\boldsymbol{\updelta}}_{\mathbf{i1i2}\mathbf{k}}\ast {\mathbf{d}}_{\mathbf{i1i2}\mathbf{k}}}{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}\;{\boldsymbol{\updelta}}_{\mathbf{i1i2}\mathbf{k}}} \) For nominal or factor variables d i1i2k  = 0, (if x i1k  = x i2k )

d i1i2k  = 1, (if x i1k  ≠ x i2k )

daisy

[66]

Allele share

\( {\mathbf{D}}_{\boldsymbol{i}\mathbf{1}\boldsymbol{i}\mathbf{2}}=\frac{\mathbf{1}}{\mathbf{K}}{\displaystyle \sum_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}{\mathbf{d}}_{\mathbf{i1i2}}\left(\mathbf{k}\right) \) Where d i1i2 (k) = {0, If individual i 1 and i 2 have two alleles in common at the k th locus,

1, If individual i 1 and i 2 have only single alleles in common at the k th locus,

2, If individual i 1 and i 2 have no alleles in common at the k th locus}

Custom-R-script

[67]

Nei

\( {\mathbf{d}}_{\mathbf{nei}}=-\mathbf{ln}\left[\frac{\left(\mathbf{2}\mathbf{N}-\mathbf{1}\right){\displaystyle {\sum}_{\mathbf{i}=\mathbf{1}}^{\mathbf{L}}}{\displaystyle {\sum}_{\mathbf{j}=\mathbf{1}}^{\mathbf{l}}}\;{\mathbf{p}}_{\mathbf{i}\mathbf{j},\mathbf{x}}{\mathbf{p}}_{\mathbf{i}\mathbf{j},\mathbf{y}}}{\sqrt{{\displaystyle {\sum}_{\mathbf{i}=\mathbf{1}}^{\mathbf{L}}}\left(\mathbf{2}\mathbf{N}{\displaystyle {\sum}_{\mathbf{j}=\mathbf{1}}^{\mathbf{l}}}\;{\mathbf{p}}_{\mathbf{i}\mathbf{j},\mathbf{x}}-1\right)}\left(\mathbf{2}\mathbf{N}{\displaystyle {\sum}_{\mathbf{j}=\mathbf{1}}^{\mathbf{l}}}\;{\mathbf{p}}_{\mathbf{i}\mathbf{j},\mathbf{y}}-\mathbf{1}\right)}\right] \) Where, the summation L is across loci and l is across alleles at each locus in population x and y (here individual)

gstudio

[68]

Bray

\( {\mathbf{d}}_{\mathbf{i1i2}} = \frac{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}\left|{\mathbf{x}}_{\mathbf{i1k}}-{\mathbf{x}}_{\mathbf{i2k}}\right|}{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}{\mathbf{x}}_{\mathbf{i1k}}+{\mathbf{x}}_{\mathbf{i2k}}} \)

vegan

[69]

Jaccard

\( {\mathbf{d}}_{\mathbf{i1i2}}=\frac{\mathbf{2B}}{\left(\mathbf{1}+\mathbf{B}\right)} \)

vegan

[70]

Kulczynski

\( {\mathbf{d}}_{\mathbf{i1i2}}=\mathbf{1}-\mathbf{0.5}\;*\;\left[\frac{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}\mathbf{min}\left({\mathbf{x}}_{\mathbf{i1k},}\ {\mathbf{x}}_{\mathbf{i2k}}\right)}{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}\;{\mathbf{x}}_{\mathbf{i1k}}}+\frac{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}\;\mathbf{min}\left({\mathbf{x}}_{\mathbf{i1k},}{\mathbf{x}}_{\mathbf{i2k}}\right)}{{\displaystyle {\sum}_{\mathbf{k}=\mathbf{1}}^{\mathbf{K}}}\;{\mathbf{x}}_{\mathbf{i2k}}}\right] \)

vegan

[70]

GRM

\( \mathbf{G}=\frac{\mathbf{ZZ}\boldsymbol{\hbox{'}}}{\mathbf{2}{\displaystyle \sum {\mathbf{p}}_{\mathbf{k}}\left(\mathbf{1}-{\mathbf{p}}_{\mathbf{k}}\right)}} \)

Custom R-script

[22]

  1. xi1k and xi2k = SNPs at locus k for accession xi1 and xi2 respectively
  2. di1i2k = distance between i1 and i2 samples for SNPs at locus k
  3. B Bray- Curtis dissimilarity
  4. G Genomic relationship matrix
  5. Z genotype information for all tomato accessions
  6. p k frequency of allele at locus k
  7. adi1i2 = distance between tomato accession i1 and i2