Skip to main content

Table 1 Possible encodings of SNP sites

From: Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

Number of alleles at given site

Possible encodings

 

Tetraallelic SNP site

Triallelic SNP site

Biallelic SNP site

Homozygous individual

(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3)

(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 2)

(0, 0, 0, 0), (1, 1, 1, 1)

Biallelic individual

(0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1), (0, 2, 2, 2), (0, 0, 2, 2), (0, 0, 0, 2), (0, 3, 3, 3), (0, 0, 3, 3), (0, 0, 0, 3), (1, 2, 2, 2), (1, 1, 2, 2), (1, 1, 1, 2), (1, 3, 3, 3), (1, 1, 3, 3), (1, 1, 1, 3), (2, 3, 3, 3), (2, 2, 3, 3), (2, 2, 2, 3)

(0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1), (0, 2, 2, 2), (0, 0, 2, 2), (0, 0, 0, 2), (1, 2, 2, 2), (1, 1, 2, 2), (1, 1, 1, 2)

(0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1)

Triallelic individual

(0, 1, 2, 2), (0, 1, 1, 2), (0, 0, 1, 2), (0, 1, 3, 3), (0, 1, 1, 3), (0, 0, 1, 3), (0, 2, 3, 3), (0, 2, 2, 3), (0, 0, 2, 3), (1, 2, 3, 3), (1, 2, 2, 3), (1, 1, 2, 3)

(0, 1, 2, 2), (0, 1, 1, 2), (0, 0, 1, 2)

 

Tetraallelic individual

(0, 1, 2, 3)

  
  1. In this table all possible allele compositions for SNP sites in a tetraploid species are given. At tetraallelic SNP sites homozygous, biallelic, triallelic and tetraallelic individuals are possible. In contrast, at triallelic SNP sites only homozygous, biallelic and triallelic individuals are possible. Additionally, encodings that contain a fourth allele are not possible anymore. The number of possible allele compositions at biallelic SNP sites decreases analogously.
  2. Note that the presented encodings hold only for tetraploid species. In general, the number of possible encodings increases exponentially with increasing ploidy. The increase is driven by all possible partitions of the ploidy using four summands for a tetraallelic individual, three summands for a triallelic individual and two summands for a biallelic individual. For instance, there are 2 different partitions for a biallelic site of a tetraploid individual: 1 + 3 = 4 and 2 + 2 = 4. Then there are 2! = 2 different encodings for the first partition and 2 ! 2 ! MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIYaGmcqGGHaqiaeaacqaIYaGmcqGGHaqiaaaaaa@2FEB@ = 1 different encodings for the second partition. This makes 3 different encodings in total if this SNP site is biallelic (second row, third column). If the SNP site is triallelic, the number of encodings is multiplied by the number of two alleles chosen from three possible alleles: 3· ( 3 2 ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaeWaaeaafaqabeGabaaabaGaeG4mamdabaGaeGOmaidaaaGaayjkaiaawMcaaaaa@2FDF@ = 9 (second row, second 2 column). If the SNP site is tetraallelic, the number of encodings is multiplied by the number of two alleles chosen from four possible alleles: 3· ( 4 2 ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaeWaaeaafaqabeGabaaabaGaeGinaqdabaGaeGOmaidaaaGaayjkaiaawMcaaaaa@2FE1@ = 18 (second row, first column).