Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: A method for positive forensic identification of samples from extremely low-coverage sequence data

Fig. 1

Overview of the method. a Sequence data from two extremely low-coverage shotgun libraries are mapped to a reference sequence. Reads that overlap known single nucleotide polymorphimic (SNP) sites are used to observe a single base for that position. The majority of SNP positions have no observations. Pairs of closely linked SNPs with observed bases are identified. b For each SNP pair, we calculate the probability of the observations under two models. The first model represents the case where the two reads orginated from a single diploid individual (top). In this model there is an equal chance that the two reads were drawn from the same chromosome or different chromosomes. This model takes the haplotype frequency as well as allele frequency into account. The second model represents the case where the two reads originated on different chromosomes (bottom). The probability of observing the two bases is the product of the allele frequencies. A reference panel of phased haplotypes is used to model the allele and haplotype frequencies for the population from which the samples were drawn. These probablilies are compared as a log-likelihood ratio. c Comparisons are made for any pair of SNPs occuring within a specified distance along a chromosome. d log-likelihood values are aggregated by sampling pairs from windows across the genome to avoid confounding effects from linkage. This sampling step is repeated in a bootstrapping approach to build an empirical distribution of the genome-wide log-likelihood ratio. Positive values indicate that the single, diploid individual model is favored while negative values indicate that the two samples are from independent individuals

Back to article page