Skip to main content
Figure 1 | BMC Genomics

Figure 1

From: iASeq: integrative analysis of allele-specificity of protein-DNA interactions in multiple ChIP-seq datasets

Figure 1

The iASeq model. (a) An example of the data structure. Each row represents a SNP and each column corresponds to either the reference allele (R) or the non-reference allele (N) read counts from a ChIP-seq sample in a dataset. A dataset could be a TF ChIP-seq experiment or a HM ChIP-seq experiment, and can have multiple replicate samples (Rep). iASeq assumes the following data generating process. (b) First, SNPs belong to K + 1 classes with different ASB patterns. For each SNP, a class label a i is randomly assigned according to a class abundance probability vector Π. Given the class label, a configuration [b id ,c id ] is generated for each SNP in each dataset according to the probabilistic allele-specificity patterns specified by two vectors V k and W k . In the figure, the darkness of each cell in V and W represents the probability for b id or c id to be 1. (c) Next, a skewing probability p idj is generated for each SNP i, dataset d and replicate sample j based on [b id ,c id ]. The distribution of p idj for NS SNPs in each sample follows a Beta distribution (blue lines). p idj s for SR SNPs are uniformly distributed in the interval [pdj 0,1] where pdj 0is the mean of the background Beta distribution (dark blue lines). p idj s for SN SNPs are uniformly distributed in the interval [0,pdj 0] (light blue lines). (d) Finally, given the configuration [b id ,c id ], skewing probability p idj and a total read count n idj for SNP i, dataset d and sample j, the read count for each allele is generated according to a binomial distribution. The length of the orange bar represents the non-reference allele read count, and the length of the red bar represents the reference allele read count.

Back to article page