Skip to main content
Figure 2 | BMC Genomics

Figure 2

From: Statistics of protein-DNA binding and the total number of binding sites for a transcription factor in the mammalian genome

Figure 2

Observed and predicted statistics of TF--DNA BEs. A: Fitting and back extrapolation analysis for complete dataset. Decomposition of mixture model (1) for Nanog TF-DNA BEs is provided based on curve-fitting analysis of the model. Close circle: number of loci of ChIP-seq extended DNA cluster overlaps from 1 to 8 BEs. Open circle: number of loci of ChIP-seq extended DNA cluster overlaps from 9 to 73 (included) BEs. Noise-like (close circles) data fits well be exponential function with exponent parameter s = 1.05 ± 0.055 (p < 0.0001, t-test). The reliable set of TF BS (at >8 BEs) are equally well fitted by the left-side truncated GDP function (at k = 1.81 ± 0.15 (p < 0.001, t-test) and b = 8.00 ± 1.335 (p < 0.001, t-test)) as well as by K-W function (θ = 0.999, a = 6.618, b = 8.29; Table 3). Extrapolation curve predicts the number of Nanog TFBSs in the noise-enriched binding site fraction of the empirical distribution. B: Nanog TF-DNA BEs, C: Esrrb TF-DNA BEs and D: c-Myc TF-DNA BEs. B, C and D: K-W model fitting on the observed and extrapolated of double-truncated GDP data to calculate p0. Vertical dotted lines are representing qPCR-defined threshold and the threshold defined based on best-fit double-truncated GDP function. Triangle symbols show the observed over represented number of TFBSs in compare to best-fit GDP function. N0, N1 and N2 are the numbers of non-detected, potentially detected and high specific (reliable) TFBSs, respectively. More detail information about parameter values of GDP and K-W models presents in Additional File 3, 4, 5.

Back to article page