Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans

Fig. 2

MOCCS profile reflected TF- or TF-family dependent DNA-binding specificities. A Overview of the ChIP-seq data processing. MOCCS2 was applied to human ChIP-seq samples from ChIP-Atlas, resulting in MOCCS profiles, k-mer-based TF-binding specificity profiles. Quality control metrics for ChIP-seq samples were calculated to filter samples (hard filter). B Number of ChIP-seq samples that passed through the hard filter. The colors indicate the cell type class (left) or TF (right). C Example of a MOCCS profile (GATA3, MDA-MB231). The highest MOCCS2score k-mer (AGATAA) was similar to that of the GATA3 PWM (HOCOMOCO database). D Detection performance (AUROC) of canonical motifs (top 10% PWM-supported k-mers) using the MOCCS2score for the original (red) and shuffled (gray) data of CTCF, SPI1, and FOXA1. *q < 0.05 (Wilcoxon signed-rank test). E Top: Detection performance (AUROC) of significant k-mers of MOCCS2 using the top 10% PWM-supported k-mers: original (red) and shuffled (gray) data from CTCF, SPI1, and FOXA1. *q < 0.05 (Wilcoxon signed-rank test). Bottom: Bar plot displaying -log10(q-value) from Wilcoxon signed-rank test for 20 TFs. F Heatmap of TF-dependent binding k-mer similarity (k-sim Jaccard) between the ChIP-seq samples. The color labels of rows and columns represent the TFs. G Violin plots of k-mer similarity indices, k-sim Pearson (green) and Jaccard (red), and the peak overlap index (blue) for different groups of ChIP-seq pairs. H UMAP visualization of MOCCS profiles. Point colors represent the ChIP-seq samples of the top 15 TFs (left) or TF families (right), with the largest sample size, or the rest (gray). I Ratios of neighboring pairs of the same TF (left) or TF family (right) for original and permuted data. * p < 6.26e-249 (permutation test; see Methods). J Star graphs displaying the TF similarity patterns between query TF (center) and the top 10 TFs with the highest k-sim Pearson (edge colors). Circles indicate TFs belonging to the same TF family as the query TF. Avairable PWMs (HOCOMOCO database) are shown

Back to article page