Skip to main content

Table 2 Distribution of transcription factor binding sites across mosaic classes

From: Most transcription factor binding sites are in a few mosaic classes of the human genome

Data Source Factor P Sites Pair 2 Pair 7 Pair 9 Pair 14 Total
Note 1 Note 2 Note 3 Note 4 Note 5 Note 5 Note 5 Note 5 Note 6
HAIB-K562 GABP 0.71 2557 0.054 0.035 0.113 0.774 0.976
HAIB-K562 NRSF 0.74 2006 0.236 0.231 0.254 0.142 0.862
HAIB-K562 SRF 0.64 367 0.370 0.083 0.111 0.229 0.794
YALE-GM128 NFKB 0.50 2653 0.322 0.156 0.139 0.069 0.686
YALE-HCT116 TCF7L2 0.50 3386 0.281 0.111 0.060 0.030 0.483
YALE-HepG2 SREBP1 0.50 4958 0.237 0.092 0.137 0.276 0.742
YALE-K562b GATA1 0.52 3367 0.322 0.221 0.146 0.048 0.736
YALE-K562b TR4 0.51 541 0.144 0.083 0.216 0.426 0.870
YALE-K562b ZNF263 0.64 5098 0.049 0.194 0.466 0.147 0.856
YALE-K562 cFos 0.53 3746 0.287 0.186 0.111 0.018 0.603
YALE-K562 Max 0.60 3176 0.210 0.100 0.180 0.185 0.675
YALE-K562 NF-E2 0.81 4700 0.273 0.149 0.088 0.026 0.536
YALE-NT2D1 YY1 0.50 2967 0.252 0.135 0.157 0.333 0.876
YALE-K562-Ia30 STAT1 0.50 1039 0.398 0.104 0.077 0.059 0.638
ORegAnno CTCF 1.00 4858 0.202 0.181 0.353 0.169 0.905
TRANSFAC sp1 0.62 693 0.045 0.075 0.332 0.512 0.966
TRANSFAC p53 0.91 608 0.266 0.203 0.081 0.021 0.571
Average of above     0.232 0.138 0.178 0.204 0.751
Genome proportions     0.134 0.071 0.042 0.009 0.256
Model proportions     0.134 0.069 0.042 0.009 0.255
Promoter region     0.170 0.069 0.128 0.233 0.600
  1. 1) For Encode data, this column gives the track, cell line and a possible note on the experimental protocol. For the HAIB data replication 1 was used. 2) Name of factor. 3) This column shows, P, the proportion of sequences for which MAST found a binding site. 4) The number sites found by MAST: later columns show the proportion of these sites in the class pair specified. 5) The values quoted are the average probabilities of the site being in the class(es); they are not maximum likelihood estimates. 6) Total of preceding columns. 7) Three lines of comparative figures are given. The line "genome proportions" gives the result of applying the analysis to 20 thousand bases chosen at random from the genome: the line "model proportions" gives those of the long term average of the HMM: the line "promoter region" gives the proportions found from applying the model to the bases within 1000 bases upstream of the transcription start site of all coding genes. The equality of "genome proportions" and "model proportions" is a cross check on the consistency of the calculations.