Skip to main content

Table 1 Summary statistics for the mosaic classes of the human genome

From: Most transcription factor binding sites are in a few mosaic classes of the human genome

Class pair A C G T Length CpG proportion Proportion of genome
1 2 3 4 5 6 7 8
1 0.169 0.234 0.104 0.492 5.4 0.006 0.143
2 0.230 0.220 0.243 0.307 84.7 0.011 0.134
3 0.314 0.139 0.118 0.428 71.5 0.003 0.132
4 0.235 0.281 0.133 0.351 64.9 0.005 0.120
5 0.279 0.175 0.185 0.361 55.3 0.004 0.116
6 0.215 0.165 0.188 0.432 504.7 0.005 0.092
7 0.183 0.336 0.205 0.276 55.6 0.009 0.069
8 0.177 0.392 0.194 0.237 33.6 0.029 0.044
9 0.167 0.379 0.236 0.218 48.2 0.027 0.042
10 0.145 0.242 0.309 0.303 18.2 0.017 0.022
11 0.199 0.453 0.240 0.108 14.4 0.056 0.017
12 0.175 0.021 0.079 0.725 14.4 0.000 0.014
13 0.252 0.359 0.126 0.263 10.5 0.003 0.012
14 0.125 0.454 0.252 0.169 25.8 0.096 0.009
15 0.217 0.357 0.124 0.301 71.4 0.010 0.008
16 0.223 0.045 0.261 0.471 14.0 0.007 0.007
17 0.003 0.066 0.006 0.925 11.9 0.000 0.006
18 0.099 0.402 0.071 0.429 5.1 0.024 0.006
19 0.021 0.434 0.023 0.522 33.5 0.003 0.005
  1. Each row refers to two classes, which we refer to as the a and b classes, each with the reverse complement properties of the other. The a class is the one with the higher percentage of T+C, and the base frequencies for this class are shown in columns 2-3-4-5. The frequencies for the b class are the same but with the A/T and C/G frequencies interchanged. Column 6 gives the mean length of the class in bases and column 7 gives the proportion of doublets within the class that are CpG: these two quantities are the same for both the a and b classes. Column 8 gives the proportion of bases in the genome within both classes of the class pair: the total of column 8 is therefore 1.0. The class pairs have been numbered by the proportion of bases they contain. All values have been calculated from the fitted HMM: the parameters of this model are given in Additional File 1.