Skip to main content

Table 1 Summary of statistical tests for 200-bp putative promoter region

From: Rigorous and thorough bioinformatic analyses of olfactory receptor promoters confirm enrichment of O/E and homeodomain binding sites but reveal no new common motifs

      Comparison to previous 200bp region Comparison to shuffled sequences Conservation tests  
MatBase matrix family MatBase family description Consensus of arbitrarily chosen PWM Number of predicted sites in 200bp promoter region Number of 200bp promoters containing > = 1 site Number of sites expected Enrichment compared to expected number Enrichment p-value Number of sites expected Enrichment compared to expected number Enrichment p-value p-value: whether whole sites are more conserved than local backgrounc p-vaiue: whether core nucleotides are more conserved than local background Significance levels of three tests
Before masking O/E and VTBP sites, default Matlnspector parameters, selected matrices
V$NOLF Neuron-specific-olfactory factor nncdabTCCCyngrgarbnkgn 167 137 24 6.96 7.4E-28 27.4 6.1 <0.0001 5.2E-22 2.6E-54 X X X
O$VTBP Vertebrate TATA binding protein factor staTAAAwrnn 391 211 494 0.79 1 311 1.26 <0.0001 1 0.65 . X .
V$IKRS Ikaros zinc finger family yyTGGGagr 123 104 69 1.78 5.9E-5 65.6 1.88 <0.0001 2.2E-4 4.9E-14 X X X
Before masking O/E and VTBP sites, less stringent Matlnspector parameters (opt-0.10), selected matrices
V$NOLF Neuron-specific-olfactory factor nncdabTCCCyngrgarbnkgn 653 265 198 3.3 9.6E-58 239.6 2.73 <0.0001 2.7E-64 3.4E-130 X X X
O$VTBP Vertebrate TATA binding protein factor staTAAAwrnn 2388 309 2972 0.8 1 2119.1 1.13 <0.0001 1 0.99 . X .
V$IKRS Ikaros zinc finger family yyTGGGagr 963 297 622 1.55 5E-18 710.3 1.36 <0.0001 1.6E-29 1.7E-73 X X X
After masking O/E and VTBP sites, default Matlnspector parameters, matrices significant in all three tests
V$ARID AT rich interactive domain factor AATAccvm 140 94 89 1.57 4.6E-4 71.9 1.95 <0.0001 0.53 0.0021 + X +
V$ATBF AT-binding TF hhwkrttantAATTahh 101 69 68 1.49 0.0068 56.3 1.8 <0.0001 0.069 7.4E-08 + X X
V$BCDF Bicoid-like homeodomain TFs abnyTAATcmnv 152 119 102 1.49 0.001 131.5 1.16 0.0401 4.7E-17 4.1E-20 + + X
V$BRN5 Brn-5 POU domain factors gCATAawttat 327 165 282 1.16 0.037 217.5 1.5 <0.0001 0.015 5.3E-09 + X X
V$CART Cart-1 cartilage homeoprotein 1 cTAATtrnsynattan 452 183 331 1.37 8.7E-6 318.7 1.42 <0.0001 2.7E-17 2.1E-30 X X X
V$DLXF Distal-less homeodomain TFs nntAATTan 274 129 173 1.58 1.0E-6 141.8 1.93 <0.0001 1.9E-20 2.7E-35 X X X
V$HBOX Homeobox TFs raaTTTAattgaa 510 192 327 1.56 1.3E-10 317.9 1.6 <0.0001 4.3E-17 2.7E-30 X X X
V$HOMF Homeodomain TFs mCTAAttnn 646 214 449 1.44 1.4E-09 463.2 1.39 <0.0001 8.6E-4 1.4E-13 X X X
V$HOXF Paralog hox genes 1-8, clusters A, B, C, D nnamTAATgrggrwnn 583 204 404 1.44 6.7E-09 385.2 1.51 <0.0001 2.3E-09 9.3E-26 X X X
V$LHXF Lim homeodomain factors nntwwttAATTaatnn 557 187 396 1.41 1.0E-7 350.4 1.59 <0.0001 1.8E-08 2.8E-28 X X X
V$MYOD Myoblast determining factors mrgCARCwgswg 30 20 13 2.31 0.0069 16.3 1.84 0.0042 0.017 0.51 + + +
V$NKX1 NK1 homeobox TFs wgnrcyAATTrgygsnn 140 75 89 1.57 4.6E-4 70.2 1.99 <0.0001 1.2E-13 9.9E-21 + X X
V$NKX6 NK6 homeobox TFs TTAAttac 263 151 178 1.48 3.0E-5 155.1 1.7 <0.0001 1.3E-07 6.1E-13 X X X
V$PAXH PAX homeodomain binding sites aawaATTAnn 152 68 95 1.6 1.7E-4 77.5 1.96 <0.0001 0.0015 2.5E-10 X X X
V$PDX1 Pancreatic and intestinal homeodomain TF rnTAATtagync 193 96 131 1.47 3.4E-4 108.4 1.78 <0.0001 1.4E-6 8.2E-16 + X X
After masking O/E and VTBP sites, less stringent Matlnspector parameters (opt-0.10), matrices significant in all three tests
V$AP4R AP4and related proteins wgaryCAGCtgyggnc 121 74 61 1.98 5.1E-6 99 1.22 0.0321 7.7E-08 0.061 X + X
V$DICE Downstream Immunoglobulin Control Element kgtySTCTccacag 186 134 135 1.38 0.0026 138.1 1.35 <0.0001 0.0026 0.2 + X +
V$DLXF Distal-less homeodomain TFs nntAATTan 1252 274 1149 1.09 0.019 1041.7 1.2 <0.0001 0.0004 1.3E-18 + X X
V$HAND Twist subfamily of class B bHLH TFs ccagaTGGCcccccn 696 252 537 1.3 3.3E-6 619.8 1.12 0.0048 0.0067 0.0019 X + +
V$NKX1 NK1 homeobox TFs wgnrcyAATTrgygsnn 783 236 619 1.26 6.6E-6 634.4 1.23 <0.0001 9.9E-12 1E-21 X X X
V$PAX5 PAX-5 B-cell-specific activator protein bcnnnrNKCAnbgnwgnrkrgc 227 139 180 1.26 0.011 192.2 1.18 0.0085 0.09 0.025 + + +
V$PAX6 PAX-4/PAX-6 paired domain binding sites GCASbswtgmgtgmn 664 249 555 1.2 9.8E-4 617 1.08 0.0354 0.011 0.0022 + + +
V$PAXH PAX homeodomain binding sites aawaATTAnn 999 247 889 1.12 0.0061 767.1 1.3 <0.0001 4.5E-09 2E-20 + X X
V$PDX1 Pancreatic and intestinal homeodomain TF rnTAATtagync 1013 257 828 1.22 8.9E-6 743.9 1.36 <0.0001 2.6E-5 3.7E-25 X X X
V$PTF1 Pancreas TF 1, heterotrimeric TF bmcaCCTGyvktkttycccrw 125 95 93 1.34 0.018 100.9 1.24 0.0102 0.015 0.17 + + +
V$SIX3 Sine oculis homeobox homolog 3 nnrhnknTAATswcwncnstv 647 254 574 1.13 0.02 515.5 1.26 <0.0001 1.8E-07 6.6E-28 + X X
  1. Results of our statistical tests for enrichment and conservation are provided for selected matrix families before and after masking O/E sites and TATA boxes. Before masking, we provide results only for selected matrix families that we discuss in the text. After masking, we provide results for any matrix family that appeared statistically significant in all three tests before applying the Bonferroni correction for multiple testing. Additional Files 5, 6, 10 and 11 give results for all matrix families, as well as for all individual matrices. Additional Files 7, 8, 9, 12 and 13 give results of analogous tests for 500-bp putative promoter regions. P-values provided here are not corrected for multiple testing, but in selecting matrices for further discussion we used the conservative Bonferroni correction. For each matrix family, we provide the description from MatBase, using the abbreviation TF for transcription factor. We also provide the consensus sequence (using IUPAC degeneracy codes) of an arbitrarily chosen matrix from each family.
  2. The "significance level" column summarizes the results of the three statistical tests in the following order: (a) enrichment versus previous 200-bp region (b) enrichment versus shuffled sequences (c) conservation scores versus surrounding nucleotides, counting whichever of the sites test or the cores test proved more significant (see Methods). The "." symbol indicates not significant; the "+" symbol indicates significance level of p < = 0.05 before applying Bonferroni correction; and the "X" symbol indicates that the p-value remains significant after applying the Bonferroni correction.