Skip to main content

Table 1 Summary of statistical tests for 200-bp putative promoter region

From: Rigorous and thorough bioinformatic analyses of olfactory receptor promoters confirm enrichment of O/E and homeodomain binding sites but reveal no new common motifs

     

Comparison to previous 200bp region

Comparison to shuffled sequences

Conservation tests

 

MatBase matrix family

MatBase family description

Consensus of arbitrarily chosen PWM

Number of predicted sites in 200bp promoter region

Number of 200bp promoters containing > = 1 site

Number of sites expected

Enrichment compared to expected number

Enrichment p-value

Number of sites expected

Enrichment compared to expected number

Enrichment p-value

p-value: whether whole sites are more conserved than local backgrounc

p-vaiue: whether core nucleotides are more conserved than local background

Significance levels of three tests

Before masking O/E and VTBP sites, default Matlnspector parameters, selected matrices

V$NOLF

Neuron-specific-olfactory factor

nncdabTCCCyngrgarbnkgn

167

137

24

6.96

7.4E-28

27.4

6.1

<0.0001

5.2E-22

2.6E-54

X X X

O$VTBP

Vertebrate TATA binding protein factor

staTAAAwrnn

391

211

494

0.79

1

311

1.26

<0.0001

1

0.65

. X .

V$IKRS

Ikaros zinc finger family

yyTGGGagr

123

104

69

1.78

5.9E-5

65.6

1.88

<0.0001

2.2E-4

4.9E-14

X X X

Before masking O/E and VTBP sites, less stringent Matlnspector parameters (opt-0.10), selected matrices

V$NOLF

Neuron-specific-olfactory factor

nncdabTCCCyngrgarbnkgn

653

265

198

3.3

9.6E-58

239.6

2.73

<0.0001

2.7E-64

3.4E-130

X X X

O$VTBP

Vertebrate TATA binding protein factor

staTAAAwrnn

2388

309

2972

0.8

1

2119.1

1.13

<0.0001

1

0.99

. X .

V$IKRS

Ikaros zinc finger family

yyTGGGagr

963

297

622

1.55

5E-18

710.3

1.36

<0.0001

1.6E-29

1.7E-73

X X X

After masking O/E and VTBP sites, default Matlnspector parameters, matrices significant in all three tests

V$ARID

AT rich interactive domain factor

AATAccvm

140

94

89

1.57

4.6E-4

71.9

1.95

<0.0001

0.53

0.0021

+ X +

V$ATBF

AT-binding TF

hhwkrttantAATTahh

101

69

68

1.49

0.0068

56.3

1.8

<0.0001

0.069

7.4E-08

+ X X

V$BCDF

Bicoid-like homeodomain TFs

abnyTAATcmnv

152

119

102

1.49

0.001

131.5

1.16

0.0401

4.7E-17

4.1E-20

+ + X

V$BRN5

Brn-5 POU domain factors

gCATAawttat

327

165

282

1.16

0.037

217.5

1.5

<0.0001

0.015

5.3E-09

+ X X

V$CART

Cart-1 cartilage homeoprotein 1

cTAATtrnsynattan

452

183

331

1.37

8.7E-6

318.7

1.42

<0.0001

2.7E-17

2.1E-30

X X X

V$DLXF

Distal-less homeodomain TFs

nntAATTan

274

129

173

1.58

1.0E-6

141.8

1.93

<0.0001

1.9E-20

2.7E-35

X X X

V$HBOX

Homeobox TFs

raaTTTAattgaa

510

192

327

1.56

1.3E-10

317.9

1.6

<0.0001

4.3E-17

2.7E-30

X X X

V$HOMF

Homeodomain TFs

mCTAAttnn

646

214

449

1.44

1.4E-09

463.2

1.39

<0.0001

8.6E-4

1.4E-13

X X X

V$HOXF

Paralog hox genes 1-8, clusters A, B, C, D

nnamTAATgrggrwnn

583

204

404

1.44

6.7E-09

385.2

1.51

<0.0001

2.3E-09

9.3E-26

X X X

V$LHXF

Lim homeodomain factors

nntwwttAATTaatnn

557

187

396

1.41

1.0E-7

350.4

1.59

<0.0001

1.8E-08

2.8E-28

X X X

V$MYOD

Myoblast determining factors

mrgCARCwgswg

30

20

13

2.31

0.0069

16.3

1.84

0.0042

0.017

0.51

+ + +

V$NKX1

NK1 homeobox TFs

wgnrcyAATTrgygsnn

140

75

89

1.57

4.6E-4

70.2

1.99

<0.0001

1.2E-13

9.9E-21

+ X X

V$NKX6

NK6 homeobox TFs

TTAAttac

263

151

178

1.48

3.0E-5

155.1

1.7

<0.0001

1.3E-07

6.1E-13

X X X

V$PAXH

PAX homeodomain binding sites

aawaATTAnn

152

68

95

1.6

1.7E-4

77.5

1.96

<0.0001

0.0015

2.5E-10

X X X

V$PDX1

Pancreatic and intestinal homeodomain TF

rnTAATtagync

193

96

131

1.47

3.4E-4

108.4

1.78

<0.0001

1.4E-6

8.2E-16

+ X X

After masking O/E and VTBP sites, less stringent Matlnspector parameters (opt-0.10), matrices significant in all three tests

V$AP4R

AP4and related proteins

wgaryCAGCtgyggnc

121

74

61

1.98

5.1E-6

99

1.22

0.0321

7.7E-08

0.061

X + X

V$DICE

Downstream Immunoglobulin Control Element

kgtySTCTccacag

186

134

135

1.38

0.0026

138.1

1.35

<0.0001

0.0026

0.2

+ X +

V$DLXF

Distal-less homeodomain TFs

nntAATTan

1252

274

1149

1.09

0.019

1041.7

1.2

<0.0001

0.0004

1.3E-18

+ X X

V$HAND

Twist subfamily of class B bHLH TFs

ccagaTGGCcccccn

696

252

537

1.3

3.3E-6

619.8

1.12

0.0048

0.0067

0.0019

X + +

V$NKX1

NK1 homeobox TFs

wgnrcyAATTrgygsnn

783

236

619

1.26

6.6E-6

634.4

1.23

<0.0001

9.9E-12

1E-21

X X X

V$PAX5

PAX-5 B-cell-specific activator protein

bcnnnrNKCAnbgnwgnrkrgc

227

139

180

1.26

0.011

192.2

1.18

0.0085

0.09

0.025

+ + +

V$PAX6

PAX-4/PAX-6 paired domain binding sites

GCASbswtgmgtgmn

664

249

555

1.2

9.8E-4

617

1.08

0.0354

0.011

0.0022

+ + +

V$PAXH

PAX homeodomain binding sites

aawaATTAnn

999

247

889

1.12

0.0061

767.1

1.3

<0.0001

4.5E-09

2E-20

+ X X

V$PDX1

Pancreatic and intestinal homeodomain TF

rnTAATtagync

1013

257

828

1.22

8.9E-6

743.9

1.36

<0.0001

2.6E-5

3.7E-25

X X X

V$PTF1

Pancreas TF 1, heterotrimeric TF

bmcaCCTGyvktkttycccrw

125

95

93

1.34

0.018

100.9

1.24

0.0102

0.015

0.17

+ + +

V$SIX3

Sine oculis homeobox homolog 3

nnrhnknTAATswcwncnstv

647

254

574

1.13

0.02

515.5

1.26

<0.0001

1.8E-07

6.6E-28

+ X X

  1. Results of our statistical tests for enrichment and conservation are provided for selected matrix families before and after masking O/E sites and TATA boxes. Before masking, we provide results only for selected matrix families that we discuss in the text. After masking, we provide results for any matrix family that appeared statistically significant in all three tests before applying the Bonferroni correction for multiple testing. Additional Files 5, 6, 10 and 11 give results for all matrix families, as well as for all individual matrices. Additional Files 7, 8, 9, 12 and 13 give results of analogous tests for 500-bp putative promoter regions. P-values provided here are not corrected for multiple testing, but in selecting matrices for further discussion we used the conservative Bonferroni correction. For each matrix family, we provide the description from MatBase, using the abbreviation TF for transcription factor. We also provide the consensus sequence (using IUPAC degeneracy codes) of an arbitrarily chosen matrix from each family.
  2. The "significance level" column summarizes the results of the three statistical tests in the following order: (a) enrichment versus previous 200-bp region (b) enrichment versus shuffled sequences (c) conservation scores versus surrounding nucleotides, counting whichever of the sites test or the cores test proved more significant (see Methods). The "." symbol indicates not significant; the "+" symbol indicates significance level of p < = 0.05 before applying Bonferroni correction; and the "X" symbol indicates that the p-value remains significant after applying the Bonferroni correction.