Skip to main content

Advertisement

Table 5 The top 25 words in Core Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

  Unmasked Masked Unmasked
Word S ES O EO SlnSES S ES O EO SlnSES RevComp RC_Pos Pal PValues
TATAAATA 1355 1071.69 1369 1175.57 317.831 1300 1029.92 1311 1128.85 302.753 TATTTATA 69 No 2.02E-08
CTATAAAT 712 474.27 716 514.446 289.286 704 464.711 708 503.987 292.416 ATTTATAG 2504 No 7.77E-16
CTATATAA 636 410.261 638 444.486 278.826 626 450.579 628 488.533 205.839 TTATATAG 18530 No 1.11E-16
ATATAAAC 560 350.797 560 379.643 261.928 554 347.685 554 376.253 258.091 GTTTATAT 26957 No 4.44E-16
TAAAAAAT 473 295.342 480 319.301 222.765 453 298.58 460 322.82 188.835 ATTTTTTA 12 No -2.22E-16
ATATATAC 544 394.869 559 427.688 174.295 507 330.093 515 357.099 217.573 GTATATAT 5651 No 7.41E-10
AATATATT 300 181.346 300 195.646 151.012 287 195.452 287 210.918 110.256 AATATATT 6 Yes 2.74E-12
TTATATAA 524 397.031 529 430.047 145.398 514 430.79 518 466.905 90.7739 TTATATAA 7 Yes 2.22E-06
AAGAAAAA 1261 1129.24 1318 1240.05 139.165 1189 1063 1238 1165.84 133.189 TTTTTCTT 25 No 0.014544
ATATAAAG 378 262.861 380 284.014 137.316 375 261.181 377 282.19 135.643 CTTTATAT 377 No 3.41E-08
TATATAAA 1260 1131.11 1276 1242.15 135.966 1234 1102.41 1250 1209.97 139.143 TTTATATA 1458 No 0.171817
AGAAAAAA 1127 1000.04 1170 1095.49 134.693 1063 936.863 1099 1025.06 134.271 TTTTTTCT 31 No 0.01331
ATTTTTTA 312 204.097 315 220.282 132.415 299 207.163 302 223.604 109.715 TAAAAAAT 4 No 1.17E-09
TTTTAAAA 688 568.245 696 617.46 131.571 658 543.865 665 590.7 125.351 TTTTAAAA 13 Yes 0.001019
CTCTTCTC 402 294.202 429 318.061 125.499 371 277.661 390 300.087 107.516 GAGAAGAG 444 No 1.97E-09
ACAAAAAA 958 840.585 988 918.052 125.259 917 799.552 939 872.564 125.681 TTTTTTGT 45 No 0.011607
ATAAATAC 578 466.039 582 505.44 124.446 574 459.992 578 498.825 127.095 GTATTTAT 14072 No 0.000465
TTATAAAA 507 397.553 508 430.617 123.294 490 386.47 491 418.525 116.302 TTTTATAA 945 No 0.000153
AAATTAAA 718 609.913 745 663.251 117.144 682 578.03 705 628.206 112.806 TTTAATTT 96 No 0.000967
GCCCATTA 374 273.89 396 295.991 116.512 372 272.658 394 294.653 115.571 TAATGGGC 190 No 1.82E-08
AAAAAACA 893 787.368 924 859.073 112.42 849 736.927 874 803.277 120.193 TGTTTTTT 33 No 0.014723
TTAAAAAA 805 701.565 828 764.227 110.71 768 667.112 788 726.227 108.159 TTTTTTAA 27 No 0.01177
ATTAAAAA 708 609.58 719 662.885 105.969 671 581.412 681 631.921 96.1611 TTTTTAAT 316 No 0.016276
GCCCAATA 322 231.782 340 250.291 105.859 321 228.286 337 246.5 109.41 TATTGGGC 130 No 4.26E-08
  1. Top 25 overrepresented words for the core promoter regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.