Skip to main content

Advertisement

Table 6 The top 25 words in Proximal Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

  Unmasked Masked Unmasked
Word S ES O EO SlnSES S ES O EO SlnSES RevComp RC_Pos Pal PValues
TAAAAAAT 4249 3411.11 4837 3674.74 933.272 3681 3028.65 4071 3237.18 718.039 ATTTTTTA 1 No 0
ATTTTTTA 3876 3135.31 4372 3358.5 822.011 3313 2758.58 3636 2932.38 606.738 TAAAAAAT 0 No 2.22E-16
TTATATAA 3094 2505.92 3390 2650.31 652.239 2712 2508.38 2934 2653.02 211.674 TTATATAA 2 Yes 7.77E-16
AATATATT 3636 3104.08 4093 3322.92 575.097 3178 3009.54 3503 3215.49 173.09 AATATATT 3 Yes 1.67E-15
GAAAAAAG 2066 1652.5 2182 1718.49 461.395 1956 1621.19 2053 1684.9 367.226 CTTTTTTC 5 No 1.11E-16
CTTTTTTC 1960 1578.31 2072 1638.97 424.512 1869 1559.58 1969 1618.92 338.269 GAAAAAAG 4 No 1.11E-16
AAAAATTG 2975 2595.17 3208 2749.61 406.363 2737 2368.41 2938 2497.98 395.888 CAATTTTT 9 No -6.66E-16
TAAAATTT 4339 3951.48 5058 4305.15 405.93 3764 3348.9 4214 3603.07 439.821 AAATTTTA 10 No -6.66E-16
TAATTTTT 4656 4272.02 5336 4686.12 400.739 4125 3726.41 4609 4040.78 419.188 AAAAATTA 19 No 0
CAATTTTT 2872 2499.79 3110 2643.5 398.638 2633 2269.83 2829 2389.32 390.785 AAAAATTG 6 No 6.66E-16
AAATTTTA 4239 3880.57 4921 4221.59 374.5 3651 3305.77 4102 3553.5 362.665 TAAAATTT 7 No 8.88E-16
TACAAAAT 2589 2241.1 2821 2357.73 373.61 2344 2040.96 2514 2138.69 324.496 ATTTTGTA 26 No 6.66E-16
ATTTTCTA 2206 1886.09 2346 1970.39 345.622 2022 1748.93 2142 1822.19 293.357 TAGAAAAT 17 No 8.88E-16
TGAAAAAT 2374 2075.6 2517 2176.47 318.891 2230 1927.32 2354 2015.09 325.288 ATTTTTCA 21 No 5.64E-13
AAAAAATC 3874 3607.85 4265 3902.57 275.738 3494 3280.06 3823 3524 220.77 GATTTTTT 68 No 5.63E-09
CATTTTTC 1675 1426.93 1760 1477.44 268.478 1558 1356.8 1624 1402.92 215.428 GAAAAATG 29 No 5.16E-13
TAAGAAAT 1895 1645.36 1990 1710.83 267.683 1773 1553.49 1856 1612.42 234.336 ATTTCTTA 23 No 2.52E-11
TAGAAAAT 2154 1904.65 2281 1990.5 265.005 1971 1754.61 2083 1828.31 229.215 ATTTTCTA 12 No 1.04E-10
GGAAAAAA 2679 2426.86 2853 2562.63 264.801 2506 2238.07 2643 2354.4 283.363 TTTTTTCC 98 No 9.20E-09
AAAAATTA 4735 4477.84 5547 4933.58 264.404 4109 3862.67 4667 4200.51 254.025 TAATTTTT 8 No 1.33E-15
CAAAATTT 3347 3092.9 3655 3310.2 264.267 3054 2796.42 3304 2974.88 269.093 AAATTTTG 60 No 1.95E-09
ATTTTTCA 2338 2088.5 2489 2190.56 263.846 2169 1928.62 2295 2016.5 254.769 TGAAAAAT 13 No 2.29E-10
TTTTTTGG 3369 3120.79 3724 3341.96 257.829 3050 2802.67 3330 2981.91 257.935 CCAAAAAA 28 No 4.49E-11
ATTTCTTA 1947 1705.79 2052 1775.75 257.518 1800 1598.57 1900 1660.66 213.623 TAAGAAAT 16 No 8.37E-11
  1. Top 25 overrepresented words for the proximal promoters in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.