Skip to main content

Advertisement

Table 7 The top 25 words in Distal Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

  Unmasked Masked Unmasked
Word S ES O EO SlnSES S ES O EO SlnSES RevComp RC_Pos Pal PValues
ATTTTTTA 5789 4874.02 7202 5393.37 995.937 4920 4189.9 5773 4568.53 790.309 TAAAAAAT 1 No 6.66E-16
TAAAAAAT 5865 4983.57 7314 5527.8 955.154 5003 4269.17 5877 4662.83 793.568 ATTTTTTA 0 No 6.66E-16
GAAAAAAG 3578 2825.77 3921 2995.09 844.484 3394 2744.34 3697 2903.99 721.112 CTTTTTTC 3 No 8.88E-16
CTTTTTTC 3546 2878.92 3904 3054.71 739.005 3345 2798.31 3662 2964.33 596.918 GAAAAAAG 2 No 0
TTATATAA 4781 4107.17 5656 4470.46 726.305 4138 3955.09 4717 4291.1 187.08 TTATATAA 4 Yes 0
AATATATT 5432 4895.21 6702 5419.31 565.205 4688 4574.65 5538 5029.33 114.742 AATATATT 5 Yes 0
CAAGAAAC 2910 2459.44 3187 2587.64 489.513 2818 2410.32 3089 2533.47 440.364 GTTTCTTG 7 No -4.44E-16
GTTTCTTG 2912 2482.93 3182 2613.58 464.176 2842 2430.36 3108 2555.55 444.685 CAAGAAAC 6 No 0
GAAAAATG 3158 2736.51 3416 2895.24 452.402 2871 2566.09 3080 2705.63 322.343 CATTTTTC 29 No 0
GTTTTTGA 3516 3093.27 3830 3296.52 450.382 3207 2816.69 3462 2984.91 416.186 TCAAAAAC 13 No 8.88E-16
GAAAAAAC 3013 2605.34 3240 2749.19 438.004 2744 2495.22 2935 2627.17 260.786 GTTTTTTC 26 No 5.55E-16
CAATTTTT 4457 4041.77 4991 4393.18 435.864 4009 3601.54 4440 3878.67 429.685 AAAAATTG 25 No 1.67E-15
ATTTTGTA 4098 3689.96 4626 3981.23 429.814 3735 3342.23 4123 3580.11 414.995 TACAAAAT 69 No 1.55E-15
TCAAAAAC 3414 3011.29 3688 3203.78 428.513 3129 2749.95 3358 2910.25 404.054 GTTTTTGA 9 No 7.77E-16
GAAGAAAG 3851 3448.5 4291 3702.07 425.126 3664 3290.44 4048 3520.87 394.006 CTTTCTTC 59 No 1.11E-16
GTTTTATG 2173 1793.07 2293 1861.81 417.607 2048 1720.91 2156 1784.36 356.372 CATAAAAC 57 No 1.11E-16
CTTTATTC 1618 1250.45 1676 1284.79 416.937 1500 1215.7 1548 1248.25 315.217 GAATAAAG 43 No 4.44E-16
GTTTTAAG 1957 1584.64 2054 1638.71 413.031 1791 1482.73 1871 1530.29 338.304 CTTAAAAC 28 No 1.33E-15
ATTTTTCA 4081 3695.36 4496 3987.5 405.1 3743 3364 4095 3605.05 399.585 TGAAAAAT 40 No 6.66E-16
TAAGAAGT 1465 1112.41 1517 1139.93 403.359 1388 1100.56 1435 1127.54 322.073 ACTTCTTA 62 No -8.88E-16
CTTGTTTC 2351 1980.52 2504 2064.03 403.153 2269 1929.76 2415 2009.12 367.453 GAAACAAG 35 No 0
CAAAAAAG 3391 3011.99 3696 3204.57 401.915 3126 2864.52 3392 3038.54 273.068 CTTTTTTG 88 No 0
TAGAAAAT 3556 3178.38 3887 3393.13 399.217 3219 2901.76 3488 3080.38 333.981 ATTTTCTA 41 No 0
ATTCTTCA 2716 2348.17 2896 2465.08 395.248 2529 2255.7 2691 2363.65 289.221 TGAAGAAT 31 No 1.11E-16
  1. Top 25 overrepresented words for the distal promoters in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.