Skip to main content

Advertisement

Table 2 The top 25 words in 3'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

  Unmasked Masked Unmasked
Word S ES O EO SlnSES S ES O EO SlnSES RevComp RC_Pos Pal PValues
TTTTTGTT 2264 2066.82 2488 2306.04 206.297 2279 2066.89 2501 2331.04 222.643 AACAAAAA 40 No 9.38E-05
TTTTTCTT 2171 1981.63 2404 2203.7 198.149 2183 1978.5 2427 2222.83 214.723 AAGAAAAA 49 No 1.34E-05
TTTTTTGG 998 824.458 1046 877.255 190.646 1003 831.208 1053 888.417 188.434 CCAAAAAA 651 No 1.71E-08
ATTTTGTA 732 583.938 752 615.741 165.421 738 599.956 759 634.768 152.831 TACAAAAT 37 No 6.00E-08
TAATTTTT 787 642.133 810 678.585 160.101 797 646.36 821 685.263 166.97 AAAAATTA 164 No 5.24E-07
ATGTTTTA 589 469.818 601 493.292 133.161 610 486.404 624 512.055 138.116 TAAAACAT 284 No 1.48E-06
TTTGTTTT 2517 2402.46 2847 2715.8 117.227 2555 2406.15 2897 2753.88 153.362 AAAACAAA 1963 No 0.006347
GTTTTTGA 491 390.189 504 408.466 112.838 512 407.532 527 427.529 116.841 TCAAAAAC 5031 No 2.76E-06
AAATTTTG 588 491.471 603 516.445 105.443 604 504.212 621 531.22 109.069 CAAAATTT 376 No 0.00011
ATTTTTTA 482 387.674 498 405.795 104.97 492 406.16 510 426.064 94.3317 TAAAAAAT 100 No 5.33E-06
ATTTTTCA 446 354.812 450 370.941 102.014 453 365.873 457 383.118 96.7633 TGAAAAAT 170 No 3.83E-05
TGTTTTGT 1227 1133.19 1326 1219.91 97.5897 1255 1162.02 1359 1260.07 96.6082 ACAAAACA 659 No 0.001413
ATAAAAAT 564 474.529 580 498.326 97.4203 566 480.088 581 505.265 93.1776 ATTTTTAT 27 No 0.000192
TTTTTTCT 1721 1628.11 1839 1786.09 95.4882 1722 1625.78 1847 1798.84 99.0176 AGAAAAAA 106 No 0.107802
AAAAATTG 397 312.488 400 326.178 95.0296 414 323.794 419 338.423 101.744 CAATTTTT 66 No 4.26E-05
TATAATAT 505 419.081 519 439.185 94.1802 514 429.108 530 450.594 92.7844 ATATTATA 275 No 0.000114
CTCTGTTT 763 674.497 814 713.654 94.0706 796 706.86 852 751.4 94.5386 AAACAGAG 227 No 0.000125
TTTTTAAT 897 808.297 929 859.536 93.4009 905 811.646 942 866.766 98.5274 ATTAAAAA 95 No 0.009964
TTCTTTTT 1884 1795.18 2075 1982.05 90.9811 1879 1764.9 2059 1964.59 117.709 AAAAAGAA 130 No 0.019465
TTTTTGGT 989 902.56 1029 963.191 90.453 1006 920.175 1052 987.344 89.7087 ACCAAAAA 9144 No 0.018455
ATTTTCTG 324 245.197 330 255.296 90.2932 340 264.756 346 275.991 85.047 CAGAAAAT 241 No 4.24E-06
AATATATT 462 382.795 474 400.615 86.8857 477 412.829 490 433.187 68.9186 AATATATT 21 Yes 0.000195
TTTGTGTG 688 607.303 705 640.94 85.8355 705 625.577 726 662.623 84.2635 CACACAAA 8153 No 0.006617
TGTTTTTT 1716 1632.37 1839 1791.05 85.7404 1730 1636.78 1864 1811.88 95.8269 AAAAAACA 1065 No 0.131261
  1. Top 25 overrepresented words for the 3'Untranslated Regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.