Skip to main content

Advertisement

Table 3 The top 25 words in 5'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

  Unmasked Masked Unmasked
Word S ES O EO SlnSES S ES O EO SlnSES RevComp RC_Pos Pal PValues
CTCTTCTC 871 614.433 992 668.648 303.928 883 669.295 972 729.203 244.68 GAGAAGAG 4 No -2.22E-16
CTTTCTCT 1154 1003.84 1293 1115.45 160.868 1204 1040.02 1327 1164.52 176.278 AGAGAAAG 15 No 1.14E-07
AACAAAAA 1051 920.535 1134 1018.31 139.302 1082 933.212 1157 1036.72 160.064 TTTTTGTT 16 No 0.000192
TTTCTTCA 611 492.734 631 532.75 131.443 808 714.439 849 780.981 99.4364 TGAAGAAA 227 No 1.88E-05
GAGAAGAG 316 211.511 360 225.309 126.863 305 219.262 327 231.047 100.664 CTCTTCTC 0 No 0
TTCTCTCC 455 346.314 464 371.543 124.193 504 412.082 517 440.518 101.482 GGAGAGAA 130 No 2.11E-06
CTTTCTTC 883 771.778 929 846.965 118.876 960 807.394 1006 888.66 166.197 GAAGAAAG 87 No 0.00285
CTCTCTTT 1229 1116.97 1351 1248.77 117.468 1284 1161.65 1410 1312.47 128.577 AAAGAGAG 9 No 0.002211
TTTCTCTC 1421 1308.64 1554 1478.35 117.051 1494 1385.35 1636 1591.45 112.808 GAGAGAAA 74 No 0.025997
AAAGAGAG 666 561.408 709 609.221 113.781 625 511.53 649 550.867 125.216 CTCTCTTT 7 No 4.30E-05
AGAAAAAA 1078 972.588 1154 1078.91 110.928 1097 983.999 1179 1097.24 119.255 TTTTTTCT 93 No 0.012195
AAAGAAAA 978 875.456 1093 966.097 108.328 1000 886.23 1111 981.116 120.779 TTTTCTTT 35 No 3.32E-05
ATCTCTCA 332 243.705 342 260.045 102.647 380 308.328 392 327.073 79.4223 TGAGAGAT 448 No 6.93E-07
AAAAAACA 759 663.266 803 723.672 102.333 774 675.404 814 736.19 105.466 TGTTTTTT 298 No 0.001952
TTTTTCTT 1020 923.944 1116 1022.27 100.884 1501 1398.57 1742 1608.22 106.097 AAGAAAAA 20 No 0.001995
AGAGAAAG 589 496.468 634 536.894 100.664 548 457.974 578 491.244 98.3457 CTTTCTCT 1 No 2.45E-05
TTTTTGTT 811 719.391 885 787.265 97.2085 1506 1441.03 1818 1662.31 66.4099 AACAAAAA 2 No 0.000332
ACAAAAAA 845 754.352 901 827.069 95.888 865 767.534 916 842.311 103.408 TTTTTTGT 37 No 0.005817
TAAAAAAG 231 152.899 238 162.371 95.3195 272 196.748 284 206.973 88.0952 CTTTTTTA 149 No 1.66E-08
CAAAAACC 357 273.395 362 292.183 95.2547 386 290.194 393 307.419 110.121 GGTTTTTG 59 No 4.45E-05
AAGAAAAA 1104 1013.1 1209 1126.3 94.8599 1134 1021.85 1230 1142.64 118.087 TTTTTCTT 14 No 0.007636
CCTCTCTT 351 268.225 358 286.579 94.4052 372 313.865 375 333.083 63.2147 AAGAGAGG 550 No 2.65E-05
TCTTCTCC 907 817.38 946 899.203 94.3624 899 804.147 934 884.875 100.239 GGAGAAGA 676 No 0.062179
TTCTCTCA 473 387.786 484 416.951 93.9572 538 481.457 555 517.331 59.7404 TGAGAGAA 126 No 0.000721
  1. Top 25 overrepresented words for the 5'Untranslated Regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.