Skip to main content

Advertisement

Table 4 The top 25 words in Introns

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

  Unmasked Masked Unmasked
Word S ES O EO SlnSES S ES O EO SlnSES RevComp RC_Pos Pal PValues
TTTTTGTT 10048 9365.74 11094 10679.8 706.524 9819 9103.26 10783 10355.3 743.17 TTTTTGTT 10048 9365.74 3.44E-05
TTTTTCTT 9144 8495.68 10021 9609.91 672.454 8939 8293.57 9751 9363.74 669.915 TTTTTCTT 9144 8495.68 1.58E-05
CTTTTTTC 2764 2170.42 2821 2314.32 668.224 2713 2187.97 2767 2333.43 583.515 CTTTTTTC 2764 2170.42 8.88E-16
GTTTTTGA 2673 2105.13 2742 2243.33 638.372 2631 2056.65 2696 2190.66 647.973 GTTTTTGA 2673 2105.13 -2.22E-16
TTTTGCAG 3505 2959.4 3523 3179.19 593.06 3452 2920.63 3470 3136.4 577.016 TTTTGCAG 3505 2959.4 1.07E-09
TTTTTTGT 7618 7067.97 8198 7889.79 570.901 7400 6823.86 7922 7600.06 599.8 TTTTTTGT 7618 7067.97 0.000286
TTTTTTGG 3765 3238.3 3942 3487.94 567.378 3635 3124.76 3795 3362.05 549.804 TTTTTTGG 3765 3238.3 2.62E-14
TTTTCTTT 9256 8733.23 10299 9900.39 538.109 9041 8500.1 9994 9615.3 557.761 TTTTCTTT 9256 8733.23 3.48E-05
TGTTTTTT 7487 6984.58 8028 7790.67 520.072 7254 6759.65 7750 7524.05 512 TGTTTTTT 7487 6984.58 0.003768
CTCTCTTT 3193 2716.79 3289 2911.9 515.697 3086 2625.01 3165 2811.09 499.291 CTCTCTTT 3193 2716.79 3.97E-12
ATTTTTTA 2508 2044.78 2645 2177.76 512.128 2383 2003.78 2486 2133.28 413.027 ATTTTTTA 2508 2044.78 3.33E-16
TTTTTTCC 3166 2702.47 3253 2896.16 501.186 3086 2616.31 3161 2801.55 509.528 TTTTTTCC 3166 2702.47 4.13E-11
TGTTTCAG 2215 1790.21 2239 1902.05 471.614 2153 1745.3 2177 1853.55 451.987 TGTTTCAG 2215 1790.21 3.01E-14
GGTTTTTG 2029 1611.17 2092 1708.92 467.851 1997 1584.97 2058 1680.71 461.47 GGTTTTTG 2029 1611.17 1.11E-16
TTTTGTTT 12142 11689.3 13879 13619.2 461.327 11843 11368.1 13438 13205.7 484.659 TTTTGTTT 12142 11689.3 0.013306
TTTGTTTT 11017 10569.9 12527 12188.1 456.39 10729 10259.7 12106 11796.5 479.827 TTTGTTTT 11017 10569.9 0.00113
CTTTTTTA 2234 1828.76 2282 1943.72 447.149 2178 1816.31 2220 1930.26 395.524 CTTTTTTA 2234 1828.76 4.17E-14
AATATATT 2022 1642.55 2143 1742.72 420.253 1925 1679.14 2019 1782.16 263.038 AATATATT 2022 1642.55 4.44E-16
ATTTTTCA 2411 2030.35 2467 2162.1 414.291 2349 1971.89 2398 2098.68 411.073 ATTTTTCA 2411 2030.35 7.51E-11
ATTTTTTC 2810 2425.9 2881 2592.99 413.021 2736 2412.96 2800 2578.85 343.758 ATTTTTTC 2810 2425.9 1.43E-08
CAATTTTT 2402 2023.84 2481 2155.04 411.472 2320 1952.98 2388 2078.19 399.534 CAATTTTT 2402 2023.84 3.73E-12
TTTTTTCT 7674 7280.17 8254 8142.69 404.295 7476 7074.7 8001 7897.8 412.475 TTTTTTCT 7674 7280.17 0.109849
TGTTGCAG 1922 1563.72 1933 1657.84 396.507 1891 1543.21 1902 1635.78 384.332 TGTTGCAG 1922 1563.72 2.42E-11
TTTCATTT 4636 4258.39 4840 4630.74 393.879 4538 4169.05 4731 4529.8 384.813 TTTCATTT 4636 4258.39 0.001152
TTTTTATT 5647 5276.08 6142 5792.21 383.658 5417 5037.47 5842 5517.96 393.481 TTTTTATT 5647 5276.08 2.72E-06
  1. Top 25 overrepresented words for the Introns in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.