Skip to main content

Advertisement

Table 10 Words not detected in the 5'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD E_S E
GGAACTGC 5.1333 5.40909
GAGGACCC 5.02658 5.29661
GCCCTATA 5.015 5.2844
CCGTACCT 4.98236 5.25
GCGAGTAT 4.94491 5.21053
TATCGCAC 4.83088 5.09034
GGTTGCGG 4.69443 4.94652
GCGGAGTG 4.66421 4.91468
AGTACAGC 4.51745 4.76
GTGCCGAT 4.4368 4.675
GTCCTGGG 4.41572 4.65278
CGGCCGTG 4.3768 4.61176
GGTCGGGG 4.16843 4.39216
GTGCTGGG 4.13122 4.35294
TAGTGCAC 4.12843 4.35
TACCGGCC 4.08277 4.30189
GCCTACGC 4.03144 4.24779
CACCGCGG 3.94494 4.15663
GCGGCGTG 3.90217 4.11155
CGCCTTAG 3.77819 3.98089
CAGCCCAG 3.74709 3.94811
TGAACGGG 3.74703 3.94805
CGTACTGC 3.74638 3.94737
GTGCGCCG 3.68013 3.87755
AGTCCTGG 3.67692 3.87417
  1. Top 25 words that were expected to occur in the 5'UTR but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.