Skip to main content

Table 10 Words not detected in the 5'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD

E_S

E

GGAACTGC

5.1333

5.40909

GAGGACCC

5.02658

5.29661

GCCCTATA

5.015

5.2844

CCGTACCT

4.98236

5.25

GCGAGTAT

4.94491

5.21053

TATCGCAC

4.83088

5.09034

GGTTGCGG

4.69443

4.94652

GCGGAGTG

4.66421

4.91468

AGTACAGC

4.51745

4.76

GTGCCGAT

4.4368

4.675

GTCCTGGG

4.41572

4.65278

CGGCCGTG

4.3768

4.61176

GGTCGGGG

4.16843

4.39216

GTGCTGGG

4.13122

4.35294

TAGTGCAC

4.12843

4.35

TACCGGCC

4.08277

4.30189

GCCTACGC

4.03144

4.24779

CACCGCGG

3.94494

4.15663

GCGGCGTG

3.90217

4.11155

CGCCTTAG

3.77819

3.98089

CAGCCCAG

3.74709

3.94811

TGAACGGG

3.74703

3.94805

CGTACTGC

3.74638

3.94737

GTGCGCCG

3.68013

3.87755

AGTCCTGG

3.67692

3.87417

  1. Top 25 words that were expected to occur in the 5'UTR but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.