Skip to main content

Table 9 Words not detected in the 3'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD

E_S

E

CTAGCAGG

5.98269

6.17391

ACTGCCAG

4.99319

5.1526

CGCCTGAT

4.97776

5.13667

GCGTCCGA

4.52742

4.67187

GGGGTGGC

4.5248

4.66917

ACTCCGCC

4.38831

4.5283

CCCGTTCC

4.25101

4.3866

ACACGCCG

4.21714

4.35165

CCCGCTCA

4.193

4.32673

CTGGGCGT

4.06873

4.19847

GACCTGCG

3.71851

3.83704

GCGCAGTA

3.68699

3.80451

GCACCCGA

3.6084

3.7234

GCACCCTC

3.59671

3.71134

CGCACCCA

3.54333

3.65625

CCGCCGTC

3.53385

3.64646

GGGTCGGC

3.52406

3.63636

GCACGCCT

3.35465

3.46154

GCGCAGCC

3.31181

3.41732

CGTCCGCT

3.28252

3.3871

CTGGCGCC

3.2624

3.36634

GGCGACCT

3.25626

3.36

ATACGCCC

3.18816

3.28972

AGCGCTCC

2.98494

3.08

TAGCGCGG

2.98494

3.08

  1. Top 25 words that were expected to occur in the 3'UTR but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.