Skip to main content

Table 12 Words not detected in the Core Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD

E_S

E

CGCACACC

5.86109

6.3029

GTCCGAAC

5.46787

5.88

GCCCTATG

5.23895

5.6338

GGACGTCG

4.98873

5.36471

GGCCCTAG

4.47129

4.80822

CGCGAGCG

4.35999

4.68852

GATCCCCC

3.92081

4.21622

GGCCGCAT

3.82028

4.10811

TACCCAGG

3.80429

4.09091

GGCCCCTG

3.67267

3.94937

CGCATCCG

3.66922

3.94565

CACGCCGA

3.56933

3.83824

CCGGCCGC

3.51312

3.77778

CGCGGTCA

3.51079

3.77528

AGGGCCCT

3.50922

3.77358

GGCGCTGT

3.49296

3.7561

ACGCCCTG

3.45587

3.71622

GCGGACAC

3.30648

3.55556

AGTGGCGC

3.29952

3.54808

GGGCGTTC

3.26995

3.51628

CGCGCAAG

3.25481

3.5

ACCCGCGT

3.22635

3.46939

TTACCCCG

3.22482

3.46774

CCGGTGCG

3.18249

3.42222

TAGGGCCG

3.18249

3.42222

  1. Top 25 words that were expected to occur in the core promoters but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.