Skip to main content

Table 11 Words not detected in the Introns

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD

E_S

E

CGCGGACA

6.1805

6.4557

CCCGGGAG

4.57278

4.77632

CCGGCCCC

4.46781

4.66667

CGCCCCCC

4.45254

4.65072

GCCCACCG

4.16782

4.35331

GCCGCGGG

3.47686

3.63158

CCGAGGGG

3.34433

3.49315

AAGCGCCC

3.17737

3.31875

CGCCAGCG

2.99188

3.125

CGCTCGCG

2.91507

3.04478

GCGTCGCG

2.8245

2.95017

CCGGCACG

2.48216

2.59259

CCGGGGCG

2.25483

2.35514

CCCGCGCC

2.16189

2.25806

TCGGGCGC

2.11021

2.20408

GCGCACGG

2.02051

2.11039

CGCTCCGC

2.00514

2.09434

CGCGACGC

1.99945

2.0884

TGCGCCCG

1.9539

2.04082

GGTGCGCG

1.92911

2.01493

GCGGGCCC

1.90464

1.98936

CGCGGCGA

1.86163

1.94444

GCGCGACG

1.83299

1.91453

GGGCGGGC

1.79662

1.87654

CCGCCGGG

1.73887

1.81622

  1. Top 25 words that were expected to occur in the introns but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.