Skip to main content

Advertisement

Table 12 Words not detected in the Core Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD E_S E
CGCACACC 5.86109 6.3029
GTCCGAAC 5.46787 5.88
GCCCTATG 5.23895 5.6338
GGACGTCG 4.98873 5.36471
GGCCCTAG 4.47129 4.80822
CGCGAGCG 4.35999 4.68852
GATCCCCC 3.92081 4.21622
GGCCGCAT 3.82028 4.10811
TACCCAGG 3.80429 4.09091
GGCCCCTG 3.67267 3.94937
CGCATCCG 3.66922 3.94565
CACGCCGA 3.56933 3.83824
CCGGCCGC 3.51312 3.77778
CGCGGTCA 3.51079 3.77528
AGGGCCCT 3.50922 3.77358
GGCGCTGT 3.49296 3.7561
ACGCCCTG 3.45587 3.71622
GCGGACAC 3.30648 3.55556
AGTGGCGC 3.29952 3.54808
GGGCGTTC 3.26995 3.51628
CGCGCAAG 3.25481 3.5
ACCCGCGT 3.22635 3.46939
TTACCCCG 3.22482 3.46774
CCGGTGCG 3.18249 3.42222
TAGGGCCG 3.18249 3.42222
  1. Top 25 words that were expected to occur in the core promoters but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.