Skip to main content

Table 16 Co-occurrence in Core Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Word1

Word2

S

ES

S*ln(S/ES)

GCCCAATA

GCCCATTA

32

2.3492

83.5729

TTTTTTCT

TTTTTCTT

68

22.9531

73.8516

AATAAAAA

AAGAAAAA

84

41.5798

59.069

CTCTCTTT

CTTTCTCT

40

9.1626

58.95

AATAAAAA

ATTAAAAA

57

22.4453

53.1222

ACAAAAAA

AAGAAAAA

71

35.1265

49.9645

ACAAAAAA

AGAAAAAA

66

31.1075

49.6455

ATTTCTCA

TATAAATA

30

6.1031

47.772

AATAAAAA

TAAAAAAT

38

10.8748

47.5432

AAAAAACA

ACAAAAAA

56

24.4921

46.3121

AAAAATAT

AAAAAACA

44

15.5191

45.8533

AACAAAAA

AAGAAAAA

77

42.5433

45.6828

AACAAAAA

AGAAAAAA

69

37.6758

41.7512

TTTCTTTT

TTTTTTGT

40

14.2927

41.1653

AAAAAACA

ATATAAAG

30

7.659

40.9596

AAAAAACA

CTATATAA

36

11.9538

39.689

AAAAATAT

CTATATAA

30

8.0863

39.3309

TATATAAA

TAAAAAAT

36

12.3623

38.4793

AATAAAAA

TTAAAAAA

53

25.8324

38.0892

TTTTATTT

TTTTTTAA

38

14.0039

37.9336

TTTTATTT

TTTTTCTT

50

23.5743

37.5932

TTCTTTTT

TTTTTCTT

46

20.3942

37.416

AAATTAAA

ACAAAAAA

44

18.9721

37.0137

AATAAAAA

AGAAAAAA

65

36.8225

36.938

TTTCTTTT

TTTTTGTT

41

16.8429

36.4755

  1. Overrepresented non-overlapping word-pairs detected in the core promoters of Arabidopsis thaliana. A word-pair is characterized through the two nucleotide sequences associated with it (Word1 and Word2), the number of sequences the pair occurs in (S) as well as the expected number of sequences (ES) and a statistical score symbolizing the overrepresentation of the word-pair in the specific sequence set (S*ln(S/ES)).