The word landscape of the non-coding segments of the Arabidopsis thaliana genome

BMC Genomics

Table 16 Co-occurrence in Core Promoters

Word1	Word2	S	E_S	S*ln(S/E_S)
GCCCAATA	GCCCATTA	32	2.3492	83.5729
TTTTTTCT	TTTTTCTT	68	22.9531	73.8516
AATAAAAA	AAGAAAAA	84	41.5798	59.069
CTCTCTTT	CTTTCTCT	40	9.1626	58.95
AATAAAAA	ATTAAAAA	57	22.4453	53.1222
ACAAAAAA	AAGAAAAA	71	35.1265	49.9645
ACAAAAAA	AGAAAAAA	66	31.1075	49.6455
ATTTCTCA	TATAAATA	30	6.1031	47.772
AATAAAAA	TAAAAAAT	38	10.8748	47.5432
AAAAAACA	ACAAAAAA	56	24.4921	46.3121
AAAAATAT	AAAAAACA	44	15.5191	45.8533
AACAAAAA	AAGAAAAA	77	42.5433	45.6828
AACAAAAA	AGAAAAAA	69	37.6758	41.7512
TTTCTTTT	TTTTTTGT	40	14.2927	41.1653
AAAAAACA	ATATAAAG	30	7.659	40.9596
AAAAAACA	CTATATAA	36	11.9538	39.689
AAAAATAT	CTATATAA	30	8.0863	39.3309
TATATAAA	TAAAAAAT	36	12.3623	38.4793
AATAAAAA	TTAAAAAA	53	25.8324	38.0892
TTTTATTT	TTTTTTAA	38	14.0039	37.9336
TTTTATTT	TTTTTCTT	50	23.5743	37.5932
TTCTTTTT	TTTTTCTT	46	20.3942	37.416
AAATTAAA	ACAAAAAA	44	18.9721	37.0137
AATAAAAA	AGAAAAAA	65	36.8225	36.938
TTTCTTTT	TTTTTGTT	41	16.8429	36.4755

Overrepresented non-overlapping word-pairs detected in the core promoters of Arabidopsis thaliana. A word-pair is characterized through the two nucleotide sequences associated with it (Word1 and Word2), the number of sequences the pair occurs in (S) as well as the expected number of sequences (E_S) and a statistical score symbolizing the overrepresentation of the word-pair in the specific sequence set (S*ln(S/E_S)).

ISSN: 1471-2164