The word landscape of the non-coding segments of the Arabidopsis thaliana genome

BMC Genomics

Table 17 Co-occurrence in Proximal Promoters

Word1	Word2	S	E_S	S*ln(S/E_S)
AAATTTTA	TAAAAAAT	996	489.8445	706.8206
ATTTTTTA	TAAAAAAT	869	395.77	683.4771
TAAATTTT	TAAAAAAT	970	501.8706	639.1852
AAAAATTA	TAAAAAAT	1040	565.2386	634.1171
TAAAATTT	TAAAAAAT	963	498.7952	633.5171
TAAAATTT	ATTTTTTA	892	458.4645	593.7003
AAATTTTA	ATTTTTTA	868	450.2375	569.7695
AAAAATTA	ATTTTTTA	947	519.5356	568.5445
AAAATTTA	TAAAAAAT	919	496.1801	566.4231
TAATTTTT	TAAAAAAT	965	539.2575	561.5671
AAAATTTA	ATTTTTTA	865	456.0608	553.6894
TAATTTTT	ATTTTTTA	907	495.6552	548.0656
AATATATT	TAAAAAAT	776	391.8276	530.2646
AAAATTTA	AAATTTTA	973	564.4665	529.8015
AAATTTTA	TAAAATTT	976	567.4415	529.3092
AAAAATTA	TAATTTTT	1125	707.8947	521.1483
AATATATT	ATTTTTTA	730	360.1459	515.7708
TAAATTTT	ATTTTTTA	845	461.2912	511.4845
AAAAATTA	TAAAATTT	1052	654.7789	498.8066
AAAATTTA	AAAAATTA	1044	651.346	492.5318
AAAATTTA	TAAAATTT	958	574.7807	489.4031
AAATTTTA	TAATTTTT	993	613.4724	478.2242
TAATTTTT	TAAAATTT	995	624.6821	463.1724
AAAATTTA	TAATTTTT	990	621.407	461.0615
TTATATAA	TAAAAAAT	645	316.3233	459.5531

Overrepresented non-overlapping word-pairs detected in the proximal promoters of Arabidopsis thaliana. A word-pair is characterized through the two nucleotide sequences associated with it (Word1 and Word2), the number of sequences the pair occurs in (S) as well as the expected number of sequences (E_S) and a statistical score symbolizing the overrepresentation of the word-pair in the specific sequence set (S*ln(S/E_S)).

ISSN: 1471-2164