The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Lichtenberg, Jens; Yilmaz, Alper; Welch, Joshua D; Kurz, Kyle; Liang, Xiaoyu; Drews, Frank; Ecker, Klaus; Lee, Stephen S; Geisler, Matt; Grotewold, Erich; Welch, Lonnie R

doi:10.1186/1471-2164-10-463

BMC Genomics

Table 6 The top 25 words in Proximal Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

	Unmasked					Masked					Unmasked
Word	S	ES	O	EO	SlnSES	S	ES	O	EO	SlnSES	RevComp	RC_Pos	Pal	PValues
TAAAAAAT	4249	3411.11	4837	3674.74	933.272	3681	3028.65	4071	3237.18	718.039	ATTTTTTA	1	No	0
ATTTTTTA	3876	3135.31	4372	3358.5	822.011	3313	2758.58	3636	2932.38	606.738	TAAAAAAT	0	No	2.22E-16
TTATATAA	3094	2505.92	3390	2650.31	652.239	2712	2508.38	2934	2653.02	211.674	TTATATAA	2	Yes	7.77E-16
AATATATT	3636	3104.08	4093	3322.92	575.097	3178	3009.54	3503	3215.49	173.09	AATATATT	3	Yes	1.67E-15
GAAAAAAG	2066	1652.5	2182	1718.49	461.395	1956	1621.19	2053	1684.9	367.226	CTTTTTTC	5	No	1.11E-16
CTTTTTTC	1960	1578.31	2072	1638.97	424.512	1869	1559.58	1969	1618.92	338.269	GAAAAAAG	4	No	1.11E-16
AAAAATTG	2975	2595.17	3208	2749.61	406.363	2737	2368.41	2938	2497.98	395.888	CAATTTTT	9	No	-6.66E-16
TAAAATTT	4339	3951.48	5058	4305.15	405.93	3764	3348.9	4214	3603.07	439.821	AAATTTTA	10	No	-6.66E-16
TAATTTTT	4656	4272.02	5336	4686.12	400.739	4125	3726.41	4609	4040.78	419.188	AAAAATTA	19	No	0
CAATTTTT	2872	2499.79	3110	2643.5	398.638	2633	2269.83	2829	2389.32	390.785	AAAAATTG	6	No	6.66E-16
AAATTTTA	4239	3880.57	4921	4221.59	374.5	3651	3305.77	4102	3553.5	362.665	TAAAATTT	7	No	8.88E-16
TACAAAAT	2589	2241.1	2821	2357.73	373.61	2344	2040.96	2514	2138.69	324.496	ATTTTGTA	26	No	6.66E-16
ATTTTCTA	2206	1886.09	2346	1970.39	345.622	2022	1748.93	2142	1822.19	293.357	TAGAAAAT	17	No	8.88E-16
TGAAAAAT	2374	2075.6	2517	2176.47	318.891	2230	1927.32	2354	2015.09	325.288	ATTTTTCA	21	No	5.64E-13
AAAAAATC	3874	3607.85	4265	3902.57	275.738	3494	3280.06	3823	3524	220.77	GATTTTTT	68	No	5.63E-09
CATTTTTC	1675	1426.93	1760	1477.44	268.478	1558	1356.8	1624	1402.92	215.428	GAAAAATG	29	No	5.16E-13
TAAGAAAT	1895	1645.36	1990	1710.83	267.683	1773	1553.49	1856	1612.42	234.336	ATTTCTTA	23	No	2.52E-11
TAGAAAAT	2154	1904.65	2281	1990.5	265.005	1971	1754.61	2083	1828.31	229.215	ATTTTCTA	12	No	1.04E-10
GGAAAAAA	2679	2426.86	2853	2562.63	264.801	2506	2238.07	2643	2354.4	283.363	TTTTTTCC	98	No	9.20E-09
AAAAATTA	4735	4477.84	5547	4933.58	264.404	4109	3862.67	4667	4200.51	254.025	TAATTTTT	8	No	1.33E-15
CAAAATTT	3347	3092.9	3655	3310.2	264.267	3054	2796.42	3304	2974.88	269.093	AAATTTTG	60	No	1.95E-09
ATTTTTCA	2338	2088.5	2489	2190.56	263.846	2169	1928.62	2295	2016.5	254.769	TGAAAAAT	13	No	2.29E-10
TTTTTTGG	3369	3120.79	3724	3341.96	257.829	3050	2802.67	3330	2981.91	257.935	CCAAAAAA	28	No	4.49E-11
ATTTCTTA	1947	1705.79	2052	1775.75	257.518	1800	1598.57	1900	1660.66	213.623	TAAGAAAT	16	No	8.37E-11

Top 25 overrepresented words for the proximal promoters in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com