The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Lichtenberg, Jens; Yilmaz, Alper; Welch, Joshua D; Kurz, Kyle; Liang, Xiaoyu; Drews, Frank; Ecker, Klaus; Lee, Stephen S; Geisler, Matt; Grotewold, Erich; Welch, Lonnie R

doi:10.1186/1471-2164-10-463

BMC Genomics

Table 5 The top 25 words in Core Promoters

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

	Unmasked					Masked					Unmasked
Word	S	ES	O	EO	SlnSES	S	ES	O	EO	SlnSES	RevComp	RC_Pos	Pal	PValues
TATAAATA	1355	1071.69	1369	1175.57	317.831	1300	1029.92	1311	1128.85	302.753	TATTTATA	69	No	2.02E-08
CTATAAAT	712	474.27	716	514.446	289.286	704	464.711	708	503.987	292.416	ATTTATAG	2504	No	7.77E-16
CTATATAA	636	410.261	638	444.486	278.826	626	450.579	628	488.533	205.839	TTATATAG	18530	No	1.11E-16
ATATAAAC	560	350.797	560	379.643	261.928	554	347.685	554	376.253	258.091	GTTTATAT	26957	No	4.44E-16
TAAAAAAT	473	295.342	480	319.301	222.765	453	298.58	460	322.82	188.835	ATTTTTTA	12	No	-2.22E-16
ATATATAC	544	394.869	559	427.688	174.295	507	330.093	515	357.099	217.573	GTATATAT	5651	No	7.41E-10
AATATATT	300	181.346	300	195.646	151.012	287	195.452	287	210.918	110.256	AATATATT	6	Yes	2.74E-12
TTATATAA	524	397.031	529	430.047	145.398	514	430.79	518	466.905	90.7739	TTATATAA	7	Yes	2.22E-06
AAGAAAAA	1261	1129.24	1318	1240.05	139.165	1189	1063	1238	1165.84	133.189	TTTTTCTT	25	No	0.014544
ATATAAAG	378	262.861	380	284.014	137.316	375	261.181	377	282.19	135.643	CTTTATAT	377	No	3.41E-08
TATATAAA	1260	1131.11	1276	1242.15	135.966	1234	1102.41	1250	1209.97	139.143	TTTATATA	1458	No	0.171817
AGAAAAAA	1127	1000.04	1170	1095.49	134.693	1063	936.863	1099	1025.06	134.271	TTTTTTCT	31	No	0.01331
ATTTTTTA	312	204.097	315	220.282	132.415	299	207.163	302	223.604	109.715	TAAAAAAT	4	No	1.17E-09
TTTTAAAA	688	568.245	696	617.46	131.571	658	543.865	665	590.7	125.351	TTTTAAAA	13	Yes	0.001019
CTCTTCTC	402	294.202	429	318.061	125.499	371	277.661	390	300.087	107.516	GAGAAGAG	444	No	1.97E-09
ACAAAAAA	958	840.585	988	918.052	125.259	917	799.552	939	872.564	125.681	TTTTTTGT	45	No	0.011607
ATAAATAC	578	466.039	582	505.44	124.446	574	459.992	578	498.825	127.095	GTATTTAT	14072	No	0.000465
TTATAAAA	507	397.553	508	430.617	123.294	490	386.47	491	418.525	116.302	TTTTATAA	945	No	0.000153
AAATTAAA	718	609.913	745	663.251	117.144	682	578.03	705	628.206	112.806	TTTAATTT	96	No	0.000967
GCCCATTA	374	273.89	396	295.991	116.512	372	272.658	394	294.653	115.571	TAATGGGC	190	No	1.82E-08
AAAAAACA	893	787.368	924	859.073	112.42	849	736.927	874	803.277	120.193	TGTTTTTT	33	No	0.014723
TTAAAAAA	805	701.565	828	764.227	110.71	768	667.112	788	726.227	108.159	TTTTTTAA	27	No	0.01177
ATTAAAAA	708	609.58	719	662.885	105.969	671	581.412	681	631.921	96.1611	TTTTTAAT	316	No	0.016276
GCCCAATA	322	231.782	340	250.291	105.859	321	228.286	337	246.5	109.41	TATTGGGC	130	No	4.26E-08

Top 25 overrepresented words for the core promoter regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com