The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Lichtenberg, Jens; Yilmaz, Alper; Welch, Joshua D; Kurz, Kyle; Liang, Xiaoyu; Drews, Frank; Ecker, Klaus; Lee, Stephen S; Geisler, Matt; Grotewold, Erich; Welch, Lonnie R

doi:10.1186/1471-2164-10-463

BMC Genomics

Table 3 The top 25 words in 5'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

	Unmasked					Masked					Unmasked
Word	S	ES	O	EO	SlnSES	S	ES	O	EO	SlnSES	RevComp	RC_Pos	Pal	PValues
CTCTTCTC	871	614.433	992	668.648	303.928	883	669.295	972	729.203	244.68	GAGAAGAG	4	No	-2.22E-16
CTTTCTCT	1154	1003.84	1293	1115.45	160.868	1204	1040.02	1327	1164.52	176.278	AGAGAAAG	15	No	1.14E-07
AACAAAAA	1051	920.535	1134	1018.31	139.302	1082	933.212	1157	1036.72	160.064	TTTTTGTT	16	No	0.000192
TTTCTTCA	611	492.734	631	532.75	131.443	808	714.439	849	780.981	99.4364	TGAAGAAA	227	No	1.88E-05
GAGAAGAG	316	211.511	360	225.309	126.863	305	219.262	327	231.047	100.664	CTCTTCTC	0	No	0
TTCTCTCC	455	346.314	464	371.543	124.193	504	412.082	517	440.518	101.482	GGAGAGAA	130	No	2.11E-06
CTTTCTTC	883	771.778	929	846.965	118.876	960	807.394	1006	888.66	166.197	GAAGAAAG	87	No	0.00285
CTCTCTTT	1229	1116.97	1351	1248.77	117.468	1284	1161.65	1410	1312.47	128.577	AAAGAGAG	9	No	0.002211
TTTCTCTC	1421	1308.64	1554	1478.35	117.051	1494	1385.35	1636	1591.45	112.808	GAGAGAAA	74	No	0.025997
AAAGAGAG	666	561.408	709	609.221	113.781	625	511.53	649	550.867	125.216	CTCTCTTT	7	No	4.30E-05
AGAAAAAA	1078	972.588	1154	1078.91	110.928	1097	983.999	1179	1097.24	119.255	TTTTTTCT	93	No	0.012195
AAAGAAAA	978	875.456	1093	966.097	108.328	1000	886.23	1111	981.116	120.779	TTTTCTTT	35	No	3.32E-05
ATCTCTCA	332	243.705	342	260.045	102.647	380	308.328	392	327.073	79.4223	TGAGAGAT	448	No	6.93E-07
AAAAAACA	759	663.266	803	723.672	102.333	774	675.404	814	736.19	105.466	TGTTTTTT	298	No	0.001952
TTTTTCTT	1020	923.944	1116	1022.27	100.884	1501	1398.57	1742	1608.22	106.097	AAGAAAAA	20	No	0.001995
AGAGAAAG	589	496.468	634	536.894	100.664	548	457.974	578	491.244	98.3457	CTTTCTCT	1	No	2.45E-05
TTTTTGTT	811	719.391	885	787.265	97.2085	1506	1441.03	1818	1662.31	66.4099	AACAAAAA	2	No	0.000332
ACAAAAAA	845	754.352	901	827.069	95.888	865	767.534	916	842.311	103.408	TTTTTTGT	37	No	0.005817
TAAAAAAG	231	152.899	238	162.371	95.3195	272	196.748	284	206.973	88.0952	CTTTTTTA	149	No	1.66E-08
CAAAAACC	357	273.395	362	292.183	95.2547	386	290.194	393	307.419	110.121	GGTTTTTG	59	No	4.45E-05
AAGAAAAA	1104	1013.1	1209	1126.3	94.8599	1134	1021.85	1230	1142.64	118.087	TTTTTCTT	14	No	0.007636
CCTCTCTT	351	268.225	358	286.579	94.4052	372	313.865	375	333.083	63.2147	AAGAGAGG	550	No	2.65E-05
TCTTCTCC	907	817.38	946	899.203	94.3624	899	804.147	934	884.875	100.239	GGAGAAGA	676	No	0.062179
TTCTCTCA	473	387.786	484	416.951	93.9572	538	481.457	555	517.331	59.7404	TGAGAGAA	126	No	0.000721

Top 25 overrepresented words for the 5'Untranslated Regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com