BMC Genomics

Table 9 Words not detected in the 3'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD	E_S	E
CTAGCAGG	5.98269	6.17391
ACTGCCAG	4.99319	5.1526
CGCCTGAT	4.97776	5.13667
GCGTCCGA	4.52742	4.67187
GGGGTGGC	4.5248	4.66917
ACTCCGCC	4.38831	4.5283
CCCGTTCC	4.25101	4.3866
ACACGCCG	4.21714	4.35165
CCCGCTCA	4.193	4.32673
CTGGGCGT	4.06873	4.19847
GACCTGCG	3.71851	3.83704
GCGCAGTA	3.68699	3.80451
GCACCCGA	3.6084	3.7234
GCACCCTC	3.59671	3.71134
CGCACCCA	3.54333	3.65625
CCGCCGTC	3.53385	3.64646
GGGTCGGC	3.52406	3.63636
GCACGCCT	3.35465	3.46154
GCGCAGCC	3.31181	3.41732
CGTCCGCT	3.28252	3.3871
CTGGCGCC	3.2624	3.36634
GGCGACCT	3.25626	3.36
ATACGCCC	3.18816	3.28972
AGCGCTCC	2.98494	3.08
TAGCGCGG	2.98494	3.08

Top 25 words that were expected to occur in the 3'UTR but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com