BMC Genomics

Table 11 Words not detected in the Introns

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

#WORD	E_S	E
CGCGGACA	6.1805	6.4557
CCCGGGAG	4.57278	4.77632
CCGGCCCC	4.46781	4.66667
CGCCCCCC	4.45254	4.65072
GCCCACCG	4.16782	4.35331
GCCGCGGG	3.47686	3.63158
CCGAGGGG	3.34433	3.49315
AAGCGCCC	3.17737	3.31875
CGCCAGCG	2.99188	3.125
CGCTCGCG	2.91507	3.04478
GCGTCGCG	2.8245	2.95017
CCGGCACG	2.48216	2.59259
CCGGGGCG	2.25483	2.35514
CCCGCGCC	2.16189	2.25806
TCGGGCGC	2.11021	2.20408
GCGCACGG	2.02051	2.11039
CGCTCCGC	2.00514	2.09434
CGCGACGC	1.99945	2.0884
TGCGCCCG	1.9539	2.04082
GGTGCGCG	1.92911	2.01493
GCGGGCCC	1.90464	1.98936
CGCGGCGA	1.86163	1.94444
GCGCGACG	1.83299	1.91453
GGGCGGGC	1.79662	1.87654
CCGCCGGG	1.73887	1.81622

Top 25 words that were expected to occur in the introns but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com