Skip to main content

Table 2 The top 25 words in 3'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

 

Unmasked

Masked

Unmasked

Word

S

ES

O

EO

SlnSES

S

ES

O

EO

SlnSES

RevComp

RC_Pos

Pal

PValues

TTTTTGTT

2264

2066.82

2488

2306.04

206.297

2279

2066.89

2501

2331.04

222.643

AACAAAAA

40

No

9.38E-05

TTTTTCTT

2171

1981.63

2404

2203.7

198.149

2183

1978.5

2427

2222.83

214.723

AAGAAAAA

49

No

1.34E-05

TTTTTTGG

998

824.458

1046

877.255

190.646

1003

831.208

1053

888.417

188.434

CCAAAAAA

651

No

1.71E-08

ATTTTGTA

732

583.938

752

615.741

165.421

738

599.956

759

634.768

152.831

TACAAAAT

37

No

6.00E-08

TAATTTTT

787

642.133

810

678.585

160.101

797

646.36

821

685.263

166.97

AAAAATTA

164

No

5.24E-07

ATGTTTTA

589

469.818

601

493.292

133.161

610

486.404

624

512.055

138.116

TAAAACAT

284

No

1.48E-06

TTTGTTTT

2517

2402.46

2847

2715.8

117.227

2555

2406.15

2897

2753.88

153.362

AAAACAAA

1963

No

0.006347

GTTTTTGA

491

390.189

504

408.466

112.838

512

407.532

527

427.529

116.841

TCAAAAAC

5031

No

2.76E-06

AAATTTTG

588

491.471

603

516.445

105.443

604

504.212

621

531.22

109.069

CAAAATTT

376

No

0.00011

ATTTTTTA

482

387.674

498

405.795

104.97

492

406.16

510

426.064

94.3317

TAAAAAAT

100

No

5.33E-06

ATTTTTCA

446

354.812

450

370.941

102.014

453

365.873

457

383.118

96.7633

TGAAAAAT

170

No

3.83E-05

TGTTTTGT

1227

1133.19

1326

1219.91

97.5897

1255

1162.02

1359

1260.07

96.6082

ACAAAACA

659

No

0.001413

ATAAAAAT

564

474.529

580

498.326

97.4203

566

480.088

581

505.265

93.1776

ATTTTTAT

27

No

0.000192

TTTTTTCT

1721

1628.11

1839

1786.09

95.4882

1722

1625.78

1847

1798.84

99.0176

AGAAAAAA

106

No

0.107802

AAAAATTG

397

312.488

400

326.178

95.0296

414

323.794

419

338.423

101.744

CAATTTTT

66

No

4.26E-05

TATAATAT

505

419.081

519

439.185

94.1802

514

429.108

530

450.594

92.7844

ATATTATA

275

No

0.000114

CTCTGTTT

763

674.497

814

713.654

94.0706

796

706.86

852

751.4

94.5386

AAACAGAG

227

No

0.000125

TTTTTAAT

897

808.297

929

859.536

93.4009

905

811.646

942

866.766

98.5274

ATTAAAAA

95

No

0.009964

TTCTTTTT

1884

1795.18

2075

1982.05

90.9811

1879

1764.9

2059

1964.59

117.709

AAAAAGAA

130

No

0.019465

TTTTTGGT

989

902.56

1029

963.191

90.453

1006

920.175

1052

987.344

89.7087

ACCAAAAA

9144

No

0.018455

ATTTTCTG

324

245.197

330

255.296

90.2932

340

264.756

346

275.991

85.047

CAGAAAAT

241

No

4.24E-06

AATATATT

462

382.795

474

400.615

86.8857

477

412.829

490

433.187

68.9186

AATATATT

21

Yes

0.000195

TTTGTGTG

688

607.303

705

640.94

85.8355

705

625.577

726

662.623

84.2635

CACACAAA

8153

No

0.006617

TGTTTTTT

1716

1632.37

1839

1791.05

85.7404

1730

1636.78

1864

1811.88

95.8269

AAAAAACA

1065

No

0.131261

  1. Top 25 overrepresented words for the 3'Untranslated Regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.