Skip to main content

Table 3 The top 25 words in 5'UTRs

From: The word landscape of the non-coding segments of the Arabidopsis thaliana genome

 

Unmasked

Masked

Unmasked

Word

S

ES

O

EO

SlnSES

S

ES

O

EO

SlnSES

RevComp

RC_Pos

Pal

PValues

CTCTTCTC

871

614.433

992

668.648

303.928

883

669.295

972

729.203

244.68

GAGAAGAG

4

No

-2.22E-16

CTTTCTCT

1154

1003.84

1293

1115.45

160.868

1204

1040.02

1327

1164.52

176.278

AGAGAAAG

15

No

1.14E-07

AACAAAAA

1051

920.535

1134

1018.31

139.302

1082

933.212

1157

1036.72

160.064

TTTTTGTT

16

No

0.000192

TTTCTTCA

611

492.734

631

532.75

131.443

808

714.439

849

780.981

99.4364

TGAAGAAA

227

No

1.88E-05

GAGAAGAG

316

211.511

360

225.309

126.863

305

219.262

327

231.047

100.664

CTCTTCTC

0

No

0

TTCTCTCC

455

346.314

464

371.543

124.193

504

412.082

517

440.518

101.482

GGAGAGAA

130

No

2.11E-06

CTTTCTTC

883

771.778

929

846.965

118.876

960

807.394

1006

888.66

166.197

GAAGAAAG

87

No

0.00285

CTCTCTTT

1229

1116.97

1351

1248.77

117.468

1284

1161.65

1410

1312.47

128.577

AAAGAGAG

9

No

0.002211

TTTCTCTC

1421

1308.64

1554

1478.35

117.051

1494

1385.35

1636

1591.45

112.808

GAGAGAAA

74

No

0.025997

AAAGAGAG

666

561.408

709

609.221

113.781

625

511.53

649

550.867

125.216

CTCTCTTT

7

No

4.30E-05

AGAAAAAA

1078

972.588

1154

1078.91

110.928

1097

983.999

1179

1097.24

119.255

TTTTTTCT

93

No

0.012195

AAAGAAAA

978

875.456

1093

966.097

108.328

1000

886.23

1111

981.116

120.779

TTTTCTTT

35

No

3.32E-05

ATCTCTCA

332

243.705

342

260.045

102.647

380

308.328

392

327.073

79.4223

TGAGAGAT

448

No

6.93E-07

AAAAAACA

759

663.266

803

723.672

102.333

774

675.404

814

736.19

105.466

TGTTTTTT

298

No

0.001952

TTTTTCTT

1020

923.944

1116

1022.27

100.884

1501

1398.57

1742

1608.22

106.097

AAGAAAAA

20

No

0.001995

AGAGAAAG

589

496.468

634

536.894

100.664

548

457.974

578

491.244

98.3457

CTTTCTCT

1

No

2.45E-05

TTTTTGTT

811

719.391

885

787.265

97.2085

1506

1441.03

1818

1662.31

66.4099

AACAAAAA

2

No

0.000332

ACAAAAAA

845

754.352

901

827.069

95.888

865

767.534

916

842.311

103.408

TTTTTTGT

37

No

0.005817

TAAAAAAG

231

152.899

238

162.371

95.3195

272

196.748

284

206.973

88.0952

CTTTTTTA

149

No

1.66E-08

CAAAAACC

357

273.395

362

292.183

95.2547

386

290.194

393

307.419

110.121

GGTTTTTG

59

No

4.45E-05

AAGAAAAA

1104

1013.1

1209

1126.3

94.8599

1134

1021.85

1230

1142.64

118.087

TTTTTCTT

14

No

0.007636

CCTCTCTT

351

268.225

358

286.579

94.4052

372

313.865

375

333.083

63.2147

AAGAGAGG

550

No

2.65E-05

TCTTCTCC

907

817.38

946

899.203

94.3624

899

804.147

934

884.875

100.239

GGAGAAGA

676

No

0.062179

TTCTCTCA

473

387.786

484

416.951

93.9572

538

481.457

555

517.331

59.7404

TGAGAGAA

126

No

0.000721

  1. Top 25 overrepresented words for the 5'Untranslated Regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).
  2. Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).
  3. Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.