Skip to main content

Table 1 Top 25 words. The top 25 words for the bidirectional promoter set (a) and the unidirectional promoter set (b) of DNA-repair pathways. The words are sorted in descending order according to their statistical overrepresentation.

From: Word-based characterization of promoters involved in human DNA repair pathways

(a) Bidirectional

Word

S

ES

O

EO

Sln(S/ES)

RevComp

Position

Palindrome

P-Value

TCGCGCCA

4

0.918299

4

0.9375

5.88611

TGGCGCGA

12538

No

0.015391

TCCCGGGA

8

3.97165

8

4.26667

5.60208

TCCCGGGA

2

Yes

0.068606

GGCCCGCC

10

5.85012

11

6.5

5.36123

GGCGGGCC

21073

No

0.066821

TCCCGGCT

6

2.54354

6

2.66667

5.14921

AGCCGGGA

NA

No

0.054084

CAGGGGCC

4

1.1085

4

1.13514

5.13315

GGCCCCTG

14546

No

0.028413

AGGGCCGT

5

1.80245

5

1.86667

5.10145

ACGGCCCT

613

No

0.04142

TCTGAGGA

5

1.84222

6

1.90909

4.99234

TCCTCAGA

5391

No

0.013499

CGTGGGGG

5

1.86693

5

1.93548

4.92572

CCCCCACG

20402

No

0.047015

TGCTGAGA

4

1.17067

4

1.2

4.91487

TCTCAGCA

NA

No

0.033766

CGCGGCCG

4

1.17067

4

1.2

4.91487

CGGCCGCG

20259

No

0.033766

TCTGGGAT

2

0.180188

2

0.181818

4.8138

ATCCCAGA

2854

No

0.014655

GGGGCCGG

5

1.92725

5

2

4.76672

CCGGCCCC

20866

No

0.052648

AGGGAGGG

6

2.73111

6

2.87234

4.7223

CCCTCCCT

9852

No

0.07159

AGAAAAGA

3

0.632564

3

0.642857

4.66976

TCTTTTCT

NA

No

0.027559

CGACTCCG

3

0.632564

3

0.642857

4.66976

CGGAGTCG

NA

No

0.027559

GGGCCAGG

7

3.61284

7

3.85714

4.6299

CCTGGCCC

19875

No

0.096315

ACTCCAGC

5

2.02051

5

2.1

4.53045

GCTGGAGT

NA

No

0.062121

CGGGCCGA

5

2.05153

5

2.13333

4.45426

TCGGCCCG

6128

No

0.065478

TGCGGAAT

2

0.220092

2

0.222222

4.41371

ATTCCGCA

NA

No

0.021321

GCCCCTCC

8

4.63031

9

5.03226

4.37454

GGAGGGGC

7041

No

0.070206

GCCGGCGA

3

0.707627

3

0.72

4.33335

TCGCCGGC

20143

No

0.036618

TGAAGCCA

4

1.38876

4

1.42857

4.23154

TGGCTTCA

NA

No

0.056996

GGCAGGGA

6

3.01111

6

3.18182

4.1367

TCCCTGCC

10531

No

0.103337

TGCCCGCG

5

2.19845

5

2.29167

4.10844

CGCGGGCA

NA

No

0.082773

CAGCAGCC

6

3.02748

6

3.2

4.10418

GGCTGCTG

19198

No

0.105399

(b) Unidirectional

Word

S

ES

O

EO

Sln(S/ES)

RevComp

Position

Palindrome

P-Value

ACCCGCCT

4

0.716577

4

0.727273

6.87826

AGGCGGGT

19440

No

0.006562

CTTCTTTC

5

1.7686

5

1.81818

5.19624

GAAAGAAG

13567

No

0.037733

AGGAAACA

4

1.16659

4

1.19048

4.92885

TGTTTCCT

21667

No

0.032947

GCAGGGCG

6

2.75716

6

2.86957

4.66535

CGCCCTGC

1311

No

0.071337

GGGGCTGC

5

2.036

5

2.1

4.49226

GCAGCCCC

16359

No

0.062122

TCTTCTTC

4

1.30438

4

1.33333

4.48225

GAAGAAGA

NA

No

0.046491

GGGGAGTA

3

0.682407

3

0.692308

4.44222

TACTCCCC

17991

No

0.033211

ATTAAAAT

4

1.36853

4

1.4

4.29023

ATTTTAAT

16078

No

0.053723

CGGAAACC

3

0.750393

3

0.761905

4.15731

GGTTTCCG

NA

No

0.042101

TGGGCGGA

4

1.44679

4

1.48148

4.06778

TCCGCCCA

NA

No

0.063337

CGGCGGCG

3

0.787559

3

0.8

4.01229

CGCCGCCG

22091

No

0.047421

TTTTTTGA

3

0.787559

3

0.8

4.01229

TCAAAAAA

NA

No

0.047421

TTTCTCCA

4

1.48541

4

1.52174

3.96242

TGGAGAAA

2378

No

0.068398

AGCCGGCT

3

0.805285

3

0.818182

3.94551

AGCCGGCT

14

Yes

0.050071

CCTCTTTA

2

0.282982

2

0.285714

3.91104

TAAAGAGG

NA

No

0.033814

CGCCCCTT

6

3.12976

6

3.27273

3.90482

AAGGGGCG

21917

No

0.113859

GCGCCGCG

5

2.33164

5

2.41379

3.81433

CGCGGCGC

15062

No

0.097601

ATTCCCAG

3

0.843245

3

0.857143

3.80733

CTGGGAAT

21297

No

0.055985

TCTCCCCT

4

1.56036

4

1.6

3.7655

AGGGGAGA

18183

No

0.07881

TCCGCCGG

3

0.855341

3

0.869565

3.7646

CCGGCGGA

NA

No

0.057938

CTCCCGCT

3

0.867789

3

0.882353

3.72126

AGCGGGAG

NA

No

0.059981

TGCGCCGA

2

0.316812

2

0.32

3.68519

TCGGCGCA

3202

No

0.041483

GGGCGCCC

4

1.59514

4

1.63636

3.67732

GGGCGCCC

23

Yes

0.083901

GTGCGTTT

3

0.884961

3

0.9

3.66247

AAACGCAC

NA

No

0.062855

TTGGTCTC

4

1.60537

4

1.64706

3.65176

GAGACCAA

NA

No

0.085429