Skip to main content

Table 1 Top 25 words. The top 25 words for the bidirectional promoter set (a) and the unidirectional promoter set (b) of DNA-repair pathways. The words are sorted in descending order according to their statistical overrepresentation.

From: Word-based characterization of promoters involved in human DNA repair pathways

(a) Bidirectional
Word S ES O EO Sln(S/ES) RevComp Position Palindrome P-Value
TCGCGCCA 4 0.918299 4 0.9375 5.88611 TGGCGCGA 12538 No 0.015391
TCCCGGGA 8 3.97165 8 4.26667 5.60208 TCCCGGGA 2 Yes 0.068606
GGCCCGCC 10 5.85012 11 6.5 5.36123 GGCGGGCC 21073 No 0.066821
TCCCGGCT 6 2.54354 6 2.66667 5.14921 AGCCGGGA NA No 0.054084
CAGGGGCC 4 1.1085 4 1.13514 5.13315 GGCCCCTG 14546 No 0.028413
AGGGCCGT 5 1.80245 5 1.86667 5.10145 ACGGCCCT 613 No 0.04142
TCTGAGGA 5 1.84222 6 1.90909 4.99234 TCCTCAGA 5391 No 0.013499
CGTGGGGG 5 1.86693 5 1.93548 4.92572 CCCCCACG 20402 No 0.047015
TGCTGAGA 4 1.17067 4 1.2 4.91487 TCTCAGCA NA No 0.033766
CGCGGCCG 4 1.17067 4 1.2 4.91487 CGGCCGCG 20259 No 0.033766
TCTGGGAT 2 0.180188 2 0.181818 4.8138 ATCCCAGA 2854 No 0.014655
GGGGCCGG 5 1.92725 5 2 4.76672 CCGGCCCC 20866 No 0.052648
AGGGAGGG 6 2.73111 6 2.87234 4.7223 CCCTCCCT 9852 No 0.07159
AGAAAAGA 3 0.632564 3 0.642857 4.66976 TCTTTTCT NA No 0.027559
CGACTCCG 3 0.632564 3 0.642857 4.66976 CGGAGTCG NA No 0.027559
GGGCCAGG 7 3.61284 7 3.85714 4.6299 CCTGGCCC 19875 No 0.096315
ACTCCAGC 5 2.02051 5 2.1 4.53045 GCTGGAGT NA No 0.062121
CGGGCCGA 5 2.05153 5 2.13333 4.45426 TCGGCCCG 6128 No 0.065478
TGCGGAAT 2 0.220092 2 0.222222 4.41371 ATTCCGCA NA No 0.021321
GCCCCTCC 8 4.63031 9 5.03226 4.37454 GGAGGGGC 7041 No 0.070206
GCCGGCGA 3 0.707627 3 0.72 4.33335 TCGCCGGC 20143 No 0.036618
TGAAGCCA 4 1.38876 4 1.42857 4.23154 TGGCTTCA NA No 0.056996
GGCAGGGA 6 3.01111 6 3.18182 4.1367 TCCCTGCC 10531 No 0.103337
TGCCCGCG 5 2.19845 5 2.29167 4.10844 CGCGGGCA NA No 0.082773
CAGCAGCC 6 3.02748 6 3.2 4.10418 GGCTGCTG 19198 No 0.105399
(b) Unidirectional
Word S ES O EO Sln(S/ES) RevComp Position Palindrome P-Value
ACCCGCCT 4 0.716577 4 0.727273 6.87826 AGGCGGGT 19440 No 0.006562
CTTCTTTC 5 1.7686 5 1.81818 5.19624 GAAAGAAG 13567 No 0.037733
AGGAAACA 4 1.16659 4 1.19048 4.92885 TGTTTCCT 21667 No 0.032947
GCAGGGCG 6 2.75716 6 2.86957 4.66535 CGCCCTGC 1311 No 0.071337
GGGGCTGC 5 2.036 5 2.1 4.49226 GCAGCCCC 16359 No 0.062122
TCTTCTTC 4 1.30438 4 1.33333 4.48225 GAAGAAGA NA No 0.046491
GGGGAGTA 3 0.682407 3 0.692308 4.44222 TACTCCCC 17991 No 0.033211
ATTAAAAT 4 1.36853 4 1.4 4.29023 ATTTTAAT 16078 No 0.053723
CGGAAACC 3 0.750393 3 0.761905 4.15731 GGTTTCCG NA No 0.042101
TGGGCGGA 4 1.44679 4 1.48148 4.06778 TCCGCCCA NA No 0.063337
CGGCGGCG 3 0.787559 3 0.8 4.01229 CGCCGCCG 22091 No 0.047421
TTTTTTGA 3 0.787559 3 0.8 4.01229 TCAAAAAA NA No 0.047421
TTTCTCCA 4 1.48541 4 1.52174 3.96242 TGGAGAAA 2378 No 0.068398
AGCCGGCT 3 0.805285 3 0.818182 3.94551 AGCCGGCT 14 Yes 0.050071
CCTCTTTA 2 0.282982 2 0.285714 3.91104 TAAAGAGG NA No 0.033814
CGCCCCTT 6 3.12976 6 3.27273 3.90482 AAGGGGCG 21917 No 0.113859
GCGCCGCG 5 2.33164 5 2.41379 3.81433 CGCGGCGC 15062 No 0.097601
ATTCCCAG 3 0.843245 3 0.857143 3.80733 CTGGGAAT 21297 No 0.055985
TCTCCCCT 4 1.56036 4 1.6 3.7655 AGGGGAGA 18183 No 0.07881
TCCGCCGG 3 0.855341 3 0.869565 3.7646 CCGGCGGA NA No 0.057938
CTCCCGCT 3 0.867789 3 0.882353 3.72126 AGCGGGAG NA No 0.059981
TGCGCCGA 2 0.316812 2 0.32 3.68519 TCGGCGCA 3202 No 0.041483
GGGCGCCC 4 1.59514 4 1.63636 3.67732 GGGCGCCC 23 Yes 0.083901
GTGCGTTT 3 0.884961 3 0.9 3.66247 AAACGCAC NA No 0.062855
TTGGTCTC 4 1.60537 4 1.64706 3.65176 GAGACCAA NA No 0.085429