Skip to main content

Table 11 Lookup results for interesting words in the promoters. Information about the regulatory function of the top 10 overrepresented words for the bidirectional and unidirectional promoter set based on lookups in the TRANSFAC and JASPAR databases.

From: Word-based characterization of promoters involved in human DNA repair pathways

(a) Bidirectional

Sequence

Transcription Factor (Matrix Ida)

Sequence (bottom) aligned to matrix consensusb

Matchesc

Avg. Scored

Score Rangee

TCGCGCCA

PF0112f

KTGGCGGGAA

 TGGCGCGA 

4/6

89.0

86.5–96.8

TCCCGGGA

STAT5A

TTCYNRGAA

TCCCGGGA 

8/16

86.7

86.7–86.7

GGCCCGCC

SP1 (V$SP1_01)

DRGGCRKGSW

  GGCGGGCC

8/13

90.2

86.5–90.8

TCCCGGCT

ELK1 (MA0028)

NNNMCGGAAR

 AGCCGGGA 

3/6

86.9

86.5–87.7

CAGGGGCC

V$WT1_Q6

SVCHCCBVC

GGCCCCTG 

5/6

87.4

85.0–91.1

AGGGCCGT

MYB (V$MYB_Q3)

NNNBNCMGTTN

 AGGGCCGT  

2/7

91.2

89.8–92.6

TCTGAGGA

TFIIA (V$TFIIA_Q6)

TMTDHRAGGRVS

  TCTGAGGA  

2/8

88.1

85.8–90.5

CGTGGGGG

E2F (V$E2F1_Q3)

  BKTSSCGS

CGTGGGGG  

6/6

87.3

87.3–87.3

TGCTGAGA

No match.

    

CGCGGCCG

No match.

    

(b) Unidirectional

Sequence

Transcription Factor (Matrix Ida)

Sequence (bottom) aligned to matrix consensusb

Matchesc

Avg. Scored

Score Rangee

ACCCGCCT

SP1 (V$SP1_01)

DRGGCRKGSW

 AGGCGGGT 

4/7

86.2

85.9–87.3

CTTCTTTC

No match.

    

AGGAAACA

NFAT (V$NFAT_Q4_01)

NWGGAAANWB

 AGGAAACA 

5/5

87.3

85.8–88.1

GCAGGGCG

PF0096f

YGCANTGCR

 GCAGGGCG

10/10

86.8

86.5–87.1

GGGGCTGC

LRF (V$LRF_Q2)

 VDVRMCCCC

GCAGCCCC  

5/8

85.4

85.4–85.4

TCTTCTTC

No match.

    

GGGGAGTA

FOXC1 (MA0032)

NNNVNGTA

GGGGAGTA

4/4

95.5

95.5–95.5

ATTAAAAT

OCT1 ($OCT1_06)

MWNMWTKWSATRYN

   ATTTTAAT   

4/9

86.9

86.5–87.5

CGGAAACC

AREB6 (V$AREB6_04)

VBGTTTSNN

 GGTTTCCG

3/3

92.2

88.3–95.8

TGGGCGGA

GC (V$GC_01)

NNDGGGYGGRGYBD

  TGGGCGGA    

4/5

90.3

85.1–95.2

  1. a. JASPAR id or TRANSFAC id.
  2. b. The consensus is in IUPAC notation: R = G or A, Y = T or C, M = A or C, H = not G, K = G or T, W = A or T, B = not A, S = G or C, V = not T, N = anything.
  3. c. Number of occurrences of the matrix that scored greater than 85% in the dataset.
  4. d. Average score for the occurrences meeting the 85% threshold.
  5. e. Range of scores for the occurrences meeting the 85% threshold.
  6. f. A profile that was extracted from phylogenetically conserved gene upstream elements.