Skip to main content

Table 11 Lookup results for interesting words in the promoters. Information about the regulatory function of the top 10 overrepresented words for the bidirectional and unidirectional promoter set based on lookups in the TRANSFAC and JASPAR databases.

From: Word-based characterization of promoters involved in human DNA repair pathways

(a) Bidirectional
Sequence Transcription Factor (Matrix Ida) Sequence (bottom) aligned to matrix consensusb Matchesc Avg. Scored Score Rangee
TCGCGCCA PF0112f KTGGCGGGAA
 TGGCGCGA 
4/6 89.0 86.5–96.8
TCCCGGGA STAT5A TTCYNRGAA
TCCCGGGA 
8/16 86.7 86.7–86.7
GGCCCGCC SP1 (V$SP1_01) DRGGCRKGSW
  GGCGGGCC
8/13 90.2 86.5–90.8
TCCCGGCT ELK1 (MA0028) NNNMCGGAAR
 AGCCGGGA 
3/6 86.9 86.5–87.7
CAGGGGCC V$WT1_Q6 SVCHCCBVC
GGCCCCTG 
5/6 87.4 85.0–91.1
AGGGCCGT MYB (V$MYB_Q3) NNNBNCMGTTN
 AGGGCCGT  
2/7 91.2 89.8–92.6
TCTGAGGA TFIIA (V$TFIIA_Q6) TMTDHRAGGRVS
  TCTGAGGA  
2/8 88.1 85.8–90.5
CGTGGGGG E2F (V$E2F1_Q3)   BKTSSCGS
CGTGGGGG  
6/6 87.3 87.3–87.3
TGCTGAGA No match.     
CGCGGCCG No match.     
(b) Unidirectional
Sequence Transcription Factor (Matrix Ida) Sequence (bottom) aligned to matrix consensusb Matchesc Avg. Scored Score Rangee
ACCCGCCT SP1 (V$SP1_01) DRGGCRKGSW
 AGGCGGGT 
4/7 86.2 85.9–87.3
CTTCTTTC No match.     
AGGAAACA NFAT (V$NFAT_Q4_01) NWGGAAANWB
 AGGAAACA 
5/5 87.3 85.8–88.1
GCAGGGCG PF0096f YGCANTGCR
 GCAGGGCG
10/10 86.8 86.5–87.1
GGGGCTGC LRF (V$LRF_Q2)  VDVRMCCCC
GCAGCCCC  
5/8 85.4 85.4–85.4
TCTTCTTC No match.     
GGGGAGTA FOXC1 (MA0032) NNNVNGTA
GGGGAGTA
4/4 95.5 95.5–95.5
ATTAAAAT OCT1 ($OCT1_06) MWNMWTKWSATRYN
   ATTTTAAT   
4/9 86.9 86.5–87.5
CGGAAACC AREB6 (V$AREB6_04) VBGTTTSNN
 GGTTTCCG
3/3 92.2 88.3–95.8
TGGGCGGA GC (V$GC_01) NNDGGGYGGRGYBD
  TGGGCGGA    
4/5 90.3 85.1–95.2
  1. a. JASPAR id or TRANSFAC id.
  2. b. The consensus is in IUPAC notation: R = G or A, Y = T or C, M = A or C, H = not G, K = G or T, W = A or T, B = not A, S = G or C, V = not T, N = anything.
  3. c. Number of occurrences of the matrix that scored greater than 85% in the dataset.
  4. d. Average score for the occurrences meeting the 85% threshold.
  5. e. Range of scores for the occurrences meeting the 85% threshold.
  6. f. A profile that was extracted from phylogenetically conserved gene upstream elements.