All and only CpG containing sequences are enriched in promoters abundantly bound by RNA polymerase II in multiple tissues
© Rozenberg et al; licensee BioMed Central Ltd. 2008
Received: 08 October 2007
Accepted: 05 February 2008
Published: 05 February 2008
The promoters of housekeeping genes are well-bound by RNA polymerase II (RNAP) in different tissues. Although the promoters of these genes are known to contain CpG islands, the specific DNA sequences that are associated with high RNAP binding to housekeeping promoters has not been described.
ChIP-chip experiments from three mouse tissues, liver, heart ventricles, and primary keratinocytes, indicate that 94% of promoters have similar RNAP binding, ranging from well-bound to poorly-bound in all tissues. Using all 8-base pair long sequences as a test set, we have identified the DNA sequences that are enriched in promoters of housekeeping genes, focusing on those DNA sequences which are preferentially localized in the proximal promoter. We observe a bimodal distribution. Virtually all sequences enriched in promoters with high RNAP binding values contain a CpG dinucleotide. These results suggest that only transcription factor binding sites (TFBS) that contain the CpG dinucleotide are involved in RNAP binding to housekeeping promoters while TFBS that do not contain a CpG are involved in regulated promoter activity. Abundant 8-mers that are preferentially localized in the proximal promoters and exhibit the best enrichment in RNAP bound promoters are all variants of six known CpG-containing TFBS: ETS, NRF-1, BoxA, SP1, CRE, and E-Box. The frequency of these six DNA motifs can predict housekeeping promoters as accurately as the presence of a CpG island, suggesting that they are the structural elements critical for CpG island function. Experimental EMSA results demonstrate that methylation of the CpG in the ETS, NRF-1, and SP1 motifs prevent DNA binding in nuclear extracts in both keratinocytes and liver.
In general, TFBS that do not contain a CpG are involved in regulated gene expression while TFBS that contain a CpG are involved in constitutive gene expression with some CpG containing sequences also involved in inducible and tissue specific gene regulation. These TFBS are not bound when the CpG is methylated. Unmethylated CpG dinucleotides in the TFBS in CpG islands allow the transcription factors to find their binding sites which occur only in promoters, in turn localizing RNAP to promoters.
The promoter region of genes is typically divided into two regions: the core or basal promoter region and the proximal promoter. The core promoter region stretches from around -50 bp to +20 bp and is the location in the promoter where the pre-initiation complex forms and the general transcriptional machinery assembles, including RNA polymerase II (RNAP). The proximal promoter extends from -200 bp to the transcriptional start site (TSS) and contains transcription factor binding sites (TFBS) that are critical for the recruitment of RNA polymerase II (RNAP) to DNA [2–4]. In mammalian genomes, the CpG dinucleotide occurs at 20% of the expected frequency  and is typically methylated both in cell cuture and animal tissues [6, 7]. The exception is in CpG islands. CpG islands are defined as regions in the DNA at least 200 bp long where C+G comprise more than 50% of the nucleotides and CpG dinucleotides occur at greater than 60% the expected frequency (this represents roughly 8 or more CpGs in 200 bp) . The presence of CpG islands is associated with gene regulatory regions  and in the promoters of genes generally correlates with binding by RNA polymerase II (RNAP) . Promoters of housekeeping genes are constitutively bound by RNAP in all tissues while regulated promoters, either tissue specific or inducible, are selectively bound by RNAP in only certain tissue(s) or contexts respectively .
Three advances allow us to interrogate the genome-wide properties of promoters. First is the availability of complete genomic sequences. Second is the determination of full-length cDNAs that can identify the TSS and proximal promoter . Third is the determination of the chromatin architecture of the genome by the identification of hypersensitive sites [11, 12] or the location of particular proteins or their modified forms using chromatin immunoprecipitation followed by microarray analysis (ChIP-chip) . Although ChIP-chip experiments have identified the location of RNAP and components of the preinitiation complex in particular tissues [9, 14], these experiments have not been done systematically over a range of tissues.
We show that all and only CpG containing DNA sequences are associated with RNAP binding to the same promoter in multiple tissues. Many DNA sequences are more abundant near the TSS than elsewhere [15–18] and the six most abundant CpG containing sequences that are localized in proximal promoters are known TFBS and can predict RNAP binding to housekeeping promoters with similar accuracy as the presence of CpG islands.
Results and discussion
Binding of RNAP and H3K9me2 to mouse promoters in keratinocytes, liver, and heart ventricles
To identify promoters that had similar RNAP binding values in all three tissues, we excluded genes where RNAP binding between any pair of tissues was significantly different. This excluded 534 tissue-specific (356 in liver, 131 in heart, and 47 in keratinocytes) promoters, and 395 with high RNAP binding in two of the three tissues. The remaining 13,861 promoters (94%) have similar RNAP binding in all three tissues, some being well bound by RNAP and others having little RNAP at the promoter (Figure 1D–F). For each of these 13,861 promoters, termed common RNAP promoters, RNAP binding values from the three tissues were normalized and averaged, producing a single number representing RNAP binding to a promoter across the three tissues.
To investigate the DNA sequence properties of the 13,861 common promoters (-1,000 bp to +500 bp) and determine potential transcription factor binding sites (TFBS) that are responsible for RNAP binding we analyzed the occurrences of 8 bp-long DNA sequences (8-mers) in common RNAP promoters. 8-mers were chosen because their length is similar to that of known TFBS. 8-mers were counted on the sense and anti-sense strands because, with the exception of TATA , 8-mers are not restricted to a single strand. Of all 32,896 8-mers (38% contain CpG) we extensively characterized the 12,208 most abundant 8-mers (see materials and methods) of which only 20% contained a CpG highlighting that the CpG dinucleotide is underrepresented even in promoter regions .
All 8-mers enriched in promoters well bound by RNAP in multiple tissues contain a CpG dinucleotide
The 8-mers with and without a CpG were plotted separately to highlight the few 8-mers that are the exception to the general conclusion that only CpG containing sequences are associated with RNAP binding to a promoters (Figure 3D–E). The most notable exception is the GACCAATC 8-mer, a CCAAT sequence that is enriched in housekeeping promoters.
Previous work indicated that ~50% of human promoters bound by RNAP contain the INR and DPE consensus sequences between -200 bp and +200 bp . To see if these non-CpG-containing sequences were also exceptions to our general conclusion, we calculated the association-with-RNAP and association-with-H3K9me2 for TATA, INR and DPE in the set of promoters with similar RNAP binding values in the three tissues we have examined. This was accomplished by averaging the binding values of those promoters that contained the consensus sequence at the expected position . In mouse, the consensus TATA is uniquely positioned in only 1.8% of promoters and has a very high association-with-H3K9me2 binding to promoters. The INR was uniquely positioned in only 9% of promoters and is associated with H3K9me2 bound promoters. DPE is not uniquely positioned in promoters, but occurs in 19% of promoters at the expected location and is also associated with H3K9me2 binding (Figure 3C). This suggests that TATA, INR and the DPE are not important for RNAP binding to promoters in multiple tissues. Presumably these sequences are important for tissue-specific gene expression.
CpG sequences are also associated with mRNA expression
Sequences most enriched in tissue-specific promoters do not contain a CpG
Non-random distribution of 8-mers in promoters
Association of the 10 localized motifs with RNAP binding and mRNA expression.
CpG islands can be defined by two or more of the six CpG containing TFBS
Nuclear extracts do not bind TFBS with a methylated CpG
Methylation of CpG dinucleotides in CpG islands inhibits promoter activity and occurs in many cancers where the oncogenic event is the transcriptional suppression of tumor suppressor genes . One simple explanation is that CpG methylation inhibits TFs from binding their TFBS resulting in promoter inactivity. A more prevalent, but not mutually exclusive view suggests that a more active mechanism is functioning in which methyl binding proteins bind methylated CpGs to facilitate chromatin mediated occlusion of the promoter [30, 31]. The effect of CpG methylation on the function of five of the six CpG containing TFBS (DNA binding and/or transcriptional potential) that localize in the proximal promoter has been described. The one exception is BoxA, for which the effect of CpG methylation on DNA binding has not been reported in the literature. In general, methylation inhibits the activity of CpG containing TFBS . CpG methylation is reported to inhibit the function of a CRE , ETS , NRF-1 , and E-Box . Other CpG containing motifs are also inhibited via methylation including AP2  and CTCF . Methylation of the CpG in the SP1 motif, the most abundant CpG containing motif, is reported to either not affect DNA binding [39–41], affect binding when a cytosine flanking the CpG is methylated [41, 42] or inhibit binding .
We identified promoters that are bound similarly by RNAP in multiple tissues and determined the association between the presence of 8-mers in these promoters and the extent of RNAP binding to the promoter. Looking at RNAP binding to housekeeping promoters, we observed a bimodal distribution: only 8-mers with the CpG dinucleotide are in the class of sequences most associated with RNAP binding and only 8-mers without a CpG are in the class least associated with RNAP binding. An implication of this observation is that knowing if a TFBS contains a CpG reveals aspects of its biological function. If the TFBS contains a CpG, it is involved in constitutive gene expression and if the TFBS does not contain a CpG, it is involved in regulated gene expression. This insight will help identify potential functions for transcription factors when their TFBS is identified. Additionally, if a transcription factor shows degeneracy in its TFBS [44, 45], binding to a CpG sequence and a similar sequence without a CpG, it suggests that this transcription factor is involved in both constitutive and regulated gene expression. This is observed for the CRE and CRE-T sequences, two sequences that are localized in the proximal promoter and differ by a single base: CRE contains a CpG (TGACGTCA) while CRE-T does not (TGAT GTCA). The CREB protein binds both sequences well (data not shown) but the two sequences correlate very differently with RNAP binding suggesting that the CREB transcription factor can regulate either constitutive gene expression by binding the CRE sequence or regulated gene expression by binding the CRE-T sequence.
In vertebrates CpG dinucleotides are rare and usually are methylated on the cytosine but do occur at close to the expected frequency in clusters called CpG islands where the CpGs remain unmethylated [30, 46]. These CpG islands often occur in promoters of housekeeping genes [8, 9]. We show that the presence of two or more of any of the six CpG containing TFBS (SP1, ETS, NRF-1, CRE, E-Box, and BoxA) in the proximal promoter can predict RNAP binding to housekeeping promoters as accurately as the presence of a CpG island in the proximal promoter.
Methylation of the CpG in the TFBS has been found to inhibit the DNA binding for five of the six TFBS that are abundant and localize in proximal promoters suggesting this may be a general result for CpG containing TFBS. Methylation dependent inhibition of transcription factor binding to DNA has two implications. First, the transcription factors that are critical for the activation of housekeeping genes solve the problem of finding their TFBS in the genome by only binding to unmethylated TFBS. Since most CpGs in the genome are methylated, the only places these transcription factors can bind are in the unmethylated CpG islands in promoters. Second, the pathological methylation of CpG dinucleotides in CpG islands, as occurs in many cancers , would prevent these abundant transcription factors from binding their TFBS thus causing the promoters to become inactive. This could be a critical initial step that subsequently allows CpG methyl binding proteins to bind to methylated CpGs and actively repress a promoter .
Mouse (Mus musculus) annotation data and genomic DNA sequences for the region -1,000 bp to +500 bp, relative to the annotated transcription start site (TSS), were downloaded from the UCSC Genome Browser site (version mm5, May 2004). This dataset contains the putative promoter regions of 26,000 genes that are represented on the MM5 minimum promoter mouse Nimblegen ChiP-chip array. However, since the TSS for many of these genes is poorly annotated (e.g. the TSS is the same as the translation start), we refined this dataset to include only those genes where the distance between the TSS to the translation start (ATG) was greater than 30 nucleotides. This reduced the total number of putative promoter regions to 15,180. We further reduced this number by excluding promoter with gaps greater than 200 bps and the blastclust program was used to confirm that this dataset did not contain multiple copies of the same DNA sequences resulting in 14,790 promoters.
The 14,790 analyzed promoters are a biased subset of the 26,000 promoters on the ChIP-chip array. The annotated promoters are enriched 1.3 fold for the 20% of promoters best bound by RNAP and depleted by 2 fold for H3K9me2 bound promoters. This could reflect that the H3K9me2 genes are not universally expressed and full-length cDNA data does not exist for them, preventing identification of a TSS.
Clustering Factor (CF) calculation
To determine if a DNA sequence has a non-random distribution (i.e. clustered), we used an automated method of detecting and quantifying peak height as described previously . Abundant 8-mers contained 20 or more members in a 20 base pair window in the 14,970 examined promoters.
Cultures of primary keratinocytes
Primary keratinocytes were isolated from newborn FVB mice epidermis . Primary keratinocytes were seeded at a density of 0.6 pelt or 5 × 106 cells per 100-mm dish in Ca+2 and Mg+2 free EMEM (Cambrex Bio Science Walkersville, Inc), supplemented with 8% Chelex (Bio-Rad, Richmond, CA) treated FBS (Atlanta Biologicals, Inc), 0.2 mM Ca2+ and Antibiotic-antimycotic. After 20 h, cultures were washed with PBS and switched to the same medium containing 0.05 mM Ca+2. After three days cells were used for ChIP.
Liver and heart samples
Tissues from 5 adult FVB mice were frozen and ground in fine powder in liquid nitrogen. After nitrogen evaporation, samples were moved into a 50 ml conical tube and 10 mls of 1% formaldehyde in PBS was added and samples incubated for 10 minutes at 37°C with vortexing. 125 mM glycine was added for 5 minutes, cells were washed in PBS with 1 mM PMSF once, dounced in Lyzis buffer (5 mM PIPES pH 8.0 85 mM KCL 0.5% NP40 1 mM NF 1 mM NaVa Roche protease inhibitors cocktail) and re-suspended in 200 μ l Nuclear lysis buffer (50 mM Tris-Cl pH 8.1 10 mM EDTA, 1% SDS proteases and phosphates inhibitors as above). DNA was sheared by sonication to yield fragments from 3,000 to 300 bp. Samples were centrifuged and supernatants were diluted 6 times (0.01% SDS, 1.1% Trition × 100, 1.2 mM EDTA, 16.7 mM Tris-Cl pH 8.1, 167 mM NaCl) and used for ChIP.
Chromatin immunoprecipitation (ChIP) was performed using antibodies against RNAP from Covance, (8WG16) that recognizes the unphosphorylated form of RNAP, H3K9me2 from Upstate (07–441), and CREB using a mixture of antibodies from Santa Cruz (sc-186) and Upstate (06–863), c-Jun from Santa Cruz (sc-1694). The ChIP protocol was from P. Farnham [19, 48]. For immunoprecipitation, we used protein G agarose beads (Invitrogen). Starting with 2 × 106 cells, we typically isolate 1 ng of ChIP DNA for RNAP and 5 ng for histone H3K9me2.
ChIP DNA amplification and hybridization
Protocol for random DNA amplification [49, 50] was adapted from DeRisi lab. We used primers conjugated with Cy3 or Cy5. After amplification 10–15 ug of DNA was purified using Quiagen PCR purification Kit, concentrated by isopropanol precipitation and dried for 5 min under vacuum. DNA was dissolved in 3 μ l water, mixed with Component A and Hybridization buffer (Nimblegen) according to manufacturer instructions. Amplified ChIP DNA was hybridized to Nimblegen MM5 min Mouse promoter microarrays containing 400,000 oligos interrogating 26,000 promoters. Arrays were washed in 45C 0.2%SDS, 0.2%SSC for 15 sec, in the same buffer at room temperature for 2 min, 0.2%SSC for one minute, 0.05% SSC for 15 sec. Arrays were dried by centrifugation and scanned using Axon 4000B scanner. Images were processed with NIMBLESCAN (Nimblegen) using default settings. Average of enrichment for fifteen spots representing one promoter were used as a measure of "binding" for a protein. We averaged binding of RNAP and H3K9me2 from two independent hybridizations for each tissue using independent biological samples. Correlation coefficients for keratinocytes replicates were: RNAP – 0.79, H3K9me2 – 0.67 and for RNAP ChIP's from liver samples: 0.86; heart samples: 0.83.
Electrophoretic Mobility Shift Assay (EMSA)
Following PAGE purified 28 base pairs long oligonucleotides, the sense strand, with their complimentary strands were purchased from Sigma-genosys (USA).
SP-1: GTCAGTCAGGGGG(C/C m )GGGG CATCGGTCAG
ETS: GTCAGTCAGAC(C/C m )GGAAGT TATCGGTCAG
NRF-1: GTCAGTCAGA(C/C m )GCCTG(C/C m )G TATCGGTCAG
A single consensus binding site for each transcription factor containing either nonmethylated (C) or methylated cytosine (Cm) (1 methyleted cytosine in SP-1 and ETS and 2 in NRF-1) is underlined. Sense strands of non-methylated and methylated oligos were end labeled with (γ32P) ATP (5000 mCi/mmol; MP Biomedical) using T4 PNK enzyme (New England Biolabs). Equimolar labeled sense and complimentary cold anti-sense oligos were annealed by heating the mixture in annealing buffer to 65°C for 15 minutes and snap cooling it on ice for 2 minutes followed by incubation at room temperature for 15 min. Annealing resulted in four types of labeled double stranded oligos (1 non-methylated, 2 hemi-methylated oligos and 1 methylated oligo) and these were used for EMSA.
Nuclear extract was prepared from mouse liver and cultured mouse primary keratinocytes . In 20 μ l of reaction sample, 7 pg of labeled oligonucleotide (50,000 cpm) was added to 5 μ g of nuclear extract, and incubated in binding buffer (10 mM HEPES, 80 mM KCl, 0.05 mM EDTA, 6% glycerol, 1 mM DTT and 1 mM MgCl2) at 37°C for 20 min. Samples were separated on a 5% native PAGE gel in 0.25 × TBE at 150 V for 1.5 hrs. Gels were dried and exposed for autoradiography. For EMSA involving E. coli HU protein, a kind gift from Shankar Adhya, 30 nM of HPLC purified recombinant HU was incubated in binding buffer (25 mM Tris-HCl pH 8.0, 50 mM KCl, 0.5 mM EDTA, 2.5 mM DTT, 1 μ g BSA) with 7 pg of labeled double stranded oligo in a total volume of 20 μ l and complex was separated on 7.5% native page (0.25 × TBE, 150 V, 1.5 Hrs), dried and autoradiographed.
Where p is the promoter in question. ∂8pis equal to one if the 8-mer occurs in the promoter sequence and zero otherwise. Summing over p implies summing over all the promoters in the set in question.
Promoters with similar RNAP binding
In order to assure that the rotation was robust and not heavily influenced by outliers in the data set, we temporarily removed data more than one standard deviation from the original best fit line. If the best-fit line of the transposed data still maintained its 45-degree angle within some small error range, we concluded the data was successfully rotated. If not, then we repeated the procedure using the new rotated values and only those points within one standard deviation of the best-fit line to determine the new rotation angle and intercept adjustment. This was repeated until the best-fit line did not significantly alter with the removal of data points more than one standard deviation from 45 degree line.
In our case we had RNAP binding values for three distinct tissues: primary mouse keratinocytes, heart ventricle, and liver. We knew that the results are similar in all three tissues, with the exception of genes involved with tissue-specific expression in those tissues. We rotated the data by pairs in the method described above. This took several iterations since the rotation of one pair might affect the values of another pair. The end result was new "rotated binding values" for the promoters in each of the three tissues. These values were then averaged to produce the "Average RNAP binding" of that promoter in all three tissues.
Determining Tissue Specific Promoters
Promoters which were more than two standard deviations off of the 45-degree best-fit line (as determined above) through any of the three pair of data (liver-heart, liver-keratinocytes, and heart-keratinocytes), were considered "tissue-specific" (not commonly bound). Of our original set of 14,790 promotes, 929 were not commonly bound by RNAP in all three tissues, leaving 13,861 promoters which were commonly bound in all three tissues. Of 929 promoters that were not commonly bound by RNAP, tissue specific promoters were selected based on following criteria using the raw RNAP binding values:
356 liver specific promoters: L > 1.5 × H, L > 1.5 × K, H< 1.5 (raw RNAP binding value), K < 1.5
131 heart specific promoters: H > 1.3 × L, H > 1.3 × K, L < 1.5, K < 1.5
47 keratinocytes specific promoters: K > 1.5 × L, K > 1.5 × H, H < 1.5, L < 1.5
Where L stands for RNAP binding value in liver, H is RNAP binding in heart and K – RNAP binding in keratinocytes.
We thank Brian Oliver and Dave Leven for comments on an earlier version of this manuscript, Yatrik Shah for the cross-linked liver sample and Shankar Adhya for the purified HU protein.
- Web site of the Charles Vinson laboratory. [http://home.ccr.cancer.gov/metabolism/vinson/vinsonccr.htm]
- Smale ST, Kadonaga JT: The RNA polymerase II core promoter. Annu Rev Biochem. 2003, 72: 449-479. 10.1146/annurev.biochem.72.121801.161520.PubMedView ArticleGoogle Scholar
- Maston GA, Evans SK, Green MR: Transcriptional Regulatory Elements in the Human Genome. Annu Rev Genomics Hum Genet. 2006Google Scholar
- Heintzman ND, Ren B: The gateway to transcription: identifying, characterizing and understanding promoters in the eukaryotic genome. Cell Mol Life Sci. 2007, 64 (4): 386-400. 10.1007/s00018-006-6295-0.PubMedView ArticleGoogle Scholar
- Swartz MN, Trautner TA, Kornberg A: Enzymatic synthesis of deoxyribonucleic acid. XI. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. J Biol Chem. 1962, 237: 1961-1967.PubMedGoogle Scholar
- Bird A, Taggart M, Frommer M, Miller OJ, Macleod D: A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell. 1985, 40 (1): 91-99. 10.1016/0092-8674(85)90312-5.PubMedView ArticleGoogle Scholar
- Bird A: DNA methylation patterns and epigenetic memory. Genes Dev. 2002, 16 (1): 6-21. 10.1101/gad.947102.PubMedView ArticleGoogle Scholar
- Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J Mol Biol. 1987, 196 (2): 261-282. 10.1016/0022-2836(87)90689-9.PubMedView ArticleGoogle Scholar
- Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B: A high-resolution map of active promoters in the human genome. Nature. 2005, 436 (7052): 876-880. 10.1038/nature03877.PubMedPubMed CentralView ArticleGoogle Scholar
- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, et al: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006, 38 (6): 626-635. 10.1038/ng1789.PubMedView ArticleGoogle Scholar
- Sabo PJ, Humbert R, Hawrylycz M, Wallace JC, Dorschner MO, McArthur M, Stamatoyannopoulos JA: Genome-wide identification of DNaseI hypersensitive sites using active chromatin sequence libraries. Proc Natl Acad Sci USA. 2004, 101 (13): 4537-4542. 10.1073/pnas.0400678101.PubMedPubMed CentralView ArticleGoogle Scholar
- Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D, et al: Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006, 16 (1): 123-131. 10.1101/gr.4074106.PubMedPubMed CentralView ArticleGoogle Scholar
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, et al: Genome-wide location and function of DNA binding proteins. Science. 2000, 290 (5500): 2306-2309. 10.1126/science.290.5500.2306.PubMedView ArticleGoogle Scholar
- Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K: High-resolution profiling of histone methylations in the human genome. Cell. 2007, 129 (4): 823-837. 10.1016/j.cell.2007.05.009.PubMedView ArticleGoogle Scholar
- FitzGerald PC, Shlyakhtenko A, Mir AA, Vinson C: Clustering of DNA sequences in human promoters. Genome Res. 2004, 14 (8): 1562-1574. 10.1101/gr.1953904.PubMedPubMed CentralView ArticleGoogle Scholar
- Bina M, Wyss P, Ren W, Szpankowski W, Thomas E, Randhawa R, Reddy S, John PM, Pares-Matos EI, Stein A, et al: Exploring the characteristics of sequence elements in proximal promoters of human genes. Genomics. 2004, 84 (6): 929-940. 10.1016/j.ygeno.2004.08.013.PubMedView ArticleGoogle Scholar
- Marino-Ramirez L, Spouge JL, Kanga GC, Landsman D: Statistical analysis of over-represented words in human promoter sequences. Nucleic Acids Res. 2004, 32 (3): 949-958. 10.1093/nar/gkh246.PubMedPubMed CentralView ArticleGoogle Scholar
- Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M: Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature. 2005, 434 (7031): 338-345. 10.1038/nature03441.PubMedPubMed CentralView ArticleGoogle Scholar
- Weinmann AS, Farnham PJ: Identification of unknown target genes of human transcription factors using chromatin immunoprecipitation. Methods. 2002, 26 (1): 37-47. 10.1016/S1046-2023(02)00006-3.PubMedView ArticleGoogle Scholar
- Ptashne M, Gann A: Transcriptional activation by recruitment. Nature. 1997, 386 (6625): 569-577. 10.1038/386569a0.PubMedView ArticleGoogle Scholar
- Schubeler D, MacAlpine DM, Scalzo D, Wirbelauer C, Kooperberg C, van Leeuwen F, Gottschling DE, O'Neill LP, Turner BM, Delrow J, et al: The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev. 2004, 18 (11): 1263-1271. 10.1101/gad.1198204.PubMedPubMed CentralView ArticleGoogle Scholar
- Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA: A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007, 130 (1): 77-88. 10.1016/j.cell.2007.05.042.PubMedPubMed CentralView ArticleGoogle Scholar
- Fitzgerald PC, Sturgill D, Shyakhtenko A, Oliver B, Vinson C: Comparative genomics of Drosophila and human core promoters. Genome Biol. 2006, 7 (7): R53-10.1186/gb-2006-7-7-r53.PubMedPubMed CentralView ArticleGoogle Scholar
- Noma K, Allis CD, Grewal SI: Transitions in distinct histone H3 methylation patterns at the heterochromatin domain boundaries. Science. 2001, 293 (5532): 1150-1155. 10.1126/science.1064150.PubMedView ArticleGoogle Scholar
- Litt MD, Simpson M, Gaszner M, Allis CD, Felsenfeld G: Correlation between histone lysine methylation and developmental changes at the chicken beta-globin locus. Science. 2001, 293 (5539): 2453-2455. 10.1126/science.1064413.PubMedView ArticleGoogle Scholar
- GNF Genome Informatics Applications & Datasets. [http://wombat.gnf.org]
- Bajic VB, Tan SL, Christoffels A, Schonbach C, Lipovich L, Yang L, Hofmann O, Kruger A, Hide W, Kai C, et al: Mice and men: their promoter properties. PLoS Genet. 2006, 2 (4): e54-10.1371/journal.pgen.0020054.PubMedPubMed CentralView ArticleGoogle Scholar
- Lonze BE, Ginty DD: Function and regulation of CREB family transcription factors in the nervous system. Neuron. 2002, 35 (4): 605-623. 10.1016/S0896-6273(02)00828-0.PubMedView ArticleGoogle Scholar
- Impey S, McCorkle SR, Cha-Molstad H, Dwyer JM, Yochum GS, Boss JM, McWeeney S, Dunn JJ, Mandel G, Goodman RH: Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell. 2004, 119 (7): 1041-1054.PubMedGoogle Scholar
- Jones PA, Baylin SB: The epigenomics of cancer. Cell. 2007, 128 (4): 683-692. 10.1016/j.cell.2007.01.029.PubMedPubMed CentralView ArticleGoogle Scholar
- Bird AP, Wolffe AP: Methylation-induced repression – belts, braces, and chromatin. Cell. 1999, 99 (5): 451-454. 10.1016/S0092-8674(00)81532-9.PubMedView ArticleGoogle Scholar
- Tate PH, Bird AP: Effects of DNA methylation on DNA-binding proteins and gene expression. Curr Opin Genet Dev. 1993, 3 (2): 226-231. 10.1016/0959-437X(93)90027-M.PubMedView ArticleGoogle Scholar
- Weih F, Nitsch D, Reik A, Schutz G, Becker PB: Analysis of CpG methylation and genomic footprinting at the tyrosine aminotransferase gene: DNA methylation alone is not sufficient to prevent protein binding in vivo. Embo J. 1991, 10 (9): 2559-2567.PubMedPubMed CentralGoogle Scholar
- Gaston K, Fried M: CpG methylation and the binding of YY1 and ETS proteins to the Surf-1/Surf-2 bidirectional promoter. Gene. 1995, 157 (1–2): 257-259. 10.1016/0378-1119(95)00120-U.PubMedView ArticleGoogle Scholar
- Choi YS, Kim S, Kyu Lee H, Lee KU, Pak YK: In vitro methylation of nuclear respiratory factor-1 binding site suppresses the promoter activity of mitochondrial transcription factor A. Biochem Biophys Res Commun. 2004, 314 (1): 118-122. 10.1016/j.bbrc.2003.12.065.PubMedView ArticleGoogle Scholar
- Prendergast GC, Ziff EB: Methylation-sensitive sequence-specific DNA binding by the c-Myc basic region. Science. 1991, 251 (4990): 186-189. 10.1126/science.1987636.PubMedView ArticleGoogle Scholar
- Comb M, Goodman HM: CpG methylation inhibits proenkephalin gene expression and binding of the transcription factor AP-2. Nucleic Acids Res. 1990, 18 (13): 3975-3982. 10.1093/nar/18.13.3975.PubMedPubMed CentralView ArticleGoogle Scholar
- Bell AC, Felsenfeld G: Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature. 2000, 405 (6785): 482-485. 10.1038/35013100.PubMedView ArticleGoogle Scholar
- Harrington MA, Jones PA, Imagawa M, Karin M: Cytosine methylation does not affect binding of transcription factor Sp1. Proc Natl Acad Sci USA. 1988, 85 (7): 2066-2070. 10.1073/pnas.85.7.2066.PubMedPubMed CentralView ArticleGoogle Scholar
- Holler M, Westin G, Jiricny J, Schaffner W: Sp1 transcription factor binds DNA and activates transcription even when the binding site is CpG methylated. Genes Dev. 1988, 2 (9): 1127-1135. 10.1101/gad.2.9.1127.PubMedView ArticleGoogle Scholar
- Clark SJ, Harrison J, Molloy PL: Sp1 binding is inhibited by (m)Cp(m)CpG methylation. Gene. 1997, 195 (1): 67-71. 10.1016/S0378-1119(97)00164-9.PubMedView ArticleGoogle Scholar
- Zhu WG, Srinivasan K, Dai Z, Duan W, Druhan LJ, Ding H, Yee L, Villalona-Calero MA, Plass C, Otterson GA: Methylation of adjacent CpG sites affects Sp1/Sp3 binding and activity in the p21(Cip1) promoter. Mol Cell Biol. 2003, 23 (12): 4056-4065. 10.1128/MCB.23.12.4056-4065.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Mancini DN, Singh SM, Archer TK, Rodenhiser DI: Site-specific DNA methylation in the neurofibromatosis (NF1) promoter interferes with binding of CREB and SP1 transcription factors. Oncogene. 1999, 18 (28): 4108-4119. 10.1038/sj.onc.1202764.PubMedView ArticleGoogle Scholar
- Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, Bulyk ML: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol. 2006, 24 (11): 1429-1435. 10.1038/nbt1246.PubMedPubMed CentralView ArticleGoogle Scholar
- Puckett JW, Muzikar KA, Tietjen J, Warren CL, Ansari AZ, Dervan PB: Quantitative Microarray Profiling of DNA-Binding Molecules. J Am Chem Soc. 2007, 129 (40): 12310-12319. 10.1021/ja0744899.PubMedPubMed CentralView ArticleGoogle Scholar
- Bird AP: CpG-rich islands and the function of DNA methylation. Nature. 1986, 321 (6067): 209-213. 10.1038/321209a0.PubMedView ArticleGoogle Scholar
- Dlugosz AA, Glick AB, Tennenbaum T, Weinberg WC, Yuspa SH: Isolation and utilization of epidermal keratinocytes for oncogene research. Methods Enzymol. 1995, 254: 3-20.PubMedView ArticleGoogle Scholar
- The Farnham laboratory. [http://www.genomecenter.ucdavis.edu/farnham/]
- Lippman Z, Gendrel AV, Colot V, Martienssen R: Profiling DNA methylation patterns using genomic tiling microarrays. Nat Methods. 2005, 2 (3): 219-224. 10.1038/nmeth0305-219.PubMedView ArticleGoogle Scholar
- Round A/B/C Random Amplification of DNA Protocol. [http://cat.ucsf.edu/pdfs/22_Round_A_B_C_protocol.pdf]
- Gerdes MJ, Myakishev M, Frost NA, Rishi V, Moitra J, Acharya A, Levy MR, Park SW, Glick A, Yuspa SH, et al: Activator protein-1 activity regulates epithelial tumor cell identity. Cancer Res. 2006, 66 (15): 7578-7588. 10.1158/0008-5472.CAN-06-1247.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.