YY1's DNA-Binding motifs in mammalian olfactory receptor genes
© Faulk and Kim; licensee BioMed Central Ltd. 2009
Received: 21 April 2009
Accepted: 3 December 2009
Published: 3 December 2009
YY1 is an epigenetic regulator for a large number of mammalian genes. While performing genome-wide YY1 binding motif searches, we discovered that the olfactory receptor (OLFR) genes have an unusual cluster of YY1 binding sites within their coding regions. The statistical significance of this observation was further analyzed.
About 45% of the olfactory genes in the mouse have a range of 4-8 YY1 binding sites within their respective 1 kb coding regions. Statistical analyses indicate that this enrichment of YY1 motifs has likely been driven by unknown selection pressures at the DNA level, but not serendipitously by some peptides enriched within the OLFR genes. Similar patterns are also detected in the OLFR genes of all mammals analyzed, but not in the OLFR genes of the fish lineage, suggesting a mammal-specific phenomenon.
YY1, or YY1-related transcription factors, may help regulate olfactory receptor genes. Furthermore, the protein-coding regions of vertebrate genes can contain cis-regulatory elements for transcription factor binding as well as codons.
The transcription factor YY1 is a Gli-Kruppel type zinc finger protein that is highly conserved from insects through vertebrates . YY1 can function as an activator, repressor, or initiator depending upon the other regulatory elements in the region . YY1 also interacts with a variety of proteins including components of RNA polymerase II complex, transcription factors, and histone-modifying complexes [3–5]. According to genome-wide surveys, about 10% of all human genes contain YY1 binding motifs in their promoter regions . Functionally, YY1 is involved in many biological processes, including embryonic development, cell cycle progression, apoptosis, B cell development, Polycomb group Gene (PcG)-mediated repression, genomic imprinting, and X chromosomal inactivation [2, 3, 7]. YY1 was also initially identified as a factor controlling the transcriptional activity of the murine retrotransposon 'Intracisternal A Particle' . Since then, many retroposons, including SINE, LINE, and endogenous retrovirus families, have been shown to contain YY1 binding sites in their promoter regions [3, 4]. Due to this ubiquitous presence of YY1 binding sites in genome-wide repeats, YY1 has also been regarded as a surveillance gene that is responsible for repressing transcriptional background noise from these repeats .
The olfactory receptor (OLFR) genes of mammals encode short, single coding exon, G protein-coupled receptors that are responsible for sensing a large number of air-borne scents . This gene family is comprised of over 800 and 1,300 gene members in human and mouse respectively, forming the largest gene family in mammalian genomes [10–13]. The aquatic vertebrates, the teleost fish lineage, also have a similar odorant receptor gene family . However, the odorant receptor (OR) family of the fish lineage consists of a much smaller number of genes than that of the mammals, and these OR genes are also much more diverse in sequence identity than those of mammals. Mammalian OLFR genes are divided into Class I and Class II groups based on sequence identity . Class II genes make up ~90% of OLFRs and are thought to have expanded during the transition to land-based living.
In mammals these olfactory receptors presumably expanded due to the selective advantage conferred by a well developed sense of smell . While mice and other mammals retain function and expression of almost all OLFRs, the majority of these are pseudogenized in humans . The mammalian OLFR genes are highly tissue-specific and are expressed primarily in the olfactory epithelium though a subset expresses in a chemosensory role in other tissues such as kidney and sperm [17–19]. Furthermore, only one copy (allele) out of all 1,000 OLFR genes is selected and expressed in each neuron cell of the olfactory tissue . The unusual transcriptional control of the OLFR gene family is likely mediated through unknown trans-acting factors . The tissue-specific nature of their expression coupled with their widespread duplication requires a mitotically-stable global silencing mechanism in all cell types. According to recent studies, potential cis-regulatory elements recruiting these trans factors are hypothesized to be located within the protein-coding regions of the OLFR genes rather than their surrounding genomic regions [22, 23].
While performing genome-wide searches of the DNA-binding motifs of YY1, we discovered that the mammalian OLFR genes contain unusual clusters of YY1 binding sites within their protein-coding regions, whereas most YY1 binding sites are solitary and upstream of a regulated gene. In this study, we further analyzed the significance of this discovery with several bioinformatic and statistical measures, which will be described below. Specifically we test whether the presence of the YY1 binding sites could be explained by DNA sequence or amino acid motif conservation.
YY1 DNA-binding motifs in the mammalian OLFR genes
We also repeated the above analyses using the mRNA data sets derived from human, cow, and zebrafish transcriptomes to test the evolutionary conservation of this unusual clustering of YY1 binding motifs within OLFR genes (Figure 2c & 2d). The results from human and cow (data not shown) mRNA data sets also showed a pattern consistent with the mouse data set: an unusual clustering of YY1 binding motifs within many OLFR genes. In humans we found 57% (or 215 of 380) of OLFRs contain more than 4 YY1 binding sites while percentage in cow rose to 82% (or 740 of 900). It is important to note that the total number of the human OLFR genes (380) in the figure is smaller than those of the other mammals since a large fraction of the human OLFR genes are known to have become pseudogenes in recent evolutionary times and we removed all genes annotated as hypothetical. In contrast, the OLFR genes of zebrafish do not show a similar pattern to mammals. The total 25 OLFR genes of zebrafish show as wide a range of the YY1 density scores as seen in the other non-OLFR genes (Figure 2d). These results confirm that the unusual clustering of YY1 binding motifs within the OLFR genes is a feature of the mammalian lineage.
Statistical tests for the clustering of YY1 binding motifs within OLFR genes
Dipeptide frequency in olfactory receptor proteins*
In the current study, we have shown an unusual enrichment of YY1 DNA-binding motifs in the OLFR gene family of mammals. About half of the members of the OLFR gene family have a range of 4-8 YY1 binding motifs within their protein-coding regions (Figures 1&2). Statistical analyses further confirmed that this enrichment of YY1 binding motifs is consistent with functional relevance and has likely been driven by unknown selection pressure at the DNA level. Overall, the current study suggests a potential role of YY1 or YY1-related transcription factors in the regulation of the mammalian OLFR genes. Also, this study provides further evidence that the protein-coding regions of vertebrate genes can both encode codon information and contain cis-regulatory elements for transcription factor binding.
The mammalian OLFR genes are expressed primarily in olfactory neurons, and only one single copy (allele) out of the entire 1,000 family members is expressed and functional in a given neuron cell . This highly tissue-specific expression pattern of the OLFR genes necessitates a global repression mechanism for the majority of the OLFR genes in neural cells and in the other cell types. This mechanism acts prior to, and is separate from the negative feedback which prevents bi- and multi-allelic expression in olfactory neurons [29, 30]. Since unknown mechanisms are believed to repress a large number of the OLFR genes all the time in most of the cell types, it is likely that these repression mechanisms are mediated through epigenetic modifications. Since YY1 is a well-known epigenetic regulator in the animal genome, it is plausible to propose that the identified YY1 binding motifs may play some roles in the predicted repression mechanisms for the OLFR genes . In that regard, it is important to note that YY1 is one of the Polycomb group Gene (PcG) members found in both invertebrates and vertebrates . Furthermore, recent studies hinted at the possibility that PcG-mediated repression mechanisms might be involved in the regulation of the OLFR genes . In mutant mouse embryonic stem cells lacking the embryonic ectoderm development (EED) protein, some OLFR genes do not replicate asynchronously as they differentiate, suggesting a loss of the typical pattern of monoallelically expressed genes . According to the results from another recent study, some cis-regulatory elements responsible for selecting one active OLFR copy in a given neuron cell are predicted to be located within the protein-coding regions of the OLFR genes . This is intriguing and consistent with the observation of the current study in that some critical cis-elements are located within the protein-coding regions of the OLFR genes. The genetic code is optimal for containing multiple layers of encoded information within protein-coding regions . In sum, although further investigation is warranted, it is likely that the identified YY1 binding motifs are functionally important cis-regulatory elements for the regulation of the OLFR genes.
The YY1 binding motifs identified within the OLFR genes are unusual since they are localized within the protein-coding regions of these genes and are present at high density. This is quite different from the typical pattern in which transcription factor binding sites (cis-regulatory elements) are located in the genomic regions surrounding the protein-coding regions of genes. OLFR genes give a fitness advantage by duplication and divergence while remaining active, yet must also retain regulatory information. According to our statistical analyses (Figure 3&4), the identified YY1 binding motifs within the OLFR genes likely represent evolutionarily selected cis-regulatory elements. Previous studies have shown high levels of YY1 binding affinity to the type of YY1 binding motifs found within OLFRs . Though the functionality of the identified YY1 binding motifs remains to be demonstrated in vivo, it is plausible that some functional constraints serve to maintain the YY1 binding motifs within the protein-coding regions of the OLFR genes. In one scenario these motifs might be linked to the sudden expansion of this gene family in mammalian genomes. The copy number of the OLFR genes has increased dramatically in recent evolutionary times in mammalian genomes, providing a large number of receptor proteins for airborne scents . The clustering of the OLFR genes in chromosomal regions also suggests this gene family may have been duplicated through in situ tandem duplications . Tandem duplication is known to be the most frequent mechanism in increasing copy numbers for gene families . However, it is not well understood how this mechanism carries over the proper information for the transcriptional regulation of duplicated gene copies. In the case of the mammalian OLFR genes, their protein-coding regions may have both information for codon and cis-regulatory elements so that the duplication of these genes would most likely guarantee their coding potential as well as associated transcriptional control. This might have been one functional constraint for the co-evolution of the YY1 binding motifs within the protein-coding potential of the OLFR genes.
The current study reports that an unusual enrichment of YY1 binding sites, 4-8 binding sites per gene, are located in the coding regions of olfactory receptor genes in mammals. According to statistical analyses, these YY1 binding sites most likely have been selected as cis-regulatory elements. Also, similar patterns are found in other mammals, but not in fish, suggesting a mammalian-specific phenomenon. This study further suggests YY1 or YY1-related transcription factors as regulators of mammalian OLFR genes.
Visualization of YY1 binding sites in coding regions
A custom Perl script, matrix-bidirectional.pl, was run against mouse chromosome files available from Ensemble (version NCBIM37.49) ftp://ftp.ensembl.org/pub/current_fasta/mus_musculus/dna/ (see Additional file 1) . The program calculates score by matching a 10 bp window to a matrix of the likelihood for each position (Additional files 2 & 3) along the 10 bp consensus sequence (the Position Weight Matrix, PWM). Each base pair is given a value equivalent to the decimal percentage of its match to the known YY1 binding motifs. The four base pair core, CCAT, is scored at 100% for each base plus 2. Flanking bases vary in score from 0 to 1 based upon their frequency in known YY1 binding motifs. The total score for each 10 bp window is calculated and compared to our cutoff score of 8.0 which indicates a good match. Our previous work revealed that scores above 8.0 correlate with good YY1 binding in vitro [1, 35]. Output was generated in the WIG format for each chromosome with each YY1 binding motif score represented by start and end position and bar height corresponding to the PWM score match. Position weight matrix scores and location information were loaded into the University of California, Santa Cruz (UCSC) Genome Browser for visualization of YY1 location (Figure 1) .
Motif finding and scoring
Our Perl script was run against the mouse, human and zebrafish mRNA available from NCBI ftp://ftp.ncbi.nih.gov/genomes/. Each gene was scored for number and quality of YY1 motifs. Results were sorted by a YY1 density score, the combined score of YY1 motifs divided by the length of the gene. Predicted and hypothetical genes were removed from the mouse yielding 40,009 total genes (19,818 removed, 20,191 remaining), with 19,083 non-olfactory and 1,108 olfactory genes. Hypothetical genes were removed from the human transcriptome, yielding 24,886 total genes with 24,506 non-olfactory and 380 olfactory genes. Hypothetical genes were removed from the zebrafish transcriptome, yielding 9,092 genes with 9,067 non-odorant genes and 25 odorant receptors. We removed the hypothetical and predicted genes because they may not contain complete ORFs. The upstream 1 kb regions of 20419 Refseq genes in the mouse was obtained from UCSC http://hgdownload.cse.ucsc.edu/downloads.html.
Histograms were made in Microsoft Excel 2007 by sorting the genes by YY1 density from high to low, then assigning a count to order the genes. Olfactory receptor genes were separated from non-olfactory receptor genes and the count numbers were used as position information to make a histogram which shows the distribution of OLFRs along the range of YY1 containing genes (Figure 2).
Protein motif correlation testing
The mouse proteome was downloaded from NCBI ftp://ftp.ncbi.nih.gov/genomes/M_musculus/protein/protein.fa.gz which contains 34,966 peptide sequences and nomenclature. Olfactory receptor proteins (1,178) were separated from non-olfactory proteins (33,788). A Perl script, dipep-singlefile.pl was used to generate a count of each of the 400 possible dipeptides in each of these groups (see Additional file 4).
A Z-test was performed according to the formula Z = (observed - μ)/δ where observed is the count of each possible dipeptide found in olfactory receptors, μ is the mean of the counts from the whole population of non-olfactory proteins, and δ is the standard deviation of the population count. Z-score units are given in standard deviations from the mean. Expected count was calculated by multiplying the frequency of each dipeptide from all non-olfactory receptor proteins by the total number of dipeptides seen in OLFR proteins. Figure 4 shows the plot generated in Microsoft Excel 2007 comparing the observed to expected ratio for all 400 possible dipeptides and the subset of 49 YY1 core-forming dipeptides which exhibits no difference in the distribution of the subset.
We found 21 of the 49 dipeptides which can make up a YY1 binding site had greater than expected values, but only 2 were over 3 standard deviations away from the mean (Table 1, Figure 4). Table 1 shows only the dipeptides in which the observed count in olfactory genes was higher than the expected count in mouse.
Global alignment was done using ClustalW with 1,178 OLFR and 6 hist1 h4 amino acid sequences from mouse (Figure 5) . All Perl scripts are available for download on our website at http://JooKimLab.lsu.edu .
Short interspersed element
Long interspersed element
Position weight matrix
University of California at Santa Cruz
Open reading frame
National center for biotechnology information.
We thank Drs. John Caprio, Mike Hellberg, and Andrew Whitehead for critically reading the manuscript. We also thank the anonymous reviewers for helpful comments. This research was supported by National Institutes of Health R01 GM66225 (J.K).
- Kim JD, Faulk C, Kim J: Retroposition and evolution of the DNA-binding motifs of YY1, YY2 and REX1. Nucleic Acids Res. 2007, 35 (10): 3442-3452. 10.1093/nar/gkm235.PubMed CentralView ArticlePubMedGoogle Scholar
- Shi Y, Lee JS, Galvin KM: Everything you have ever wanted to know about Yin Yang 1. Biochim Biophys Acta. 1997, 1332 (2): F49-F66.PubMedGoogle Scholar
- Gordon S, Akopyan G, Garban H, Bonavida B: Transcription factor YY1: structure, function, and therapeutic implications in cancer biology. Oncogene. 2006, 25 (8): 1125-1142. 10.1038/sj.onc.1209080.View ArticlePubMedGoogle Scholar
- Thomas MJ, Seto E: Unlocking the mechanisms of transcription factor YY1: are chromatin modifying enzymes the key?. Gene. 1999, 236 (2): 197-208. 10.1016/S0378-1119(99)00261-9.View ArticlePubMedGoogle Scholar
- Wilkinson FH, Park K, Atchison ML: Polycomb recruitment to DNA in vivo by the YY1 REPO domain. Proc Natl Acad Sci USA. 2006, 103 (51): 19296-19301. 10.1073/pnas.0603564103.PubMed CentralView ArticlePubMedGoogle Scholar
- Schug J, Schuller WP, Kappen C, Salbaum JM, Bucan M, Stoeckert CJ: Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 2005, 6 (4): R33-10.1186/gb-2005-6-4-r33.PubMed CentralView ArticlePubMedGoogle Scholar
- Sui G, Affar el B, Shi Y, Brignone C, Wall NR, Yin P, Donohoe M, Luke MP, Calvo D, Grossman SR, Shi Y: Yin Yang 1 is a negative regulator of p53. Cell. 2004, 117 (7): 859-872. 10.1016/j.cell.2004.06.004.View ArticlePubMedGoogle Scholar
- Satyamoorthy K, Park K, Atchison ML, Howe CC: The intracisternal A-particle upstream element interacts with transcription factor YY1 to activate transcription: pleiotropic effects of YY1 on distinct DNA promoter elements. Mol Cell Biol. 1993, 13 (11): 6621-6628.PubMed CentralView ArticlePubMedGoogle Scholar
- Humphrey GW, Englander EW, Howard BH: Specific binding sites for a pol III transcriptional repressor and pol II transcription factor YY1 within the internucleosomal spacer region in primate Alu repetitive elements. Gene Expr. 1996, 6 (3): 151-168.PubMedGoogle Scholar
- Buck L, Axel R: A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell. 1991, 65 (1): 175-187. 10.1016/0092-8674(91)90418-X.View ArticlePubMedGoogle Scholar
- Zozulya S, Echeverri F, Nguyen T: The human olfactory receptor repertoire. Genome Biol. 2001, 2 (6): research0018.1-research0018.12. 10.1186/gb-2001-2-6-research0018.View ArticleGoogle Scholar
- Zhang X, Rodriguez I, Mombaerts P, Firestein S: Odorant and vomeronasal receptor genes in two mouse genome assemblies. Genomics. 2004, 83 (5): 802-811. 10.1016/j.ygeno.2003.10.009.View ArticlePubMedGoogle Scholar
- Niimura Y, Nei M: Evolutionary dynamics of olfactory and other chemosensory receptor genes in vertebrates. J Hum Genet. 2006, 51 (6): 505-517. 10.1007/s10038-006-0391-8.PubMed CentralView ArticlePubMedGoogle Scholar
- Alioto TS, Ngai J: The odorant receptor repertoire of teleost fish. BMC Genomics. 2005, 6: 173-10.1186/1471-2164-6-173.PubMed CentralView ArticlePubMedGoogle Scholar
- Kambere MB, Lane RP: Co-regulation of a large and rapidly evolving repertoire of odorant receptor genes. BMC Neurosci. 2007, 8 (Suppl 3): S2-10.1186/1471-2202-8-S3-S2.PubMed CentralView ArticlePubMedGoogle Scholar
- Keller A, Vosshall LB: Better smelling through genetics: mammalian odor perception. Curr Opin Neurobiol. 2008, 18 (4): 364-369. 10.1016/j.conb.2008.09.020.PubMed CentralView ArticlePubMedGoogle Scholar
- Feldmesser E, Olender T, Khen M, Yanai I, Ophir R, Lancet D: Widespread ectopic expression of olfactory receptor genes. BMC Genomics. 2006, 7: 121-10.1186/1471-2164-7-121.PubMed CentralView ArticlePubMedGoogle Scholar
- Spehr M, Schwane K, Riffell JA, Zimmer RK, Hatt H: Odorant receptors and olfactory-like signaling mechanisms in mammalian sperm. Mol Cell Endocrinol. 2006, 250: 128-136. 10.1016/j.mce.2005.12.035.View ArticlePubMedGoogle Scholar
- Pluznick JL, Dong-Jing Z, Xiaohong Z, Qingshang Y, Rodriguez-Gil DJ, Eisner C, Wells E, Greer CA, Wang T, Firestein S, Schnermann J, Caplan MJ: Functional expression of the olfactory signaling system in the kidney. Proc Natl Acad Sci USA. 2009, 106 (6): 2059-2064. 10.1073/pnas.0812859106.PubMed CentralView ArticlePubMedGoogle Scholar
- Chess A, Simon I, Cedar H, Axel R: Allelic inactivation regulates olfactory receptor gene expression. Cell. 1994, 78 (5): 823-834. 10.1016/S0092-8674(94)90562-2.View ArticlePubMedGoogle Scholar
- Shykind BM: Regulation of odorant receptors: one allele at a time. Hum Mol Genet. 2005, 14 (Spec No 1): R33-R39. 10.1093/hmg/ddi105.View ArticlePubMedGoogle Scholar
- Nguyen MQ, Zhishang Z, Marks CA, Ryba NJP, Belluscio L: Prominent roles for odorant receptor coding sequences in allelic exclusion. Cell. 2007, 131 (5): 1009-1017. 10.1016/j.cell.2007.10.050.PubMed CentralView ArticlePubMedGoogle Scholar
- Merriam LC, Chess A: cis-Regulatory elements within the odorant receptor coding region. Cell. 2007, 131 (5): 844-846. 10.1016/j.cell.2007.11.016.View ArticlePubMedGoogle Scholar
- Kim J: YY1's longer DNA-binding motifs. Genomics. 2009, 93 (2): 152-158. 10.1016/j.ygeno.2008.09.013.PubMed CentralView ArticlePubMedGoogle Scholar
- Kang K, Chung JH, Kim J: Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites. Nucleic Acids Res. 2009, 37 (6): 2003-2013. 10.1093/nar/gkp077.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim JD, Hinz AK, Bergmann A, Huang JM, Ovcharenko I, Stubbs L, Kim J: Identification of clustered YY1 binding sites in imprinting control regions. Genome Res. 2006, 16 (7): 901-911. 10.1101/gr.5091406.PubMed CentralView ArticlePubMedGoogle Scholar
- Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 1996Google Scholar
- UCSC Genome Browser. [http://genome.ucsc.edu/]
- Lewcock JW, Reed RR: A feedback mechanism regulates monoallelic odorant receptor expression. Proc Natl Acad Sci USA. 2004, 101 (4): 1069-1074. 10.1073/pnas.0307986100.PubMed CentralView ArticlePubMedGoogle Scholar
- Serizawa S, Miyamichi K, Sakano H: Negative feedback regulation ensures the one neuron-one receptor rule in the mouse olfactory system. Chem Senses. 2005, 30 (Suppl 1): i99-100. 10.1093/chemse/bjh133.View ArticlePubMedGoogle Scholar
- Alexander MK, Mlynarczyk-Evans S, Royce-Tolland M, Plocik A, Kalantry S, Magnuson T, Panning B: Differences between homologous alleles of olfactory receptor genes require the Polycomb Group protein Eed. J Cell Biol. 2007, 179 (2): 269-276. 10.1083/jcb.200706053.PubMed CentralView ArticlePubMedGoogle Scholar
- Ohno S: Evolution by gene duplication. 1970, London, NY: Allen & Unwin; Springer-VerlagView ArticleGoogle Scholar
- Itzkovitz S, Alon U: The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Res. 2007, 17 (4): 405-412. 10.1101/gr.5987307.PubMed CentralView ArticlePubMedGoogle Scholar
- Flicek P, et al: Ensembl 2008. Nucleic Acids Res. 2008, 36: D707-D714. 10.1093/nar/gkm988.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim J: Multiple YY1 and CTCF binding sites in imprinting control regions. Epigenetics. 2008, 3 (3): 115-118. 10.4161/epi.3.3.6176.View ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12 (6): 996-1006.PubMed CentralView ArticlePubMedGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23 (21): 2947-2948. 10.1093/bioinformatics/btm404.View ArticlePubMedGoogle Scholar
- The Joomyeong Kim Lab Website. [http://JooKimLab.lsu.edu]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.