Skip to main content

Widespread Alu repeat-driven expansion of consensus DR2 retinoic acid response elements during primate evolution



Nuclear receptors are hormone-regulated transcription factors whose signaling controls numerous aspects of development and physiology. Many receptors recognize DNA hormone response elements formed by direct repeats of RGKTCA motifs separated by 1 to 5 bp (DR1-DR5). Although many known such response elements are conserved in the mouse and human genomes, it is unclear to which extent transcriptional regulation by nuclear receptors has evolved specifically in primates.


We have mapped the positions of all consensus DR-type hormone response elements in the human genome, and found that DR2 motifs, recognized by retinoic acid receptors (RARs), are heavily overrepresented (108,582 elements). 90% of these are present in Alu repeats, which also contain lesser numbers of other consensus DRs, including 50% of consensus DR4 motifs. Few DR2s are in potentially mobile AluY elements and the vast majority are also present in chimp and macaque. 95.5% of Alu-DR2s are distributed throughout subclasses of AluS repeats, and arose largely through deamination of a methylated CpG dinucleotide in a non-consensus motif present in AluS sequences. We find that Alu-DR2 motifs are located adjacent to numerous known retinoic acid target genes, and show by chromatin immunoprecipitation assays in squamous carcinoma cells that several of these elements recruit RARs in vivo. These findings are supported by ChIP-on-chip data from retinoic acid-treated HL60 cells revealing RAR binding to several Alu-DR2 motifs.


These data provide strong support for the notion that Alu-mediated expansion of DR elements contributed to the evolution of gene regulation by RARs and other nuclear receptors in primates and humans.


Alu repeats are SINEs (Short INterspersed Elements) whose original insertion in genomic sequences appears to have occurred shortly after the dawn of the primate lineage. While most SINEs arose from tRNA genes, Alu elements are derived from the 7SL RNA gene, which encodes a component of the protein signal recognition complex. The structure of modern Alu sequences arose from a duplication of primordial elements, which were approximately 200 bp in length and composed of an RNA polymerase III promoter at one extremity and a polyA tail at the other. Dimeric Alu elements expanded extensively throughout primate genomes as a result of a parasitic relationship with the transposition machinery encoded by L1 retrotransposons, which are LINEs (Long INterspersed Elements) [1, 2], and today occupy ~10% of the human genome.

Several lines of evidence indicate that a variety of events associated with Alu transposition, recombination and expansion have contributed to genome evolution and to altering gene expression by several mechanisms. Because of their high CpG content and the tendency of CpG dinucleotides to mutate, and because of their prevalence, Alu repeats contain a substantial percentage of the single nucleotide polymorphisms in the human genome [1]. Their polyA tails act to seed microsatellite formation and expansion [3]. Moreover, Alu transposition has been proposed to underlie the expansion of segmental duplications that comprise approximately 5% of the human genome [4]. Conservative estimates suggest that 0.3–0.5% of human genetic disorders arise from mobile element insertion or from element-driven recombination events [5, 6]. There is also evidence that Alu repeats inserted in an antisense orientation to gene transcripts can introduce alternative splicing sites [7, 8], and that insertion of Alu sequences may alter epigenetic regulation of gene expression [9].

In addition to the above, increasing evidence suggests that Alu repeats are a source of elements that regulate transcriptional initiation by RNA polymerase II [10].

We have been interested in genome-wide mapping of hormone response elements in the promoter-proximal regions of human genes [11, 12] to study their evolutionary conservation in the vicinity of genes, and their accessibility to cognate members of the nuclear receptor family. Nuclear receptors are the primary targets of a range of lipophilic signaling molecules such as steroid and thyroid hormones, vitamin D, retinoids, specific prostaglandins and cholesterol metabolites that regulate many aspects of physiology and metabolism, as well as embryonic development [13]. They directly regulate gene transcription by ligand-dependent recruitment of so-called coregulatory proteins that carry out the histone modifications, chromatin remodeling and binding of RNA polymerase II and ancillary factors necessary for initiation of transcription [13, 14].

Nuclear receptors are composed of a series of domains, with the DNA binding and ligand binding domains being the most highly conserved. Most nuclear receptors function as homo- or heterodimers, and many receptor DNA binding domains recognize variants of consensus RGKTCA motifs arranged as either direct or inverted repeats[15]. Many non-steroid receptors recognize direct repeats in the form of heterodimers with members of the retinoid × receptor (RXR) family. For example, retinoic acid receptors (RARs) heterodimerized with RXRs specifically recognize retinoic acid response elements (RAREs) as RGKTCA direct repeats separated by 1, 2 or 5 bp (DR1, DR2 or DR5 motifs), whereas heterodimerized receptors for vitamin D (VDR) and thyroid hormone (TR) recognize DR3 and DR4 elements, respectively.

Evidence is accumulating that Alu repeats constitute a significant source of hormone response elements [16]. Notably, a motif, AGGTCAnnAGTTCG, found within most subclasses of AluS sequences, corresponds to a non-consensus DR2 element recognized by RARs, and has been shown to function as a RARE [17]. Signaling through RARs is of particular interest because retinoids control many aspects of embryonic development in a wide variety of organisms [13, 18, 19].

Here, we used a genome-wide screen to map DR-type hormone response elements in the human genome, and found that consensus DR2 elements are massively overrepresented relative to DR1 and DR5 RAREs and DRs recognized by other nuclear receptors, due to the presence of consensus DR2 motifs in >100,000 Alu repeats. The vast majority of these correspond to AluS sequences, which drove the Alu element expansion in primate that started 35–40 Myr ago. We also show that these "Alu-DR2" elements are present in retinoic acid regulated genes and bind RARs in vivo. Taken together, our data provide strong support for the notion that Alu-mediated expansion of DR elements contributed to the evolution of gene regulation by RARs and other nuclear receptors in primates and humans.


Mapping DR-type hormone response elements in the human and mouse genomes

A number of nuclear receptors recognize direct repeats of RGKTCA motifs spaced between 1 and 5 nucleotides as response elements, and individual receptors distinguish between DR elements based on inter-repeat spacings and subtle differences in motif sequence [15]. Sequencing of entire genomes has opened the way to mapping DNA response elements for nuclear hormone receptors on a genome-wide scale [11, 12]. Given that mice have been used extensively as genetic models to study nuclear receptor signaling (e.g. [19]), we compared the frequency and distribution of consensus direct repeat response elements [RGKTCAniRGKTCA, i = 1 to 5] both in the mouse and human genomes (Table 1). Although the frequency of these motifs should be comparable in random DNA sequences, our screen revealed that DR2 elements are about 10-fold more frequent than other DR sequences in the human genome, and are about 4-fold more common than in the mouse genome (108,582 vs 26,717). In addition, DR1 and DR4 elements occur more frequently than DR3 or DR5 elements (about 2-fold). To investigate the varying frequencies of different DR elements in the human and mouse genomes, we examined the proportions of each of these motifs found in transposable elements (TEs, Table 1). The vast majority of human DR2 motifs map within SINEs, in particular in Alu repeats (102,359/102,564 elements, see Figure 1 and Table 1), while approximately 50% of DR2 elements in the mouse (13,356) are found in retroviral LTRs, which only account for 1064 of the human elements (Table 1). DR4 motifs were also significantly associated within SINEs in the human genome, whereas DR1 motifs in TEs are found mostly in LINEs (Table 1). The frequencies of DR elements outside TEs is relatively similar, indicating that incorporation of DR elements within SINEs, LINEs or LTRs is the main cause for their variable distributions.

Table 1 Consensus RGKTCA direct repeats found in the human and mouse genomes.
Figure 1

Consensus DR1 to DR5 motifs embedded within Alu elements in the human genome. The numbers below each DR correspond to the total number of repeats in Alu elements/total number of repeats in the genome.

Over 100,000 consensus DR elements are present in Alu repeats

To assess whether the large over-representation of DR2s in SINEs reflects amplification of a specific sequence, we calculated the frequencies of all motifs corresponding to the DR2 consensus. Of the 102,359 consensus DR2 elements located in Alu repeats, 92,686 corresponded to the sequence AGGTCAnnAGTTCA, and of these, AGGTCAggAGTTCA accounted for 82,319 occurrences, along with 9,877 copies with only one variation in the spacer nucleotides (data not shown). Other consensus DR2 elements containing one variation in the repeated motifs were much less frequent (less than 3,000 copies each for the single variations at the R and K positions, as calculated from the percentages in DR2 elements listed in Table 2). Consensus DR1 or DR3 motifs resulting from removal or addition of one nucleotide from the highly represented DR2 sequence were even more infrequent in Alu repeats (less than 0.2% or about 200 copies, Table 2, line 2).

Table 2 RGKTCA direct repeats found in Alu elements.

The 102,359 DR2 elements, which are RAR target sequences, are by far the most frequent consensus DR motif present in Alu repeats (93.4%, Fig. 1), followed by DR4 elements, recognized specifically by TRs (5.7% or 6198 occurrences, Fig. 1). It is noteworthy that the 6198 Alu-DR4 elements represent ~50% of the total consensus DR4 elements in the genome (Fig. 1). Moreover, unlike in the DR2 consensus, RGGTCA motifs are found predominantly in the 3' repeat of the Alu DR4 sequences (Table 2, DR4 column). This position is bound by the TR when RXR-TR heterodimers bind to DR4 motifs. Since the TR has weak affinity for RGTTCA but recognizes RGGTCA motifs [20], these DR4 elements represent optimal TR binding sites.

The vast majority of Alu-DR elements are present in AluS sequences

Further analysis of human DR2-Alu elements revealed that their 5' ends are tightly clustered around positions 66–68 of the Alu repeat (Fig. 2). DR1 and DR3 elements showed largely similar distributions. In contrast, the 5' ends DR4 elements were found predominantly at position 58 of the repeat (Fig. 2). While DR5 elements were somewhat more widely distributed, most motifs started at positions 58, 66 or 68. The results reveal that DR2 and DR4 elements originated from partially overlapping sequences, where the 5' half-site of the consensus DR2 element corresponds to the 3' half-site of the DR4 motif (see below).

Figure 2

Positions of DR motifs within Alu sequences in the human genome. For clarity, only every 10th position is indicated, along with positions 58, 66 and 68. The positions indicated correspond to the 5'nucleotide of the 5'half site. Note the differences in scales of the ordinates.

The vast majority of consensus DR2 elements (97,779 or 95.5%; Fig. 3A) are present in AluS sequences, which represent just under 56% of total Alu sequences in the human genome. The overrepresentation of DR elements in AluS sequences, which largely drove the Alu repeat expansion 35–40 Myr ago [1, 2, 4], indicates that the majority of the consensus Alu-DR2 sequences (and related DR elements) present in the human genome multiplied through primate genomes during this AluS expansion. Consensus DR4 elements are similarly distributed primarily in AluS sequences (Fig. 3A). In contrast, consensus DR2 elements are markedly underrepresented in older AluJ and younger AluY sequences (2.3% and 1.7%, respectively; Fig. 3A). Of the 1.7% of DR2 motifs (1740 in total) present in potentially mobile AluY elements, only 41 motifs were not found to be present in the chimp and rhesus macaque genomes [see Additional file 1], indicating that more recent AluY-mediated transposition was not a major driving force in shaping retinoid-regulated gene expression in the human genome.

Figure 3

Consensus DR2 and DR4 elements are found predominantly in AluS sequences. (A) Distribution of consensus DR2 and DR4 elements within AluJ, S and Y families in the human genome. (B) Consensus sequences of AluJ, AluY and AluS subfamilies within the region containing the DR4 and DR2 element, and frequency of consensus DR2 RARE sequences within each subfamily. The position of box B of the polymerase III promoter is provided above. The sequences of the DR4 and DR2 consensus motifs are provided below. (C) Nucleotide substitution frequencies of the CpG dinucleotide present within AGGTCAnnAGGTCG motifs in AluS sequences.

Consensus Alu-DR2 elements arose predominantly via deamination of a methylated CpG dinucleotide

The distribution and frequency of DR2 elements in Alu sequences can be explained by the fact that the consensus DR2 RARE differs from the consensus sequence of several AluS subfamilies by a single G to A substitution in the 3'-most nucleotide of the DR2 repeat, whereas generation of an RARE consensus from Alu J or Y sequences would require multiple substitutions (Fig. 3B). Similarly, a single A to G or T substitution in the third position of the upstream GGATCA motif in AluS subfamilies Sx, Sq and Sp would generate a consensus DR4 element. Note however that although two different single mutations can lead to formation of a consensus DR4 versus one single replacement leading to a consensus DR2, the rate of conversion to a consensus DR2 is far superior to that to a DR4 element in all Alu subclasses.

It is noteworthy that generation of the DR2 RARE consensus could occur through deamination of a methylated C residue in the CpG dinucleotide of the AluS consensus, converting it to a T on the complementary strand of the direct repeat. A comparison of the frequencies of all four possible CpN dinucleotides in AGGTCAnnAGTTCN motifs present in AluS sequences indicates that CpA occurs far more frequently than CpT or CpC (Fig. 3C). Similarly, the conversion to T of the C residue of the CpG on the sense strand occurred far more frequently than other substitutions (Fig. 3C). Collectively, these data support the notion that the CpA dinucleotide in the DR2 element arose predominantly via a deamination-driven mechanism rather than by random base substitutions. The data are also consistent with other observations of high rates of CpG methylation [21] and of CpG dinucleotide decay generally in Alu repeats [22, 23]. Further, the rate of conversion to consensus DR2s was similar in different subclasses of AluS elements differing by only one nucleotide (about 15–20% except for the AluSg1, a minor subfamily of AluSg [24], for which the rate of conversion is about 60%, Fig. 3B), suggesting generation of DR2s via the same methylation/deamination mechanism throughout evolution of the family. In addition, it is worth noting that the conversion to a perfect DR2 from the consensus AluS sequence, while introducing a change in the sequence of the box B of the RNA polIII promoter, occurs without altering its consensus (GWTCRANNC), suggesting that Alu elements containing perfect DR2 may still be mobile [25]. In a recent paper Price et al. defined over 200 AluS subfamilies [26], seven of which contain consensus DR2 elements. One of these, AluSg_14 shares the presence of the consensus RARE DR2 with three subfamilies [see Additional file 4]. These AluS subfamilies are likely examples of DR2 elements amplified by transposition instead of spontaneous mutation of a CpG.

Distribution of Alu-DR2 elements relative to the 5'ends of human genes

Generation of consensus DR2 in Alu elements via a mechanism implicating methylation of CpG dinucleotides, which is associated with transcriptional silencing, raises the question of whether these motifs can be recognized by RARs as functional binding sites. To investigate whether these elements could contribute to gene regulation, we first characterized the distribution of repeats containing DR2 elements relative to the 5' ends of human genes (Fig. 4). The distribution of Alu-DR2 differed from that of Alu repeats (χ2 test, p-value = 2.84E-19), with a sharper reduction in the number of elements between -2 and +2 kb of transcriptional start sites. We note that there was a 20–30% further relative reduction in elements containing DR2 motifs within 1 kb of known 5' ends of genes compared to the general distribution of Alu elements, suggestive of a selection pressure against the presence of promoter-proximal Alu-DR2 sequences. However, RAREs are known to function as enhancer elements at distances up to 10–20 kb. Our mapping studies revealed over 18,000 Alu-DR2 elements lying within -10 kb and +10 kb of the 5'ends of >10,000 genes, [see Additional file 2], representing a substantial proportion of human genes.

Figure 4

Distribution of Alu sequences within the 10 kb flanking 5' ends of genes. The 20 kb window for all genes annotated in the human genome were subdivided in 500 bp intervals. The number of Alu sequences and Alu sequences with a DR2 element were counted for each interval. Grey bars represent the ratio of the number Alu repeats within a given interval divided by the total number of Alu repeats within 10 kb of a gene, black bars represent the number DR2-containing Alu repeats within a given interval divided by the total number of Alu repeats within 10 kb of a gene, whereas black squares indicate the number Alu motifs with a DR2 element within a given interval divided by the total number of DR2-containing Alu repeats within 10 kb of a gene.

Association of RARs with Alu-DR2 elements in vivo

It is highly likely that most of the genes with proximal Alu-DR2 elements are not implicated in RAR-dependent gene regulation in a given cell type, with access to the sites being potentially limited by chromatin structure or binding of other proteins. For example, chromatin immunoprecipitation studies followed by microarray analysis (ChIP-on-chip) of genes regulated by related estrogen receptors in estrogen-sensitive breast carcinoma cells have suggested that only a minority of potential high affinity estrogen response elements of a given chromosome are accessible to receptors [27]. Moreover, as mentioned above, Alu repeat sequences are highly susceptible to methylation [21], which correlates with a closed chromatin structure and transcriptional silencing. However, 363 Alu-DR2 were found near 193 known retinoic acid target genes [see Additional file 2], suggesting that some Alu-DR2 elements can drive retinoid-regulated gene expression. We therefore tested the accessibility of several of these Alu-DR2 sequences to RARs by chromatin immunoprecipitation (ChIP) assay. We chose multiple elements present in the vicinity of 5 genes known to be RA-responsive in cells of squamous epithelial origin, RAI1, GPRC5A (RAI3), SMYD5 (RAI15), RARRES1 and RARRES3, and assessed RAR binding to these elements in SCC25 cells, a relatively well-differentiated human head and neck squamous cell carcinoma (HNSCC) line [28]. SCC25 cells express RARs and are retinoid-responsive [28], and are arrested by RA in the G0/G1 phase of the cell cycle [29]. The genes RAI1 and GPRC5A were originally identified as RA-regulated genes in HNSCC [30]. Similarly, SMYD5 and RARRES 1 and 3 (retinoic acid receptor responder genes 1 and 3) were found to be induced by RA in human keratinocytes [31]. RARRES3 expression is lost in many tumors, and its product inhibits cell proliferation when transiently overexpressed [32].

ChIP analysis with an anti-panRAR antibody revealed RAR binding to all elements tested (Fig. 5), whereas no RAR binding was observed to control sequences lacking DR2 elements (data not shown). In most instances, binding was either not RA-dependent or weakly enhanced by ligand, consistent with capacity of RARs to bind DNA in the absence of ligand [15]. However, immunoprecipitation of RARs associated with elements #3 in GPRC5A and #1 in SMYD5 was consistently ligand-dependent (Fig. 5, and data not shown). These studies are important as they confirm that all of the Alu-DR2 elements tested are in a chromatin environment that renders them accessible to RARs in vivo. We also confirmed that an oligonucleotide corresponding to the predominant DR2 element present in Alu repeats drove robust RA-inducible gene expression when cloned upstream of a truncated thymidine kinase promoter (data not shown), supporting its potential to function as an RARE in vivo.

Figure 5

RARs bind to Alu-DR2 motifs present in retinoic acid responsive genes in vivo. Upper panel: Schematic representation of the promoter regions of selected known retinoic acid-responsive genes containing multiple consensus DR2 RAREs (AGGTCAggAGTTCA) embedded within an Alu repeat sequence. Lower Panel: Results of ChIP assays for RAR binding to promoter regions containing DR2 motifs. DNA from all immunoprecipitates (RAR-specific and control IgG) was amplified under similar conditions (40–42 cycles of PCR).

In addition, we obtained further evidence for the function of Alu-DR2 elements as RA-dependent enhancer sequences from the publicly-available results of the ENCODE Project Consortium, which mapped transcription factor binding sites including those for RARs in 44 selected regions of the human genome (30 Mb in total) [33]. ChIP-on-chip studies of these regions using retinoic acid-treated human HL60 cells identified binding regions for RARα, 17 of which encompass Alu elements containing perfect DR2 motifs (Table 3). Of these, we note that expression of caveolin-1, the product of the CAV1 gene, was found to be upregulated in retinoic acid-treated HL60 cells [34]. Taken together, the data of Fig. 5 and Table 3 strongly support the capacity of Alu-DR2 sequences to function as RAREs.

Table 3 Alu-DR2 elements within ENCODE ChIP-on-chip RARα binding regions.


Nuclear receptor signaling, particularly that of retinoic acid receptors, has been widely studied in animal models, most notably the mouse. Our findings reveal striking differences in hormone response element frequency and distribution between rodent, and primate and human genomes. Results of the genome-wide mapping of high affinity hormone response elements for members of the nuclear receptor family of transcription factors presented herein indicate that consensus DR2 retinoic acid response elements are vastly overrepresented relative to other DR response elements in the human genome, and are enriched approximately 4-fold in the human genome relative to the mouse. This enrichment arose as a result of the presence of DR2 motifs in Alu repeats, which are primate-specific. Movement of Alu repeats and other classes of transposable elements has been recognized as a potential mechanism for introduction of RNA polymerase II transcriptional regulatory elements for a variety of transcription factors into promoter regions of genes [10, 35, 36], thus altering the signal transduction pathways regulating specific genes. For example, a recent study [36] used DNA-binding profiles of representatives of different classes of transcription factors from the TRANSFAC database [37] to identify transcription factor binding sites in transposable elements located in promoter regions. This study did not screen for RAREs, for which matrices are not provided in the TRANSFAC database.

While the presence of imperfect RAREs in Alu elements and their capacity to function as RAREs in reporter assays has been reported previously [17], the main findings from our study are the unexpected large number of consensus DR2 RAREs, and the demonstration that several of these sites are functional RAR binding sites in vivo. It is striking that the genome-wide number of imperfect DR2s corresponding to the AGGTCAnnAGTTCG consensus, which matches the consensus sequences of most AluS subclasses, is not much higher (102,657) than that of the consensus DR2s (93,278), underlining the unexpectedly high number of perfect DR2 RAREs in the human genome (Fig. 3C). The generation of consensus DR2 elements by deamination of the CpG dinucleotide within the 3' AGTTCG motif is supported by the comparably high number of AGGTCAnnAGTTTG sequences (80,444), which diverge from a perfect DR2 more than the motif found in consensus AluS sequences but represent the other possible site generated by methylation/deamination at the CpG dinucleotide in the B box of Alu repeats.

Demonstration that Alu DR2 RAREs are functional binding sites in their normal chromatin context is crucial given the high frequency of methylation of Alu sequences in the human genome [21], and the association between DNA methylation and silencing of transcription in general, and of Alu repeats specifically [38]. For instance, expression and retinoic acid responsiveness of one of the genes studied above, RARRES1, is frequently inhibited in malignancies by hypermethylation [39], and the extent of methylation is inversely correlated with the state of differentiation of malignant cells [40].

The capacity of Alu-DR2 motifs to function as RAREs is supported by several lines of evidence. We identified 363 motifs within 10 kb of the 5' ends of 193 known retinoic acid-regulated genes [see Additional file 2]. It is also noteworthy that 8 Alu-DR2 within 10 kb of 9 genes overlap with DNase I-hypersensitive sites from primary human CD4+ T cells [41] [see Additional file 2]. We confirmed the capacity of several Alu-DR2 elements to bind RARs in vivo in retinoic acid-sensitive SCC25 cells by chromatin immunoprecipitation analysis of receptor binding to sequences located proximal to five genes known to be retinoic acid-responsive in squamous carcinoma cells. These studies indicate that access of RARs to the DR2 motifs under study was not impeded by their presence in an Alu repeats in these cells. Moreover, our analyses of ChIP-on-chip studies from the ENCODE consortium showed that several Alu-DR2 elements in human HL60 cells are sites of RA-dependent recruitment of RARs (Table 3), further supporting the function of Alu-DR2 elements as RAREs.

We found that DR2 consensus elements were present in substantial numbers in multiple AluS subfamilies, indicating that they arose throughout the course of AluS expansion. The perfect RARE DR2 motif corresponds to the consensus sequence in seven divergent AluS subfamilies defined by Price et al. [26], in addition to being found sporadically throughout numerous other AluS subfamily sequences (Additional file 4 and unpublished results). It should be noted that the downstream half-site of the DR2 element lies within box B of the Alu RNA polymerase III promoter, but that the conversion of RGTTCG to RGTTCA to generate the consensus DR2 motif does not necessarily impair its competence for retrotransposition, consistent with transcriptional activity of Alu RNA polymerase III promoters containing RGTTCAANNC box B motifs [25, 42].

Consistent with their presence predominantly in AluS sequences, which represent relatively ancient Alu elements, we found that the vast majority of human Alu-DR2 motifs are present in the chimp and macaque genomes. A few Alu-DR2 elements were apparently unique to the human genome, although in some cases they corresponded to regions that were not sequenced in the simian genomes. While none of these uniquely human Alu-DR2s are adjacent to known retinoid-regulated genes in humans, it will be of interest in the future to characterize the potential of RA to regulate their expression in appropriate cell model systems. It is also noteworthy that one of these genes, CYP2A6, was shown to be regulated by the nuclear receptor hepatic nuclear factor 4 (HNF4) [43], which, like RARs, can recognize DR2 motifs [44].

Our previous work identified a consensus DR3 element recognized by the VDR in vitro and in vivo in the proximal promoter region of the cathelicidin antimicrobial peptide gene [45], which was found to be strongly regulated by vitamin D in a variety of human cell types. However, neither the regulation nor the response element was conserved in mouse, and studies by others [46] showed the DR3 to be conserved in primates, but not other species, and present in an Alu repeat. This represents a clear example of an Alu-driven divergence in nuclear receptor gene regulation between primates/humans and other species. Results presented here demonstrate that Alu repeat expansion similarly has led to the generation of functional binding sites for retinoic acid receptors. Further studies of DR2-containing Alu repeats will be necessary to determine precisely to which extent these elements have contributed to the primate-specific regulation of target genes by retinoic acid.


We find that consensus DR2 motifs are heavily overrepresented in the human genome relative to other DR response elements due to their presence in a subset of Alu motifs, in particular in AluS sequences. Although DR4s are found far less frequently in Alu repeats, those present account for 50% of consensus DR4 motifs in the human genome. Consensus Alu-DR2 elements arose predominantly through deamination of a methylated CpG dinucleotide present in AluS elements rather than through random base substitutions. While Alu elements can be transcriptionally repressed by methylation-driven mechanisms, our screen identified Alu-DR2 elements in promoter proximal regions of numerous retinoid regulated genes. Importantly, we found that Alu-DR2s are accessible to RARs in vivo in two RA-responsive human cell lines, indicating that the expansion of DR2 motifs within Alu elements during the course of primate evolution contributed to altering regulation of gene expression by retinoic acid.


Genome-wide identification of DR1-DR5 within Alu sequences

The human (hg17, May 2004) and mouse (mm6, March 2005) reference sequence chromosomes from the UCSC Genome Browser database [47, 48] were scanned for RGKTCA motifs separated by 1 to 5 bp (DR1-DR5) using a previously described program perl program [11, 12]. The RefSeq Genes track [49], downloaded September 21 2005, was used to identify the DR2 located -10 to +10 kb of gene 5'-ends. A custom SQL program used the chromosomal location of Alu sequences from the RepeatMasker track [50] to identify included DR1-DR5. The same program used the chromosomal location of regions bound by RARα from the ENCODE Affy Sites track to identify DR2 elements within these regions. Another SQL program was used to identify the DR2 in AluY that are not included within Human/Chimp (hg17 vs panTro1) and Human/Rhesus (hg17 vs rheMac2) pairwise BLASTZ alignment [51]. The χ2 test to compare the distribution of Alu-DR2 to Alu repeat was carried out with the chisq.test function of the R statistical computing environment [52]. The cladogram of AluSg subfamilies was generated with PHYLIP 3.6 [53] using the consensus of AluS families from Repbase [54] and AluS subfamilies with RARE DR2 in their consensus from Price et al [26]. All programs were run on the bioinformatics cluster of The Quebec Bioinformatics Network (BioneQ) and are available upon request.

ChIP assays

Approximately 2 × 107 SCC25 cells were treated for 1 h with RA 10-6M or vehicle. Then cells were rinsed with PBS and incubated for 10 min in PBS containing 1% formaldehyde at 37°C. Cells were washed with ice-cold PBS and the cell pellets were resuspended in 300 μl of lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCl (pH 8.1) and 1× protease inhibitor cocktail (Roche Diagnostics, Laval, Que, Canada). Sonication was performed by pulsing three times for 15 s to reduce DNA to fragments of ~300–500 bp in length. Centrifugation was then performed to obtain cleared cell lysates. Lysates were diluted 1:10 in ChIP dilution buffer (1% Triton X-100, 1 mM EDTA, 150 mM NaCl and 20 mM Tris-HCl pH 8.1). A portion of the lysate was taken as INPUT control sample. Pre-clearing was done by incubating the cell lysates with salmon sperm DNA (1 μg/ml), BSA (1 μg/ml) and protein A-agarose slurry (Santa Cruz Biotechnology, Santa Cruz CA) for 1 h. Cell lysates were then incubated by overnight rotation at 4°C with 4 μg of either normal rabbit IgG or primary antibody, followed by rotation-incubation with protein A-agarose for 2 h. The beads were rinsed three times sequentially with TSE I buffer (0.1% SDS; 1% Triton X-100; 2 mM EDTA; 150 mM NaCl; 20 mM Tris-HCl, pH 8.1), TSE II buffer (0.1% SDS; 1% Triton X-100; 2 mM EDTA; 500 mM NaCl; 20 mM Tris-HCl, pH 8.1), LiCl Wash Buffer (0.25 M LiCl; 1% NP-40; 1% deoxychlolate (Na salt); 1 mM EDTA; 10 mM Tris-HCl, pH 8.1) and 2 times TE buffer (10 mM Tris-HCl; 1 mM EDTA, pH 8.0). The beads were eluted twice with 250 μl elution buffer (1% SDS; 0.1 M NaHCO3). To reverse the cross-linking, the eluates were incubated in a 65°C incubator overnight and then the samples were purified with PCR purification kit (Qiagen, Mississauga, Ont, Canada). PCR primers used for ChIP assays are described in Additional file 3.



chromatin immunoprecipitation


direct repeat


hepatic nuclear factor 4


long interspersed elements


retinoic acid


retinoic acid receptor


retinoic acid response element


retinoid × receptor


Short INterspersed Elements


thyroid hormone receptor


vitamin D receptor.


  1. 1.

    Hedges DJ, Batzer MA: From the margins of the genome: mobile elements shape primate evolution. Bioessays. 2005, 27 (8): 785-794. 10.1002/bies.20268.

    CAS  PubMed  Article  Google Scholar 

  2. 2.

    Jurka J: Evolutionary impact of human Alu repetitive elements. Curr Opin Genet Dev. 2004, 14 (6): 603-608. 10.1016/j.gde.2004.08.008.

    CAS  PubMed  Article  Google Scholar 

  3. 3.

    Arcot SS, Wang Z, Weber JL, Deininger PL, Batzer MA: Alu repeats: a source for the genesis of primate microsatellites. Genomics. 1995, 29 (1): 136-144. 10.1006/geno.1995.1224.

    CAS  PubMed  Article  Google Scholar 

  4. 4.

    Bailey JA, Liu G, Eichler EE: An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet. 2003, 73 (4): 823-834. 10.1086/378594.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  5. 5.

    Deininger PL, Batzer MA: Alu repeats and human disease. Mol Genet Metab. 1999, 67 (3): 183-193. 10.1006/mgme.1999.2864.

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Li X, Scaringe WA, Hill KA, Roberts S, Mengos A, Careri D, Pinto MT, Kasper CK, Sommer SS: Frequency of recent retrotransposition events in the human factor IX gene. Hum Mutat. 2001, 17 (6): 511-519. 10.1002/humu.1134.

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Sorek R, Ast G, Graur D: Alu-containing exons are alternatively spliced. Genome Res. 2002, 12 (7): 1060-1067. 10.1101/gr.229302.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  8. 8.

    Dagan T, Sorek R, Sharon E, Ast G, Graur D: AluGene: a database of Alu elements incorporated within protein-coding genes. Nucleic Acids Res. 2004, 32 (Database issue): D489-92. 10.1093/nar/gkh132.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  9. 9.

    Greally JM: Short interspersed transposable elements (SINEs) are excluded from imprinted regions in the human genome. Proc Natl Acad Sci U S A. 2002, 99 (1): 327-332. 10.1073/pnas.012539199.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  10. 10.

    Shankar R, Grover D, Brahmachari SK, Mukerji M: Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements. BMC Evol Biol. 2004, 4 (1): 37-10.1186/1471-2148-4-37.

    PubMed Central  PubMed  Article  Google Scholar 

  11. 11.

    Bourdeau V, Deschenes J, Metivier R, Nagai Y, Nguyen D, Bretschneider N, Gannon F, White JH, Mader S: Genome-wide identification of high-affinity estrogen response elements in human and mouse. Mol Endocrinol. 2004, 18 (6): 1411-1427. 10.1210/me.2003-0441.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    Wang TT, Tavera-Mendoza LE, Laperriere D, Libby E, MacLeod NB, Nagai Y, Bourdeau V, Konstorum A, Lallemant B, Zhang R, Mader S, White JH: Large-scale in silico and microarray-based identification of direct 1,25-dihydroxyvitamin D3 target genes. Mol Endocrinol. 2005, 19 (11): 2685-2695. 10.1210/me.2005-0106.

    CAS  PubMed  Article  Google Scholar 

  13. 13.

    Chawla A, Repa JJ, Evans RM, Mangelsdorf DJ: Nuclear receptors and lipid physiology: opening the X-files. Science. 2001, 294 (5548): 1866-1870. 10.1126/science.294.5548.1866.

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    McKenna NJ, O'Malley BW: Combinatorial control of gene expression by nuclear receptors and coregulators. Cell. 2002, 108 (4): 465-474. 10.1016/S0092-8674(02)00641-4.

    CAS  PubMed  Article  Google Scholar 

  15. 15.

    Claessens F, Gewirth DT: DNA recognition by nuclear receptors. Essays Biochem. 2004, 40: 59-72.

    CAS  PubMed  Article  Google Scholar 

  16. 16.

    Babich V, Aksenov N, Alexeenko V, Oei SL, Buchlow G, Tomilin N: Association of some potential hormone response elements in human genes with the Alu family repeats. Gene. 1999, 239 (2): 341-349. 10.1016/S0378-1119(99)00391-1.

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Vansant G, Reynolds WF: The consensus sequence of a major Alu subfamily contains a functional retinoic acid response element. Proc Natl Acad Sci U S A. 1995, 92 (18): 8229-8233. 10.1073/pnas.92.18.8229.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  18. 18.

    Chambon P: A decade of molecular biology of retinoic acid receptors. Faseb J. 1996, 10 (9): 940-954.

    CAS  PubMed  Google Scholar 

  19. 19.

    Mark M, Ghyselinck NB, Chambon P: Function of retinoid nuclear receptors: lessons from genetic and pharmacological dissections of the retinoic acid signaling pathway during mouse embryogenesis. Annu Rev Pharmacol Toxicol. 2006, 46: 451-480. 10.1146/annurev.pharmtox.46.120604.141156.

    CAS  PubMed  Article  Google Scholar 

  20. 20.

    Mader S, Chen JY, Chen Z, White J, Chambon P, Gronemeyer H: The patterns of binding of RAR, RXR and TR homo- and heterodimers to direct repeats are dictated by the binding specificites of the DNA binding domains. Embo J. 1993, 12 (13): 5029-5041.

    CAS  PubMed Central  PubMed  Google Scholar 

  21. 21.

    Kondo Y, Issa JP: Enrichment for histone H3 lysine 9 methylation at Alu repeats in human cells. J Biol Chem. 2003, 278 (30): 27658-27662. 10.1074/jbc.M304072200.

    CAS  PubMed  Article  Google Scholar 

  22. 22.

    Britten RJ, Baron WF, Stout DB, Davidson EH: Sources and evolution of human Alu repeated sequences. Proc Natl Acad Sci U S A. 1988, 85 (13): 4770-4774. 10.1073/pnas.85.13.4770.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  23. 23.

    Labuda D, Striker G: Sequence conservation in Alu evolution. Nucleic Acids Res. 1989, 17 (7): 2477-2491. 10.1093/nar/17.7.2477.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  24. 24.

    Kapitonov V, Jurka J: The age of Alu subfamilies. Journal of molecular evolution. 1996, 42 (1): 59-65. 10.1007/BF00163212.

    CAS  PubMed  Article  Google Scholar 

  25. 25.

    Murphy MH, Baralle FE: Directed semisynthetic point mutational analysis of an RNA polymerase III promoter. Nucleic Acids Res. 1983, 11 (22): 7695-7700. 10.1093/nar/11.22.7695.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  26. 26.

    Price AL, Eskin E, Pevzner PA: Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res. 2004, 14 (11): 2245-2252. 10.1101/gr.2693004.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  27. 27.

    Carroll JS, Liu XS, Brodsky AS, Li W, Meyer CA, Szary AJ, Eeckhoute J, Shao W, Hestermann EV, Geistlinger TR, Fox EA, Silver PA, Brown M: Chromosome-wide mapping of estrogen receptor binding reveals long-range regulation requiring the forkhead protein FoxA1. Cell. 2005, 122 (1): 33-43. 10.1016/j.cell.2005.05.008.

    CAS  PubMed  Article  Google Scholar 

  28. 28.

    Hu L, Crowe DL, Rheinwald JG, Chambon P, Gudas LJ: Abnormal expression of retinoic acid receptors and keratin 19 by human oral and epidermal squamous cell carcinoma cell lines. Cancer Res. 1991, 51 (15): 3972-3981.

    CAS  PubMed  Google Scholar 

  29. 29.

    Akutsu N, Lin R, Bastien Y, Bestawros A, Enepekides DJ, Black MJ, White JH: Regulation of gene Expression by 1alpha,25-dihydroxyvitamin D3 and Its analog EB1089 under growth-inhibitory conditions in squamous carcinoma Cells. Mol Endocrinol. 2001, 15 (7): 1127-1139. 10.1210/me.15.7.1127.

    CAS  PubMed  Google Scholar 

  30. 30.

    Cheng Y, Lotan R: Molecular cloning and characterization of a novel retinoic acid-inducible gene that encodes a putative G protein-coupled receptor. J Biol Chem. 1998, 273 (52): 35008-35015. 10.1074/jbc.273.52.35008.

    CAS  PubMed  Article  Google Scholar 

  31. 31.

    Nagpal S, Chandraratna RA: Vitamin A and regulation of gene expression. Curr Opin Clin Nutr Metab Care. 1998, 1 (4): 341-346. 10.1097/00075197-199807000-00005.

    CAS  PubMed  Article  Google Scholar 

  32. 32.

    DiSepio D, Ghosn C, Eckert RL, Deucher A, Robinson N, Duvic M, Chandraratna RA, Nagpal S: Identification and characterization of a retinoid-induced class II tumor suppressor/growth regulatory gene. Proc Natl Acad Sci U S A. 1998, 95 (25): 14811-14815. 10.1073/pnas.95.25.14811.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  33. 33.

    Consortium TEP: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004, 306 (5696): 636-640. 10.1126/science.1105136.

    Article  Google Scholar 

  34. 34.

    Navakauskiene R, Treigyte G, Gineitis A, Magnusson KE: Identification of apoptotic tyrosine-phosphorylated proteins after etoposide or retinoic acid treatment. Proteomics. 2004, 4 (4): 1029-1041. 10.1002/pmic.200300671.

    CAS  PubMed  Article  Google Scholar 

  35. 35.

    Oei SL, Babich VS, Kazakov VI, Usmanova NM, Kropotov AV, Tomilin NV: Clusters of regulatory signals for RNA polymerase II transcription associated with Alu family repeats and CpG islands in human promoters. Genomics. 2004, 83 (5): 873-882. 10.1016/j.ygeno.2003.11.001.

    CAS  PubMed  Article  Google Scholar 

  36. 36.

    Thornburg BG, Gotea V, Makalowski W: Transposable elements as a significant source of transcription regulating signals. Gene. 2006, 365: 104-110. 10.1016/j.gene.2005.09.036.

    CAS  PubMed  Article  Google Scholar 

  37. 37.

    Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, 34 (Database issue): D108-10. 10.1093/nar/gkj143.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  38. 38.

    Li TH, Kim C, Rubin CM, Schmid CW: K562 cells implicate increased chromatin accessibility in Alu transcriptional activation. Nucleic Acids Res. 2000, 28 (16): 3031-3039. 10.1093/nar/28.16.3031.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  39. 39.

    Wu CC, Shyu RY, Chou JM, Jao SW, Chao PC, Kang JC, Wu ST, Huang SL, Jiang SY: RARRES1 expression is significantly related to tumour differentiation and staging in colorectal adenocarcinoma. Eur J Cancer. 2006, 42 (4): 557-565. 10.1016/j.ejca.2005.11.015.

    CAS  PubMed  Article  Google Scholar 

  40. 40.

    Youssef EM, Chen XQ, Higuchi E, Kondo Y, Garcia-Manero G, Lotan R, Issa JP: Hypermethylation and silencing of the putative tumor suppressor Tazarotene-induced gene 1 in human cancers. Cancer Res. 2004, 64 (7): 2411-2417. 10.1158/0008-5472.CAN-03-0164.

    CAS  PubMed  Article  Google Scholar 

  41. 41.

    Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D, Zhou D, Luo S, Vasicek TJ, Daly MJ, Wolfsberg TG, Collins FS: Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006, 16 (1): 123-131. 10.1101/gr.4074106.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  42. 42.

    Paolella G, Lucero MA, Murphy MH, Baralle FE: The Alu family repeat promoter has a tRNA-like bipartite structure. Embo J. 1983, 2 (5): 691-696.

    CAS  PubMed Central  PubMed  Google Scholar 

  43. 43.

    Jover R, Bort R, Gomez-Lechon MJ, Castell JV: Cytochrome P450 regulation by hepatocyte nuclear factor 4 in human hepatocytes: a study using adenovirus-mediated antisense targeting. Hepatology. 2001, 33 (3): 668-675. 10.1053/jhep.2001.22176.

    CAS  PubMed  Article  Google Scholar 

  44. 44.

    Makita T, Hernandez-Hoyos G, Chen TH, Wu H, Rothenberg EV, Sucov HM: A developmental transition in definitive erythropoiesis: erythropoietin expression is sequentially regulated by retinoic acid receptors and HNF4. Genes Dev. 2001, 15 (7): 889-901. 10.1101/gad.871601.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  45. 45.

    Wang TT, Nestel FP, Bourdeau V, Nagai Y, Wang Q, Liao J, Tavera-Mendoza L, Lin R, Hanrahan JW, Mader S, White JH: Cutting edge: 1,25-dihydroxyvitamin D3 is a direct inducer of antimicrobial peptide gene expression. J Immunol. 2004, 173 (5): 2909-2912.

    CAS  PubMed  Article  Google Scholar 

  46. 46.

    Gombart AF, Borregaard N, Koeffler HP: Human cathelicidin antimicrobial peptide (CAMP) gene is a direct target of the vitamin D receptor and is strongly up-regulated in myeloid cells by 1,25-dihydroxyvitamin D3. Faseb J. 2005, 19 (9): 1067-1077. 10.1096/fj.04-3284com.

    CAS  PubMed  Article  Google Scholar 

  47. 47.

    Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ: The UCSC Genome Browser Database. Nucleic Acids Res. 2003, 31 (1): 51-54. 10.1093/nar/gkg129.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  48. 48.

    Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM, Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006, 34 (Database issue): D590-8. 10.1093/nar/gkj144.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  49. 49.

    Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, 33 (Database issue): D501-4. 10.1093/nar/gki025.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  50. 50.

    Smit AFA, Hubley R, Green P: Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-3.0. []

  51. 51.

    Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W: Human-mouse alignments with BLASTZ. Genome Res. 2003, 13 (1): 103-107. 10.1101/gr.809403.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  52. 52.

    Team RDC: R: A Language and Environment for Statistical Computing. []

  53. 53.

    Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 164-166.

    Google Scholar 

  54. 54.

    Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110 (1-4): 462-467. 10.1159/000084979.

    CAS  PubMed  Article  Google Scholar 

Download references


This work was supported by an operating grant from the Canadian Institutes of Health Research to JHW and SM (MOP-74571) and by a grant from the Natural Sciences and Engineering Research Council of Canada to SM. JHW and SM are Chercheurs-boursier of the Fonds de Recherche en Santé du Québec. DL was supported by a scholarship from a CIHR Strategic Training Program Grant in Bioinformatics.

Author information



Corresponding authors

Correspondence to John H White or Sylvie Mader.

Additional information

Authors' contributions

DL was responsible for the genome-wide analysis of response element distribution in the human and mouse genomes, TTW performed the ChIP assays, JW and SM conceived the study, directed DL and TTW in the analysis of their work and wrote the article. All authors read and approved the final manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Laperriere, D., Wang, TT., White, J.H. et al. Widespread Alu repeat-driven expansion of consensus DR2 retinoic acid response elements during primate evolution. BMC Genomics 8, 23 (2007).

Download citation


  • Retinoic Acid
  • Retinoic Acid Receptor
  • SCC25 Cell
  • Hormone Response Element
  • AluS Sequence