Skip to main content

Nuclear Receptor HNF4α Binding Sequences are Widespread in Alu Repeats



Alu repeats, which account for ~10% of the human genome, were originally considered to be junk DNA. Recent studies, however, suggest that they may contain transcription factor binding sites and hence possibly play a role in regulating gene expression.


Here, we show that binding sites for a highly conserved member of the nuclear receptor superfamily of ligand-dependent transcription factors, hepatocyte nuclear factor 4alpha (HNF4α, NR2A1), are highly prevalent in Alu repeats. We employ high throughput protein binding microarrays (PBMs) to show that HNF4α binds > 66 unique sequences in Alu repeats that are present in ~1.2 million locations in the human genome. We use chromatin immunoprecipitation (ChIP) to demonstrate that HNF4α binds Alu elements in the promoters of target genes (ABCC3, APOA4, APOM, ATPIF1, CANX, FEMT1A, GSTM4, IL32, IP6K2, PRLR, PRODH2, SOCS2, TTR) and luciferase assays to show that at least some of those Alu elements can modulate HNF4α-mediated transactivation in vivo (APOM, PRODH2, TTR, APOA4). HNF4α-Alu elements are enriched in promoters of genes involved in RNA processing and a sizeable fraction are in regions of accessible chromatin. Comparative genomics analysis suggests that there may have been a gain in HNF4α binding sites in Alu elements during evolution and that non Alu repeats, such as Tiggers, also contain HNF4α sites.


Our findings suggest that HNF4α, in addition to regulating gene expression via high affinity binding sites, may also modulate transcription via low affinity sites in Alu repeats.


As much as 50% of the ~3 billion base pairs in the human genome may be derived from repetitive DNA sequence [1]. While repetitive DNA is often referred to as "junk" DNA, even when that term was originally coined it was hypothesized that junk DNA may play an active role in genome function [2]. The notion that repetitive DNA may play a regulatory role and be involved in the evolution of gene regulation was also postulated early on, although it was not until recently that there was evidence to support those ideas [35].

A major category of repetitive DNA is short interspersed nuclear elements (SINEs), which are believed to have originated from the 7SL RNA gene that is part of the ribosome complex [6]. In the human genome, the largest class of SINEs are Alu repeats, which at ~1.2 million copies account for ~10% of the human genome [1]. Alu elements were first characterized as ~300 nucleotide repetitive sequences that contain an AluI restriction site (5'-AGCT-3') from the bacterium Arthrobacter luteus[7, 8]. Alu elements, which are still mobile in the human genome by virtue of the action of a LINE-1 reverse transcriptase [9], are a relatively recent occurrence evolutionarily. They are found exclusively in primates, including humans, and hence are postulated to have entered the mammalian genome ~60-65 million years ago [10].

Alu elements have been implicated in several human diseases including leukemia, hemophilia and breast cancer, suggesting that their impact on human health may be significant [11]. There are several well characterized examples of Alu insertions affecting splicing patterns and hence protein function [12]. A variety of transcription factor (TF) binding sites (TFBSs) have also been characterized in Alu elements, including sites for YY1 [13], Sp1 [14], tumor suppressor p53 [15], homeodomain and TATA binding proteins [16]. Nuclear receptors (NR), which belong to a superfamily of ligand-dependent TFs, have also been found to have binding sites in Alu elements: retinoid acid receptor (RAR, NR1B) [17], estrogen receptor (ER, NR3A) [18, 19], progesterone receptor (PR, NR3C3) [20] and vitamin D receptor (VDR, NR1I1) [21]. Alu insertions have also been shown to alter the expression of at least six human genes: CD8a (CD8A), keratin 18 (KRT18), parathyroid hormone (PTH), Wilm's tumor 1 (WT1), receptor for Fc fragment of IgE, high affinity I, gamma polypeptide (FCER1G) and breast cancer 1, early onset (BRCA1)[22]. Therefore, Alu sequences may regulate the level of transcripts and hence proteins in the cell, as well as the function of those proteins.

Hepatocyte nuclear factor 4 alpha, (HNF4α, NR2A1) is a member of the NR superfamily that is highly expressed in the liver, as well as the kidney, intestine (large and small), pancreas and stomach [23]. HNF4α is best known for its role in the adult liver and pancreas, as well as in early development [24, 25]; it also has an emerging role in the gut [2628]. The HNF4Α gene is mutated in an inherited form of type 2 diabetes, maturity onset diabetes of the young 1 (MODY1) [29], and was recently identified as a susceptibility locus in inflammatory bowel disease (IBD) [30]. Mutations in HNF4α binding sites have also been directly linked to human diseases, including hemophilia and MODY3 [31, 32]. Many NRs are common drug targets [33]; the recent identification of the endogenous ligand of HNF4α that binds in a reversible fashion also makes HNF4α a potential drug target [34, 35].

In addition to its medical relevance, HNF4α also appears to play a unique role in the evolution of NRs. It is highly conserved across species, with 100% amino acid conservation in the DNA binding domain of all mammalian HNF4α. While HNF4α is most similar to the retinoid × receptor alpha (RXRα, NR2B1), unlike many other NRs, it does not heterodimerize with RXR. Rather, it binds DNA in the form of direct repeats separated by one nucleotide (DR1, AGGTCAxAGGTCA) exclusively as a homodimer [36]. HNF4α has been found in every animal organism examined thus far, including sponge and coral [37], and has been postulated to be the ancestor of the entire NR family [38].

Many hundreds of HNF4α target genes have been identified by both classical promoter analysis as well as more modern genome-wide studies [32, 3941]. During one such genomic study, we observed a very uneven frequency profile of individual HNF4α binding sequences [42]. Specifically, we noted that a certain DNA sequence designated H4.141 (5'-AGGCTGaAGTGCA-3') was > 100-fold overrepresented compared to other HNF4α binding sites in the human, but not the mouse, genome (see additional file 1: Figure S1). In the current study, we investigate the notion that these and other HNF4α binding sequences are in Alu repeats. We use the powerful high throughput technology of protein binding micorarrays (PBMs) to show that HNF4α does indeed bind numerous sequences in Alu repeats in vitro. We perform ChIP and luciferase assays to show that HNF4α binds at least some Alu sequences in vivo and that those binding events are associated with transcriptional activation. Finally, we investigate accessibility of these sites by correlation with DNase hypsersensitivity data and evolutionary conservation by comparative genomic analysis.


HNF4α binds Alu repeats in vitro

Since genome-wide location analysis (i.e., ChIP-chip/seq) often filters out or cannot distinguish the exact location of TF binding events in highly repetitive DNA, we took a combined in vitro/in silico approach to determine whether HNF4α binds Alu elements. We generated a custom protein binding microarray (PBM3) that contained 200 unique Alu-associated sequences (Figure 1). Since RAR was previously shown to bind DR2-like sequences (AGGTCAxxAGGTCA) in Alu repeats [17] and since we have previously shown that HNF4α, while preferring DR1s, can also bind DR2s [43], we also put on the PBM ~1470 permutations of DR1 and DR2 sequences as well as ~150 random controls and ~2000 additional sequences in the human genome predicted by a support vector machine (SVM) algorithm to bind HNF4α [42]. Each sequence was replicated four times for a total of more than 15,000 spots of DNA.

Figure 1

Schematic diagram of the protein binding microarray (PBM) designed to test the ability of HNF4α to bind Alu-derived DNA sequences. Top, schematic structure of a generic Alu element (~300 nt long) comprised of two related, but non identical monomers, the right and left arms (adapted from [75]). Box A and B are RNA Pol III internal promoters. Relative positions of the 200 Alu-derived 13-mers incorporated into PBM3 are also shown. Bottom, remaining DNA probes on PBM3 and workflow.

We found that human HNF4α2 bound 66 out of 200 Alu-derived 13-mers in a significant fashion (> 2 SD better than random controls, p-value < 0.045 for the lowest binder) (Figure 2A). It also bound 994 out of 3796 non Alu-derived sequences, although eight of those sequences were subsequently found also to be associated with Alu repeats at a frequency of > 90%. An exact match search of the entire human genome (hg18) with the 1060 sequences that bound HNF4α in the PBM (66 + 994) showed that there are a total of 1,320,513 occurrences of those HNF4α binding sites in the genome and that the vast majority (94.9%, 1,252,918) are in repetitive elements, of which most (95.7%, 1,198,534) are in Alu repeats (Figure 2A). This number is much greater than that previously found for RAR binding sites in Alu elements but that is most likely due to the fact that strict DR1 and DR2 consensus sequences were used for the genomic search [17].

Figure 2

The vast majority of HNF4α binding sites in the human genome are found in Alu repeats. A. Numerical results from PBM3 described in Fig. 1 and in the text. B. Position weight matrices (PWMs) generated with Weblogo [76] of sequences bound by human HNF4α2 in the PBM, categorized by the type of sequence. The DR1-derived PWM was from 994 sequences bound by HNF4α2, the DR2-derived PWM from 50 sequences and the Alu-derived sequences from 66 sequences. The non canonical PWM is from Bolotin et al. [42].

The position weight matrices (PWM) of the DR1- and DR2-derived sequences bound by HNF4α were essentially identical and suggested that for HNF4α the core of CAAAG is more relevant than the AGGTCA half sites (Figure 2B). Interestingly, the PWM of the Alu-derived 13-mers bound by HNF4α did not contain a prominent CAAAG core but did contain an identifiable AGGTCA half site on the right hand side; the left hand portion was primarily C-rich. Overall, the Alu-derived PWM strongly resembled the non canonical HNF4α PWM we identified in our previous PBM study [42], although the association of the non canonical motif with Alu elements was not investigated. A partial list of DNA sequences significantly bound by HNF4α in the PBM and their estimated frequencies in Alu repeats and the human genome (hg18) is given in Table 1 (see additional file 2: Table S1 for the complete list of HNF4α-bound motifs associated with Alu repeats).

Table 1 Frequency of Alu-derived sequences bound by HNF4α in Alu repeats and the human genome.

HNF4α binds Alu repeats in the promoter region of target genes in vivo

To investigate HNF4α binding to Alu repeats in the promoters of HNF4α target genes in vivo, we performed a ChIP assay for HNF4α in human hepatocellular carcinoma HepG2 cells that express HNF4α and many of its target genes. Several criteria were used for selecting potential Alu sequences for ChIP analysis. First, the Alu element had to contain a probable HNF4α binding site based on the PBM results. Second, the gene containing the Alu element had to be down regulated > 1.4-fold by HNF4α RNAi in HepG2 cells as determined by expression profiling [42]. Third, the Alu repeat had to be within -5 kb to +1 kb of the transcription start site (TSS) of the gene. Fourth, the Alu element had to be amenable to primer design and PCR amplification, non trivial criteria due to the repetitive nature of the sequences. Overall, 47 sets of primers for 35 genes were designed, of which 15 sets gave a specific signal from the input control, indicating appropriate amplification of the Alu sequence. Finally, of those 15 primer sets, 13 genes yielded a significant signal in the HNF4α ChIP assay compared to the corresponding negative control IgG (Figure 3). These results indicate that HNF4α binds the Alu elements in the promoter regions of these target genes in vivo.

Figure 3

HNF4α binds Alu elements in vivo. A. HNF4α chromatin immunoprecipitation (ChIP) of HepG2 cells using 16 sets of PCR primers as indicated. Shown is an ethidium-bromide-stained agarose gel of the qPCR products after ~40 cycles for graphical representation only. In, input control of genomic DNA. IgG, control IP with normal rabbit IgG. H4, IP with HNF4α antibody raised in rabbit. Fold change, ratio of the H4 to IgG signal determined by quantitative real time PCR (qPCR). Pos and neg control, regions of the CDKN1A promoter in which HNF4α was shown previously to bind or not, respectively [71]; Alu generic, amplification with generic primers that recognize all Alu elements. Shown are the results from one of two or more independent ChIP experiments performed in duplicate, except for APOM, GSTM4, PRLR and SOCS2 which are from one ChIP experiment. The largest fold change values obtained for a given gene are indicated. B. Schematic diagram of promoters of HNF4α target genes. Diamonds, position of HNF4α binding sites in Alu elements identified in this study; triangles, other HNF4α binding sites predicted by PBM from Bolotin et al. [42]; vertical lines, position of the PCR primers used in the ChIP; arrows, start sites of transcription. See additional file 2: Table S2 for sequences of all the PCR primers.

HNF4α activates transcription from Alu elements

In order to determine whether the binding of HNF4α to the Alu elements observed in vivo could drive transcription, we subcloned into a luciferase reporter construct with a minimal core promoter the PCR fragments containing the Alu element with the HNF4α binding site (HNF4α-Alu element). Three of the genes ChIP'd by HNF4α in HepG2 cells were analyzed -- APOM, TTR and PRODH2. Transient transfection into an HNF4α-responsive cell line (HEK 293T) showed that HNF4α2 significantly transactivates the luciferase constructs in a dose-dependent manner (Figure 4A). While the fold induction was not large (1.5 to 2.7-fold), it was comparable to two reporter constructs containing a single classical HNF4α response element (2.0- and 4.8-fold) (Figure 4B). To determine whether an HNF4α-Alu element could contribute to transcription of a native promoter, we analyzed the APOA4 promoter construct that contained both an HNF4α-Alu element as well as a classical HNF4α response element. The wildtype (WT) promoter was transactivated well by HNF4α (4.9-fold) and mutations in the HNF4α binding site in either the Alu element or the classical response element reduced the transactivation (to 3.4- and 3.2-fold, respectively) (Figure 4C). While the effect of the mutation in the HNF4α-Alu site was not large, it was statistically signficant (p < 0.001) and comparable to the mutation in the classical site. Taken together, these results indicate not only that HNF4α binds Alu elements in the promoters of HNF4α target genes in vivo, but also that this binding can contribute to the overall transcriptional activity of the gene.

Figure 4

HNF4α activates transcription from Alu elements. A. Reporter gene assay with the HNF4α-Alu elements from the indicated human gene promoters fused to a minimal core promoter driving luciferase (pGL4.23). Shown is luciferase activity (relative light units, RLU) normalized to β-gal activity (normalized RLU) from HEK 293T cells transiently transfected with 1 μg of reporter and different amounts of either empty vector or human HNF4α2 expression vector (100, 200, 300 and 500 ng). Reporter constructs contain only the HNF4α-Alu element and immediately adjacent sequence; they do not contain any additional known HNF4α binding sites. Data are the mean normalized RLU of triplicate samples from one representative experiment from two or more that were performed. P-values of the HNF4α2 signal compared to the empty vector are indicated. Fold induction by HNF4α2 compared to the empty vector is indicated. B. As in (A) but with two different reporter constructs containing classical HNF4α response elements (RE-1 and RE-2). Shown is fold induction by 500 ng HNF4α2 compared to the parent construct (pGL4.23). C. As in (A) but of the native human APOA4 promoter (-1343 to +247) fused to luciferase (pGL4.10) without (WT) or with mutations (MUT) in either the HNF4α-Alu element (Alu) or a classical HNF4α site identified by a previous PBM analysis (HNF4α-PBM) (see Figure 3). Shown is the fold induction by 500 ng HNF4α2 compared to the empty expression vector from one experiment performed in six replicates. A second independent experiment performed in triplicate gave similar results. (B) and (C), sequence of the relevant HNF4α binding sites are given with the spacer nucleotide in lower case and mutations in red.

Frequency of HNF4α sites in Alu and non Alu repeats

In order to determine the prevalence of HNF4α binding sites in Alu elements, a search of all the Alu repeats in the human genome (hg18) was performed with the ~1060 (66 +994) sequences bound by HNF4α in PBM3; the vast majority of hits were obtained with the 66 + 8 Alu-derived sequences. Approximately ~750,000 out of ~1,175,000 Alu repeats in Repeat Masker (~64%) were found to contain at least one DNA sequence to which HNF4α bound in PBM3; there was also a substantial number of Alu repeats (~338,000, ~45%) that contained more than one HNF4α binding site (Table 2). All told there were nearly ~1.2 million HNF4α binding sites in Alu repeats in the human genome.

Table 2 Number of Alu repeats with HNF4α binding sites in the human genome (hg 18).

Different families of Alu repeats were found to have different frequencies of HNF4α sites (Table 3) and within a given Alu family there was a range of frequencies (Table 4). There was also a rough negative correlation between the percentage of Alu elements within a given family that contained an HNF4α binding site and the age of the family. The newest Alu family, AluY (~25 Mya), had the greatest percentage of HNF4α sites (~91%); the second newest family, AluS (~30-55 Mya), had the next highest percentage (~75%) and the oldest family, Alu J (~55-65 Mya), had the lowest percentage (~33%) (Table 3) [9]. This correlation held for the precursors to the Alu family as well. FAM (free Alu-like monomer) sequences are Alu precursors that gave rise to FRAM (free right Alu monomer) and FLAM (free left Alu monomer) sequences that eventually joined to create the modern dimeric Alu element [44]. The frequency of the HNF4α binding sites in FAM (0.34%), FRAM (11.84%) and FLAM_C (~33.77%) suggests that the HNF4α sites may have first appeared in Alu-like sequences in the FLAM family. Interestingly, not only does AluJ have a similar frequency of HNF4α sites as FLAM_C, but the HNF4α sites in AluJ are almost exclusively in the left arm at position 31 (Figure 5). In contrast, the newer AluS family has significant secondary sites at positions 62 and 200 while the newest Alu family, AluY, has essentially the same number of HNF4α sites at position 62 as at position 31, although the number of sites at position 200 has remained relatively low. All told, these results suggest that there has been a gain of HNF4α binding sites in Alu elements during the course of evolution. (See see additional file 2: Table S3 for a complete list of Alu repeats with HNF4α binding sites and their frequency in the human genome.)

Table 3 Alu families in human genome (hg18) with HNF4α binding sites.
Table 4 Alu subfamilies in human genome (hg18) with HNF4α binding sites.
Figure 5

Distribution of HNF4α binding sites in Alu repeats. Frequency histogram showing the position of HNF4α binding sites in AluY, AluS, and AluJ families of SINEs as well as the precursors FLAM and FRAM. The approximate age of the repeats is indicated. The slight variation in peaks at positions 31, 62 and 200 are due to small differences in the length of the Alu repeats.

The human genome search also revealed ~54,000 occurrences of HNF4α sites in non Alu repeats (Figure 2A). The non Alu repeat families were numerically dominated by repeats referred to as mammalian interspersed repeats (MIRs), LINE2 elements (L2) and Tigger (Table 5). However, while only ~1% of the MIRs and L2s possess an HNF4α binding site, more than ~20% of Tiggers do. In addition, more than 50% of the SVA family of retrotransposons contain at least one HNF4α site, although this is not surprising since these elements contain a portion of an Alu element (see additional file 2: Table S4 for a complete list of frequencies of HNF4α binding sites in non Alu repeats).

Table 5 Non Alu repeat families in human genome (hg18) with HNF4α binding sites.

Frequency of HNF4α-Alu elements in promoters and DNase hypersensitive sites

Others have shown that the region 5000 bp upstream from the TSS (+1) contains on average 3.63 Alu elements [45]. We analyzed the same promoter region and found that every human gene has on average 2.91 HNF4α-Alu elements, consistent with the overall high proportion of Alu elements with an HNF4α site (Tables 3 and 4). To determine which Alu elements may be accessible, and hence potentially play a role in transcription regulation, we determined the number of HNF4α-Alu elements that reside within DNase hypersensitive regions using datasets from the ENCODE project [46, 47]. Genome-wide 46,129 HNF4α-Alu elements (~6.2% of all HNF4α-Alu's) are within DNase hypersensitive regions across mutliple cell lines, with 5458 genes containing one or more HNF4α-Alu/DNase sites in their 5 kb promoter region. ~7000 HNF4α-Alu elements are in DNase hypersensitive regions in HepG2 cells alone (6212 from Rep Track 1 and 8127 from Rep Track 2). While these findings may be an underestimate due to the difficulty of sequencing through repetitive elements, they nonetheless indicate that while the majority of the ~750,000 HNF4α-Alu elements may not be accessible in most cell types, a sizeable portion of HNF4α-Alu elements are in regions of open chromatin and hence may be transcriptionally active.

Age of Alu repeats in HNF4α target genes

In order to estimate the age of the various HNF4α-Alu elements, we determined the presence of the Alu elements bound by HNF4α in the HepG2 ChIP assay in four sequenced primate genomes - marmoset, rhesus, orangutan and chimpanzee. The first mammalian primate originated ~60 million years ago (Mya). The marmoset monkey branched off from the human lineage ~35 Mya, the rhesus monkey ~25 Mya, the orangutan ~8 Mya and the chimpanzee ~5.5 Mya [48]. The results show that all of the HNF4α-Alu elements examined are older than humans (Figure 6), which is not surprising since only ~ 5,000 or 0.5% of Alus are human-specific [9]. Five of the ChIP'd HNF4α-Alu elements (in ABCC3, ATPIF1, PRLR, TTR, SOD2) were common among all the primate genomes, and thus fairly ancient (> 35 million years old). An additional two elements (in APOA4 and SOCS2) also appear to be about 35 million years old but may have been lost after chimps diverged from the primate lineage. In contrast, five of the HNF4α-Alu elements (in CANX, FEM1A, GSTM4, IP6K2, PRODH2) appear to be somewhat newer (~25 million years old) due to their presence in all primates except marmoset. The two most recent elements (~8 million years old) appear to be in the IL32 and APOM genes since they are found only in orangutan, chimp and human. The AluSq2 element in the IL32 gene, however, could be older due to the fact that an entire region of the chromosome, including the IL32 gene, is missing in rhesus (Figure 7A). In the APOM gene, our ChIP results could not distinguish whether the HNF4α site is in AluJr or AluSg7; it is also curious that the AluSg7 element is only partially missing in both rhesus and marmoset (Figure 7B). The AluSp element in the PRODH2 promoter, in contrast, appears to have entered the primate lineage after the divergence of the marmoset (35 Mya) but before the divergence of the rhesus monkey (25 Mya), consistent with the reported age of the AluS subfamily (30-55 Mya) (Figure 7C). Assuming that the absence of the HNF4α-Alu elements are not due simply to errors in genome assembly and/or misclassification of Alu elements, these results suggest that HNF4α-Alu elements could play a role in differential regulation of these genes in different primate species.

Figure 6

Gene-specific HNF4α-Alu sequences in primate genomes. Presence (+) and absence (-) of the HNF4α-Alu element in the indicated HNF4α target genes as determined by ChIP analysis (Figure 3) in all the sequenced primate genomes. Age in millions of years ago (MYA) of the divergence from the primate lineage is given on the right.

Figure 7

Alu insertions in the promoters of IL32 , APOM and PRODH2 genes. Screen shots from UCSC Genome Browser of human and other indicated primate genomes in the region of the HNF4α-Alu element ChIP'd by HNF4α in the IL32 (A), APOM (B) and PRODH2 (C) genes. Shown from top to bottom in each figure are non Alu HNF4α binding sites from Bolotin et al. [42] (Non Alu PBM); Alu sites from this study (Alu PBM); mRNA from RefSeq track (RNA); HNF4α ChIP signal in HepG2 from the Custom Track by UC Davis ( (ChIP-Seq); DNA sequence conservation in four primate genomes (Rhesus, Marmoset, Orangutan, Chimp); PCR product amplified after ChIP in Fig. 3 (ChIP product); repeats from Repeat Masker 3.2.7 with the relevant Alu sequence indicated (Repeats).


The functional relevance of repetitive DNA such as Alu repeats in the human genome has been debated ever since they were first discovered several decades ago. In this study, we show that the nuclear receptor HNF4α binds Alu-derived 13-mers in vitro as well as Alu elements in the promoters of HNF4α target genes in vivo. We show that HNF4α sites in Alu elements can drive gene expression in luciferase assays and that HNF4α binding sites are found in ~64% of all known Alu repeats in the genome (~1.2 million HNF4α sites in ~750,000 Alu elements). Additionally, we found that while HNF4α sites are predominantly found in Alu repeats, they are also found in other repeats such as SVA elements, which contain a portion of Alu repeat [49], and L2, MIR and Tigger families of retrotransposons.

Functionality of HNF4α-Alu elements

Perhaps the most important question is how many of the HNF4α-Alu elements are functional. Several recent studies suggest that Alu elements may indeed play a role in regulating gene expression: Alu elements are enriched in regions with genes [50], particularly in housekeeping and metabolism genes. However, they are underrepresented in developmental genes [45], suggesting that their presence in those genes may be detrimental. Binding sites for other NRs have also been found in Alu repeats and several of those sites were found to affect transcription [17, 1921]. To determine what types of genes contain HNF4α-Alu elements, we performed a Gene Ontology (GO) analysis of genes enriched with HNF4α-Alu elements (> 8 per 5 kb promoter region) and found RNA processing and transcription regulation genes, as well as macromolecular catabolic processes and complex assembly genes (see additional file 2: Table S6 for a full list of significant GO categories and relevant genes). RNA processing is not a category previously associated with classical HNF4α binding sites, but Alu elements have been found to play a direct role in alternative splicing [51].

In a detailed, genome-wide analysis of functional targets of HNF4α and binding sites, we recently found that only 30% of genes down regulated in an HNF4α RNAi experiment contained a potential classical HNF4α binding site [42]. While the other 70% could be indirect targets, it is also possible that some of those genes are regulated by HNF4α-containing Alu elements, consistent with our finding here that on average every gene in the human genome contains ~2.91 HNF4α-Alu elements within 5000 bp upstream of the TSS. On an individual gene basis, we found that even though the HNF4α binding sites in Alu repeats are not high affinity sites compared to the majority of classical HNF4α sites, they are nonetheless capable of driving the expression of a heterologous gene on their own. In the context of the genome, however, the HNF4α-Alu elements are typically present in conjunction with other TFBS in the promoter, including other HNF4α binding sites, suggesting that they may act in more of a modulatory capacity than as the sole drivers of transcription, as we observed on the APOA4 promoter. These results are similar to those found for other NRs albeit on different binding sites within the Alu elements [1921].

The functionality of HNF4α-Alu elements, as with any potential TFBS, will also depend on the state of the local chromatin and the accessibility of the site to HNF4α. While it has been reported that most Alu repeats in the human genome contain CpG dinucleotides that are methylated [52], potentially rendering them nonfunctional, the Alu elements that are hypomethylated tend to be in promoter regions, suggesting that they are accessible [52, 53]. Indeed, our analysis showed that there may be as many as ~46,000 HNF4α-Alu elements in DNase hypersensitive regions genome-wide, suggesting that they may be accessible for binding and therefore may affect transcription.

Alu repeats as a sink for HNF4α protein?

In addition to affecting transcription directly, it is tempting to speculate that the relatively large number of HNF4α-Alu elements, especially in regions of open chromatin, could act as a sink or reservoir for HNF4α protein. We have estimated by semi-quantitative immunoblotting that there may be as many as 450,000 molecules of HNF4α in the nucleus of an adult mouse hepatocyte (unpublished observation); this estimate is consistent with the fact that we originally had to purify HNF4α only ~5,000 to 10,000-fold from adult rat liver nuclei [54]. Assuming that human hepatocytes have similar levels of HNF4α protein and keeping in mind that HNF4α binds DNA only as a dimer [36], this suggests that the presence of ~7000 to 46,000 HNF4α-Alu elements in accessible regions of the genome would not have a significant impact on the availability of ~225,000 HNF4α protein dimers in a normal adult hepatocyte nucleus. However, conditions that significantly alter the accessibility of the ~750,000 HNF4α-Alu elements genome-wide, or the amount of HNF4α protein, could in theory result in a situation in which the stoichiometry of HNF4α-Alu sites to HNF4α protein is indeed relevant. For example, global loss of DNA methylation has been associated with cancer progression and there is at least one report in which certain Alu elements lose methylation during tumor progression [55]. Likewise, a decrease in the amount of functional HNF4α protein, such as that found in heterozygous MODY1 patients [31], activation of signaling pathways [5661], DNA damage via p53 [62, 63], microRNAs [64], diet [35, 65, 66] and diseases such as colitis and cancer [67, 68] could tip the balance between HNF4α protein and potential binding sites, rendering the notion of Alu elements as a sink of HNF4α potentially relevant. The stoichiometry of HNF4α protein to total HNF4α binding sites may also differ in other tissues and developmental time points [69], which could alter the relevance of HNF4α-Alu elements.


The ~1.2 million HNF4α binding sites in ~750,000 Alu elements in the human genome has the potential to affect the expression of HNF4α target genes. Therefore, it will be important to keep the HNF4α-Alu elements in mind when investigating HNF4α function, especially when using non primates as models for humans and when investigating conditions, such as cancer, where there may be genome-scale alterations in chromatin accessibility. These results join the increasing number of reports of NR and other TF binding sites in Alu or other repeat elements [70] and support the notion that repetitive DNA may be more than just "junk" DNA.


PBM design and analysis

A custom-designed 8x15k Alu PBM (PBM3) containing 8 grids, each of which consisted of ~15,000 spots of DNA, was ordered from Agilent (Figure 1). An in silico Alu library of ~200 DNA sequences was made by extracting every unique 13-mer from every Alu element consensus from the RepBase database ( The human genome (hg18) was searched with the Alu library and the 100 most frequent sites were included on PBM3. The 13-mer Alu library was further searched with the support vector machine (SVM) model described in Bolotin et al [42]. (The SVM is an algorithm trained on sequences bound by HNF4α in the PBM; it predicts the binding HNF4α binding with correlation R2 = 0.76.) The top 100 scoring potential HNF4α binding sites from the SVM search were included on PBM3 for a total of 200-derived Alu sequences. Another 704 sequences were included from permutations of three adjacent positions in every combination of the DR1 consensus (5'-AGGTCAaAGGTCA-3') and 768 sequences from similar permutations of a DR2 consensus (5'-AGGTCAaaAGGTCA-3'). Additionally, 100 randomly generated 13-mers and 50 randomly generated 14-mers were included as negative controls for the DR1s and DR2s, respectively. Finally, an additional 2,061 unique sequences were generated from an SVM search of all human genes for a total of 3802 unique DNA sequences, each of which was replicated 4 times on the PBM for a total of 15,208 DNA spots. The linker and cap sequences were the same as those described in Bolotin et al. [42]. (See additional file 2: Table S5 for a list of all DNA sequences on PBM3 and the corresponding HNF4α binding score.)

Crude nuclear extracts of COS-7 cells transfected with human HNF4α2 or HNF4α8 expression vectors was applied to PBM3 (~400 ng HNF4α protein per grid) and visualized and analyzed as described in Bolotin et al. [42]. The primary antibody was a mouse monoclonal that recognizes the C-terminal region of HNF4α (H1415 from R&D Systems); the secondary was NL-637 anti-Mouse IgG (NL008 from R&D Systems). PBMs were scanned using a GenePix Axon 4000B scanner (Molecular Devices, Sunnyvale, CA) at 543 nm (Cy3) dUTP and 633 nm (Cy5-conjugated secondary antibody). Since there was no significant difference between the HNF4α2 and HNF4α8 isoforms, which differ by ~30 amino acids in the N-terminal region but have identical DNA binding and dimerization/ligand binding domains, the average of the four grids (two with HNF4α2 and two with HNF4α8) were used for the final PBM3 score. The sequences with a score > 0.612 (i.e., 2 SD above the mean of the random controls, p-value < .045) were considered to be HNF4α binders.

ChIP and RNAi Expression Profiling

HNF4α ChIP from HepG2 cells was performed as described in [71]. Quantitative-PCR (qPCR) following the ChIP was performed using BioRad IQ SYBR Green Supermix. Each 23.5-ul reaction included 12.5 ul of Supermix, 0.25 ul of 100 nmol of each primer, 0.5 ul of template and 10 ul of ddH2O. The qPCR was performed as follows: 95°C for 5 min (hot start), followed by 40 cycles 95°C for 30 sec (melt), 30 sec at the melting temperature (Tm) for annealing and extension, followed by a melt curve. The Tm was determined experimentally for each pair of primers by using a temperature gradient qPCR that was visualized on an ethidium bromide-stained agarose gel to control for product size. All qPCR was performed using BioRad iQ5 and myQ5 thermocyclers. (See additional file 2:Table S2 for a complete list of PCR primers giving a positive ChIP signal.) Affymetrix expression profiling data for the HNF4α RNAi knockdown in HepG2 cells were obtained from Bolotin et al. [42].

Luciferase assay

Human embryonic kidney (HEK 293T) cells were plated (0.25 × 106 cells) in 12-well plates. After 24 hr the cells were transfected using Lipofectamine 2000 according to the manufacturer's protocol (Invitrogen), with different amounts of empty vector (pcDNA3) or wild type human HNF4α2 in pcDNA3, 1 μg of the luciferase reporter and 200 ng of a CMV.βgal control. Cells were harvested after 24 hr using Triton lysis buffer (1% Triton X-100, 25 mM Gly-Gly pH 7.8, 15 mM MgSO4, 4 mM EGTA, 1 mM DTT). Luciferase and β-gal activity were measured as described earlier [62]. Significant differences in luciferase activity between cells transfected with empty vector or human HNF4α2 were determined by the Student's t-test. APOM, PRODH2 and TTR luciferase constructs were created by cloning PCR products of the Alu elements in the respective promoters into pGL4.23 (Promega): the APOM construct used SfiI restriction sites and the PRODH2 and TTR constructs used NheI and KpnI sites. The APOA4.Luc construct was made by cloning a PCR product from the human APOA4 promoter (-1343 to +247) into the pGL4.10 vector (Promega) at HindIII and NheI sites. Site-directed mutations were introduced into the HNF4α binding sites in the Alu and PBM elements using the QuikChange kit (Stratagene). Luciferase reporter constructs with classical HNF4α response elements (RE-1 and RE-2) were made by inserting the appropriate synthetic oligonucleotides into pGL4.23. All constructs were sequence verified. (See additional file 2: Table S2 for the sequence of the PCR primers and oligonucleotides used in the constructions.)

Bioinformatic searches

Searches of human genome hg18 downloaded from UCSC Genome Browser ( were conducted using all of the sequences that HNF4α bound in PBM3 using Seqmap [72]. Alu and non Alu repeats with HNF4α sites were identified by comparing the HNF4α genome-wide search results to the repeat coordinates obtained from Repeat Masker Track version 3.2.7 in UCSC Genome Browser. The results were processed using custom Perl scripts and an SQL database. To determine accessibility of HNF4α-Alu sites, we used BEDtools software package [73] to cross reference our list of ~750,000 HNF4α-Alu elements (Table 2) with DNase hypersensitivity tracks in the ENCODE Project in UCSC Genome Bioinformatics, allowing for one nucleotide or more of overlap. We used both the clustered track that contains data from multiple human cell lines ( as well as tracks for two different repetitions of HepG2 cells ( Gene Ontology analysis of genes containing HNF4α-Alu elements was done using DAVID [74]. We used as a cut off eight HNF4α-Alu elements within 5 kb upstream of +1, two SD above the average number of sites (2.91+4.22).


  1. 1.

    Lander ES, Linton LM, Birren B, CN, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.

    CAS  PubMed  Article  Google Scholar 

  2. 2.

    Ohno S: So much "junk" DNA in our genome. Brookhaven Symp Biol. 1972, 23: 366-370.

    CAS  PubMed  Google Scholar 

  3. 3.

    van de Lagemaat LN, Landry JR, Mager DL, Medstrand P: Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 2003, 19 (10): 530-536. 10.1016/j.tig.2003.08.004.

    CAS  PubMed  Article  Google Scholar 

  4. 4.

    Davidson EH, Britten RJ: Organization, transcription, and regulation in the animal genome. Q Rev Biol. 1973, 48 (4): 565-613. 10.1086/407817.

    CAS  PubMed  Article  Google Scholar 

  5. 5.

    Orgel LE, Crick FH: Selfish DNA: the ultimate parasite. Nature. 1980, 284 (5757): 604-607. 10.1038/284604a0.

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Ullu E, Tschudi C: Alu sequences are processed 7SL RNA genes. Nature. 1984, 312 (5990): 171-172. 10.1038/312171a0.

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Rubin CM, Houck CM, Deininger PL, Friedmann T, Schmid CW: Partial nucleotide sequence of the 300-nucleotide interspersed repeated human DNA sequences. Nature. 1980, 284 (5754): 372-374. 10.1038/284372a0.

    CAS  PubMed  Article  Google Scholar 

  8. 8.

    Houck CM, Rinehart FP, Schmid CW: A ubiquitous family of repeated DNA sequences in the human genome. J Mol Biol. 1979, 132 (3): 289-306. 10.1016/0022-2836(79)90261-4.

    CAS  PubMed  Article  Google Scholar 

  9. 9.

    Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet. 2002, 3 (5): 370-379. 10.1038/nrg798.

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Liu GE, Alkan C, Jiang L, Zhao S, Eichler EE: Comparative analysis of Alu repeats in primate genomes. Genome Res. 2009, 19 (5): 876-885. 10.1101/gr.083972.108.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Deininger PL, Batzer MA: Alu repeats and human disease. Mol Genet Metab. 1999, 67 (3): 183-193. 10.1006/mgme.1999.2864.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    Kreahling J, Graveley BR: The origins and implications of Aluternative splicing. Trends Genet. 2004, 20 (1): 1-4. 10.1016/j.tig.2003.11.001.

    CAS  PubMed  Article  Google Scholar 

  13. 13.

    Humphrey GW, Englander EW, Howard BH: Specific binding sites for a pol III transcriptional repressor and pol II transcription factor YY1 within the internucleosomal spacer region in primate Alu repetitive elements. Gene Expr. 1996, 6 (3): 151-168.

    CAS  PubMed  Google Scholar 

  14. 14.

    Oei S-L, Babich VS, Kazakov VI, Usmanova NM, Kropotov AV, Tomilin NV: Clusters of regulatory signals for RNA polymerase II transcription associated with Alu family repeats and CpG islands in human promoters. Genomics. 2004, 83 (5): 873-882. 10.1016/j.ygeno.2003.11.001.

    CAS  PubMed  Article  Google Scholar 

  15. 15.

    Cui F, Sirotin MV, Zhurkin VB: Impact of Alu repeats on the evolution of human p53 binding sites. Biol Direct. 2011, 6 (1): 2-10.1186/1745-6150-6-2.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Thornburg BG, Gotea V, Makaowski W: Transposable elements as a significant source of transcription regulating signals. Gene. 2006, 365: 104-110.

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Laperriere D, Wang T-T, White JH, Mader S: Widespread Alu repeat-driven expansion of consensus DR2 retinoic acid response elements during primate evolution. BMC Genomics. 2007, 8: 23-10.1186/1471-2164-8-23.

    PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Norris J, Fan D, Aleman C, Marks JR, Futreal PA, Wiseman RW, Iglehart JD, Deininger PL, McDonnell DP: Identification of a new subclass of Alu DNA repeats which can function as estrogen receptor-dependent transcriptional enhancers. J Biol Chem. 1995, 270 (39): 22777-22782. 10.1074/jbc.270.39.22777.

    CAS  PubMed  Article  Google Scholar 

  19. 19.

    Mason CE, Shu FJ, Wang C, Session RM, Kallen RG, Sidell N, Yu T, Liu MH, Cheung E, Kallen CB: Location analysis for the estrogen receptor-alpha reveals binding to diverse ERE sequences and widespread binding within repetitive DNA elements. Nucleic Acids Res. 2010, 38 (7): 2355-2368. 10.1093/nar/gkp1188.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Jacobsen BM, Jambal P, Schittone SA, Horwitz KB: ALU repeats in promoters are position-dependent co-response elements (coRE) that enhance or repress transcription by dimeric and monomeric progesterone receptors. Mol Endocrinol. 2009, 23 (7): 989-1000. 10.1210/me.2009-0048.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Gombart AF, Saito T, Koeffler HP: Exaptation of an ancient Alu short interspersed element provides a highly conserved vitamin D-mediated innate immune response in humans and primates. BMC Genomics. 2009, 10: 321-10.1186/1471-2164-10-321.

    PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Britten RJ: DNA sequence insertion and evolutionary variation in gene regulation. Proceedings of the National Academy of Sciences of the United States of America. 1996, 93 (18): 9374-9377. 10.1073/pnas.93.18.9374.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Bolotin E, Schnabl J, Sladek F: HNF4A (Homo sapiens). Transcription Factor Encyclopedia. 2009, []

    Google Scholar 

  24. 24.

    Hayhurst GP, Lee YH, Lambert G, Ward JM, Gonzalez FJ: Hepatocyte nuclear factor 4alpha (nuclear receptor 2A1) is essential for maintenance of hepatic gene expression and lipid homeostasis. Mol Cell Biol. 2001, 21: 1393-1403. 10.1128/MCB.21.4.1393-1403.2001.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Watt AJ, Garrison WD, Duncan SA: HNF4: a central regulator of hepatocyte differentiation and function. Hepatology. 2003, 37: 1249-1253. 10.1053/jhep.2003.50273.

    CAS  PubMed  Article  Google Scholar 

  26. 26.

    Babeu JP, Darsigny M, Lussier CR, Boudreau F: Hepatocyte nuclear factor 4alpha contributes to an intestinal epithelial phenotype in vitro and plays a partial role in mouse intestinal epithelium differentiation. Am J Physiol Gastrointest Liver Physiol. 2009, 297 (1): G124-134. 10.1152/ajpgi.90690.2008.

    CAS  PubMed  Article  Google Scholar 

  27. 27.

    Cattin AL, Le Beyec J, Barreau F, Saint-Just S, Houllier A, Gonzalez FJ, Robine S, Pincon-Raymond M, Cardot P, Lacasa M, et al: Hepatocyte nuclear factor 4alpha, a key factor for homeostasis, cell architecture, and barrier function of the adult intestinal epithelium. Mol Cell Biol. 2009, 29 (23): 6294-6308. 10.1128/MCB.00939-09.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Darsigny M, Babeu JP, Dupuis AA, Furth EE, Seidman EG, Levy E, Verdu EF, Gendron FP, Boudreau F: Loss of hepatocyte-nuclear-factor-4alpha affects colonic ion transport and causes chronic inflammation resembling inflammatory bowel disease in mice. PLoS One. 2009, 4 (10): e7609-10.1371/journal.pone.0007609.

    PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Yamagata K, Furuta H, Oda N, Kaisaki PJ, Menzel S, Cox NJ, Fajans SS, Signorini S, Stoffel M, Bell GI: Mutations in the hepatocyte nuclear factor-4alpha gene in maturity-onset diabetes of the young (MODY1). Nature. 1996, 384 (6608): 458-460. 10.1038/384458a0.

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Barrett JC, Lee JC, Lees CW, Prescott NJ, Anderson CA, Phillips A, Wesley E, Parnell K, Zhang H, Drummond H, et al: Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nat Genet. 2009, 41 (12): 1330-1334. 10.1038/ng.483.

    CAS  PubMed  Article  Google Scholar 

  31. 31.

    Ryffel GU: Mutations in the human genes encoding the transcription factors of the hepatocyte nuclear factor (HNF)1 and HNF4 families: functional and pathological consequences. J Mol Endocrinol. 2001, 27 (1): 11-29. 10.1677/jme.0.0270011.

    CAS  PubMed  Article  Google Scholar 

  32. 32.

    Sladek F, Seidel S: Hepatocyte nuclear factor 4 alpha. Nuclear Receptors and Genetic Diseases. 2001, London: Academic Press, 309-361.

    Google Scholar 

  33. 33.

    Overington JP, Al-Lazikani B, Hopkins AL: How many drug targets are there?. Nat Rev Drug Discov. 2006, 5 (12): 993-996. 10.1038/nrd2199.

    CAS  PubMed  Article  Google Scholar 

  34. 34.

    Hwang-Verslues WW, Sladek FM: HNF4alpha--role in drug metabolism and potential drug target?. Curr Opin Pharmacol. 2010, 10 (6): 698-705. 10.1016/j.coph.2010.08.010.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Yuan X, Ta TC, Lin M, Evans JR, Dong Y, Bolotin E, Sherman MA, Forman BM, Sladek FM: Identification of an endogenous ligand bound to a native orphan nuclear receptor. PLoS ONE. 2009, 4: e5609-10.1371/journal.pone.0005609.

    PubMed  PubMed Central  Article  Google Scholar 

  36. 36.

    Jiang G, Nepomuceno L, Hopkins K, Sladek FM: Exclusive homodimerization of the orphan receptor hepatocyte nuclear factor 4 defines a new subclass of nuclear receptors. Mol Cell Biol. 1995, 15: 5131-5143.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Sladek FM: What are nuclear receptor ligands?. Mol Cell Endocrinol. 2011, 334 (1-2): 3-13. 10.1016/j.mce.2010.06.018.

    CAS  PubMed  Article  Google Scholar 

  38. 38.

    Bridgham JT, Eick GN, Larroux C, Deshpande K, Harms MJ, Gauthier ME, Ortlund EA, Degnan BM, Thornton JW: Protein evolution by molecular tinkering: diversification of the nuclear receptor superfamily from a ligand-dependent ancestor. PLoS Biol. 2010, 8 (10):

  39. 39.

    Ellrott K, Yang C, Sladek FM, Jiang T: Identifying transcription factor binding sites through Markov chain optimization. Bioinformatics (Oxford, England). 2002, 18 (Suppl 2): S100-109. 10.1093/bioinformatics/18.suppl_2.S100.

    Article  Google Scholar 

  40. 40.

    Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, Murray HL, Volkert TL, Schreiber J, Rolfe PA, Gifford DK, et al: Control of pancreas and liver gene expression by HNF transcription factors. Science. 2004, 303: 1378-1381. 10.1126/science.1089769.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41.

    Odom DT, Dowell RD, Jacobsen ES, Nekludova L, Rolfe PA, Danford TW, Gifford DK, Fraenkel E, Bell GI, Young RA: Core transcriptional regulatory circuitry in human hepatocytes. Mol Syst Biol. 2006, 2: 2006 0017-

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Bolotin E, Liao H, Ta TC, Yang C, Hwang-Verslues W, Evans JR, Jiang T, Sladek FM: Integrated approach for the identification of human hepatocyte nuclear factor 4alpha target genes using protein binding microarrays. Hepatology. 2010, 51 (2): 642-653. 10.1002/hep.23357.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. 43.

    Jiang G, Sladek FM: The DNA binding domain of hepatocyte nuclear factor 4 mediates cooperative, specific binding to DNA and heterodimerization with the retinoid × receptor alpha. J Biol Chem. 1997, 272: 1218-1225. 10.1074/jbc.272.2.1218.

    CAS  PubMed  Article  Google Scholar 

  44. 44.

    Quentin Y: Origin of the Alu family: a family of Alu-like monomers gave birth to the left and the right arms of the Alu elements. Nucleic Acids Res. 1992, 20 (13): 3397-3401. 10.1093/nar/20.13.3397.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Polak P, Domany E: Alu elements contain many binding sites for transcription factors and may play a role in regulation of developmental processes. BMC Genomics. 2006, 7: 133-10.1186/1471-2164-7-133.

    PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A, et al: Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat Methods. 2006, 3 (7): 511-518. 10.1038/nmeth890.

    CAS  PubMed  Article  Google Scholar 

  47. 47.

    Sabo PJ, Hawrylycz M, Wallace JC, Humbert R, Yu M, Shafer A, Kawamoto J, Hall R, Mack J, Dorschner MO, et al: Discovery of functional noncoding elements by digital analysis of chromatin structure. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101 (48): 16837-16842. 10.1073/pnas.0407387101.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Goodman M: The genomic record of Humankind's evolutionary roots. Am J Hum Genet. 1999, 64 (1): 31-39. 10.1086/302218.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Ostertag EM, Goodier JL, Zhang Y, Kazazian HH: SVA elements are nonautonomous retrotransposons that cause disease in humans. Am J Hum Genet. 2003, 73 (6): 1444-1451. 10.1086/380207.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Grover D, Mukerji M, Bhatnagar P, Kannan K, Brahmachari SK: Alu repeat analysis in the complete human genome: trends and variations with respect to genomic composition. Bioinformatics (Oxford, England). 2004, 20 (6): 813-817. 10.1093/bioinformatics/bth005.

    CAS  Article  Google Scholar 

  51. 51.

    Kreahling J, Graveley BR: The origins and implications of Aluternative splicing. Trends Genet. 2004, 20 (1): 1-4. 10.1016/j.tig.2003.11.001.

    CAS  PubMed  Article  Google Scholar 

  52. 52.

    Xie H, Wang M, Bonaldo Mde F, Smith C, Rajaram V, Goldman S, Tomita T, Soares MB: High-throughput sequence-based epigenomic analysis of Alu repeats in human cerebellum. Nucleic Acids Res. 2009, 37 (13): 4331-4340. 10.1093/nar/gkp393.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    Xie H, Wang M, de Andrade A, Bonaldo Mde F, Galat V, Arndt K, Rajaram V, Goldman S, Tomita T, Soares MB: Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Res. 2011, 39 (10): 4099-4108. 10.1093/nar/gkr017.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. 54.

    Sladek FM, Zhong WM, Lai E, Darnell JE: Liver-enriched transcription factor HNF-4 is a novel member of the steroid hormone receptor superfamily. Genes Dev. 1990, 4 (12B): 2353-2365. 10.1101/gad.4.12b.2353.

    CAS  PubMed  Article  Google Scholar 

  55. 55.

    Xie H, Wang M, Bonaldo Mde F, Rajaram V, Stellpflug W, Smith C, Arndt K, Goldman S, Tomita T, Soares MB: Epigenomic analysis of Alu repeats in human ependymomas. Proceedings of the National Academy of Sciences of the United States of America. 2010, 107 (15): 6952-6957. 10.1073/pnas.0913836107.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Sun K, Montana V, Chellappa K, Brelivet Y, Moras D, Maeda Y, Parpura V, Paschal BM, Sladek FM: Phosphorylation of a conserved serine in the deoxyribonucleic acid binding domain of nuclear receptors alters intracellular localization. Mol Endocrinol. 2007, 21 (6): 1297-1311. 10.1210/me.2006-0300.

    CAS  PubMed  Article  Google Scholar 

  57. 57.

    Xie X, Liao H, Dang H, Pang W, Guan Y, Wang X, Shyy JY, Zhu Y, Sladek FM: Down-regulation of hepatic HNF4alpha gene expression during hyperinsulinemia via SREBPs. Mol Endocrinol. 2009, 23 (4): 434-443. 10.1210/me.2007-0531.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  58. 58.

    Hong YH, Varanasi US, Yang W, Leff T: AMP-activated protein kinase regulates HNF4alpha transcriptional activity by inhibiting dimer formation and decreasing protein stability. J Biol Chem. 2003, 278 (30): 27495-27501. 10.1074/jbc.M304112200.

    CAS  PubMed  Article  Google Scholar 

  59. 59.

    Leclerc I, Lenzner C, Gourdon L, Vaulont S, Kahn A, Viollet B: Hepatocyte nuclear factor-4alpha involved in type 1 maturity-onset diabetes of the young is a novel target of AMP-activated protein kinase. Diabetes. 2001, 50 (7): 1515-1521. 10.2337/diabetes.50.7.1515.

    CAS  PubMed  Article  Google Scholar 

  60. 60.

    Viollet B, Kahn A, Raymondjean M: Protein kinase A-dependent phosphorylation modulates DNA-binding activity of hepatocyte nuclear factor 4. Mol Cell Biol. 1997, 17 (8): 4208-4219.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. 61.

    Hatzis P, Kyrmizi I, Talianidis I: Mitogen-activated protein kinase-mediated disruption of enhancer-promoter communication inhibits hepatocyte nuclear factor 4alpha expression. Mol Cell Biol. 2006, 26 (19): 7017-7029. 10.1128/MCB.00297-06.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  62. 62.

    Maeda Y, Seidel SD, Wei G, Liu X, Sladek FM: Repression of hepatocyte nuclear factor 4alpha tumor suppressor p53: involvement of the ligand-binding domain and histone deacetylase activity. Mol Endocrinol. 2002, 16 (2): 402-410. 10.1210/me.16.2.402.

    CAS  PubMed  Google Scholar 

  63. 63.

    Maeda Y, Hwang-Verslues WW, Wei G, Fukazawa T, Durbin ML, Owen LB, Liu X, Sladek FM: Tumour suppressor p53 down-regulates the expression of the human hepatocyte nuclear factor 4alpha (HNF4alpha) gene. Biochem J. 2006, 400 (2): 303-313. 10.1042/BJ20060614.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  64. 64.

    Takagi S, Nakajima M, Kida K, Yamaura Y, Fukami T, Yokoi T: MicroRNAs regulate human hepatocyte nuclear factor 4alpha, modulating the expression of metabolic enzymes and cell cycle. J Biol Chem. 2010, 285 (7): 4415-4422. 10.1074/jbc.M109.085431.

    CAS  PubMed  Article  Google Scholar 

  65. 65.

    Selva DM, Hogeveen KN, Innis SM, Hammond GL: Monosaccharide-induced lipogenesis regulates the human hepatic sex hormone-binding globulin gene. J Clin Invest. 2007, 117 (12): 3979-3987.

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Chiang JY: Regulation of bile acid synthesis: pathways, nuclear receptors, and mechanisms. J Hepatol. 2004, 40 (3): 539-551. 10.1016/j.jhep.2003.11.006.

    CAS  PubMed  Article  Google Scholar 

  67. 67.

    Tanaka T, Jiang S, Hotta H, Takano K, Iwanari H, Sumi K, Daigo K, Ohashi R, Sugai M, Ikegame C, et al: Dysregulated expression of P1 and P2 promoter-driven hepatocyte nuclear factor-4alpha in the pathogenesis of human cancer. J Pathol. 2006, 208: 662-672. 10.1002/path.1928.

    CAS  PubMed  Article  Google Scholar 

  68. 68.

    Ahn SH, Shah YM, Inoue J, Morimura K, Kim I, Yim S, Lambert G, Kurotani R, Nagashima K, Gonzalez FJ, et al: Hepatocyte nuclear factor 4alpha in the intestinal epithelial cells protects against inflammatory bowel disease. Inflamm Bowel Dis. 2008, 14 (7): 908-920. 10.1002/ibd.20413.

    PubMed  PubMed Central  Article  Google Scholar 

  69. 69.

    Briancon N, Weiss MC: In vivo role of the HNF4alpha AF-1 activation domain revealed by exon swapping. Embo J. 2006, 25 (6): 1253-1262. 10.1038/sj.emboj.7601021.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  70. 70.

    Bourque G, Leong B, Vega VB, Chen X, Lee YL, Srinivasan KG, Chew JL, Ruan Y, Wei CL, Ng HH, et al: Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 2008, 18 (11): 1752-1762. 10.1101/gr.080663.108.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  71. 71.

    Hwang-Verslues WW, Sladek FM: Nuclear receptor hepatocyte nuclear factor 4alpha1 competes with oncoprotein c-Myc for control of the p21/WAF1 promoter. Mol Endocrinol. 2008, 22: 78-90.

    CAS  PubMed  Article  Google Scholar 

  72. 72.

    Jiang H, Wong WH: SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics (Oxford, England). 2008, 24 (20): 2395-2396. 10.1093/bioinformatics/btn429.

    CAS  Article  Google Scholar 

  73. 73.

    Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England). 2010, 26 (6): 841-842. 10.1093/bioinformatics/btq033.

    CAS  Article  Google Scholar 

  74. 74.

    Huang da W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4 (1): 44-57.

    PubMed  Article  Google Scholar 

  75. 75.

    Jurka J: Evolutionary impact of human Alu repetitive elements. Curr Opin Genet Dev. 2004, 14 (6): 603-608. 10.1016/j.gde.2004.08.008.

    CAS  PubMed  Article  Google Scholar 

  76. 76.

    Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements and funding

We thank D. Mane-Padros and L. Vuong for the luciferase constructs with classical HNF4α response elements and B. Fang for predicting mutations in HNF4α binding sites. This work was funded by a PhRMA Foundation fellowship to EB, and grants to FMS from the UCR Institute for Integrative Genome Biology and the NIH (R21 MH087397, R01 DK053892). KC, W H-V, CY and JMS were supported by NIH R01 DK053892. EB was supported by NIH R21 MH087397. The funding bodies did not have any role in the study design, data collection, manuscript preparation or submission.

Author information



Corresponding author

Correspondence to Frances M Sladek.

Additional information

Authors' contributions

EB designed and carried out the PBM, designed the primers for the ChIP and carried out the PCR, made luciferase constructs, performed all the bioinformatics analysis and drafted the manuscript; KC carried out the luciferase assays and helped with figures; W H-V carried out the ChIP assay; CY made the initial observation of HNF4 sites in Alu elements; JMS made luciferase reporter constructs and mutants; FMS was involved in all aspects of the design of the experiments, analysis of the results and preparation of the manuscript. All authors proof-read the manuscript.

Electronic supplementary material


Additional file 1:Overrepresentation of Alu-related HNF4α binding motif (H4.141) in the human genome. Frequency profile of 217 HNF4α binding sites identified by gel shift assays and derived from the literature in the human and mouse genomes. (PDF 68 KB)


Additional file 2:Supplementary Tables. Six tables with frequencies of HNF4α binding sites in repetitive DNA, complete PBM and GO results, and primers used in this study. (XLS 2 MB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Bolotin, E., Chellappa, K., Hwang-Verslues, W. et al. Nuclear Receptor HNF4α Binding Sequences are Widespread in Alu Repeats. BMC Genomics 12, 560 (2011).

Download citation


  • HepG2 Cell
  • Position Weight Matrice
  • APOA4 Promoter
  • APOM Gene
  • DNase Hypersensitive Region