Imprinted genes show unique patterns of sequence conservation
© Hutter et al; licensee BioMed Central Ltd. 2010
Received: 13 May 2010
Accepted: 22 November 2010
Published: 22 November 2010
Skip to main content
© Hutter et al; licensee BioMed Central Ltd. 2010
Received: 13 May 2010
Accepted: 22 November 2010
Published: 22 November 2010
Genomic imprinting is an evolutionary conserved mechanism of epigenetic gene regulation in placental mammals that results in silencing of one of the parental alleles. In order to decipher interactions between allele-specific DNA methylation of imprinted genes and evolutionary conservation, we performed a genome-wide comparative investigation of genomic sequences and highly conserved elements of imprinted genes in human and mouse.
Evolutionarily conserved elements in imprinted regions differ from those associated with autosomal genes in various ways. Whereas for maternally expressed genes strong divergence of protein-encoding sequences is most prominent, paternally expressed genes exhibit substantial conservation of coding and noncoding sequences. Conserved elements in imprinted regions are marked by enrichment of CpG dinucleotides and low (TpG+CpA)/(2·CpG) ratios indicate reduced CpG deamination. Interestingly, paternally and maternally expressed genes can be distinguished by differences in G+C and CpG contents that might be associated with unusual epigenetic features. Especially noncoding conserved elements of paternally expressed genes are exceptionally G+C and CpG rich. In addition, we confirmed a frequent occurrence of intronic CpG islands and observed a decelerated degeneration of ancient LINE-1 repeats. We also found a moderate enrichment of YY1 and CTCF binding sites in imprinted regions and identified several short sequence motifs in highly conserved elements that might act as additional regulatory elements.
We discovered several novel conserved DNA features that might be related to allele-specific DNA methylation. Our results hint at reduced CpG deamination rates in imprinted regions, which affects mostly noncoding conserved elements of paternally expressed genes. Pronounced differences between maternally and paternally expressed genes imply specific modes of evolution as a result of differences in epigenetic features and a special response to selective pressure. In addition, our data support the potential role of intronic CpG islands as epigenetic key regulatory elements and suggest that evolutionary conserved LINE-1 elements fulfill regulatory functions in imprinted regions.
Imprinted genes are monoallelically expressed in a parent-of-origin way, i.e. one of the two alleles is silenced depending on its parental origin. They are often found in clusters around differentially methylated regions (DMRs) that are characterized by hypermethylated DNA on one chromosome but hypomethylated DNA on the other [1, 2]. The specific DNA methylation patterns are established during germ cell development and maintained after fertilization [3–5]. In human and mouse, a steadily growing number of approximately 100 imprinted genes have been identified to date [6, 7]. It is estimated that a few hundred genes may be subject to imprinting [8–10].
In order to decipher the particular epigenetic properties that distinguish imprinted genes from the majority of genes that are biallelically expressed, their DNA sequences have been intensely analyzed [11–16]. The major aim of such studies was to identify DNA sequence features that support the establishment and maintenance of allele-specific modifications. One of the most immediate findings was that repetitive elements show a particular behavior in imprinted regions: Short interspersed transposable elements (SINEs) are reduced in the vicinity of human and mouse imprinted genes whereas long ones (LINEs, especially of the L1 subfamily), long terminal repeats, simple repeats, and low complexity regions as well as tandem repeats occur more frequently. In combination with other sequence features, the distinct distribution of repetitive elements has subsequently been used to predict putative imprinted genes in the mouse and human genomes [9, 10].
Imprinted gene expression in mammalian species is strongly conserved in the sense that the orthologs of most imprinted genes are also monoallelically expressed in other species. For this reason, one might expect that also DMRs, which represent the key regulatory elements in imprinted regions, exhibit a strong conservation of their DNA sequences. Interestingly, this is not the case. Instead, a common conserved feature of functionally orthologous DMRs is the presence of tandem repeats that can be composed of highly divergent motifs in the individual species [17, 18]. Indeed, detailed analyses revealed that CpG islands associated with imprinted genes contain more frequently tandem repeats than the CpG islands of randomly selected genes . Thus, for identification of imprinting centers rather the presence of tandem repeats than conservation of the DNA sequence appears to be a useful indicator. Nevertheless, highly conserved elements outside of genes or CpG islands have been identified in imprinted regions [17, 19] and some of these elements have been shown to act as additional regulatory elements such as tissue-specific enhancers .
Among transcription factors and chromatin organizers that bind to specific DNA motifs, CTCF and Yin-Yang 1 (YY1) appear to play prominent roles in genomic imprinting. YY1 has been suggested to recruit histone H3K27 tri-methylase to the repressed allele of imprinted genes . In line with this suggestion, conserved tandem repeats in the DMRs of Peg3 and at the Gnas locus contain YY1 binding sites  and also the Prader-Willi/Angelman Syndrome region is associated with YY1 binding sites . Furthermore, YY1 interacts with CTCF , a methylation-sensitive transcription factor that was shown to inhibit the interaction of the Igf2 promoter with the enhancers downstream of H19 [25, 26] by formation of chromatin loops . In addition, CTCF binding sites have been identified at several other imprinted loci [28–30].
Although detailed studies have addressed CpG islands and repetitive elements, little attention has been paid to the general issue of DNA sequence conservation in imprinted regions, especially of noncoding sequences. This is surprising as monoallelic silencing of imprinted genes is a conserved mechanism of gene regulation suggesting that many of their regulatory elements might be tightly conserved. Moreover, in a recent publication we showed that the monoallelic expression of imprinted genes is associated with unusual conservation patterns of protein-coding sequences, indicating that these genes differ from biallelically expressed genes in their reaction to natural selection .
The function of a gene is determined by the encoded protein or noncoding RNA sequence and its temporal or tissue-specific expression pattern, which is under influence of various regulatory elements such as promoters, enhancers or silencers that reside in noncoding regions. Therefore, differences in response to natural selection between imprinted and non-imprinted genes might result in different conservation patterns of noncoding sequences. In addition, germline specific DNA methylation and histone modifications may influence mutations rates and DNA repair efficiency . Taken together, these factors may result in specific patterns of sequence conservation of regulatory elements in noncoding regions of imprinted genes compared to biallelically expressed genes. Uncovering such differences is one of the important research questions of this study.
Addressing the issue of sequence conservation at imprinted loci, we have compared genomic sequences and highly conserved elements of imprinted genes to those of all autosomal genes in the human and mouse genomes. We found that they show differences in terms of length, conservation, G+C content, and CpG content. Furthermore, imprinted genes seem to be less affected by CpG deamination than other genes, indicating that differential methylation may correspond to either a relative hypomethylation or strong purifying selection at CpG positions. Enrichment of intronic CpG islands and ancient repetitive elements, particularly LINE-1 (L1), indicates that these elements constitute important functional elements of imprinting. Conserved intergenic and intronic regions are enriched in CpG-rich motifs, arguing for an open chromatin structure and possible functions as promoters of antisense or alternative transcripts. In contrast, sequence features of promoter regions suggest that the transcriptional regulation of imprinted genes on the active allele is in general similar to that of biallelically expressed genes.
In order to get a comprehensive picture of DNA sequence properties in imprinted regions, we compared a set of 58 protein-coding imprinted genes to all 17,916 protein-coding autosomal human genes from the UCSC Genome Browser RefSeq genes track . The human imprinted group consists of genes that are orthologous in human and mouse and for which imprinting has been reported in at least one of the two species in the literature. Applying the same procedure for the mouse yielded 18,772 genes on autosomes. The imprinted set in mouse excludes five orthologs that are not annotated as RefSeq genes; additionally, parental expression patterns are different for some orthologs. Information about the imprinted genes and their allele-specific expression is given in additional file 1.
General sequence properties of human genes
G+C content of genes
CpGobs/CpGexp of genes
gene length (bp)
intron length (bp)
length of intergenic regions (bp)
coverage of introns with purely intronic PCSs
purely intronic PCSs per 10 kb of intron per gene
coverage of intergenic regions with PCSs
intergenic PCSs per 10 kb per gene
In the human, intergenic regions assigned to imprinted genes (median 50 kb) are longer than on the autosomal level (Wilcoxon test, p < 0.01), however, in mouse, this size difference represents only a trend (p < 0.05 for mouse). As a consequence of longer introns and intergenic spaces, there is an increased statistical chance of encountering genomic features such as repetitive elements, CpG islands, or conserved elements in the vicinity of imprinted genes.
Sequence conservation in protein-coding regions gives clues about structural and functional conservation of the encoded proteins. In contrast, conserved DNA segments in noncoding sequences may serve as indicators for evolutionarily conserved regulatory elements. Hence, such elements, which that can be up to several hundred base pairs long, are interesting subjects for investigations on relationships between sequence conservation and epigenetic regulation of imprinted genes. Addressing the conservation of DNA sequences on a genome-wide scale, we investigated the phastCons28wayPlacMammal most conserved sequences (PCSs) from the UCSC Genome Browser. These DNA elements are conserved among 18 eutherian mammals and have been identified through multiple sequence alignments of vertebrate genomes . After mapping 1,271,956 PCSs of at least 20 bp length onto autosomal human genes, 3969 PCSs were assigned to the imprinted group. Of these, 2102 belong to the 28 maternally expressed, and 1867 to the 30 paternally expressed genes, respectively. In order to verify that obtained results are not biased by the properties of the human genome, we repeated the analyses for the mouse. From the phastCons30wayPlacMammal track we extracted 1,268,568 highly conserved elements of at least 20 bp, of which 3502 reside in the vicinity of the murine imprinted genes. In the following, we refer in most cases only to the human since we found essentially the same patterns in the mouse (Additional file 2).
Features of different PCS classes in human
number of PCSs
PCSs with ≥ 1 CpG
CpGobs/CpGexp of PCSs with ≥ 1 CpG
(TpG+CpA)/(2·CpG) ratio of PCSs with ≥ 1 CpG
overlapping with CpG islands
We previously introduced an additional estimate of CpG deamination by the (TpG+CpA)/(2·CpG) ratio, which can be regarded as an indicator for CpG to TpG transitions rates . High (TpG+CpA)/(2·CpG) values hint at an accumulation of the potential deamination products of methylated cytosines whereas low values indicate maintenance of CpGs, which might result from reduced methylation levels. In contrast to the CpGobs/CpGexp ratio, the calculation of (TpG+CpA)/(2·CpG) is independent of the G+C content. Regarding the CpG-containing PCSs in imprinted regions, the median (TpG+CpA)/(2·CpG) ratio is lower than on the autosomal level (Wilcoxon test, p < 0.001). This effect is mostly caused by lower (TpG+CpA)/(2·CpG) ratios in PCSs in intronic and intergenic regions. Detailed numbers are given in Table 2.
Enrichment of CpG dinucleotides is typical for CpG islands, which are believed to be epigenetic key regulatory elements. For this study, we used the UCSC annotations of CpG islands that are close to the original criteria established by Gardiner-Garden and Frommer  but are based on a higher CpG content and exclude repetitive elements. Strengthening our previous findings of enrichment of intronic CpG islands in imprinted genes , we found that in human, 15 out of 57 imprinted genes (29.82%) and in mouse, 11 out of 53 (20.75%) possess at least one intronic CpG island that can be regarded as potential promoter for antisense transcripts. This is significantly more than the 8.50% and 3.77% for autosomal human and mouse genes, respectively (χ2 test, p < 0.001).
Addressing the conservation of CpG islands, we observed that eight percent of the PCSs in human imprinted regions overlap with CpG islands whereas the autosomal ratio is only four percent (χ2 test, p < 0.001). In the mouse, the values are six and three percent, respectively (p < 0.001). For both species, the enrichment is most prominent for the group of intronic PCSs. However, CpG islands in imprinted regions do not exhibit special levels of sequence conservation: 66% of the 137 human and 84% of the 64 murine CpG islands overlap with PCSs, which is similar to the autosomal rate of 68% and 86%, respectively (χ2 test, p > 0.8). The percentage by which CpG islands are covered by PCSs as well as their conservation score are virtually identical for all groups (Wilcoxon test, p > 0.8). With a median of 0.69 in human and 0.65 in mouse, the (TpG+CpA)/(2·CpG) ratio of imprinted CpG islands is increased in comparison to the ratio in all autosomal CpG islands (median 0.60 for human, 0.57 for mouse; Wilcoxon test, p < 0.005). In contrast, the CpGobs/CpGexp ratio is not significantly different (p > 0.05). This discrepancy might be due to the fact that CpG islands have to exceed a certain CpGobs/CpGexp ratio threshold by definition. In summary, the CpG richness of PCSs does not coincide with stronger conservation or elevated CpG contents of CpG islands.
Interestingly, especially PCSs that overlap with CpG islands associated with paternally expressed genes have a lower G+C content and lower CpGobs/CpGexp ratio and a higher (TpG+CpA)/(2·CpG) ratio. For maternally expressed genes, we observed opposite patterns, which are however not statistically significant (Table 2). This observation suggests that maternally and paternally expressed genes may differ in their epigenetic marks.
The 130 PCSs of imprinted genes that overlap with L1 elements do not show distinctive features in terms of G+C and CpG content. Nevertheless, when the entire sequences of intergenic L1 elements were investigated, for the imprinted set we observed an elevation of their CpGobs/CpGexp ratio (median 0.14 vs. 0.13; Wilcoxon test, p < 0.0002) and their (TpG+CpA)/(2·CpG) ratio is significantly reduced (median 12.00 vs. 12.70; p < 0.0003), indicating a rather mild loss of CpGs. This is of particular interest when regarding their age distribution: 81% of these L1 elements belong to the ancient L1 M subgroup whose origin predates the mammalian radiation whereas in autosomal intergenic regions, only 76% of the L1 elements belong to the L1 M subgroup (χ2 test, p < 0.001).
Corresponding to the previously reported depletion of SINE elements [11, 16], PCSs overlapping with SINEs are reduced in murine imprinted regions (χ2 test, p < 0.001) but not in human (p > 0.1). PCSs that overlap with other types of repetitive elements do not show significant differences.
The coding parts of exons are of similar length in imprinted and autosomal genes (Wilcoxon test, p > 0.8). Interestingly, those of maternally expressed genes (median 131 bp) tend to be longer (p < 0.02) and those of paternally expressed ones (median 111 bp) are shorter (p < 0.007) than those of the autosomes (median 125 bp). Thus, shorter exons are not responsible for a decreased length of PCSs in maternally expressed genes. The proportions by which PCSs overlap with coding exons are even higher for imprinted genes compared to the rate for all protein-coding human genes (Wilcoxon test, p < 0.0001).
In order to differentiate between the contribution of protein-coding sequences and adjacent intronic parts to PCSs, we separately investigated the subsets of PCSs that are completely located in coding exons. They comprise 51% of those in the imprinted group and 41% of the autosomal ones (χ2 test, p < 0.001). Here, the weak conservation of PCSs in all imprinted genes and in maternally expressed genes was less significant (Wilcoxon test, p < 0.02) and the lengths became similar (p > 0.05). In contrast, PCS that only partially overlap with coding exons are significantly shorter and less conserved, especially in maternally expressed genes (p < 0.0002). Together with the increased exon overlap rate, this implies that intronic sequences near exon boundaries contribute substantially to the differences between PCSs in coding exons of imprinted genes and those of biallelically expressed genes.
Within mature mRNAs, conserved elements in untranslated regions (UTRs) of the mRNAs might act as regulatory elements on pre- and post-transcriptional level. Such elements may be important for RNA stability. Interestingly, the UTRs of imprinted genes seem to be only marginally conserved: Among 6537 PCSs in UTRs, only five belong to imprinted genes (χ2 test, p < 0.001). This low number makes a detailed statistical analysis of these PCSs impossible.
Outside of transcribed regions, conserved elements might influence the promoter activities of nearby genes. After normalization by sequence length, we observed a slightly reduced coverage with PCSs in intergenic regions (Table 1, Additional file 2), but the differences did not reach statistical significance (p > 0.05). Here, all groups showed a highly similar pattern of decreasing PCS content with increasing distance from the next gene (data not shown). In general, gene distance is uncorrelated with conservation score or length of the PCSs (Pearson's r < 0.06) and PCSs in different distance windows do not show consistent differences.
Interestingly, intergenic PCSs assigned to paternally and maternally expressed genes, respectively, differ in terms of their sequence feature from each other: The latter are shorter and have a lower conservation score and G+C content, and a higher CpGobs/CpGexp ratio (Table 2).
Regulatory elements outside of promoter regions are assumed to reside only rarely in exons or repetitive elements. Hence, a possibly function of conserved elements outside of exons, repetitive elements, or CpG island promoters might be regulatory enhancer or silencer functions. In order to address differences in conservation of DNA sequences, G+C and CpG contents of such elements we formed an own class of unique PCSs that do not overlap with exons, repetitive elements, or CpG islands. Also these unique PCS elements show distinguishing features for imprinted genes (Table 2). Elevated G+C content and presence of at least one CpG are characteristic for PCSs assigned to paternally, but not maternally expressed genes. Unique PCSs of both maternally and paternally expressed genes are shorter and possess decreased conservation scores in comparison to unique PCS associated with autosomal genes.
Promoters contain transcription factor binding sites that directly mediate gene expression. Detailed analyses of transcription factor binding sites in the promoter regions of imprinted genes have been reported elsewhere . Therefore, we focused on general sequence patterns. In the promoter region defined as the sequences from -1000 to the most upstream transcriptional start site, 67% of the imprinted genes and 61% of the autosomal ones have at least one PCS (χ2 test, p > 0.4). Also in terms of general sequence features such as G+C content, CpGobs/CpGexp and (TpG+CpA)/(2·CpG) ratios, overlap with CpG islands, and conservation scores of PCSs, promoters of imprinted genes are highly similar to those of all autosomal genes (data not shown).
We next aimed at identifying short sequence motifs that are overrepresented in promoters of imprinted genes compared to both the genomic background and promoters of autosomal genes. Using the program K-Factor , we detected two 6 bp motifs with a significant enrichment (K-Factor score ≥ 3.5) in the regions 1000 bp upstream of the transcriptional start site in human imprinted genes (tgcgta and gcgtat) and seven different ones in mouse imprinted genes (atagcg, atcgca, cgtacg, ctacga, tgcgtg, tgtcga, ttggcg). Indicating their association with CpG islands, all of these motifs share the feature of having a CpG dinucleotide. Furthermore, the occurrence of TpG hints at possible effects of deamination. When scanning the motifs with the TransFac tool Match  we found that two murine motifs correspond well to known transcription factor binding sites, namely CCAAT box (ttggcg) and AhR/Arnt (tgcgtg).
6-mers enriched in intronic PCSs of imprinted genes
score against genomic background
score against autosomal PCSs
score against genomic background
score against autosomal PCSs
matching transcription factor(s)
cg c cg c
gc cg tc
g cg c cg
g cg t cg
As CTCF and YY1 are supposed to act as regulators of imprinted genes [22–26], we analyzed the association of imprinted genes with potential binding sites for these factors in more detail. Focusing on a set of CTCF binding sites that were identified in an unbiased genome-wide analysis , we found CTCF binding sites in the introns of 20 imprinted genes (34.48%), which is a slight enrichment compared to 21.65% of the autosomal genes (χ2 test, p < 0.05). With regard to intergenic regions, 17 imprinted genes (29.31%) and 4734 autosomal genes (26.42%) have a nearby CTCF binding site (p > 0.8). In total, CTCF binding sites are present within or in the vicinity of 55.17% of the human imprinted genes and 40.96% of the autosomal ones (p < 0.05). When requiring these sites to overlap with PCSs, the numbers drop considerably: Only 16 imprinted and 4290 autosomal genes are associated with a conserved CTCF binding site (23.95% or 27.59%, respectively; χ2 test, p > 0.6). Hence, CTCF binding sites are apparently not part of highly conserved regulatory modules.
Predicted YY1 binding sites that are conserved between human, mouse, and rat are found in introns of 16 human imprinted genes, including both previously reported genes and new ones, and 3347 autosomal genes (27.59% vs. 18.68%, p > 0.1). In intergenic regions, 20 imprinted genes (34.48%) possess a YY1 binding site compared to the autosomes with 23.58% (p > 0.05). If all locations are taken into account, the ratio increases to 53.45% for imprinted and 35.52% for autosomal genes (p < 0.01). Since CTCF and YY1 interact physically , a combined occurrence of binding sites for both proteins might be particularly meaningful. This is the case for 34.48% of the human imprinted genes as opposed to 19.66% on autosomes (p < 0.01). However, the unchanged p value indicates that the combination of both binding sites did not result in an increased enrichment. Hence, the co-occurrence of both binding sites is apparently not a prominent feature of imprinted gene regulation.
In this study, we have identified highly conserved DNA elements in imprinted genes and compared them to all autosomal genes in the human and mouse genomes. We observed some characteristic features that appear to be related to their allele-specific DNA methylation. Analyses of general sequence features confirm previous data such as longer introns, enrichment of intronic CpG islands, depletion of SINE repeats, and enrichment of LINE repeats [11–16, 34, 42]. Moreover, imprinted genes are more distant from their neighboring genes. They differ in conservation of protein-encoding and noncoding sequences from autosomal genes. In addition, short sequences that are highly conserved in mammals (PCSs) show differences in the accumulation of CpG mutations that relate sequence conservation to epigenetic features such as DNA methylation. Lastly, paternally and maternally expressed genes can be distinguished by their sequence conservation patterns in coding and noncoding sequences. It should be noted that features such as G+C and CpG content and repetitive elements correlate or anticorrelate with each other. For example, the LINE-1 content is higher in G+C poor sequences than in G+C rich sequences . Hence, an interesting topic of future research might be an in-depth analysis of interactions between different sequence features in imprinted regions.
PCSs in imprinted regions not only show elevated G+C and CpG contents and increased overlap with CpG islands but also a reduced (TpG+CpA)/(2·CpG) ratio, which can be regarded as an indicator for low C to T transitions rates . This effect is, however, not associated with a stronger conservation of CpG islands. Instead, CpG islands of imprinted genes are characterized by elevated (TpG+CpA)/(2·CpG) ratios. This suggests that their methylation levels in the germline and subsequent deamination rates might be higher than those of CpG islands of normal autosomal genes, which are usually unmethylated. In contrast, high G+C and CpG contents outside of CpG islands might result from hypomethylation compared to autosomal genes. Such a scenario is reminiscent of observations for the inactive X chromosome in females that is hypermethylated only at CpG islands but hypomethylated in regions outside of these regulatory elements . Therefore, our observations may indicate allele specific or germline specific DNA methylation outside of the known DMRs of imprinted genes. An alternative explanation for the elevated CpG content might be the need of a certain CpG density in these elements that allows establishment and maintenance of methylation marks.
Our results are stable towards the inclusion of additional imprinted genes: After addition of five new imprinted genes, the p values were found to be stable or even smaller for most differences, except for those of the lengths of intergenic and unique PCSs. Additional analyses based on randomly sampled gene sets confirmed that significant differences between imprinted and autosomal genes are indicated by p values below 0.005 whereas trends with higher p values have to be interpreted with caution (data not shown).
In a previous study, we concluded that the increased divergence of maternally expressed genes occurred most likely due to reduced selective pressure . Here, we show that paternally expressed genes, which are in terms of sequence conservation highly similar to all autosomal genes, have a slightly higher coverage with intronic PCSs. As this effect is most pronounced in exon-near regions, the strict conservation of paternally expressed protein-encoding sequences might be supported by conservation of splice signals. Most interestingly, some classes of PCSs exhibit pronounced differences in terms of G+C and CpG content between maternally and paternally expressed genes. The high CpG content of unique PCSs of paternally expressed genes, however, is not associated with elevated conservation scores. Hence, a conventional conservation of regulatory elements that would coincide with the strong conservation on protein level is probably not the driving force. Lastly, in the parental germlines, paternally and maternally expressed genes might acquire different, temporal methylation marks outside of known DMRs. Such transient marks may later be removed by epigenetic reprogramming processes. Due to different mutation rates and repair efficiencies, differential methylation might result in differences in the evolutionary retention of CpGs in paternally and maternally expressed genes.
Conserved elements in imprinted regions overlap with repetitive elements more frequently than those of all autosomal genes. Taking into account that most SINEs are either primate or rodent specific, it is not surprising that the overlap of PCSs with SINEs in imprinted regions is similarly low as that of the whole autosomal genome. In contrast, more PCSs overlap with L1 elements and these PCSs show elevated conservation scores. Interestingly, the proportion of L1 elements that belong to older classes of repetitive elements is elevated at imprinted loci. However, a rather low number of PCSs overlap with L1 elements. Therefore it is not clear if the elevated level of ancient L1 elements is solely due to stronger conservation, or if increased integration rates in early mammals might have been its major cause as suggested by other studies .
We also evaluated putative transcription factor binding sites in imprinted regions. Imprinted genes exhibit a pronounced divergence in terms of their tissue-specific expression patterns . Thus, it is not surprising that among the overrepresented 6mer motifs in promoters we found only one motif, tgcgtg, which is identical to the consensus sequence of a transcription factor binding site, namely the aryl hydrocarbon receptor nuclear translocator (ARNT), which is also known as hypoxia-inducible factor 1-beta (HIF1-β). A causative linkage between placental hypoxia, pre-eclampsia and misregulation of imprinted genes has been suggested [46, 47]. Hence, ARNT binding sites might be an indicator for placental or embryonic key function of a number of imprinted genes. The fact that the pattern is only overrepresented in the mouse may be related to different placenta morphologies in human and mouse . Our analyses confirm an enrichment of putative conserved YY1 binding sites in imprinted regions whereas experimentally validated CTCF binding sites are only moderately enriched and rarely contain PCSs. By analyzing DNA sequences of PCSs, we identified some 6mer motifs that are overrepresented in both human and mouse imprinted genes and characterized by their high CpG content. Interestingly, especially in intronic PCSs there are C+G rich motifs with similarity to GC boxes, and motifs that are similar to the binding sites of activating transcription factors (ATFs). In line with the enrichment of intronic CpG islands, such sequence motifs may indicate promoter elements for alternative or antisense transcripts that are frequent features of imprinted genes .
In summary, we discovered pronounced differences in the conservation patterns of imprinted and autosomal genes. Changes in CpG densities and evidence for reduced CpG deamination suggest that imprinted genes differ in their DNA methylation patterns from biallelically expressed, not only at previously identified DMRs but also in coding regions, CpG islands and repetitive elements.
From the Otago Catalogue of Imprinted Genes  and the literature we selected 58 genes for which imprinting effects have been observed at least in one species and for which orthologous sequences of human and mouse could be localized with the UCSC Genome Browser  for human hg18 (NCBI build 36.1, March 2006 assembly), and mouse mm9 (NCBI build 37.1, July 2007 assembly). These genes were compared to all RefSeq genes that are located on autosomes. If there were several transcripts for one gene, we took the most 5' annotated transcriptional start site and the most 3' annotated transcriptional termination site to construct the longest possible transcript. Similarly, splice variants and overlapping exons were merged in a way that the largest possible coding regions could be constructed. The genomic sequence that was assigned to a gene contained the transcribed sequence and intergenic regions upstream and downstream of the transcription unit. For determining the intergenic region, the DNA sequence between two genes was cut into two halves, each half was assigned to the nearest gene.
As a set of sequences with high conservation in eutherian mammals, we used the UCSC phastCons28wayPlacMammal most conserved sequences (PCSs). Such highly conserved regions were originally identified from a genome-wide multiple alignment of 29 vertebrate species by the Phast program  and afterwards projected onto a reference genome. The PCSs analyzed here are a subset of these regions showing conservation in 18 eutherian mammals. We assigned them to the longest possible RefSeq transcripts based on the human genome March 2006 assembly (hg18). It may happen that most of the conserved region is absent in human, leaving it anchored to one or a few bases followed by a gap region in the human genome compared to other genomes. Thus, we excluded elements that comprise less than 20 bp in the human genome, thereby reducing the number of PCSs by one third to 1,271,956.
The phastCons30wayPlacMammal most conserved sequences based on the mm9 mouse assembly were analyzed likewise. G+C content, CpGobs/CpGexp as a measure for normalized CpG content, and the (TpG+CpA)/(2·CpG) ratio were calculated for the according human or mouse sequences, respectively. A PCS that resides between transcriptional start site and transcriptional termination site of the respective reference gene was termed intronic if it did not overlap with an exon and coding if it overlapped by at least one base pair with a coding exon. Intergenic PCSs are located between genes and were assigned to the nearest gene.
Using a local installation of the UCSC hg18 and mm9 databases and the bioinformatics tools collection from UCSC, we searched for overlaps of genomic regions and PCSs and transcription factor binding sites that are conserved between human, mouse, and rat (tfbsConsSites). Additionally, using annotations for UCSC we identified overlaps with CpG islands and repetitive elements. We also identified overlaps with experimentally validated CTCF binding sites . In order to possess a certain feature, a PCS had to overlap with its annotation by at least 1 bp.
We performed χ2 tests or Fisher's exact tests to assess whether proportions of features (e.g. relative numbers of PCSs) in the imprinted, maternally expressed or paternally expressed group were significantly higher or lower compared to those in the autosomal group. Wilcoxon tests were applied to test whether the distribution of features (e.g. length of PCSs) differed. Since differing lengths of genomic regions influence the content of PCSs, we divided their number by the summed length of the analyzed sequences per gene. We report raw p values. With a Bonferroni correction for multiple testing, a feature would be highly significant if p < 0.005. However, we also consider p values between 0.01 and 0.005 as moderately significant, and we refer to 0.01<p < 0.05 as indicating a trend.
To test whether including additional genes in the analysis would lead to substantial changes of the results, we repeated the analyses shown in Table 1 and Table 2 including five additional imprinted genes (BLCAP, DLGAP2, PRIM2A, TFPI2, and ZNF597). In order to investigate possible background effects, we compared randomly selected sets of autosomal genes of the same size as the imprinted set (i.e. 58 genes) with that of all autosomal genes. For each feature shown in Table 1 and Table 2, 100 such comparisons were performed. Based on this, we then counted how often randomly selected gene sets reached the same level of significance in the Wilcoxon tests as the imprinted genes.
For investigating the enrichment of sequence motifs, we used K-Factor  with default settings and custom Perl scripts. Sequences of PCSs in intronic or intergenic regions, respectively, were concatenated per gene, separated by 6 Ns each to prevent artificial sequence combinations. Converting repetitive elements to Ns to exclude potential motifs in repeats did not alter the motifs and only marginally influenced their scores. Possible transcription factor binding sites from the TransFac database were identified with the Match tool  using high quality matrices for vertebrate species, the "best selection" profile, matrix similarity = 0.7 and core similarity = 0.75. Since the original 6mers were too short to produce hits, we added five Ns to each end.
ratio of observed number of CpGs to expected number of CpGs
differentially methylated region
long interspersed transposable element
phastCons28wayPlacMammal most conserved sequence
short interspersed transposable element
transcription factor Yin-Yang 1
We would like to thank Katja Schmitt (Saarland University) for critically reading the manuscript. We highly appreciate the work of numerous sequencing and bioinformatics centers that made the data used in this study publicly available. This work was supported by the Deutsche Forschungsgemeinschaft (PA 750/3-1).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.