Genomic regions with distinct genomic distance conservation in vertebrate genomes
© Sun et al; licensee BioMed Central Ltd. 2009
Received: 21 October 2008
Accepted: 27 March 2009
Published: 27 March 2009
A number of vertebrate highly conserved elements (HCEs) have been detected and their genomic interval distances have been reported to be more conserved than protein coding genes among mammalian genomes. A characteristic of the human – non-mammalian comparisons is a bimodal distribution of relative distance difference of conserved consecutive HCE pairs; and it is difficult to attribute such profile to a random assortment. We therefore undertook an analysis of the human genomic regions confined by consecutive HCE pairs common to eight genomes (human, mouse, rat, chicken, frog, zebrafish, tetradon and fugu).
Among HCE pairs, we found that some consistently preserve highly conserved interval distance among genomes while others have relatively low distance conservation. Using a partition method, we detected two groups of inter-HCE regions (IHRs) with distinct distance conservation pattern in vertebrate genomes: IHR1s that are bordered by HCE pairs with relative small distance variation, and IHR2s with larger distance difference values. Compared to random background, annotated repeat sequences are significantly less frequent in IHR1s than IHR2s, which reflects a correlation between repeat sequences and the length expansion of IHRs. Both groups of IHRs are unexpectedly enriched in human indel (i.e. insertion and deletion) polymorphism-variations than random background. The correlation between the percentage of conserved sequence and human IHR length was stronger for IHR1 than IHR2. Both groups of IHRs are significantly enriched for CpG islands.
The data suggest that subsets of HCE pairs may undergo different evolutionary paths in light of their genomic distance conservation, and that sets of genomic regions pertain to HCEs, as well as the region in which HCEs reside, should be treated as integrated domains.
Comparative sequence analysis has become an essential component for studying genome function. Genome-wide comparison of human and rodents DNA sequences discovered more than 5,000 ultraconserved elements (UCEs) of absolute identity and >100 bp in length , of which 77 percent were located outside annotated exons, and long-range comparison between man and pufferfish also identified high numbers of similar genomic elements [2, 3]. A common characteristic of these highly conserved elements is their strong tendency to occur in clusters along the mammalian chromosomes [1–4] with their relative order as conserved as that of protein coding genes . A relative distance difference (RDD) measure was defined to evaluate the genomic distance change ratio of pairs of UCEs found at adjacent positions in the human, mouse and dog genomes, showing that the genomic distance between such elements is significantly more conserved than corresponding genomic distance between orthologous protein coding genes . An intriguing observation is that the conservation of the genomic distance between pairs of UCEs  also exists in the preliminary analyses of evolutionarily more distant vertebrates, and displayed a distinct pattern with a fraction of UCE pairs displaying nearly unaltered genomic distances over long evolutionary distances and a second fraction of UCE pairs with less conserved genomic distances . It is hard to imagine that such genomic distance profile could arise by random as it was consistently found in comparisons between several genomes, and suggests that the genomic distance conservation may take on different characteristics corresponding to different genomic regions.
Genomic sequences have been divided into functional blocks based on various characteristics, e.g. biological roles or sequence conservation. Early research works mainly focused on synteny blocks defined according to conserved DNA sequences or orthologous genes to study the evolutionary history of rearrangements in entire genomes [6–8]. Concordant higher-order patterns from functional genomic studies suggest that the human genome and other large genomes are organized into higher-order functional domains . For example, genomic regulatory blocks (GRBs) spanned by highly conserved noncoding elements, developmental regulatory target gene(s), and phylogenetically and functionally unrelated "bystander" genes, are conserved between mammal and fishes . The identification and study of these higher-order functional domains within the human genome is of great importance. Our early observation of distinct two-peak distribution profiles in genomic distance conservation suggested that some other aspect of the DNA sequences adjacent to the UCE pairs may likewise be different, and might thus be treated as two distinct genomic regions or blocks .
Different sets of highly conserved elements (HCEs) vary somewhat with respect to their locations in the human genome, some are exclusively found in non-protein-coding sequence (e.g. CNEs  and UCRs ) and others contain exons of protein-coding genes (UCEs ). Though it has been suggested that exonic UCEs represent a distinct subset , no satisfactory explanations for the extreme degree of sequence conservation of exonic UCEs have been presented. In our previous study , we have shown that with respect to distance conservation there were no substantial differences between different data sets (i.e. UCEs , CNEs  and UCRs ) or between UCE pairs with exonic and non-exonic locations. We have therefore integrated the three datasets [1–3] and undertaken an analysis of the human genomic regions confined by HCE pairs common to eight genomes (human, mouse, rat, chicken, frog, zebrafish, tetradon and fugu).
In our previous work, we calculated a relative distance difference (RDD ; Methods) value to assess whether the genomic distances between consecutive UCEs show less change than that between other adjacent genomic elements (i.e. genes and exons) in the human, mouse and dog genomes. The analysis showed that in addition to an extreme level of sequence conservation, UCEs also display strong conservation of mutual genomic distances among mammalian species . The conservation of distance between pairs of UCEs  is also found between evolutionarily more distant vertebrates, but the distributions of RDD values show a persistent nature of distinct two-peak profiles in all mammal – non-mammal comparisons, with one peak close to zero, and another at a more negative value . Low number of UCEs  was used in the previous analysis therefore a large data set is warranted to validate these findings.
To facilitate this investigation, we constructed a dataset integrated from three independent works [1–3]. A direct element by element comparison shows that two-thirds of the non-exonic UCEs from data set  do not overlap with HCEs from any of the two other data sets [see Additional file 1]. The smallest data set of ~1,400 conserved non-coding elements (CNEs)  had the highest fraction of overlaps (~80%) to data set  and , whereas the set of ultraconserved regions (UCRs)  has ~50% overlaps with others. We combined these three published data sets [1–3] to form an integrated data set consisting of 7,570 distinct highly conserved elements (HCEs) in the human genome. We used BLASTn with non-stringent parameters and criteria for order and genomic distance conservation to locate all occurrences of the same HCEs in the mouse, rat, chicken, frog, zebrafish, fugu and tetraodon genomes [see details in Methods; Additional file 2]. The resulted number of orthologous HCEs that can be located uniquely in the different genomes is variable: more than 95 percent of human HCEs could be anchored to the rodent genomes, 71 percent to the chicken genome, and around 24 to 30 percent in fish [see Additional file 3]. From the comparisons with the human genome more than 99 percent of HCEs were found linked with at least one other HCE/HCEs in all other genomes, including the linkage relationship with quite a number of HCEs in the fish genomes. More than sixty percent of HCEs were found ordered together with at least 5 individual elements [see Additional file 4], which indicates the tendency for HCEs to preserve order conservation among vertebrate species.
We calculated RDD values between pairs of HCEs and compared them with RDD values for pairs of genes and exons of these genes. Similar to what has been reported for mammalian comparisons , the absolute relative distance difference (|RDD|) were significantly lower for HCE pairs than for pairs of genes or exons [see Additional file 5; Wilcoxons unpaired test, p value < 2.2e-16]. Calculated as absolute values (|RDD|), the median distance difference for HCE pairs in the human-chicken comparisons was 0.46, which is about half that for gene pairs (0.91) and exons (0.95) [see Additional file 5]. The difference between distance conservation of HCE pairs and gene pairs is most pronounced for the human – zebrafish comparison; median |RDD|HCE being only 32 percent of median |RDD|gene. HCE-HCE absolute distance differences are also significantly less than exon-exon distance differences (within gene); the latter being only slightly different from the gene-gene relative distance differences.
The RDD distribution profiles were also markedly different for the three different pair comparisons (HCE-HCE, gene-gene, and exon-exon). The RDD distributions for HCE pairs show distinct two-peak profiles, with one peak close to zero and another at a more negative value. RDD values for gene pairs, in contrast, show only one peak skewed toward more negative values. The distributions of exon RDD values are wider than for both HCE and gene pairs. The distributions of all three data show a peak at relatively low RDD values (-1 to -2) for all four human – chicken/fish comparisons [see Additional file 6]. However, the distribution of HCE RDD values consistently show an additional, dominant peak around zero, indicating the existence of a subset of HCE-HCE pairs whose distances have been conserved across vertebrate evolution. Even for Fugu and Tetraodon, whose genome sizes are only around 13 percent of the human genome, the result indicates that around 30 percent of the analyzed HCE pairs have largely unaltered distances (i.e. |RDD| within ± 0.116~0.409) compared to the human genome [see Additional file 7].
Inter-HCE regions with distinctive distance conservation patterns
Number of HCE pairs with different genomic locations
Distance between intergenic IHRs and their nearest genes.
Intergenic IHR1 with CpG islands
Intergenic IHR2 with CpG islands
We also identified a few human genomic regions that are spanned by the same type of IHRs, indicating that the distance variation of HCEs in these regions is probably associated [see Additional file 13]. An intriguing observation is that ten IHR1s are clustered in a region close to 1 Mb, and the corresponding eight HCE pairs are all located in intergenic regions.
Enrichment of DNA repeat sequences
Percentage of repeated base pairs within IHRs.
Percentage of IHRs containing repeat (%)
Average percentage of repeated base pairs (%)
We also found that both types of IHRs contain significantly less sequences of SINE (4.3% for IHR1s, 11.0% for IHR2s), LINE (2.4% for IHR1s, 13.4% for IHR2s) and LTR (0.6% for IHR1s, 4.7% for IHR2s) compared to the random backgrounds (Table 3; p value < 0.001); however, both types of IHRs are significantly enriched in low complexity DNA sequences (4.9% for IHR1s, 0.7% for IHR2s) (Table 3; p value < 0.001 for IHR1; p value = 0.016 for IHR2;). We also tested the enrichment of long transposon-free regions (TFRs) in IHR1s and IHR2s. TFRs have been reported to be associated with both protein coding genes and UCEs . Of the 188 IHR1s, 60 percent are intersected with TFRs (2.6% for the random background); and 52 percent of the 215 IHR2s are intersected with TFRs [see Additional file 14; 12% for the random background]. Both groups of IHRs show a significant enrichment of TFRs compared with random selected regions, indicating a complex relationship between TFRs and distance conservation.
Unexpected enrichment of indel variation
Since HCEs are highly conserved at not only sequence level but also their genomic organization (e.g. order and distance), we suspected that IHRs might not tolerate any large extent of rearrangements. We therefore asked whether there are any differences in the distribution of human indel (i.e. insertion and deletion) polymorphisms in the IHRs.
Enrichment of human indels within IHRs.
Average percentage of deleted base pairs (%)
Average percentage of inserted base pairs (%)
Conserved sequences within IHRs
A previous observation is that |RDD| and sequence conservation are to some extent positively correlated . We used the datasets of phastCons elements provided by the UCSC online server to test the conservation characteristic within the IHRs. As for Tetraodon and Fugu, there are presently no phastCons data from the UCSC online service, so these two genomes were excluded from the sequence conservation analysis.
Length of conserved fractions and distance between two consecutive conserved fractions within IHRs.
Length of conserved fractions (bp)
Distance in between two consecutive conserved fractions (bp)
Total number of conserved fractions
p value (IHR1, IHR2)
p value (IHR1, IHR2)
p value (IHR1, IHR2)
p value (IHR1, IHR2)
IHRs and CpG islands
Both groups of IHRs are significantly enriched for CpG islands compared with the corresponding random backgrounds in the human genome: about 10 percent of IHR1s (0.5% for the random background) and 14 percent of IHR2s (2.3% for the random background) were found to contain CpG islands [see Additional file 16, Additional file 17; p value < 0.001]. We further tested the percentage length of CpG islands and observed the difference: average 45% for IHR1s and 7% for IHR2s. The percentage length of CpG islands between IHR1s and IHR2s is significantly different [see Additional file 16; Wilcoxon test, p value < 5.5e-06].
For both IHR1s and IHR2s with CpG islands, the pair-wise genomic loci of HCEs are only significantly sparse in the "intronic-intronic" class [see Additional file 18; Hypergeometric test, p value = 0.0024], which can easily be understood that there are exonic sequences residing in between the HCEs and that promoter elements (i.e. CpG islands) are less likely to be located in the exonic regions. We next checked the environment of those intergenic-intergenic IHRs with CpG islands, eleven/fifteen intergenic IHR1s/IHR2s were found with CpG islands, respectively (Table 2). Eight intergenic IHR1s with CpG islands are more than 8 Kb away from the closest gene. Fifteen IHR2s with CpG islands are located in the intergenic regions; only three reside in the regions less than 10 Kb away from the nearest gene. A high percentage of intergenic IHRs are more than 10 Kb away from the nearest genes.
HCEs are frequently found in relatively gene poor regions , and their distances are conserved among the mammalian genomes . Our data show that IHRs shared by the six vertebrates are also enriched in gene poor regions of their genomes. CpG islands are generally associated with human promoters  and most promoter-associated CpG islands that have been reported are located within 2 Kb regions around transcription start sites [19, 20]. The enrichment of CpG islands in the IHRs over the random background genomic regions suggests the possibility of the existence of potential target genes, and the long distance between the IHRs and the nearest gene indicates that putative targets might be located in a wider genomic range, or that the CpG islands residing in the IHRs along with the two side HCEs could together perform important roles either as regulatory blocks or other unknown functions.
Out data suggests that subsets of HCE pairs may undergo different evolutionary paths for their genomic distance conservation. We also examined a few features for the functional regions constituted by HCEs and their interior or adjacent sequences, and found that the precise spacing of HCEs to be an important aspect of the HCE structures. Highly conserved structural relationship of HCEs among genomes  indicates the feasibility that HCEs are not independent and that two or more HCEs may function together with adjacent sequences as a combined unit. We are not the first to propose the viewpoint that a portion of genomic region function as a united block. Chromosomal segments termed "genomic regulatory blocks (GRBs)" have been annotated in the human genome, formed by conserved relationships between HCEs and their assumed target genes . Higher-order functional architecture also illuminates functional domain structure of the ENCODE regions .
Early observations of HCEs strongly suggested their function as acting on vertebrate cis-regulatory elements (cREs) of early developmental genes [2, 21, 22], however cREs are not necessarily strongly conserved but have been regarded as more 'evolvable' than coding sequences . Highly complex correlations between HCEs and their putative target genes also question the idea that the primary function of HCEs is as cis-regulatory elements . Recently, function as "counting units" has been suggested to be associated with such elements . Overlapping, multiple functions have been suggested by several studies to account for the extreme sequence conservation of HCEs . HCE-rich regions are reported to be associated with histone methylation . Increasing evidence suggested their functional association with chromatin remodeling accompanying the involvement of HCEs in other functions like cREs. Most intergenic IHRs are located far away from annotated protein coding genes. Both long distance and relatively close related associations between HCEs and genes were identified . If the IHRs contain elements for chromatin structure and thus perform epigenetic regulation of gene transcription, this would either indicate a form of long distance regulatory action, or that other functional elements (not protein coding genes) are associated with these IHRs, or that the IHRs are per se functional units independent of target genes.
We detected 188 IHR1s with extremely conserved distances among deeply divergent species. Distance conservation between highly divergent organisms implies the extreme constraint on the evolution of the IHR1 lengths, which strongly suggests that their distances are functionally important. One possible interpretation for the less conservation of IHR2 lengths would be that some functional elements were inserted or deleted in the IHR2s, or alternatively, the expanded distance is indeed the requirement of difference in their potential biological function among genomes.
In this study, the bimodal distribution profiles of RDD values still persisted when mapping the integrated data set of HCEs onto the five non-mammalian genomes. We detected two groups of genomic regions confined by HCE pairs with distinct distance conservation pattern in vertebrate genomes. The data suggests these IHRs may function as combined unit, and that subsets of IHRs with distinct space conservation should be treated differently.
Genome sequences were downloaded from UCSC GoldenPath database for the 7 species: human (hg18), mouse (mm7), rat (rn4), chicken (galGal2), frog (xenTro2), zebrafish (danRer3), tetraodon (tetNrg1) and fugu (fr1). UCE  and CNE  dataset were obtained from the respective authors. The UCR  dataset was obtained from http://mordor.cgb.ki.se/GeneReg.net%20Home.html. The TFR (>5 kb) data set was obtained from http://jsm-group.imb.uq.edu.au/tfr/. The collections of annotated genes, the transposon, the repeat and CpG island annotation files for the human genome were downloaded from UCSC GoldenPath database http://hgdownload.cse.ucsc.edu/goldenPath. Collections of pair wise orthologous groups between human and other genomes were downloaded from the InParanoid database .
The three datasets of conserved elements were integrated together. Using the human genome as the reference, we extended physical loci to the most remote start/end positions of those elements that have intersections with each other, and we obtained 7,570 highly conserved elements (HCEs) without overlap.
Assignment of unique homologous HCE hits
The human HCE sequences were mapped onto the rodent (mouse and rat) and five non-mammalian vertebrate genomes (chicken, frog, zebrafish, fugu and tetraodon) with non-stringent BLASTn parameters (mismatch penalty -1, gap open penalty 1, word size 9, and soft masking). Hits for each HCE with an e-value ≤ 10-5 were considered to be under constraints of sequence conservation and kept for further analysis.
A number of HCEs have multiple BLASTn hits in the non-mammalian genomes [see Additional file 19]. To determine which hit is potentially the orthologous one is difficult with only sequence similarity information. The relative order of UCEs along the chromosomes has been found to be nearly identical among mammalian genomes, at a level similar to that of genes and strong conservation of mutual distances among vertebrate species was also found . Thus, criterions of consecutiveness and distance conservation were added to locate the HCEs uniquely onto the non-mammalian genomes [see Additional file 2]. For the cases where some HCEs have multi-alignment hits and some have no BLASTn hit in the query genome, two hits were looked as one pair according to the query genome, if there are less than two other HCEs located in between the two consecutive HCEs in the non-human genomes. RDD  values were calculated to measure the conservation of distance between the HCEs pairs. The pairs which were unique in the non-mammal genome were kept, and were divided into three categories according to their linkage relationship with other HCE pairs or associated orthologous genes. For the HCEs with multi-BLASTn hits pairs, we treat them as the corresponding HCEs in the non-mammal genomes on the condition of linkage with other HCE pairs or orthologous genes. Because HCEs tend to be located in clusters, linkage condition of HCE pairs is the first screening step. Thus, the corresponding |RDD| value might not be the minimum. If there were no existing linkage, the two consecutive HCEs with minimum |RDD| value were kept and thus positioned the corresponding HCEs in the query genome.
Assignment of homologous element pairs
Two HCEs or genes were regarded as a conserved pair if they were found as neighbors in the genomes of both (or all) the species compared [see Additional file 20].
Calculation of relative distance differences between HCE pairs
To investigate the conservation of distances between the HCEs pairs, we used the same definition as presented in our previous work , i.e. RDD = (dq-dh)/[(dq+dh)/2]; dq and dh being the distance between the mid-points of two HCEs of a pair in the query (non-human) and human genomes [see Additional file 20], respectively.
Partitioning HCE pairs into two groups
By using R clustering function 'pam', we partitioned 403 HCE pairs shared by the five pair wise genomes intro 2 groups based on their |RDD| values dissimilarity matrix. The 'pam' algorithm is a more robust version of K-means, and it is based on the search for 'k' (the number of clusters specified, we let k equal to 2) representative objects or medoids among the observations of the dataset. These observations should represent the structure of the data. After finding a set of 'k' medoids, 'k' clusters are constructed by assigning each observation to the nearest medoid. The goal is to find 'k' representative objects, which minimize the sum of the dissimilarities of the observations to their closest representative object.
We limited inter-HCE regions (IHRs) with boundary marked by the HCE pairs by removing HCEs themselves.
We randomly selected the same number of regions with same size as IHRs as a negative control. To evaluate the statistical significance of the features of the IHRs, analysis was repeated 1,000 times with independent, randomly sampled data sets. The fraction of times in which the random sample sets had higher (or lower) average scores than those of the IHRs provided the basis for the statistical significance.
Statistical analyses were carried out using the R language .
We thank Gill Bejerano and Adam Woolfe for kindly making available the UCE and CNE data sets. We also thank two anonymous reviewers for a number of constructive suggestions. Hong Sun is supported by Wyeth-Zhongxin Joint postdoctoral program. This work is supported by grants National High-Tech R&D Program (863) (2007AA02Z330, 2006AA02Z330), National Basic Research Program of China (2006CB910700, 2003CB715901), Key Research Program (CAS) KSCX2-YW-R-112 and the Shanghai Committee of Science and Technology (07JC14048).
- Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D: Ultraconserved elements in the human genome. Science. 2004, 304 (5675): 1321-1325. 10.1126/science.1098119.View ArticlePubMedGoogle Scholar
- Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, et al: Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005, 3 (1): e7-10.1371/journal.pbio.0030007.PubMed CentralView ArticlePubMedGoogle Scholar
- Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman WW, Ericson J, Lenhard B: Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 2004, 5 (1): 99-10.1186/1471-2164-5-99.PubMed CentralView ArticlePubMedGoogle Scholar
- Boffelli D, Nobrega MA, Rubin EM: Comparative genomics at the vertebrate extremes. Nat Rev Genet. 2004, 5 (6): 456-465. 10.1038/nrg1350.View ArticlePubMedGoogle Scholar
- Sun H, Skogerbo G, Chen R: Conserved distances between vertebrate highly conserved elements. Hum Mol Genet. 2006, 15 (19): 2911-2922. 10.1093/hmg/ddl232.View ArticlePubMedGoogle Scholar
- Eichler EE, Sankoff D: Structural dynamics of eukaryotic chromosome evolution. Science. 2003, 301 (5634): 793-797. 10.1126/science.1086132.View ArticlePubMedGoogle Scholar
- Bourque G, Pevzner PA, Tesler G: Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Res. 2004, 14 (4): 507-516. 10.1101/gr.1975204.PubMed CentralView ArticlePubMedGoogle Scholar
- Bourque G, Zdobnov EM, Bork P, Pevzner PA, Tesler G: Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages. Genome Res. 2005, 15 (1): 98-110. 10.1101/gr.3002305.PubMed CentralView ArticlePubMedGoogle Scholar
- Thurman RE, Day N, Noble WS, Stamatoyannopoulos JA: Identification of higher-order functional domains in the human ENCODE regions. Genome Res. 2007, 17 (6): 917-927. 10.1101/gr.6081407.PubMed CentralView ArticlePubMedGoogle Scholar
- Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engstrom PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, et al: Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 2007, 17 (5): 545-555. 10.1101/gr.6086307.PubMed CentralView ArticlePubMedGoogle Scholar
- Derti A, Roth FP, Church GM, Wu CT: Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants. Nat Genet. 2006, 38 (10): 1216-1220. 10.1038/ng1888.View ArticlePubMedGoogle Scholar
- Simons C, Pheasant M, Makunin IV, Mattick JS: Transposon-free regions in mammalian genomes. Genome Res. 2006, 16 (2): 164-172. 10.1101/gr.4624306.PubMed CentralView ArticlePubMedGoogle Scholar
- Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, Devine SE: An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 2006, 16 (9): 1182-1190. 10.1101/gr.4565806.PubMed CentralView ArticlePubMedGoogle Scholar
- Lunter G, Ponting CP, Hein J: Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol. 2006, 2 (1): e5-10.1371/journal.pcbi.0020005.PubMed CentralView ArticlePubMedGoogle Scholar
- Weber JL, David D, Heil J, Fan Y, Zhao C, Marth G: Human diallelic insertion/deletion polymorphisms. Am J Hum Genet. 2002, 71 (4): 854-862. 10.1086/342727.PubMed CentralView ArticlePubMedGoogle Scholar
- Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437 (7055): 69-87. 10.1038/nature04072.Google Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15 (8): 1034-1050. 10.1101/gr.3715005.PubMed CentralView ArticlePubMedGoogle Scholar
- Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J Mol Biol. 1987, 196 (2): 261-282. 10.1016/0022-2836(87)90689-9.View ArticlePubMedGoogle Scholar
- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, et al: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006, 38 (6): 626-635. 10.1038/ng1789.View ArticlePubMedGoogle Scholar
- Ioshikhes IP, Zhang MQ: Large-scale human promoter mapping using CpG islands. Nat Genet. 2000, 26 (1): 61-63. 10.1038/79189.View ArticlePubMedGoogle Scholar
- Shashikant CS, Kim CB, Borbely MA, Wang WCH, Ruddle FH: Comparative studies on mammalian Hoxc8 early enhancer sequence reveal a baleen whale-specific deletion of a cis-acting element. PNAS. 1998, 95 (26): 15446-15451. 10.1073/pnas.95.26.15446.PubMed CentralView ArticlePubMedGoogle Scholar
- Nobrega MA, Ovcharenko I, Afzal V, Rubin EM: Scanning human gene deserts for long-range enhancers. Science. 2003, 302 (5644): 413-10.1126/science.1088328.View ArticlePubMedGoogle Scholar
- Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA: The evolution of transcriptional regulation in eukaryotes. Mol Biol Evol. 2003, 20 (9): 1377-1419. 10.1093/molbev/msg140.View ArticlePubMedGoogle Scholar
- Sun H, Skogerbo G, Wang Z, Liu W, Li Y: Structural relationships between highly conserved elements and genes in vertebrate genomes. PLoS ONE. 2008, 3 (11): e3727-10.1371/journal.pone.0003727.PubMed CentralView ArticlePubMedGoogle Scholar
- Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, et al: A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006, 125 (2): 315-326. 10.1016/j.cell.2006.02.041.View ArticlePubMedGoogle Scholar
- O'Brien KP, Remm M, Sonnhammer EL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005, D476-480. 33 DatabaseGoogle Scholar
- Ihaka R, Gentleman R: A language for data analysis and graphics. J Comput Graph Statist. 1996, 5: 299-314. 10.2307/1390807.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.