Distribution of segmental duplications in the context of higher order chromatin organisation of human chromosome 7

  • Grit Ebert1, 2Email author,

    Affiliated with

    • Anne Steininger1, 2Email author,

      Affiliated with

      • Robert Weißmann3,

        Affiliated with

        • Vivien Boldt1, 2,

          Affiliated with

          • Allan Lind-Thomsen4,

            Affiliated with

            • Jana Grune1,

              Affiliated with

              • Stefan Badelt1, 5,

                Affiliated with

                • Melanie Heßler1,

                  Affiliated with

                  • Matthias Peiser6,

                    Affiliated with

                    • Manuel Hitzler6,

                      Affiliated with

                      • Lars R Jensen3,

                        Affiliated with

                        • Ines Müller1,

                          Affiliated with

                          • Hao Hu1,

                            Affiliated with

                            • Peter F Arndt1,

                              Affiliated with

                              • Andreas W Kuss3,

                                Affiliated with

                                • Katrin Tebel1 and

                                  Affiliated with

                                  • Reinhard Ullmann1Email author

                                    Affiliated with

                                    BMC Genomics201415:537

                                    DOI: 10.1186/1471-2164-15-537

                                    Received: 9 December 2013

                                    Accepted: 17 June 2014

                                    Published: 29 June 2014

                                    Abstract

                                    Background

                                    Segmental duplications (SDs) are not evenly distributed along chromosomes. The reasons for this biased susceptibility to SD insertion are poorly understood. Accumulation of SDs is associated with increased genomic instability, which can lead to structural variants and genomic disorders such as the Williams-Beuren syndrome. Despite these adverse effects, SDs have become fixed in the human genome. Focusing on chromosome 7, which is particularly rich in interstitial SDs, we have investigated the distribution of SDs in the context of evolution and the three dimensional organisation of the chromosome in order to gain insights into the mutual relationship of SDs and chromatin topology.

                                    Results

                                    Intrachromosomal SDs preferentially accumulate in those segments of chromosome 7 that are homologous to marmoset chromosome 2. Although this formerly compact segment has been re-distributed to three different sites during primate evolution, we can show by means of public data on long distance chromatin interactions that these three intervals, and consequently the paralogous SDs mapping to them, have retained their spatial proximity in the nucleus. Focusing on SD clusters implicated in the aetiology of the Williams-Beuren syndrome locus we demonstrate by cross-species comparison that these SDs have inserted at the borders of a topological domain and that they flank regions with distinct DNA conformation.

                                    Conclusions

                                    Our study suggests a link of nuclear architecture and the propagation of SDs across chromosome 7, either by promoting regional SD insertion or by contributing to the establishment of higher order chromatin organisation themselves. The latter could compensate for the high risk of structural rearrangements and thus may have contributed to their evolutionary fixation in the human genome.

                                    Keywords

                                    Higher order chromatin organisation Segmental duplication Williams-Beuren syndrome Chromosome evolution Hi-C

                                    Background

                                    Segmental duplications (SDs) are DNA sequences larger than 1 kb, which can be found at least twice with more than 90% sequence similarity in the genome. They are a feature of various eukaryotic genomes, however, they have particularly accumulated during primate evolution [13]. Thus the percentage of SDs has increased from about 2% in the New World monkey marmoset (Callithrix jacchus) genome [4] to approximately 5% in the human genome [5]. It is not clear what has triggered this recent burst of SDs, but the simultaneous decrease of point mutations and retrotransposition rate argues against that this is owed to a general increase of mutability [2]. Although SDs pose a serious threat to genomic integrity by promoting non-allelic homologous recombination (NAHR), this specific type of DNA copy number variant has been fixed in the genome. One reason for the manifestation of SDs could be their preferential location in gene-rich genomic segments and their high gene content [6, 7]. Several of the duplicated exons appear to be subject of accelerated evolution [8, 9], which has led to neofunctionalisation and subfunctionalisation of duplicated genes [1014]. However, in most cases mutations have resulted in pseudogenisation of duplicated genes [4, 15, 16], that nevertheless can show remarkably high transcriptional activity [4, 17]. Yet, the large fraction of pericentromeric SDs, which is less gene-rich [18], points at alternative factors that could support positive selection of SDs. For example, SD insertion could also impact gene expression by demarcating euchromatin from transcriptional inactive heterochromatin [19, 20]. Moreover, it has been discussed that SDs, which frequently map to synteny breaks [2125], may have mediated evolutionary rearrangements that have led to reproductive isolation of their carriers [26]. However, the temporal order of events argues against the impact of SDs on the generation of evolutionary rearrangements in many cases [27, 28]. On the contrary, a recent study supports the idea that the accumulation of SDs may also be the consequence of evolutionary rearrangements rather than their cause [20].

                                    SDs are not evenly distributed across the genome. Instead there are profound differences within and among chromosomes [29, 30]. Apart from large SD clusters in the subtelomeric and pericentromeric regions of most chromosomes, SDs can also accumulate in interstitial hubs [4, 18]. These hubs are characterised by an increased genomic instability, which manifests itself in a high probability of further SD insertion in their flanking regions, a phenomenon termed SD shadowing [31]. Furthermore, such hubs favour the presence of numerous structural variants with many of them having pathological relevance [32]. Yet, it is still uncertain what mechanisms have driven SD aggregation in the first place [33] and whether the pro rata contribution of any such mechanism remained the same throughout evolution [34]. A pivotal first step preceding formation of SD hubs may have been the insertion of core SDs [29]. Recombination between repetitive elements may play a role too, as nearly 27% of all SDs are flanked by Alu repeats [35]. In addition, the association of SDs with G4 motifs and other sequence features promoting non-B DNA conformations [19] points at the possible relevance of chromatin conformation for SD insertion.

                                    However, studies investigating SD distribution across the genome have so far based their analysis on the linear genome and have not taken into account its complex three dimensional organisation. Therefore, in this study we combined publicly available data on the three-dimensional organisation of the nucleus [36] with own experimental data in order to explore the distribution of SDs in relation to higher order chromatin organisation. Focusing on chromosome 7 with its particular high content of intrachromosomal and interstitial SDs [7, 22, 37], we demonstrate that paralogous SDs, that have been separated in the course of evolution, are still in close spatial proximity. Proceeding on this observation we have explored a possible role of SDs in sequence directed chromatin organisation and discuss how this may impact the emergence of genomic disorders such as the Williams-Beuren syndrome (WBS).

                                    Results

                                    Filtering and bundling of Hi-C interaction bins

                                    We have inferred spatial proximities of intrachromosomal SDs from normalised Hi-C data for chromosome 7 [36] at a resolution of 20 kb. Hi-C is a derivative of the chromosome conformation capture protocol (3C) [38, 39] and facilitates the genome-wide analysis of chromatin interactions within the nucleus. It is a proximity ligation based technology, where DNA is cut, re-ligated and the products are analysed by paired-end sequencing. The frequency of two DNA sequences co-occurring in the same paired-end reads reflects their contact probability within the nucleus across a large population of heterogeneous cells in all phases of the cell cycle.

                                    In order to concentrate on the most prevailing Hi-C interactions and to minimise the influence of random noise, we have applied different criteria to filter Hi-C data bins by changing 1) the normalised number of reads necessary to confirm the interaction of two given bins and 2) the minimal genomic distance of interacting bins. For each of these data sets adjacent interaction bins were merged to regions of interaction bundles if their start and target sites locate within an interval of 500 kb, respectively, using Circos tools [40]. Bundling all long distance interactions that have been confirmed by at least 15 interaction counts (=normalised number of paired-end reads) with a minimum interaction span size of at least 25 Mb using the bundling criteria “at least five interaction bins mapping within 500 kb at the start and the target site” to interaction bins, resulted in 33 bundles covering 37.2 Mb in total (i.e. 23.4% of chromosome 7, Additional file 1). In line with the literature, these long distance interaction bundles preferentially connect regions with high transcriptional activity and open chromatin [36, 39, 41] as demonstrated by our RNA-seq and H4K8ac data (Figure 1 and Additional file 2).
                                    http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-15-537/MediaObjects/12864_2013_6219_Fig1_HTML.jpg
                                    Figure 1

                                    Distribution of segmental duplications (SDs) and bundled long distance interactions in relation to acetylation of H4K8, transcriptional activity and lamina associated domains on human chromosome 7 (derived from IMR90 unless indicated otherwise). A) H4K8 acetylation profile, dark yellow: hyperacetylation of H4K8; blue: hypoacetylation of H4K8. B) the red and blue curve represent RNA-seq read counts/100 kb bin for coding and non-coding RNA, respectively (IMR91L). C) grey areas underlying the two histograms mark lamina associated domains (LADs, Tig3 cells). D) idiogram of chromosome 7, the Williams-Beuren syndrome region is highlighted in yellow beside the idiogram (at 72-74 Mb, hg18). E) transparent blue shading of the idiogram illustrates the inversion-affected segments of chromosome 7 depicted in Figure 2A-C. Bundled long distance interactions (F) and segmental duplications (G) are depicted in the inner circle; green ribbons: long distance interactions between genomic regions; grey: SDs with sequence similarity <98%; yellow: SDs with sequence similarity 98-99%; orange: SDs with sequence similarity >99%.

                                    In accordance with the preferential insertion of SDs into the gene-rich euchromatic portion of the genome, SD regions have a higher probability to be located within long distance interaction bundles (for chr7: adjusted p-value = 1.3332 × 10−4, for all chromosomes: adjusted p-value = 1.3332 × 10−4, 10000 simulations; Additional file 3). In two out of 1474 instances start and target site of long distance interaction bins directly coincide with the location of two SD paralogs (Additional file 2). Although the initial sequence alignment of Hi-C reads, as performed by Dixon et al. [36], employed a mapping quality score chosen to accept unique reads only, there is an apparent risk that some of these long distance interactions are owed to erroneous sequence alignment. Thus, we added a third filter for the Hi-C data bins, namely 3) the exclusion of genomic bins overlapping with SDs. We tested the consequences on the bundling pattern after removing all interacting bins that connect two given SD paralogs (termed IA bins w/o SD paralogs in Additional file 4), as well as ignoring all interaction bins that overlap with any SD at all (termed IA bins w/o any SD in Additional file 4). These filter options are aimed at excluding all short distance interactions that have been misinterpreted as long distance interactions due to false alignment of one side of a paired-end read. While this reduced the number of interaction bins by 0.01% and 9.75% (and 0.14% and 59.77% when only considering long distance interactions; see Additional file 1), interactions of bins adjacent to the removed ones were sufficient to retain the basic triangular interaction pattern (Additional file 4C-F and H). In addition to the filtering of SD overlapping interaction bins at the resolution of 20 kb, we performed a filtering also at the level of paired-end reads starting from the raw Hi-C data [36]. After exclusion of 369559 intrachromosomal paired-end reads that ambiguously mapped to chromosome 7 (affecting 5.11% of intrachromosomal 20 kb interaction bins), data were normalised and bundled (Additional files 1 and 4J).

                                    In order to avoid threshold-induced interpretation bias we have tested in total 12 different combinations of cut-offs and filter criteria (Additionals file 1 and 4) with variations in interaction counts per bin, interaction distance and handling of genomic bins overlapping with known SDs for the bundling of Hi-C data. The intersection of these 12 data sets revealed a core pattern of interactions independent of the threshold used (Additionals files 4H and 5). Therefore it is unlikely that the observed proximities of paralogous SDs are solely result of ambiguous sequence alignments within segmental duplications. However, we want to emphasise that given the paucity of reliable interaction counts within SDs, this statement heavily depends on the interaction patterns of adjacent bins that lack any SDs and is supported by shared regions of interactions as indicated by triangular interaction patterns.

                                    Chromosomal regions separated in the course of evolution retain spatial proximity

                                    SDs preferentially map to regions that are rich in long distance interactions. At the same time they are known to accumulate at synteny breakpoints [23, 25, 42]. This prompted us to search for particularities of long distance interaction patterns with respect to evolutionary breakpoints. We have focused on two recent rearrangements of chromosome 7 that have occurred during hominoid evolution and are not present in the homologous chromosome of orang-utan, a pericentric inversion in the common ancestor of human/gorilla followed by a paracentric inversion in the human/chimpanzee ancestor. As depicted in Figure 2A-C, synteny breakpoints coincide with changes in the characteristics of interaction patterns. To mimic the linear order of segments in gorilla and orang-utan we then recalculated the genomic coordinates of human chromosome 7 based on the fine-mapped evolutionary breakpoints (human/orang-utan, see Additional file 6). Figure 2A-C visualise the evolutionary split and relocation of a compact segment to three distant chromosomal regions in human and shows that these three - formerly adjacent - segments remain connected by long distance interactions. These segments comprise almost all sequences of human chromosome 7 that are syntenic to a large block (17.9 Mb) of marmoset chromosome 2 (Figure 2D; Ensembl v67 [43]). Genomic bins covering sequences of marmoset chromosome 2 were significantly overrepresented in regions rich in SDs as indicated by low probability scores based on minimum hypergeometric statistics [44] (p-value = 3.5 × 10−12; Figure 2E). Similarly a significant enrichment was detected in regions with a high frequency of Alu repeats (p-value = 2.3 × 10−14; Figure 2F), as well as G4 DNA motifs (p-value = 2.3 × 10−14; Figure 2G).
                                    http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-15-537/MediaObjects/12864_2013_6219_Fig2_HTML.jpg
                                    Figure 2

                                    Long distance interactions of human chromosome 7 connect sequences syntenic to the most proximal 17.9 Mb of marmoset chromosome 2 and cluster in regions rich in SDs, Alu repeats and G4 motifs. A-C) Circos plots showing the patterns of long distance interactions (green bundles) in relation to SDs (following the colouring scheme of Figure 1) within the three segments of human chromosome 7 affected by the pericentric and paracentric inversions (as highlighted in blue in the idiogram of Figure 1); (A) before and (B) after in silico reversion of the paracentric inversion and (C) after reverting the pericentric inversion. The partial red and blue shading of the idiogram in A and B indicates the genomic interval inverted by the paracentric and pericentric inversion, respectively. D) distribution of SDs, long distance interactions (LDIs), G4 motifs and Alu repeats across human (Hs) chromosome 7 (100 kb bins) and its relation to marmoset (Cj) chromosome 2 syntenic regions (green blocks). Pink blocks highlight sequences syntenic to regions of marmoset chromosome 8. E-G) enrichment of SDs, Alu repeats and G4 motifs within chromosome 7 segments homologous to sequences of marmoset chromosome 2 (highlighted in blue). Chromosome 7 segments (binned in 200 kb windows) are displayed in ranked order according to feature count. The red curve and red dot above each plot indicate the hypergeometric score and its minimum (mHG), respectively.

                                    Chromatin organisation of the Williams-Beuren region

                                    One of the three segments affected by the evolutionary rearrangement described above – the most closest segment to the centromere - contains three SD clusters (indicated by green boxes in the idiogram track in Figure 3), two of which are involved in the aetiology of the Williams-Beuren syndrome (WBS). Together these three SD clusters are encompassed by a 4.8 Mb genomic interval at 7q11.22-q11.23 (see Figure 3) (in the following named 7q11 segment). The most proximal SD cluster in the 7q11 segment starts at a transition of heterochromatin to euchromatin as demonstrated by our H4K8ac ChIP data and corroborated by numerous public data sets on posttranslational chromatin modifications (a selection of them is displayed in Figure 3 and Additional file 7). This heterochromatin to euchromatin switch is accompanied by changed probabilities of DNA attachment to the nuclear membrane [45] (Figure 3) and is also reflected by altered characteristics of replication timing and DNA degradation during early phases of apoptosis. In general, and in line with the literature, genome-wide analysis of apoptotic DNA degradation revealed significant correlation with both lamina attachment (ρ = −0.62, p-value < 2.2 × 10−16; Additional file 8) and replication timing (ρ = 0.65, p-value < 2.2 × 10−16) as defined by Spearman’s rank correlation test (Additional file 7). The patterns of apoptotic DNA degradation and its correlation to H4K8 acetylation were highly reproducible between two different cell lines (Additional file 9).
                                    http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-15-537/MediaObjects/12864_2013_6219_Fig3_HTML.jpg
                                    Figure 3

                                    Higher order chromatin organisation and SD localisation around the Williams-Beuren syndrome region. All data are referring to genome release hg19 and are derived from IMR90 unless indicated otherwise. The proximal, central and distal SD clusters (P, C, D) of the 7q11 segment encompassing 4.8 Mb are highlighted in green within the chromosome banding track. A-C) localisation of SDs; colouring according to sequence similarity; grey: <98%, yellow: 98%-99%; orange: >99%; D) genomic interval commonly deleted in WBS and the distal 7q11.23 deletion syndrome; E) topological domains as defined by Dixon et al. [36]; F) topological domains identified in the corresponding region in mouse [36] after conversion to human hg19. Note that the murine topological domain homologous to sequences deleted in the distal 7q11.23 syndrome is not fully represented due to a break of synteny within this genomic interval. See Figure 4 for details; G-H) heatmap and arc view of CTCF binding sites as detected by ChIA-PET in MCF7; I) number of G4 motifs/100 kb bins; J) average GC-content within 100 kb bins; K) number of Alu repeats/100 kb bin; L) number of structural variants as annotated by Database of Genomic Variance (DGV) [104], *maximum of 1080 CNVs not shown; M) log2 ratio scores of the LaminB1 DamID Map (Tig3 cells) as reported by Guelen et al. [45]; N) log2 ratio scores of DNA regions prone to early apoptotic DNA degradation in 20 kb windows, turquoise: degraded DNA segments; O) log2 ratio scores of H4K8 acetylation profile in 20 kb windows, blue: hyperacetylation, grey: hypoacetylation; P) red curve representing the sum of all intrachromosomal interaction counts/bin divided by the median number of interactions for all bins of chromosome 7; Q) percentage of interactions categorised according to their interaction span size; light grey: <0.5 Mb, grey: 0.5-1 Mb, light blue: 1–5 Mb, light brown: 5–10 Mb, dark grey: 10–25 Mb, black: ≥25 Mb. Gaps in this plot are due to alignment problems of Hi-C data in regions harbouring SDs with high sequence similarity.

                                    Given the reported association of gene density and chromatin organisation [46], we compared gene distribution and intron size inside and outside of the 7q11 segment. Gene density in the genomic region of this segment is higher than in 100000 randomly simulated intervals of chromosome 7 (23.86 vs. an average of 9.38 genes per Mb, estimated p-value < 0.0441). This difference in gene density was even more pronounced when focusing on the immediate genomic neighbourhood of the 7q11 segment; regions 4.8 Mb upstream and downstream contain an average of 1.45 genes per Mb (p-value = 5.829 × 10−14, two-tailed Fisher’s exact test) and 5.19 genes per Mb (p-value = 4.661 × 10−7, two-tailed Fisher’s exact test), respectively. At the same time, intron size of the 7q11 segment is decreased when compared to the average of 100000 simulations (3760 vs. 9827 bp, estimated p-value < 0.0453) and to the same number of genes (as located within the 7q11 segment) upstream and downstream of the segment (13772 and 9420 bp, p-value < 2.2 × 10−16, two-tailed Fisher’s exact test).

                                    GC-content is another aspect that is tightly linked to chromatin conformation. GC-content within the 7q11 segment is 47.5% on average with a standard deviation of 4.4% based on 100 kb windows. We observed a considerable drop of GC-content (down to 36.3%) within the most distal SD block and public data suggest that this interval of about 295 kb is located next to the nuclear membrane if mapped correctly. G4 motifs show variable enrichment within the 7q11 segment, which is most prominent outside the SD blocks. We also observed a relative depletion of G4 motifs within the central block of SDs which is not reflected in a corresponding change of GC-content (Figure 3).

                                    Next we have asked whether this distinct DNA conformation is also reflected in the Hi-C data set. The classification of Hi-C interaction data referring to chromosome 7 into six categories based on their interaction span size (ranging from less than 0.5 Mb to greater than 25 Mb) revealed that the change of chromatin state close to the WBS locus is also reflected by an increased proportion of interactions spanning less than 0.5 Mb (Figure 3), predominantly at the expense of interactions between 0.5-5 Mb and 10–25 Mb. This shift of span size characteristics is not accompanied by a general decrease of absolute interaction frequencies (red curve in Figure 3) and also lacks any symmetry around the gaps (owed to SDs with high sequence similarity) within the Hi-C data set, which would be expected if the observed changes in average span size are a consequence of mapping problems associated with the presence of SDs (Figure 3).

                                    Furthermore, Hi-C interaction patterns suggest that the recurrent deletion involved in the aetiology of WBS removes one topological domain, which is flanked by SDs with highest sequence similarities. In order to validate this assumption and to rule out that domain border definition at this site simply reflects sequence read depletion in large SD blocks, we performed an interspecies comparison of the human WBS locus and its homologous region in mouse. Topological domains were reported to have a high degree of evolutionary conservation. Indeed, the corresponding region in mice (5qG2) comprises a distinct topological domain and the large SD blocks present in humans have inserted at sites that are homologous to murine topological domain borders (Figure 4).
                                    http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-15-537/MediaObjects/12864_2013_6219_Fig4_HTML.jpg
                                    Figure 4

                                    Cross-species comparison showing that SDs next to the WBS locus have inserted at topological domain borders. Hi-C interactions and topological domains in the human fetal fibroblast cell line IMR90 are shown in dark green in the upper part as triangle view and bars, respectively. SDs with sequence similarity of 98%-99% and above 99%, respectively, (shown in yellow and orange in the SDs track) coincide with gaps within the Hi-C data. SD distribution and Hi-C data of the corresponding region in mouse are given in the lower part of the image. The position of FKBP6 and WBSCR16, the human orthologues of the two genes next to the murine topological domain borders are highlighted in green and red, respectively. The intervals commonly affected in WBS and the distal 7q11.23 syndrome are indicated by pale red bars. Note that the region distal to SRRM3 including the distal SD block are homologous to a different mouse chromosome.

                                    Discussion

                                    In this study we have investigated the relation between chromatin organisation of human chromosome 7 and the distribution of segmental duplications.

                                    Our study reveals that SDs preferentially map to those regions of chromosome 7, that are homologous to a 17.9 Mb large segment of marmoset chromosome 2. In the course of evolution, this formerly compact chromosomal segment split up and relocated to human chromosome 7p22, 7q11 and 7q22 by a pericentric and paracentric inversion in the common ancestor of human/gorilla and human/chimpanzee, respectively [47, 48]. Our analysis indicates that, despite these structural rearrangements, the three regions have retained their nuclear neighbourhood. This observation corroborates findings of evolutionarily conserved principles of nuclear organisation at the resolution of interphase FISH [49] and is in line with a recent report on an increased Hi-C interaction probability between murine syntenic breakpoint regions on human chromosomes, a phenomenon which has been termed spatial synteny [50]. As a consequence of spatial synteny, SD paralogs that are separated by structural rearrangements and appear distant on the linear chromosome are still in close spatial proximity in the interphase nucleus.

                                    A possible role for SDs in spatial synteny

                                    In light of the observed conservation of nuclear architecture, we have asked what factors could account for spatial synteny and whether the biased distribution of SDs might play a role therein and in nuclear organisation in general [51, 52]. It is still unclear whether nuclear architecture is determined by a nuclear scaffold or represents the outcome of self-organisation choreographed by intrinsic properties of the chromatin itself (reviewed in [53]), or a combination thereof. Although several DNA-protein interactions and epigenetic marks clearly correlate with specific features of chromatin organisation, DNA sequence by itself is likely to play a crucial role [36, 5355]. One DNA sequence feature significantly enriched in those segments of chromosome 7 that are syntenic to a large block of marmoset chromosome 2 are G4 DNA motifs (G≥3NxG≥3NxG≥3NxG≥3) [56] (Figure 2D and G). These motifs can establish highly stable intramolecular and intermolecular connections via Hoogsteen pairing between four guanines and have already been implicated in telomere organisation and in meiotic chromosome pairing [5659]. The non-random distribution of G4 motifs along human chromosome 7, as shown in this study, could point at a possible function of quadruplex structures in the retention of spatial proximities also in interphase nuclei. High frequency of Alu repeats is another, partly interrelated, sequence feature that we have found significantly enriched in these highly interacting regions (Figure 2D and F). Alu repeat distribution is not the result of regional insertion preferences, but more likely the consequence of selective pressure on GC-content biased removal [6063]. Against this background, Alu repeats have been implicated in higher order chromatin organisation [64, 65]. However, the overall presence of both Alu repeats and G4 motifs throughout the genome raises the question how such a sequence-directed organisation of the nucleus might obtain its specificity in the first place. The observed spatial proximity of SD paralogs (Figure 1), as well as their preferential insertion within Alu repeat and G4 motif-rich areas [18] (Figure 2) makes SDs ideal candidates to introduce sequence specificity into this process. For example, temporal somatic pairing could influence polymer dynamics and in this way accelerate the establishment of higher order chromatin organisation. Allelic or ectopic somatic pairing of homologous sequences is a widespread phenomenon in eukaryotes that is known to impact gene regulation and nuclear architecture ([66, 67], reviewed in [68]). Chromosomal structures enriched for interchromosomal SDs such as the telomeres and centromeres have already been reported to colocalise in interphase nuclei [6974]. Notably, paralogous SDs show a remarkably high rate of interlocus gene conversion [75], which may indicate a high contact probability within the nucleus.

                                    SD distribution at the heterochromatin to euchromatin boundary at 7q11.22

                                    Previous studies have reported the occurrence of SDs at the transition of heterochromatin to euchromatin [7678]. This prompted us to re-evaluate the distribution of SDs in the context of new models of chromatin organisation, particularily the concept of topological domains. These megabase sized domains of highly interacting chromatin are remarkably stable between different cell types and highly conserved between mice and humans [36]. We have focused on the three SD blocks localised at the border of 7q11.22 to 7q11.23. These SDs are of special interest to human geneticists as non-allelic homologous recombination between them underlies the development of Williams-Beuren syndrome (WBS, OMIM 194050), the 7q11.23 duplication syndrome (OMIM 609757 [79]), the inversion that predisposes to the WBS deletion [80] and the distal 7q11.23 deletion syndrome (OMIM 613729 [81]).

                                    Several observations indicate that the 7q11 segment containing these three SD blocks has a particular DNA conformation. This segment meets all criteria that have been defined for RIDGES (regions of increased gene expression; [82]), i.e. highly transcribed, GC-rich and gene-rich sequences with short introns and a high content of Alu repeats. RIDGES have a different degree of DNA compaction as suggested by computational analysis [83], an assumption, which is backed by the fact that the genomic characteristics of RIDGES largely overlap with those recently defined for DNA domains in an underwound state [84]. One factor for establishing and maintaining this specific chromatin conformation in this highly transcribed region may be G4 motifs, which are frequent in the 7q11 segment and have been reported to stabilise open chromatin [85]. Remarkably, sequences covered by the central and the distal SD cluster in the 7q11 segment show less G4 motif density and thus disrupt the continuity of G4 motif enrichment. Proceeding on the assumption that sequence reads were mapped unequivocally, the most distal SD block also has a high probability of being attached to the nuclear membrane (Figure 3).

                                    Evaluation of CTCF interaction characteristics and the re-analysis of Hi-C data with focus on average interaction span sizes mirrors the particularities of chromatin conformation in the 7q11 segment (Figure 3). Moreover, Hi-C data [36] suggest that the genomic interval typically deleted in WBS patients comprises a distinct topological domain, which is flanked by SDs at its borders. Clearly, the paucity of Hi-C data mapping to SDs with highest sequence similarities complicates the interpretation of SD-related interaction patterns and may have compromised the precise definition of topological domains. In search of strategies which could enable us to discriminate SD-associated technical artefacts from biological relevant SD insertion at domain borders, we exploited the facts that topological domains are highly conserved between mice and humans [36] and that the syntenic region in mice lack these large SD blocks [23, 27, 86]. Our cross-species comparison revealed that the single copy sequences deleted in WBS indeed compose a distinct topological domain in mice, and that the large SD blocks present in humans have inserted at sites homologous to the murine domain borders. This insertion of DNA sequences with different characteristics, for example in terms of G4 motif density or preference for attachment to the nuclear membrane (see Figure 3), could emphasise the separation of topological domains. Thus SDs may impact chromatin organisation at the level of topological domains in a way which is reminiscent of what has been proposed for pericentric SDs at the chromosomal level, namely to facilitate differential gene regulation and to protect from the regulatory influence of adjacent sequences [19, 20]. The reciprocal event, a deletion of domain borders and linker region, has already been shown experimentally to provoke significant changes in the interaction pattern of two adjacent topological domains [87]. Further support for this assumption is provided by recent reports on the impact of WBS deletions on the interaction patterns of its adjacent topological domains [88].

                                    Interestingly, although many SDs show accelerated rates of sequence divergence [26], SDs involved in the aetiology of WBS and several other genomic disorders show a considerably high rate of gene conversion, which preserves their sequence similarity [8992] and, as a consequence, the risk of recombination events that cause the genomic disorder [93, 94]. On one hand, recurrent recombinations of paralogous SDs, which cause the high rate of intrachromosomal deletions and inversions in the WBS region, supports the assumption of a high contact probability between these paralogous SDs within the nucleus. On the other hand, it raises the question whether sequence similarity might serve a function that could compensate for the associated high susceptibility to structural rearrangements mediated by SDs with high sequence similarity. For example, SDs could influence chromatin organisation by somatic pairing as discussed above or by RNA-based mechanisms. The latter option would be one explanation for the reported high transcriptional activity of pseudogenes mapping to SDs [4], with many of them regulated in a tissue-specific manner [17]. Notably, the frequent interaction of the Prader-Willi syndrome imprinting centre (15q13) with two adjacent SDs has already inspired discussions on the functional impact of SDs on chromatin organisation [95].

                                    Conclusions

                                    Our study suggests a link of nuclear architecture and the propagation of SDs across chromosome 7. Higher contact probabilities could promote regional SD insertion, but also could be a factor of nuclear organisation themselves, which promotes their propagation and evolutionary fixation in the genome.

                                    Methods

                                    Analysis of long distance interactions

                                    We have downloaded normalised intrachromosomal Hi-C data (hg18) of autosomes with 20 kb resolution derived from the human fetal lung fibroblast cell line IMR90 (replicate 1; [36]). A stringent cut-off was used to remove interaction (IA) bins represented by less than 15 independent sequence counts. Long distance interactions of chromosome 7 were defined by a minimal span size of 25 Mb. “Circos utilities/bundlelinks” [40] was employed to fuse long distance interactions to one bundle when at least five interaction bins were within a maximum distance of 500 kb at the start and target sites. We applied different combinations of filter options in terms of interaction counts per bin (at least 10, at least 15, and 10–50 IA/bin) and minimum span sizes (10 and 25 Mb) to evaluate the impact of thresholds on the bundle pattern (see Additional files 1 and 4). Moreover, we introduced a third filter based on the overlap of a given bin with SDs in order to correct for interactions that are owed to erroneous sequence alignments. BEDTools ”pairToPair” [96] was used to remove all interaction bins that connect two SD paralogs (removed IA bins: n = 159) or that overlap with any SD at all (removed IA bins: n = 126883) (see scheme in Additional file 4I). The remaining interactions were bundled using adapted criteria to factor the reduced number of interactions in total.

                                    Beside this filtering of Hi-C data on the level of genomic bins covering SDs we have repeated our filtering and bundling analysis on the level of paired-end reads mapping to SD regions. On the basis of the method of SUNs (Single Unique Nucleotides) discovery [97] we merged all regions covered by SDs, divided them into 30 bp long reads and remapped them to the human reference genome using RazerS 3 [98]. 30mer alignments mapping only once and with a maximum edit distance of 2 bp were considered as unique sequences. This data set was used to filter out ambiguously mapped paired-end reads within the Dixon data set mapping to these regions. The remaining read pairs were binned into 20 kb genomic windows and the resulting observed interaction counts per bin were re-normalised using the expected contact probability for the unfiltered read pairs as calculated by hicpipe [41]. The re-normalised interaction bins were filtered for long distance interactions (at least 15 interaction counts per bin, spanning more than 25 Mb) and these were bundled applying the criteria described above. Long distance interaction bundles were visualised by means of Circos plots [40].

                                    Public data sets

                                    Our analysis took advantage of various publicly available data sets (segmental duplications [5, 86], [36, 45, 99105], GSM935404, GSM970215, GSM469974, GSM469968, GSM521915, GSM521900, GSM469970, GSM521884, GSM521883, GSM521897, GSM469966, GSM521890, see Additional files 10 and 11 for details), which were downloaded from the UCSC Table Browser [106], the annotation database of the UCSC Genome Browser [107], the non-B database [100] and from the website given in Dixon et al. [36].

                                    SD distribution and intrachromosomal interaction patterns

                                    Segmental duplications of all sequence similarities have been categorised in those with their paralog mapping exclusively to the same chromosome (intra) and in those with their paralog mapping intrachromosomal and genome-wide. Additionally, in line with the colouring scheme used in the UCSC Genome Browser [108] segmental duplications have been categorised in those with sequence similarities below 98% (grey), between 98% and 99% (yellow) and above 99% (orange), respectively, and all three categories combined. Enrichment of the above-mentioned SD categories within long distance interaction bundles was tested. For this purpose the base pair overlap of SD covering regions of chromosome 7 with the bundle intervals of chromosome 7 (data set obtained with the cut-offs: >15 interaction counts/bin, interaction distance > 25 Mb) was determined and compared to 10000 random intervals employing the following strategy. First, to combine overlapping intervals within a given SD or bundle data set, respectively, the BEDTools “mergeBed” [96] was used. Second, the base pair overlap of SD data sets with long distance interaction bundles was calculated (observed base pair overlap) (BEDTools "coverageBed"). As control a resampling of the SD categories was performed (10000×; BEDTools "shuffleBed") with the following conditions for the random intervals: locate to the same chromosome and with the same interval sizes as the input SD data set, non-overlapping intervals and exclusion of annotation gaps. Subsequently the base pair overlap for each of the 10000 random data sets with the long distance interaction bundles was calculated (expected base pair overlaps). The fold change of the observed base pair overlap was calculated as the ratio of observed base pair overlap and the mean of 10000 expected base pair overlaps. The number of expected base pair overlaps greater or equal to the observed base pair overlap was counted for each SD category and used to calculate the p-value as described for Monte Carlo resampling in [109]. The p-value adjustment was performed according to the Benjamini-Hochberg method. Histograms of the expected base pair overlaps for each SD category were drawn using the R package ‘ggplot2’ [110].

                                    In addition, SD enrichment within interaction bundles (data set obtained with the cut-offs: >15 interaction counts/bin, interaction distance > 25 Mb) was determined for all chromosomes using SDs with paralogs exclusively mapping to the same chromosome, or intrachromosomal and genome-wide.

                                    Finally, SD enrichment within regions where bins are part of all bundle data sets (obtained by intersection of all twelve data sets resulting from different filter criteria, see Additional file 3) was calculated using SDs with paralogs mapping intrachromosomal and genome-wide.

                                    Fine-mapping of evolutionary breakpoints and mimicking interaction patterns in orang-utan and gorilla

                                    Alignments were retrieved from the Ensembl database (version 67) using the Perl API [43]. As the paracentric inversion is not represented in the current version of the gorilla genome (Gorilla gorilla gorilla; gorGor3.1; May 2011), the proximal and distal breakpoint of both inversions were determined by plotting the orang-utan genome (Pongo abelii; WUGSC2.0.2/ponAbe2; July 2007) versus the human genome (GRCh37/hg19; February 2009). A corresponding dot plot, which uses the UCSC colouring scheme for the chromosome numbers is shown in Additional file 6. Segmental duplications were superimposed onto the dot plot following the colouring scheme introduced above (Additional file 6). The fine-mapped coordinates of the paracentric and pericentric inversion of chromosome 7 derived from this analysis (para: chr7:76646908 and chr7:102118853, peri: chr7: 6875820 and 80857936; hg18) were used to recalculate the genomic coordinates of long distance interactions and SDs in order to mimic the situation in gorilla and orang-utan. The three segments surrounding the evolutionary breakpoints, the positional changes of SDs and long distance interactions after in silico reversion were visualised by means of Circos plots [40].

                                    Synteny of human chromosome 7 and enrichment analysis for SDs, Alu repeats and G4 motifs

                                    Syntenic regions of human chromosome 7 and marmoset (Callithrix jacchus) were obtained from Ensembl database (version 67) [43] and converted to hg18 coordinates using the default settings of the LiftOver tool [108]. We divided chromosome 7 into 200 kb bins (n = 795), of which 125 comprise sequences homologous to marmoset chromosome 2. The minimum hypergeometric score and its exact p-value were calculated as described by Eden et al. [44]. In brief, we have shuffled the natural order of genomic bins in order to minimise the influence of the genomic order of bins with identical values. Then we ranked all bins in ascending order according to their counts for the respective feature (Alu, SD, G4). The enrichment of marmoset chromosome 2 sequences within the highest scoring bins was quantified by means of the hypergeometric score and the p-value was calculated for the minimum hypergeometric score (mHG). Distribution of SDs, long distance interactions, G4 DNA motifs, Alu repeats and syntenic regions of human chromosome 7 and marmoset were visualised in the UCSC Genome Browser [108] (upper part in Figure 2D) and combined with further information on synteny derived from the Ensembl Genome Browser (lower part in Figure 2D).

                                    Chromatin immunoprecipitation

                                    Human fetal lung fibroblast cell lines IMR91L (male) and IMR90 (female) were obtained from the Coriell Institute for Medical Research. Both cell lines were cultured in Eagle´s minimum essential medium (EMEM) supplemented with 10% fetal bovine serum (Sigma-Aldrich, Saint Louis, USA), 2 mM UltraGlutamine 1 (Lonza, Walkerville, USA), 1 mM sodium pyruvate and 100 units/mL penicillin/streptomycin. The fibroblasts were maintained at 37°C with a humidified atmosphere of 5% CO2 and ambient oxygen. Chromatin immunoprecipitation was done according to the Transcription Factor ChIP kit protocol (Diagenode, Liège, Belgium). In brief, lysed cells were sonicated using the Bioruptor UCD-200 device (Diagenode, Liège, Belgium), followed by overnight incubation of 1 × 106 cells with 5 μg of antibody against Histone H4 lysine 8 acetylation (pAb-103-050; Diagenode, Liège, Belgium). The subsequent chromatin reverse crosslinking, elution and purification of ChIP DNA and input DNA were done employing the IPure Kit (Diagenode, Liège, Belgium).

                                    Analysis of DNA degradation during early phases of apoptosis

                                    Apoptosis of IMR90 and IMR91L cells was induced by exposing 2 × 106 cells to either 1 μmol/L staurosporine (Cell Signaling Technology, Inc., Danvers, USA)/0.1% DMSO or 0.1% DMSO alone (as control) for four hours at 37°C. An aliquot of about 5-10 × 106 cells/mL was co-stained with Annexin V-APC (BD Biosciences, San Jose, USA) and 7-Aminoactinomycin D (7-AAD, BD Biosciences, San Jose, USA) for 15 minutes to monitor the progress of apoptosis by FACS analysis.

                                    The remaining cells were treated with lysis buffer (0.40 M Tris–HCl pH 8.0, 0.06 M Na-EDTA, 0.15 M NaCl, 1% SDS) and RNA was digested for 1 hour at 37°C using 15 μg/mL RNase A. 1 M sodium perchlorate and one volume chloroform were added to deproteinise cell lysates. DNA fragmentation was checked using the Genomic DNA Screentape on an Agilent 2200 Tap2station (Agilent, Santa Clara, USA) (see Additional file 9).

                                    High molecular (>48 kb) and degraded apoptotic DNA (~4 kb) were extracted by cutting slices out of a preparative 1% low melt agarose gel and subsequent digestion with β-Agarase I according to the manufacturer´s protocol (New England Biolabs, Ipswich, USA).

                                    Microarray hybridisation

                                    Purifed DNA from ChIP and apoptotic DNA degradation experiments were amplified by means of the GenomePlex Whole Genome Amplification Kit (Sigma, Saint Louis, USA). Regional preferences in apoptotic DNA degradation and H4K8 acetylation were determined by co-hybridising high molecular (>48 kb) and degraded apoptotic DNA (~4 kb), and ChIP DNA and input DNA onto a 400 k whole genome oligonucleotide array (GPL9777) and region-specific custom oligonucleotide array covering the interval chr7:69936560–70795513 (hg19) with an average oligospacing of 198 bp (GPL17964), respectively (following the protocols for array CGH provided by the manufacturer (Agilent, Santa Clara, USA)). Image analysis, normalisation and annotation were done with Feature Extraction 10.5.1.1 (Agilent, Santa Clara, USA) using the default settings. Data visualisation and further analysis was performed with GenomeCAT (Tebel et al., manuscript in preparation; http://​www.​molgen.​mpg.​de/​204904/​GenomeCAT) and the Human Epigenome Browser [111, 112].

                                    RNA expression profiling

                                    Expression profiling was performed by Next-generation sequencing on a SOLiD 5500xl Genetic Analyzer (Life Technologies, Carlsbad, USA). Total RNA was extracted from IMR91L cell cultures using TRIzol (Life Technologies, Carlsbad, USA). 10 μg of each total RNA sample was spiked with ERCC spike-in control mixes (Life Technologies, Carlsbad, USA) prior to removal of the rRNA by use of the RiboMinus Kit (Life Technologies, Carlsbad, USA). The RNA was then prepared for sequencing using the protocol and components provided with. In brief, the rRNA-depleted RNA was fragmented by chemical hydrolysis, phosphorylated and purified. Adaptors were then ligated and hybridised to the RNA fragments and reverse transcribed into cDNA. The cDNA was then purified and size-selected using two rounds of Agencourt AMPure XP bead purification (Beckman Coulters Genomics, Danvers, USA) and released from the beads. The sample was then amplified by 12 PCR cycles in a T3 Thermocycler (Biometra, Göttingen, Germany) in the presence of primers that contained unique sequences (barcoding) in order to determine the origin of the sequence after pooling of the fragments and sequencing. The size distribution and concentration of the fragments were determined with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA) and quantitative PCR using a LightCycler 480 Real-Time PCR System (Roche Applied Science, Penzberg, Germany) and the KAPA Library Quant ABI SOLiD kit (Peqlab Biotechnologie GmbH, Erlangen, Germany).

                                    The cDNA fragments were then pooled in equimolar amounts and diluted to 61 pg/μL corresponding to a concentration of 500 pM. 50 μL of this dilution was mixed with a freshly prepared oil emulsion, P1 and P2 reagents and P1 beads in a SOLiD EZ Bead Emulsifier prepared according to the E80 scale protocol (Life Technologies, Carlsbad, USA). The emulsion PCR was carried out in a SOLiD EZ Bead Amplifier (Life Technologies, Carlsbad, USA) using the E80sm setting. To enrich for the beads that carried amplified template DNA, the beads were purified on a SOLiD EZ Bead Enricher using the recommended chemistry and software (Life Technologies, Carlsbad, USA).

                                    The purified beads were then loaded onto a SOLiD 6-lane Flowchip and incubated upside down for 1 hour at 37°C. The Flowchip was then positioned in the 5500xl SOLiD System and the DNA was sequenced using 50 nucleotides in the forward direction and 35 nucleotides in the reverse direction and the recommended chemistry (Life Technologies, Carlsbad, USA).

                                    Sequence reads mapping to RefSeq coding exons and matching the coding strand were counted towards coding RNAs, all other mapping reads were counted towards non-coding RNAs.

                                    Genomic characterisation of the Williams-Beuren region

                                    Own experimental results and public data (Additional files 10 and 11) were conflated in the Human Epigenome Browser hosted by Washington University [111, 112]. Regional characteristics of lamin B1 interaction sites [45], replication timing [101, 102] and apoptotic DNA degradation (log2 ratio) were compared for 20 kb bins using Spearman's rank correlation test implemented in R [113].

                                    For calculation of gene density and intron size of genes on chromosome 7 within the 7q11 segment or the intermediate neighbourhood, genomic coordinates of known canonical genes and their introns were downloaded from the UCSC Table Browser. Number of genes and intron length within each region were determined by means of “BEDTools/intersectBed” [96]. Gene density for each region was calculated as the number of genes per megabase. Statistical significance was estimated using 100000 random simulations or a Fisher’s exact test.

                                    Calculation of average span sizes of intrachromosomal interactions of chromosome 7

                                    All intrachromosomal interaction bins of chromosome 7 indicated by at least one normalised interaction count between two genomic bins according to Dixon et al. [36] were categorised into six classes based on their span size: i) <500 kb, ii) 500 kb to less than 1 Mb, iii) 1 Mb to less than 5 Mb, iv) 5 Mb to less than 10 Mb, v) 10 Mb to less than 25 Mb and vi) span sizes equal or greater than 25 Mb.

                                    For each bin and span size category we summed up the scores separately. The relative contribution of each category to the total score of interaction counts/bin was calculated by dividing the category score through the total score of each bin. For the purpose of comparability within Figure 3, genomic coordinates have been converted to hg19 using the default settings of the LiftOver tool [108].

                                    Topological domains in mice

                                    Coordinates of mouse (mm9) topological domains were obtained from [36] and converted to hg19 using the default settings of the LiftOver tool [108]. Both the original and the converted mouse domains were visualised within the Human Epigenome Browser [112] in the mm9 and hg19 assembly, respectively. Orthologous genes located at the murine domain borders were plotted at the corresponding location in the human genome employing the Multi-Genome Synteny Viewer (mGSV) [114].

                                    Availability of supporting data

                                    Microarray data generated in this study have been submitted to NCBI GEO (http://​www.​ncbi.​nlm.​nih.​gov/​geo/​) under accession number GSE41356.

                                    RNA sequencing data have been submitted to Sequence Read Archive (SRA) (http://​www.​ncbi.​nlm.​nih.​gov/​Traces/​sra/​) under accession number SRS366467.

                                    Abbreviations

                                    IA: 

                                    Interaction

                                    NAHR: 

                                    Non-allelic homologous recombination

                                    SD: 

                                    Segmental duplication

                                    WBS: 

                                    Williams-Beuren syndrome.

                                    Declarations

                                    Acknowledgments

                                    This study was supported by the Deutsche Forschungsgemeinschaft (UL 342/2-2) and the Wilhelm Sander-Stiftung (2011.066.1). We thank Udo Georgi for helpful comments and Markus Gassmann (Agilent Technologies) for assistance with the Agilent 2200 Tap2station.

                                    Authors’ Affiliations

                                    (1)
                                    Max Planck Institute for Molecular Genetics
                                    (2)
                                    Department of Biology, Chemistry and Pharmacy, Free University Berlin
                                    (3)
                                    Department of Human Genetics, University Medicine Greifswald, and Interfaculty Institute of Genetics and Functional Genomics, University of Greifswald
                                    (4)
                                    Wilhelm Johannsen Centre for Functional Genome Research, Department of Cellular and Molecular Medicine, University of Copenhagen
                                    (5)
                                    Institute for Theoretical Chemistry, University of Vienna
                                    (6)
                                    Unit Experimental Research, Department of Product Safety, Federal Institute for Bundeswehr Institute of Radiobiology affiliated, the University of Ulm

                                    References

                                    1. Jiang Z, Tang H, Ventura M, Cardone MF, Marques-Bonet T, She X, Pevzner PA, Eichler EE: Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat Genet 2007, 39:1361–1368.PubMedView Article
                                    2. Marques-Bonet T, Kidd JM, Ventura M, Graves TA, Cheng Z, Hillier LW, Jiang Z, Baker C, Malfavon-Borja R, Fulton LA, Alkan C, Aksay G, Girirajan S, Siswara P, Chen L, Cardone MF, Navarro A, Mardis ER, Wilson RK, Eichler EE: A burst of segmental duplications in the genome of the African great ape ancestor. Nature 2009, 457:877–881.PubMed CentralPubMedView Article
                                    3. Stankiewicz P, Shaw CJ, Withers M, Inoue K, Lupski JR: Serial segmental duplications during primate evolution result in complex human genome architecture. Genome Res 2004, 14:2209–2220.PubMed CentralPubMedView Article
                                    4. She X, Liu G, Ventura M, Zhao S, Misceo D, Roberto R, Cardone MF, Rocchi M, Program NCS, Green ED, Archidiacano N, Eichler EE: A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications. Genome Res 2006, 16:576–583.PubMed CentralPubMedView Article
                                    5. Bailey JA, Gu ZP, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science 2002, 297:1003–1007.PubMedView Article
                                    6. Bailey JA, Yavor AM, Viggiano L, Misceo D, Horvath JE, Archidiacono N, Schwartz S, Rocchi M, Eichler EE: Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22. Am J Hum Genet 2002, 70:83–100.PubMed CentralPubMedView Article
                                    7. Hillier LW, Fulton RS, Fulton LA, Graves TA, Pepin KH, Wagner-McPherson C, Layman D, Maas J, Jaeger S, Walker R, Wylie K, Sekhon M, Becker MC, O'Laughlin MD, Schaller ME, Fewell GA, Delehaunty KD, Miner TL, Nash WE, Cordes M, Du H, Sun H, Edwards J, Bradshaw-Cordum H, Ali J, Andrews S, Isak A, Vanbrunt A, Nguyen C, Du F, et al.: The DNA sequence of human chromosome 7. Nature 2003, 424:157–164.PubMedView Article
                                    8. Lorente-Galdos B, Bleyhl J, Santpere G, Vives L, Ramirez O, Hernandez J, Anglada R, Cooper GM, Navarro A, Eichler EE, Marques-Bonet T: Accelerated exon evolution within primate segmental duplications. Genome Biol 2013, 14:R9.PubMed CentralPubMedView Article
                                    9. Pegueroles C, Laurie S, Alba MM: Accelerated evolution after gene duplication: a time-dependent process affecting just one copy. Mol Biol Evol 2013, 30:1830–1842.PubMedView Article
                                    10. Zhang JZ: Evolution by gene duplication: an update. Trends Ecol Evol 2003, 18:292–298.View Article
                                    11. Bekpen C, Tastekin I, Siswara P, Akdis CA, Eichler EE: Primate segmental duplication creates novel promoters for the LRRC37 gene family within the 17q21.31 inversion polymorphism region. Genome Res 2012, 22:1050–1058.PubMed CentralPubMedView Article
                                    12. Giannuzzi G, Siswara P, Malig M, Marques-Bonet T, Mullikin JC, Ventura M, Eichler EE, Progra NCS: Evolutionary dynamism of the primate LRRC37 gene family. Genome Res 2013, 23:46–59.PubMed CentralPubMedView Article
                                    13. He XL, Zhang JZ: Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 2005, 169:1157–1164.PubMed CentralPubMedView Article
                                    14. Ohno S: Evolution by gene duplication. Berlin, New York: Springer-Verlag; 1970.View Article
                                    15. Newman T, Trask BJ: Complex evolution of 7E olfactory receptor genes in segmental duplications. Genome Res 2003, 13:781–793.PubMed CentralPubMedView Article
                                    16. Malnic B, Godfrey PA, Buck LB: The human olfactory receptor gene family. Proc Natl Acad Sci U S A 2004, 101:2584–2589.PubMed CentralPubMedView Article
                                    17. Kalyana-Sundaram S, Kumar-Sinha C, Shankar S, Robinson DR, Wu YM, Cao X, Asangani IA, Kothari V, Prensner JR, Lonigro RJ, Iyer MK, Barrette T, Shanmugam A, Dhanasekaran SM, Palanisamy N, Chinnaiyan AM: Expressed pseudogenes in the transcriptional landscape of human cancers. Cell 2012, 149:1622–1634.PubMed CentralPubMedView Article
                                    18. Bailey JA, Eichler EE: Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet 2006, 7:898–898.View Article
                                    19. Eichler EE, Archidiacono N, Rocchi M: CAGGG repeats and the pericentromeric duplication of the hominoid genome. Genome Res 1999, 9:1048–1058.PubMedView Article
                                    20. Giannuzzi G, Pazienza M, Huddleston J, Antonacci F, Malig M, Vives L, Eichler EE, Ventura M: Hominoid fission of chromosome 14/15 and the role of segmental duplications. Genome Res 2013, 23:1763–1773.PubMed CentralPubMedView Article
                                    21. Samonte RV, Eichler EE: Segmental duplications and the evolution of the primate genome. Nat Rev Genet 2002, 3:65–72.PubMedView Article
                                    22. Scherer SW, Cheung J, MacDonald JR, Osborne LR, Nakabayashi K, Herbrick JA, Carson AR, Parker-Katiraee L, Skaug J, Khaja R, Zhang J, Hudek AK, Li M, Haddad M, Duggan GE, Fernandez BA, Kanematsu E, Gentles S, Christopoulos CC, Choufani S, Kwasnicka D, Zheng XH, Lai Z, Nusskern D, Zhang Q, Gu Z, Lu F, Zeesman S, Nowaczyk MJ, Teshima I, et al.: Human chromosome 7: DNA sequence and biology. Science 2003, 300:767–772.PubMed CentralPubMedView Article
                                    23. Armengol L, Pujana MA, Cheung J, Scherer SW, Estivill X: Enrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements. Hum Mol Genet 2003, 12:2201–2208.PubMedView Article
                                    24. Zhao H, Bourque G: Recovering genome rearrangements in the mammalian phylogeny. Genome Res 2009, 19:934–942.PubMed CentralPubMedView Article
                                    25. Kehrer-Sawatzki H, Cooper DN: Structural divergence between the human and chimpanzee genomes. Hum Genet 2007, 120:759–778.PubMedView Article
                                    26. Armengol L, Marques-Bonet T, Cheung J, Khaja R, Gonzalez JR, Scherer SW, Navarro A, Estivill X: Murine segmental duplications are hot spots for chromosome and gene evolution. Genomics 2005, 86:692–700.PubMedView Article
                                    27. Bailey JA, Baertsch R, Kent WJ, Haussler D, Eichler EE: Hotspots of mammalian chromosomal evolution. Genome Biol 2004, 5:R23.PubMed CentralPubMedView Article
                                    28. Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, Hobolth A, Lappalainen T, Mailund T, Marques-Bonet T, McCarthy S, Montgomery SH, Schwalie PC, Tang YA, Ward MC, Xue Y, Yngvadottir B, Alkan C, Andersen LN, Ayub Q, Ball EV, Beal K, Bradley BJ, Chen Y, Clee CM, Fitzgerald S, Graves TA, Gu Y, Heath P, Heger A, et al.: Insights into hominid evolution from the gorilla genome sequence. Nature 2012, 483:169–175.PubMed CentralPubMedView Article
                                    29. Eichler EE: Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet 2001, 17:661–669.PubMedView Article
                                    30. Zhang L, Lu HH, Chung WY, Yang J, Li WH: Patterns of segmental duplication in the human genome. Mol Biol Evol 2005, 22:135–141.PubMedView Article
                                    31. Cheng Z, Ventura M, She X, Khaitovich P, Graves T, Osoegawa K, Church D, DeJong P, Wilson RK, Paabo S, Rocchi M, Eichler EE: A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 2005, 437:88–93.PubMedView Article
                                    32. Lupski JR: Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet 1998, 14:417–422.PubMedView Article
                                    33. Marques-Bonet T, Girirajan S, Eichler EE: The origins and impact of primate segmental duplications. Trends Genet 2009, 25:443–454.PubMed CentralPubMedView Article
                                    34. Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, Grubert F, Chen X, Weissman S, Snyder M, Gerstein MB: Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res 2008, 18:1865–1874.PubMed CentralPubMedView Article
                                    35. Bailey JA, Liu G, Eichler EE: An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet 2003, 73:823–834.PubMed CentralPubMedView Article
                                    36. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 2012, 485:376–380.PubMed CentralPubMedView Article
                                    37. Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui LC, Scherer SW: Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol 2003, 4:R25.PubMed CentralPubMedView Article
                                    38. Dekker J, Rippe K, Dekker M, Kleckner N: Capturing chromosome conformation. Science 2002, 295:1306–1311.PubMedView Article
                                    39. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 2009, 326:289–293.PubMed CentralPubMedView Article
                                    40. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: An information aesthetic for comparative genomics. Genome Res 2009, 19:1639–1645.PubMed CentralPubMedView Article
                                    41. Yaffe E, Tanay A: Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet 2011, 43:1059–1065.PubMedView Article
                                    42. Capozzi O, Carbone L, Stanyon RR, Marra A, Yang F, Whelan CW, de Jong PJ, Rocchi M, Archidiacono N: A comprehensive molecular cytogenetic analysis of chromosome rearrangements in gibbons. Genome Res 2012, 22:2520–2528.PubMed CentralPubMedView Article
                                    43. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kahari AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, et al.: Ensembl 2012. Nucleic Acids Res 2012, 40:D84-D90.PubMed CentralPubMedView Article
                                    44. Eden E, Lipson D, Yogev S, Yakhini Z: Discovering motifs in ranked lists of DNA sequences. PLoS Comput Biol 2007, 3:e39.PubMed CentralPubMedView Article
                                    45. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, Eussen BH, de Klein A, Wessels L, de Laat W, van Steensel B: Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 2008, 453:948–951.PubMedView Article
                                    46. Kupper K, Kolbl A, Biener D, Dittrich S, von Hase J, Thormeyer T, Fiegler H, Carter NP, Speicher MR, Cremer T, Cremer M: Radial chromatin positioning is shaped by local gene density, not by gene expression. Chromosoma 2007, 116:285–306.PubMed CentralPubMedView Article
                                    47. Müller S, Finelli P, Neusser M, Wienberg J: The evolutionary history of human chromosome 7. Genomics 2004, 84:458–467.PubMedView Article
                                    48. Yunis JJ, Prakash O: The origin of man: a chromosomal pictorial legacy. Science 1982, 215:1525–1530.PubMedView Article
                                    49. Neusser M, Schubel V, Koch A, Cremer T, Muller S: Evolutionarily conserved, cell type and species-specific higher order chromatin arrangements in interphase nuclei of primates. Chromosoma 2007, 116:307–320.PubMedView Article
                                    50. Véron AS, Lemaitre C, Gautier C, Lacroix V, Sagot MF: Close 3D proximity of evolutionary breakpoints argues for the notion of spatial synteny. BMC Genomics 2011, 12:303.PubMed CentralPubMedView Article
                                    51. Bolzer A, Kreth G, Solovei I, Koehler D, Saracoglu K, Fauth C, Muller S, Eils R, Cremer C, Speicher MR, Cremer T: Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes. PLoS Biol 2005, 3:826–842.View Article
                                    52. Cremer T, Cremer M: Chromosome territories. Cold Spring Harb Perspect Biol 2010, 2:a003889.PubMed CentralPubMedView Article
                                    53. Misteli T: Beyond the sequence: cellular organization of genome function. Cell 2007, 128:787–800.PubMedView Article
                                    54. Bickmore WA, van Steensel B: Genome architecture: domain organization of interphase chromosomes. Cell 2013, 152:1270–1284.PubMedView Article
                                    55. Cook PR: A model for all genomes: the role of transcription factories. J Mol Biol 2010, 395:1–10.PubMedView Article
                                    56. Maizels N, Gray LT: The G4 genome. PLoS Genet 2013, 9:e1003468.PubMed CentralPubMedView Article
                                    57. Bochman ML, Paeschke K, Zakian VA: DNA secondary structures: stability and function of G-quadruplex structures. Nat Rev Genet 2012, 13:770–780.PubMed CentralPubMedView Article
                                    58. Horvath JE, Bailey JA, Locke DP, Eichler EE: Lessons from the human genome: transitions between euchromatin and heterochromatin. Hum Mol Genet 2001, 10:2215–2223.PubMedView Article
                                    59. Sen D, Gilbert W: Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 1988, 334:364–366.PubMedView Article
                                    60. Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet 2002, 3:370–379.PubMedView Article
                                    61. Boissinot S, Davis J, Entezam A, Petrov D, Furano AV: Fitness cost of LINE-1 (L1) activity in humans. Proc Natl Acad Sci U S A 2006, 103:9590–9594.PubMed CentralPubMedView Article
                                    62. Hackenberg M, Bernaola-Galvan P, Carpena P, Oliver JL: The biased distribution of Alus in human isochores might be driven by recombination. J Mol Evol 2005, 60:365–377.PubMedView Article
                                    63. Jurka J: Evolutionary impact of human Alu repetitive elements. Curr Opin Genet Dev 2004, 14:603–608.PubMedView Article
                                    64. Klimopoulos A, Sellis D, Almirantis Y: Widespread occurrence of power-law distributions in inter-repeat distances shaped by genome dynamics. Gene 2012, 499:88–98.PubMedView Article
                                    65. Tang S-J: Chromatin Organization by Repetitive Elements (CORE): A Genomic Principle for the Higher-Order Structure of Chromosomes. Genes 2011, 2:502–515.PubMed CentralPubMedView Article
                                    66. Haaf T, Steinlein K, Schmid M: Preferential somatic pairing between homologous heterochromatic regions of human chromosomes. Am J Hum Genet 1986, 38:319–329.PubMed CentralPubMed
                                    67. Schneider R, Grosschedl R: Dynamics and interplay of nuclear architecture, genome organization, and gene expression. Genes Dev 2007, 21:3027–3043.PubMedView Article
                                    68. Barzel A, Kupiec M: Finding a match: how do homologous sequences get together for recombination? Nat Rev Genet 2008, 9:27–37.PubMedView Article
                                    69. Linardopoulou EV, Williams EM, Fan Y, Friedman C, Young JM, Trask BJ: Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature 2005, 437:94–100.PubMed CentralPubMedView Article
                                    70. Chuang TC, Moshir S, Garini Y, Chuang AY, Young IT, Vermolen B, van den Doel R, Mougey V, Perrin M, Braun M, Kerr PD, Fest T, Boukamp P, Mai S: The three-dimensional organization of telomeres in the nucleus of mammalian cells. BMC Biol 2004, 2:12.PubMed CentralPubMedView Article
                                    71. Nagele RG, Velasco AQ, Anderson WJ, McMahon DJ, Thomson Z, Fazekas J, Wind K, Lee H: Telomere associations in interphase nuclei: possible role in maintenance of interphase chromosome topology. J Cell Sci 2001, 114:377–388.PubMed
                                    72. Stout K, van der Maarel S, Frants RR, Padberg GW, Ropers HH, Haaf T: Somatic pairing between subtelomeric chromosome regions: implications for human genetic disease? Chromosome Res 1999, 7:323–329.PubMedView Article
                                    73. Louis SF, Vermolen BJ, Garini Y, Young IT, Guffei A, Lichtensztejn Z, Kuttler F, Chuang TC, Moshir S, Mougey V, Chuang AY, Kerr PD, Fest T, Boukamp P, Mai S: c-Myc induces chromosomal rearrangements through telomere and chromosome remodeling in the interphase nucleus. Proc Natl Acad Sci U S A 2005, 102:9613–9618.PubMed CentralPubMedView Article
                                    74. Weierich C, Brero A, Stein S, von Hase J, Cremer C, Cremer T, Solovei I: Three-dimensional arrangements of centromeres and telomeres in nuclei of human and murine lymphocytes. Chromosome Res 2003, 11:485–502.PubMedView Article
                                    75. Dumont BL, Eichler EE: Signals of historical interlocus gene conversion in human segmental duplications. PLoS One 2013, 8:e75949.PubMed CentralPubMedView Article
                                    76. Darai-Ramqvist E, Sandlund A, Muller S, Klein G, Imreh S, Kost-Alimova M: Segmental duplications and evolutionary plasticity at tumor chromosome break-prone regions. Genome Res 2008, 18:370–379.PubMed CentralPubMedView Article
                                    77. Grunau C, Buard J, Brun ME, De Sario A: Mapping of the juxtacentromeric heterochromatin-euchromatin frontier of human chromosome 21. Genome Res 2006, 16:1198–1207.PubMed CentralPubMedView Article
                                    78. Kirsch S, Munch C, Jiang Z, Cheng Z, Chen L, Batz C, Eichler EE, Schempp W: Evolutionary dynamics of segmental duplications from human Y-chromosomal euchromatin/heterochromatin transition regions. Genome Res 2008, 18:1030–1042.PubMed CentralPubMedView Article
                                    79. Somerville MJ, Mervis CB, Young EJ, Seo EJ, del Campo M, Bamforth S, Peregrine E, Loo W, Lilley M, Perez-Jurado LA, Morris CA, Scherer SW, Osborne LR: Severe expressive-language delay related to duplication of the Williams-Beuren locus. N Engl J Med 2005, 353:1694–1701.PubMed CentralPubMedView Article
                                    80. Osborne LR, Li M, Pober B, Chitayat D, Bodurtha J, Mandel A, Costa T, Grebe T, Cox S, Tsui LC, Scherer SW: A 1.5 million-base pair inversion polymorphism in families with Williams-Beuren syndrome. Nat Genet 2001, 29:321–325.PubMed CentralPubMedView Article
                                    81. Ramocki MB, Bartnik M, Szafranski P, Kolodziejska KE, Xia Z, Bravo J, Miller GS, Rodriguez DL, Williams CA, Bader PI, Szczepanik E, Mazurczak T, Antczak-Marach D, Coldwell JG, Akman CI, McAlmon K, Cohen MP, McGrath J, Roeder E, Mueller J, Kang SH, Bacino CA, Patel A, Bocian E, Shaw CA, Cheung SW, Mazurczak T, Stankiewicz P: Recurrent distal 7q11.23 deletion including HIP1 and YWHAG identified in patients with intellectual disabilities, epilepsy, and neurobehavioral problems. Am J Hum Genet 2010, 87:857–865.PubMed CentralPubMedView Article
                                    82. Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH: The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res 2003, 13:1998–2004.PubMed CentralPubMedView Article
                                    83. Mateos-Langerak J, Bohn M, de Leeuw W, Giromus O, Manders EM, Verschure PJ, Indemans MH, Gierman HJ, Heermann DW, van Driel R, Goetze S: Spatially confined folding of chromatin in the interphase nucleus. Proc Natl Acad Sci U S A 2009, 106:3812–3817.PubMed CentralPubMedView Article
                                    84. Naughton C, Avlonitis N, Corless S, Prendergast JG, Mati IK, Eijk PP, Cockroft SL, Bradley M, Ylstra B, Gilbert N: Transcription forms and remodels supercoiling domains unfolding large-scale chromatin structures. Nat Struct Mol Biol 2013, 20:387–395.PubMed CentralPubMedView Article
                                    85. Du Z, Zhao Y, Li N: Genome-wide analysis reveals regulatory role of G4 DNA in gene transcription. Genome Res 2008, 18:233–241.PubMed CentralPubMedView Article
                                    86. Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE: Segmental duplications: Organization and impact within the current Human Genome Project assembly. Genome Res 2001, 11:1005–1017.PubMed CentralPubMedView Article
                                    87. Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T, van Berkum NL, Meisig J, Sedat J, Gribnau J, Barillot E, Bluthgen N, Dekker J, Heard E: Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 2012, 485:381–385.PubMed CentralPubMedView Article
                                    88. Gheldof N, Witwicki RM, Migliavacca E, Leleu M, Didelot G, Harewood L, Rougemont J, Reymond A: Structural variation-associated expression changes are paralleled by chromatin architecture modifications. PLoS One 2013, 8:e79973.PubMed CentralPubMedView Article
                                    89. Hurles ME, Lupski JR: Recombination Hotspots in Nonallelic Homologous Recombination. In Genomic Disorders. Edited by: Lupski JR, Stankiewicz P. Totowa, New Jersey: Humana Press; 2006:341–355.View Article
                                    90. Pavlicek A, House R, Gentles AJ, Jurka J, Morrow BE: Traffic of genetic information between segmental duplications flanking the typical 22q11.2 deletion in velo-cardio-facial syndrome/DiGeorge syndrome. Genome Res 2005, 15:1487–1495.PubMed CentralPubMedView Article
                                    91. Hurles ME, Willey D, Matthews L, Hussain SS: Origins of chromosomal rearrangement hotspots in the human genome: evidence from the AZFa deletion hotspots. Genome Biol 2004, 5:R55.PubMed CentralPubMedView Article
                                    92. Fawcett JA, Innan H: The role of gene conversion in preserving rearrangement hotspots in the human genome. Trends Genet 2013, 29:561–568.PubMedView Article
                                    93. Dutly F, Schinzel A: Unequal interchromosomal rearrangements may result in elastin gene deletions causing the Williams-Beuren syndrome. Hum Mol Genet 1996, 5:1893–1898.PubMedView Article
                                    94. Schubert C: The genomic basis of the Williams-Beuren syndrome. Cell Mol Life Sci 2009, 66:1178–1197.PubMedView Article
                                    95. Yasui DH, Scoles HA, Horike S, Meguro-Horike M, Dunaway KW, Schroeder DI, Lasalle JM: 15q11.2–13.3 chromatin analysis reveals epigenetic regulation of CHRNA7 with deficiencies in Rett and autism brain. Hum Mol Genet 2011, 20:4311–4323.PubMed CentralPubMedView Article
                                    96. Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26:841–842.PubMed CentralPubMedView Article
                                    97. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, Genomes P, Eichler EE: Diversity of human copy number variation and multicopy genes. Science 2010, 330:641–646.PubMed CentralPubMedView Article
                                    98. Weese D, Holtgrewe M, Reinert K: RazerS 3: faster, fully sensitive read mapping. Bioinformatics 2012, 28:2592–2599.PubMedView Article
                                    99. RepeatMasker Open-3.0 http://​www.​repeatmasker.​org/​
                                    100. Cer RZ, Donohue DE, Mudunuri US, Temiz NA, Loss MA, Starner NJ, Halusa GN, Volfovsky N, Yi M, Luke BT, Bacolla A, Collins JR, Stephens RM: Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res 2013, 41:D94-D100.PubMed CentralPubMedView Article
                                    101. Hansen RS, Thomas S, Sandstrom R, Canfield TK, Thurman RE, Weaver M, Dorschner MO, Gartler SM, Stamatoyannopoulos JA: Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A 2010, 107:139–144.PubMed CentralPubMedView Article
                                    102. Thurman RE, Day N, Noble WS, Stamatoyannopoulos JA: Identification of higher-order functional domains in the human ENCODE regions. Genome Res 2007, 17:917–927.PubMed CentralPubMedView Article
                                    103. Li G, Fullwood MJ, Xu H, Mulawadi FH, Velkov S, Vega V, Ariyaratne PN, Mohamed YB, Ooi HS, Tennakoon C, Wei CL, Ruan Y, Sung WK: ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing. Genome Biol 2010, 11:R22.PubMed CentralPubMedView Article
                                    104. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nat Genet 2004, 36:949–951.PubMedView Article
                                    105. Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Van Vooren S, Moreau Y, Pettett RM, Carter NP: DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet 2009, 84:524–533.PubMed CentralPubMedView Article
                                    106. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004, 32:D493-D496.PubMed CentralPubMedView Article
                                    107. Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H, Barber GP, et al.: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 2013, 41:D64-D69.PubMed CentralPubMedView Article
                                    108. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res 2002, 12:996–1006.PubMed CentralPubMedView Article
                                    109. Phipson B, Smyth GK: Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat Appl Genet Mol Biol 2010, 9:Article39.PubMed
                                    110. Wickham H: ggplot2 Elegant Graphics for Data Analysis. In Book ggplot2 Elegant Graphics for Data Analysis. Springer-Verlag New York; 2009.
                                    111. The Human Epigenome Browser http://​epigenomegateway​.​wustl.​edu/​browser/​
                                    112. Zhou X, Maricque B, Xie M, Li D, Sundaram V, Martin EA, Koebbe BC, Nielsen C, Hirst M, Farnham P, Kuhn RM, Zhu J, Smirnov I, Kent WJ, Haussler D, Madden PA, Costello JF, Wang T: The Human Epigenome Browser at Washington University. Nat Methods 2011, 8:989–990.PubMed CentralPubMedView Article
                                    113. R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria 2012.
                                    114. Revanna KV, Munro D, Gao A, Chiu C-C, Pathak A, Dong Q: A web-based multi-genome synteny viewer for customized data. BMC bioinformatics 2012, 13:190.PubMed CentralPubMedView Article
                                    115. Ernst J, Kellis M: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol 2010, 28:817–825.PubMed CentralPubMedView Article
                                    116. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, Ku M, Durham T, Kellis M, Bernstein BE: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011, 473:43–49.PubMed CentralPubMedView Article
                                    117. Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K: cluster: Cluster Analysis Basics and Extension. R package version 2002., 1142:
                                    118. Warnes GR: Includes R source code and/or documentation contributed by Ben Bolker LB, Robert Gentleman, Wolfgang Huber Andy Liaw, Thomas Lumley, Martin Maechler, Arni Magnusson, Steffen Moeller, Marc Schwartz, Bill Venables: gplots: Various R programming tools for plotting data. R package version 2012., 2110:
                                    119. R Studio: RStudio: Integrated development environment for R (Version 0.95.265). Book RStudio: Integrated development environment for R (Version 0.95.265) 2012.
                                    120. Venables WN, Ripley BD: Modern Applied Statistics with S. 4th edition. New York: Springer; 2002.View Article

                                    Copyright

                                    © Ebert et al.; licensee BioMed Central Ltd. 2014

                                    This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://​creativecommons.​org/​publicdomain/​zero/​1.​0/​) applies to the data made available in this article, unless otherwise stated.