Analysis of long distance interactions
We have downloaded normalised intrachromosomal Hi-C data (hg18) of autosomes with 20 kb resolution derived from the human fetal lung fibroblast cell line IMR90 (replicate 1; ). A stringent cut-off was used to remove interaction (IA) bins represented by less than 15 independent sequence counts. Long distance interactions of chromosome 7 were defined by a minimal span size of 25 Mb. “Circos utilities/bundlelinks”  was employed to fuse long distance interactions to one bundle when at least five interaction bins were within a maximum distance of 500 kb at the start and target sites. We applied different combinations of filter options in terms of interaction counts per bin (at least 10, at least 15, and 10–50 IA/bin) and minimum span sizes (10 and 25 Mb) to evaluate the impact of thresholds on the bundle pattern (see Additional files 1 and 4). Moreover, we introduced a third filter based on the overlap of a given bin with SDs in order to correct for interactions that are owed to erroneous sequence alignments. BEDTools ”pairToPair”  was used to remove all interaction bins that connect two SD paralogs (removed IA bins: n = 159) or that overlap with any SD at all (removed IA bins: n = 126883) (see scheme in Additional file 4I). The remaining interactions were bundled using adapted criteria to factor the reduced number of interactions in total.
Beside this filtering of Hi-C data on the level of genomic bins covering SDs we have repeated our filtering and bundling analysis on the level of paired-end reads mapping to SD regions. On the basis of the method of SUNs (Single Unique Nucleotides) discovery  we merged all regions covered by SDs, divided them into 30 bp long reads and remapped them to the human reference genome using RazerS 3 . 30mer alignments mapping only once and with a maximum edit distance of 2 bp were considered as unique sequences. This data set was used to filter out ambiguously mapped paired-end reads within the Dixon data set mapping to these regions. The remaining read pairs were binned into 20 kb genomic windows and the resulting observed interaction counts per bin were re-normalised using the expected contact probability for the unfiltered read pairs as calculated by hicpipe . The re-normalised interaction bins were filtered for long distance interactions (at least 15 interaction counts per bin, spanning more than 25 Mb) and these were bundled applying the criteria described above. Long distance interaction bundles were visualised by means of Circos plots .
Public data sets
Our analysis took advantage of various publicly available data sets (segmental duplications [5, 86], [36, 45, 99–105], GSM935404, GSM970215, GSM469974, GSM469968, GSM521915, GSM521900, GSM469970, GSM521884, GSM521883, GSM521897, GSM469966, GSM521890, see Additional files 10 and 11 for details), which were downloaded from the UCSC Table Browser , the annotation database of the UCSC Genome Browser , the non-B database  and from the website given in Dixon et al. .
SD distribution and intrachromosomal interaction patterns
Segmental duplications of all sequence similarities have been categorised in those with their paralog mapping exclusively to the same chromosome (intra) and in those with their paralog mapping intrachromosomal and genome-wide. Additionally, in line with the colouring scheme used in the UCSC Genome Browser  segmental duplications have been categorised in those with sequence similarities below 98% (grey), between 98% and 99% (yellow) and above 99% (orange), respectively, and all three categories combined. Enrichment of the above-mentioned SD categories within long distance interaction bundles was tested. For this purpose the base pair overlap of SD covering regions of chromosome 7 with the bundle intervals of chromosome 7 (data set obtained with the cut-offs: >15 interaction counts/bin, interaction distance > 25 Mb) was determined and compared to 10000 random intervals employing the following strategy. First, to combine overlapping intervals within a given SD or bundle data set, respectively, the BEDTools “mergeBed”  was used. Second, the base pair overlap of SD data sets with long distance interaction bundles was calculated (observed base pair overlap) (BEDTools "coverageBed"). As control a resampling of the SD categories was performed (10000×; BEDTools "shuffleBed") with the following conditions for the random intervals: locate to the same chromosome and with the same interval sizes as the input SD data set, non-overlapping intervals and exclusion of annotation gaps. Subsequently the base pair overlap for each of the 10000 random data sets with the long distance interaction bundles was calculated (expected base pair overlaps). The fold change of the observed base pair overlap was calculated as the ratio of observed base pair overlap and the mean of 10000 expected base pair overlaps. The number of expected base pair overlaps greater or equal to the observed base pair overlap was counted for each SD category and used to calculate the p-value as described for Monte Carlo resampling in . The p-value adjustment was performed according to the Benjamini-Hochberg method. Histograms of the expected base pair overlaps for each SD category were drawn using the R package ‘ggplot2’ .
In addition, SD enrichment within interaction bundles (data set obtained with the cut-offs: >15 interaction counts/bin, interaction distance > 25 Mb) was determined for all chromosomes using SDs with paralogs exclusively mapping to the same chromosome, or intrachromosomal and genome-wide.
Finally, SD enrichment within regions where bins are part of all bundle data sets (obtained by intersection of all twelve data sets resulting from different filter criteria, see Additional file 3) was calculated using SDs with paralogs mapping intrachromosomal and genome-wide.
Fine-mapping of evolutionary breakpoints and mimicking interaction patterns in orang-utan and gorilla
Alignments were retrieved from the Ensembl database (version 67) using the Perl API . As the paracentric inversion is not represented in the current version of the gorilla genome (Gorilla gorilla gorilla; gorGor3.1; May 2011), the proximal and distal breakpoint of both inversions were determined by plotting the orang-utan genome (Pongo abelii; WUGSC2.0.2/ponAbe2; July 2007) versus the human genome (GRCh37/hg19; February 2009). A corresponding dot plot, which uses the UCSC colouring scheme for the chromosome numbers is shown in Additional file 6. Segmental duplications were superimposed onto the dot plot following the colouring scheme introduced above (Additional file 6). The fine-mapped coordinates of the paracentric and pericentric inversion of chromosome 7 derived from this analysis (para: chr7:76646908 and chr7:102118853, peri: chr7: 6875820 and 80857936; hg18) were used to recalculate the genomic coordinates of long distance interactions and SDs in order to mimic the situation in gorilla and orang-utan. The three segments surrounding the evolutionary breakpoints, the positional changes of SDs and long distance interactions after in silico reversion were visualised by means of Circos plots .
Synteny of human chromosome 7 and enrichment analysis for SDs, Alu repeats and G4 motifs
Syntenic regions of human chromosome 7 and marmoset (Callithrix jacchus) were obtained from Ensembl database (version 67)  and converted to hg18 coordinates using the default settings of the LiftOver tool . We divided chromosome 7 into 200 kb bins (n = 795), of which 125 comprise sequences homologous to marmoset chromosome 2. The minimum hypergeometric score and its exact p-value were calculated as described by Eden et al. . In brief, we have shuffled the natural order of genomic bins in order to minimise the influence of the genomic order of bins with identical values. Then we ranked all bins in ascending order according to their counts for the respective feature (Alu, SD, G4). The enrichment of marmoset chromosome 2 sequences within the highest scoring bins was quantified by means of the hypergeometric score and the p-value was calculated for the minimum hypergeometric score (mHG). Distribution of SDs, long distance interactions, G4 DNA motifs, Alu repeats and syntenic regions of human chromosome 7 and marmoset were visualised in the UCSC Genome Browser  (upper part in Figure 2D) and combined with further information on synteny derived from the Ensembl Genome Browser (lower part in Figure 2D).
Human fetal lung fibroblast cell lines IMR91L (male) and IMR90 (female) were obtained from the Coriell Institute for Medical Research. Both cell lines were cultured in Eagle´s minimum essential medium (EMEM) supplemented with 10% fetal bovine serum (Sigma-Aldrich, Saint Louis, USA), 2 mM UltraGlutamine 1 (Lonza, Walkerville, USA), 1 mM sodium pyruvate and 100 units/mL penicillin/streptomycin. The fibroblasts were maintained at 37°C with a humidified atmosphere of 5% CO2 and ambient oxygen. Chromatin immunoprecipitation was done according to the Transcription Factor ChIP kit protocol (Diagenode, Liège, Belgium). In brief, lysed cells were sonicated using the Bioruptor UCD-200 device (Diagenode, Liège, Belgium), followed by overnight incubation of 1 × 106 cells with 5 μg of antibody against Histone H4 lysine 8 acetylation (pAb-103-050; Diagenode, Liège, Belgium). The subsequent chromatin reverse crosslinking, elution and purification of ChIP DNA and input DNA were done employing the IPure Kit (Diagenode, Liège, Belgium).
Analysis of DNA degradation during early phases of apoptosis
Apoptosis of IMR90 and IMR91L cells was induced by exposing 2 × 106 cells to either 1 μmol/L staurosporine (Cell Signaling Technology, Inc., Danvers, USA)/0.1% DMSO or 0.1% DMSO alone (as control) for four hours at 37°C. An aliquot of about 5-10 × 106 cells/mL was co-stained with Annexin V-APC (BD Biosciences, San Jose, USA) and 7-Aminoactinomycin D (7-AAD, BD Biosciences, San Jose, USA) for 15 minutes to monitor the progress of apoptosis by FACS analysis.
The remaining cells were treated with lysis buffer (0.40 M Tris–HCl pH 8.0, 0.06 M Na-EDTA, 0.15 M NaCl, 1% SDS) and RNA was digested for 1 hour at 37°C using 15 μg/mL RNase A. 1 M sodium perchlorate and one volume chloroform were added to deproteinise cell lysates. DNA fragmentation was checked using the Genomic DNA Screentape on an Agilent 2200 Tap2station (Agilent, Santa Clara, USA) (see Additional file 9).
High molecular (>48 kb) and degraded apoptotic DNA (~4 kb) were extracted by cutting slices out of a preparative 1% low melt agarose gel and subsequent digestion with β-Agarase I according to the manufacturer´s protocol (New England Biolabs, Ipswich, USA).
Purifed DNA from ChIP and apoptotic DNA degradation experiments were amplified by means of the GenomePlex Whole Genome Amplification Kit (Sigma, Saint Louis, USA). Regional preferences in apoptotic DNA degradation and H4K8 acetylation were determined by co-hybridising high molecular (>48 kb) and degraded apoptotic DNA (~4 kb), and ChIP DNA and input DNA onto a 400 k whole genome oligonucleotide array (GPL9777) and region-specific custom oligonucleotide array covering the interval chr7:69936560–70795513 (hg19) with an average oligospacing of 198 bp (GPL17964), respectively (following the protocols for array CGH provided by the manufacturer (Agilent, Santa Clara, USA)). Image analysis, normalisation and annotation were done with Feature Extraction 10.5.1.1 (Agilent, Santa Clara, USA) using the default settings. Data visualisation and further analysis was performed with GenomeCAT (Tebel et al., manuscript in preparation; http://www.molgen.mpg.de/204904/GenomeCAT) and the Human Epigenome Browser [111, 112].
RNA expression profiling
Expression profiling was performed by Next-generation sequencing on a SOLiD 5500xl Genetic Analyzer (Life Technologies, Carlsbad, USA). Total RNA was extracted from IMR91L cell cultures using TRIzol (Life Technologies, Carlsbad, USA). 10 μg of each total RNA sample was spiked with ERCC spike-in control mixes (Life Technologies, Carlsbad, USA) prior to removal of the rRNA by use of the RiboMinus Kit (Life Technologies, Carlsbad, USA). The RNA was then prepared for sequencing using the protocol and components provided with. In brief, the rRNA-depleted RNA was fragmented by chemical hydrolysis, phosphorylated and purified. Adaptors were then ligated and hybridised to the RNA fragments and reverse transcribed into cDNA. The cDNA was then purified and size-selected using two rounds of Agencourt AMPure XP bead purification (Beckman Coulters Genomics, Danvers, USA) and released from the beads. The sample was then amplified by 12 PCR cycles in a T3 Thermocycler (Biometra, Göttingen, Germany) in the presence of primers that contained unique sequences (barcoding) in order to determine the origin of the sequence after pooling of the fragments and sequencing. The size distribution and concentration of the fragments were determined with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA) and quantitative PCR using a LightCycler 480 Real-Time PCR System (Roche Applied Science, Penzberg, Germany) and the KAPA Library Quant ABI SOLiD kit (Peqlab Biotechnologie GmbH, Erlangen, Germany).
The cDNA fragments were then pooled in equimolar amounts and diluted to 61 pg/μL corresponding to a concentration of 500 pM. 50 μL of this dilution was mixed with a freshly prepared oil emulsion, P1 and P2 reagents and P1 beads in a SOLiD EZ Bead Emulsifier prepared according to the E80 scale protocol (Life Technologies, Carlsbad, USA). The emulsion PCR was carried out in a SOLiD EZ Bead Amplifier (Life Technologies, Carlsbad, USA) using the E80sm setting. To enrich for the beads that carried amplified template DNA, the beads were purified on a SOLiD EZ Bead Enricher using the recommended chemistry and software (Life Technologies, Carlsbad, USA).
The purified beads were then loaded onto a SOLiD 6-lane Flowchip and incubated upside down for 1 hour at 37°C. The Flowchip was then positioned in the 5500xl SOLiD System and the DNA was sequenced using 50 nucleotides in the forward direction and 35 nucleotides in the reverse direction and the recommended chemistry (Life Technologies, Carlsbad, USA).
Sequence reads mapping to RefSeq coding exons and matching the coding strand were counted towards coding RNAs, all other mapping reads were counted towards non-coding RNAs.
Genomic characterisation of the Williams-Beuren region
Own experimental results and public data (Additional files 10 and 11) were conflated in the Human Epigenome Browser hosted by Washington University [111, 112]. Regional characteristics of lamin B1 interaction sites , replication timing [101, 102] and apoptotic DNA degradation (log2 ratio) were compared for 20 kb bins using Spearman's rank correlation test implemented in R .
For calculation of gene density and intron size of genes on chromosome 7 within the 7q11 segment or the intermediate neighbourhood, genomic coordinates of known canonical genes and their introns were downloaded from the UCSC Table Browser. Number of genes and intron length within each region were determined by means of “BEDTools/intersectBed” . Gene density for each region was calculated as the number of genes per megabase. Statistical significance was estimated using 100000 random simulations or a Fisher’s exact test.
Calculation of average span sizes of intrachromosomal interactions of chromosome 7
All intrachromosomal interaction bins of chromosome 7 indicated by at least one normalised interaction count between two genomic bins according to Dixon et al.  were categorised into six classes based on their span size: i) <500 kb, ii) 500 kb to less than 1 Mb, iii) 1 Mb to less than 5 Mb, iv) 5 Mb to less than 10 Mb, v) 10 Mb to less than 25 Mb and vi) span sizes equal or greater than 25 Mb.
For each bin and span size category we summed up the scores separately. The relative contribution of each category to the total score of interaction counts/bin was calculated by dividing the category score through the total score of each bin. For the purpose of comparability within Figure 3, genomic coordinates have been converted to hg19 using the default settings of the LiftOver tool .
Topological domains in mice
Coordinates of mouse (mm9) topological domains were obtained from  and converted to hg19 using the default settings of the LiftOver tool . Both the original and the converted mouse domains were visualised within the Human Epigenome Browser  in the mm9 and hg19 assembly, respectively. Orthologous genes located at the murine domain borders were plotted at the corresponding location in the human genome employing the Multi-Genome Synteny Viewer (mGSV) .