Human genomic DNA samples
DNA donors for Southern Blotting and PRT analysis of TUF regions were of north European origin, and had given informed consent with ethical approval from the Leicestershire, Northamptonshire and Rutland Research Ethics Committee (LNRREC Ref. No. 6659 UHL). DNA was prepared from fresh blood as follows. 20 ml whole blood was centrifuged at 1300 g at 4°C for 15 minutes. The buffy coat was extracted and incubated at 37°C in 15 ml lysis buffer (10 mM Tris-Cl (pH 8.0) 0.1 M EDTA (pH 8.0) 0.5% w/v SDS) for 1 hour. Proteinase K (final concentration 100 μg/ml) was added and mixed gently followed by incubation at 50°C overnight. After allowing to cool to room temperature an equal volume of phenol equilibrated with 0.1 M Tris HCl and mixed slowly on a Stuart Rotator SB3 for 10 mins. The phases were separated by centrifugation at 5600 g for 15 min. The aqueous phase was transferred to a fresh tube and the phenol extraction repeated twice. To the final aqueous phase 1/10th volume 5 M Ammonium Acetate and 2 volumes of 100% Ethanol were added. Samples were mixed very slowly and carefully by inversion. The precipitated DNA was spooled using a glass hook and dried briefly and dissolved in water to a final concentration of 200 ng/μl. DNA quality and quantity was assessed by gel electrophoresis and on the NanoDrop ND-8000 spectrophotometer.
Paralogue ratio test (PRT)
PRTs were designed according to information from Armour et al.,. All PRT oligonucleotide primers are described in Table 1. 10 μl PRT PCRs contained 1 x PCR buffer (75 mM Tris HCl (pH8.8), 20 mM (NH4)2SO4, 0.01% v/v Tween) (Abgene, Epsom, Surrey, UK), 1.5 mM MgCl2 (Abgene), 0.15 μM of each primer (Biomers), 0.2 mM dNTPs (Promega), 0.3 U Taq polymerase (Kapa Biosystems, Boston, MA, USA) and 10 to 25 ng DNA. PCR were initially heated to 94°C for 30 seconds, and then heated for 25 to 35 cycles as follows: 94°C for 30 seconds; annealing temperature for 30 seconds; 72°C for 1 minute. A final extension was carried out at 72°C for 5 minutes. Where required, restriction enzyme digests were performed to allow visualisation of similar sized PRT products. On using additives (DMSO up to 50%, betaine up to 2 M) the optimal annealing temperature was re-optimised for each assay. Recommended PCR conditions for TUF regions are 1.5 M betaine, 5U/μl Taq polymerase, 0.01U/μl pfu enzyme and use of 98°C denaturing temperature in all cycles. Higher concentrations of betaine may be appropriate for individual PCRs.
Agarose gel peak height quantification
Gels were documented using a GBOX HR, Gel documentation system (Syngene, Cambridge, Cambridgeshire, UK) using the EDR function and the maximum resolution settings (5.52 M pixels). Peaks were identified and peak heights quantified using the Gene Tools programme version 4.00 (A) (Syngene). For peak height analysis, the rolling disc method (diameter = 30 pixels) was used to determine peak base line.
Pre-PCR heat denaturation
High temperature denaturing was performed in a 96 well format heat block set to the desired temperature. Sierra Antifreeze/coolant (Peak performance products, Northbrook, IL, USA) was used to maintain a liquid contact between the tubes, thermometer and heat block. The DNA was denatured in either water or in buffered conditions (1 x PCR buffer, as above) in tubes with the lids sealed tightly with Nescofilm to prevent evaporation at temperatures greater than 100°C. Samples were heated for 1 minute and snap cooled on ice for 5 minutes. Samples were stored at −20°C and thawed on ice prior to use.
Sonication of DNA
Aliquots of genomic DNA (200 ng/μl) were sonicated for 30 second intervals (with a 30 second gap), using a Bioruptor (Diagenode, Liège, Belgium) until the desired size range (0.3 to 3.0 kbp) was reached (visualised by agarose gel electrophoresis).
Adapted illumina protocol
Using conditions recommended by Illumina, 200 ng samples of genomic DNA (with or without pre-processing as necessary for each experiment) were hybridised to human370CNV Infinium HD BeadChips (Illumina INC, San Diego, CA, USA).
Whole genome amplification
Whole genome amplification was performed using the REPLI-g Mini Kit (Qiagen) to amplify a range of masses of human genomic DNA to generate >8 μg of DNA. Samples were prepared using the isothermal amplification reaction in PCR tubes incubated at 30°C for 16 hours and 65°C for 3 minutes in a thermal cycler. Amplified products were quantified using a NanoDrop spectrophotometer and visualised on a 0.8% LE agarose gel with Ethidium Bromide.
Restriction enzyme digestion for southern blotting
Six μg of genomic DNA was digested using selected enzymes supplied by New England Biolabs (NEB) (Hitchin, Hertfordshire, UK) under the conditions recommended by the supplier with the addition of 4 mM Spermidine pH 7.4. Double digests were performed in the most suitable buffer, and the quantity of the least active enzyme per reaction was doubled if required.
DNA denaturing prior to southern blotting
Heat denaturation was performed in a water-bath at 100°C for either for 40 seconds to 4 minutes as stated. Samples were snap cooled on ice for 5 minutes prior to gel electrophoresis.
Alkaline denaturation was performed by addition of 0.4 M NaOH to 0.32 M (~ 240 μl added to 54 μl of sample), and incubation at room temperature for 10 minutes. 1 M Tris Hcl (pH 8) was added to 0.02 M prior to neutralisation (pH 8 to 8.5) with 0.4 M HCl. Samples were ethanol precipitated and dissolved in distilled water.
Southern blotting and hybridisation
Digested DNA was run at 3 V/cm in 0.7% agarose gels (LE agarose, Seakem. 1 X TAE (4.84 g Tris base, 11.4 ml glacial acetic acid, 3.7 g EDTA pH 8.0 per litre)). The resulting gels were soaked twice in denaturing solution (1.5 M NaCl, 0.5 M NaOH) for 30 minutes, and twice in neutralising solution (0.5 M Tris pH 7.2, 1 M NaCl) for 30 min. The denatured DNA was transferred onto uncharged nylon membranes (MAGNA, Nylon, Transfer Membrane, 0.45 Micron; GE Water & Process Technologies, Trevose, PA, USA) using 10X SSC as the transfer buffer and fixed to the membranes by baking at 80°C in a Sanyo MOV drying oven (Sanyo E&E Europe BV, Biomedical Division, Loughborough, Leicestershire, UK), for 1 hour.
PCR amplified probes (Table 1) were purified using a Qiagen MinElute PCR purification kit (Qiagen). 75 ng of probe was labelled for 15 minutes with α-32P –dCTP (Perkin Elmer, Waltham, MA USA) using the Rediprime II random prime labelling system (Amersham Biosciences, Little Chalfont, Buckinghamshire, UK), purified using ILLUSTA NICK Columns Sephadex DNA grade (GE Healthcare, Little Chalford, Buckinghamshire, UK), and eluted in 400 μl column wash (1 x TE, 0.1% w/v SDS). 75 μg of human Cot I DNA (Invitrogen, Paisley, Renfrewshire, UK) was added prior to denaturation at 100°C for 6 minutes and snap cooling on ice for 5 minutes.
Hybridisation was performed in 20 ml Church buffer (0.5 M sodium phosphate, pH 7.2, 7% SDS, 1 mM EDTA, 1% BSA ) with 2 mg heat denatured (100°C for 5 min, ice for 5 min) salmon sperm DNA. Pre-hybridisation was performed at 65°C in a rolling bottle for 2 hours prior to hybridisation for 10 hours. Hybridised blots were washed for 10 min at 65°C in 0.1 x SSC, 0.1% SDS. Counts were recorded using a phosphoimager screen (Amersham Biosciences) for between 12 and 60 hours. Further washing at 68°C or 72°C depending on the number of background counts.
Regression analysis of LRR and G + C/CpG content for varying window sizes
The log probe intensity ratio (LRR) value for each SNP or CNV assay provides data on probe intensity relative to that of the estimated genotype-specific cluster location. LRR values estimated by the Genome Studio software were corrected for bias due to the properties of the assay chemistry and fluorescent dyes used in the probes. We implemented a method similar to that described by Staaf et al.  to re-estimate LRR after applying quantile-normalization, with an enhanced multiple linear regression model, incorporating within-chip signal re-scaling terms and a polynomial correction for GC and CpG waves. The correction model is an extension to the method described in Diskin et al. with terms for multiple window sizes for proportion of GC and CpG content around the genomic location of each set of probes. GC and CpG terms in the regression model are the proportion of GC and CpG content for window sizes (in bp) of 50, 100, 500, 1 k, 10 k, 50 k, 100 k, 250 k, and 1 M centered around the genomic location of each assay, based on locations annotated in the Illumina manifest files and sequence context based on the NCBI build 36 reference genome sequence. This model is estimated per sample, as the phenomenon is modulated by TUF, the concentration of the DNA input, and possibly other factors. The final LRR was re-computed using the resulting quantile-normalized and GC/CpG corrected values as shown in Peiffer et al.. The reduction in variance of the LRR values is shown in Figure 6.