Characterization of whole genome amplified (WGA) DNA for use in genotyping assay development

Background Genotyping assays often require substantial amounts of DNA. To overcome the problem of limiting amounts of available DNA, Whole Genome Amplification (WGA) methods have been developed. The multiple displacement amplification (MDA) method using Φ29 polymerase has become the preferred choice due to its high processivity and low error rate. However, the uniformity and fidelity of the amplification process across the genome has not been extensively characterized. Results To assess amplification uniformity, we used array-based comparative genomic hybridization (aCGH) to evaluate DNA copy number variations (CNVs) in DNAs amplified by two MDA kits: GenomiPhi and REPLI-g. The Agilent Human CGH array containing nearly one million probes was used in this study together with DNAs from a normal subject and 2 cystic fibrosis (CF) patients. Each DNA sample was amplified 4 independent times and compared to its native unamplified DNA. Komogorov distances and Phi correlations showed a high consistency within each sample group. Less than 2% of the probes showed more than 2-fold CNV introduced by the amplification process. The two amplification kits, REPLI-g and GenomiPhi, generate very similar amplified DNA samples despite the differences between the unamplified and amplified DNA samples. The results from aCGH analysis indicated that there were no obvious CNVs in the CFTR gene region due to WGA when compared to unamplified DNA. This was confirmed by quantitative real-time PCR copy number assays at 10 locations within the CFTR gene. DNA sequencing analysis of a 2-kb region within the CFTR gene showed no mutations introduced by WGA. Conclusion The relatively high uniformity and consistency of the WGA process, coupled with the low replication error rate, suggests that WGA DNA may be suitable for accurate genotyping. Regions of the genome that were consistently under-amplified were found to contain higher than average GC content. Because of the consistent differences between the WGA DNA and the native unamplified DNA, characterization of the genomic region of interest, as described here, will be necessary to ensure the reliability of genotyping results from WGA DNA.


Background
Advances in genomic technologies have enabled development of many novel genome analysis methods that may have applications in the understanding, diagnosis, and management of genetic diseases and cancer. Comprehensive high-throughput assays are available for detection of single nucleotide polymorphisms (SNPs) [1][2][3], DNA copy number variation (CNV) [4], microsatellite expansion or contraction [5] and loss of heterozygosity (LOH) [6], all of which detect sometimes subtle genomic alterations associated with disease. Some of these assays require micrograms of DNA which may be difficult to obtain for many clinical samples. In addition, inadequate DNA template may prevent the performance of multiple assays on a single sample [7]. Limited availability of DNA also poses challenges for manufacturers and regulators of genetic diagnostic devices. One of these challenges is availability of sufficient quantities of DNA samples to appropriately validate a particular test under investigation, especially when uncommon mutations for a rare disease severely limit access to patient sample specimens.
Since the U.S. Food and Drug Administration (FDA)'s clearance of the first genotyping assay for cystic fibrosis (CF) in 2005, the Office of In Vitro Diagnostic Device Evaluation and Safety within the Center for Devices and Radiological Health (CDRH) has reviewed a number of other genotyping assays for inherited disorders. Some of these genotyping assays are intended to detect relatively rare heritable diseases consisting of multiple disease-causing alleles (mutations) for each disease, while others may detect more common diseases, but are intended to test very rare as well as common mutations. To show the accuracy of these assays, manufacturers would generally use patient samples (whole blood or archived DNA). For rare mutations or alleles, it is often difficult to obtain sufficient quantities of such clinical samples to adequately assess test performance. Therefore, there is an interest in using whole genome amplified (WGA) DNA samples created from patient samples instead of native DNA samples to increase the availability of appropriate samples to query the performance of the assays. WGA is a method that amplifies small amounts of genomic DNA several thousand-fold in vitro. The WGA process has the potential, however, to result in non-uniform amplification of the DNA in which some regions of the genome are overrepresented and others are under-represented. Such biased amplification could make the WGA DNA unsuitable for some of the studies designed to assess the clinical assay performance.
Several WGA methods based on the polymerase chain reaction (PCR) with Taq polymerase were initially developed [8][9][10][11][12]. These methods included the use of primers directed at highly repetitive sequences [11], ligation of linkers to fragmented DNA [12], degenerate oligonucleotide primed PCR [9], and primer extension preamplification [10]. All of these methods suffer from a relatively high level of mutations in the amplified DNA (error rate 3x10 -5 [13]) and highly non-uniform amplification due to the low fidelity and low processivity of the Taq polymerase, respectively. In 2002, Dean et al. [14] described the multiple displacement amplification (MDA) technique. This method of WGA takes advantage of the high processivity and low error rate of the Φ29 bacteriophage DNA polymerase. This polymerase has a 3'-5' proof-reading activity and adds an average of 70,000 templated nucleotides to a primer [15], resulting in higher fidelity and less biased amplification than with the Taq polymerase methods. The WGA process using Φ29 polymerase is isothermal and uses random primers to target the entire genome. The polymerase has strong strand displacement activity so that exponential amplification occurs through a branching mechanism [14], resulting in a high yield of DNA. The MDA process has been recently reviewed and shown to be superior to other DNA amplification methods with regard to genotyping, genomic coverage, and amplification bias [16,17].
MDA-based WGA has been frequently used in DNA sample preparation for genotyping and sequencing in recent years [18][19][20][21]. High call rates (97.5%) and excellent concordance rates were achieved from WGA samples using high-density SNP arrays [19,20]. There was a very low error rate (1 SNP genotyping error per 1000 assays) when high quality DNA was used as template [21]. MDA-based WGA has also been used for analyses of single cells [22,23]. Jiang et al. successfully amplified DNA from single sperm at least 250 fold with a single round of MDA [24]. DNA amplified from single lymphocytes was used for multiple analyses of 20 different loci including the ΔF508 deletion in exon 10 and two intragenic microsatellite markers in the CF gene [25]. Short tandem repeats (STR) and Human Leucocyte Antigen typing were performed using DNA amplified by MDA from a single cell [26]. MDA-WGA has also been increasingly used in the field of forensic testing [27,28]. Even in the often degraded DNA samples, WGA showed the capability and potential to increase the quality and quantity of DNA from difficult samples in forensic casework [29]. Although MDA-based WGA has been successfully applied in many studies, there are several potential problems that may affect interpretation of results. These include the finding that large amounts of nonspecific DNA amplification can be generated during MDA, mostly due to primer-directed DNA synthesis [5,30], and the quantity [21,31] and quality [32] of input DNA into the MDA reaction can affect genotyping results.
Despite the advantage of the MDA-based WGA method, the uniformity of the amplification process across the genome has not been extensively characterized [3,4,33]. In 2004, Paez et al. [33] examined DNAs amplified using a commercially available MDA method, REPLI-g, using 10 k Affymetrix SNP arrays and direct sequencing of~500,000 bp of DNA and showed nearcomplete genome representation, as well as low replication error rate. In 2006, Pinard et al. [34] assessed the bias of WGA methods on bacterial genomes using massively parallel sequencing and found statistically significant amplification bias, although the MDA methods produced the least bias. Arriola et al. (2007) [35], using a second commercially available MDA method, Genomi-Phi, evaluated the WGA DNA with low density array comparative genomic hybridization (aCGH), a method with higher resolution (~200 kb in this case) than the traditional CGH method [36]. Copy number biases were found, with the extent dependent on the degree of amplification. While these studies and others [16,17] have consistently shown non-uniformity of amplification by the MDA methods (although less than with other WGA methods), the uniformity of amplification has not been examined with high resolution aCGH at a resolution of several kb. In addition, the consistency of the amplification process has not been evaluated in replicate amplifications of the same sample, an important parameter when such material may be used for validation of genetic diagnostic devices or tests.
In this study, we evaluated two commonly used methods of WGA for their ability to produce large quantities of uniformly amplified DNA with minimal introduced mutations. Biases during the amplification process were determined by measurements of DNA CNVs at nearly 1 million positions in the genome using aCGH analysis. Particular attention was given to a single gene, CFTR, which encodes cystic fibrosis transmembrane conductance regulator. Mutations in CFTR can cause cystic fibrosis, a rare disease manifested by thick, sticky mucus and salty sweat, which usually leads to lung transplant or early death [37]. In addition to aCGH, copy number variation in the CFTR gene was examined by a second method, quantitative real-time PCR. Finally, the introduction of mutations during the WGA process was examined by comparing the DNA sequence of the resulting amplified DNA to that of the unamplified DNA. The consistency of the WGA process was evaluated by examining four replicate amplified DNAs derived from three individuals, one healthy and two with cystic fibrosis.

CNV introduced by the WGA process
To characterize the consistency and faithfulness of whole genome amplification methods based on Ф29  polymerase, three human DNA samples (Table 1) were amplified using two commercially available kits, REPLI-g and GenomiPhi. The DNA samples were derived from a healthy individual (Normal) and two patients with cystic fibrosis (CF1 and CF2). The experimental design is shown in Figure 1. Each of the three DNA samples was amplified 4 independent times by both REPLI-g and GenomiPhi kits. DNA copy number changes introduced into the amplified DNA were evaluated by comparing the amplified DNA to the native unamplified DNA utilizing aCGH. The Agilent Human CGH array was used with the assumption that as a research tool it could comprehensively evaluate and reveal gross copy number changes at nearly 1 million loci. In addition, unamplified Normal samples were compared to unamplified Normal samples four times to define the technical variability of the aCGH process. Any copy number variation in such a self-self hybridization would be due to technical limits of the aCGH process since the DNA samples were identical. A total of 28 CGH arrays were used in this study. Figure 2 shows box plots of the average log 2 ratios of amplified or unamplified test samples versus the  corresponding native unamplified sample. The first plot shows the distribution of average log 2 ratios of unamplified Normal DNA vs. unamplified Normal DNA (self-self hybridization). The distribution is centered at 0 and is very tight, indicating that little variability (i.e., CNV "noise") is introduced by the aCGH process itself. Supporting this conclusion, the derivative log 2 ratio spread (which is the average log ratio difference between adjacent probes of an array) for the 4 replicate self-self hybridizations was low (0.165). Thus, any change in the distribution of log 2 ratios in amplified samples compared to unamplified samples could be considered to be due primarily to the WGA process. A notably wider distribution of the ratios was observed for the amplified samples compared to the unamplified samples although the median was still close to 0. These results suggest that CGH arrays can be sensitive tools for this type of CNV analysis. More importantly, WGA resulted in CNV changes for all three samples. The majority of the CNV changes were due to under-amplification.
To better understand differences in copy number introduced by the WGA process, we compared the   Table 2. The median pair-wise Kolmogorov distances between the REPLIg-amplified vs. unamplified distribution and Genomi-Phi-amplified vs. unamplified distribution are 0.407 and 0.368, respectively, consistent with differences between the distributions of log 2 ratios before and after WGA. Another measure of the uniformity of the WGA process is the percentage of probes with ratios below or above a certain cut-off value when the WGA DNA is compared to the unamplified DNA. Table 3 shows the percentage of probes on the CGH arrays with ratios greater than 1.5-fold, 2-fold, 2.5-fold, 3-fold, and 4-fold for the three DNA samples and the two WGA methods. The percentages are the average of the 4 replicates. Also shown is the percentage of probes above these cut-offs for the unamplified Normal DNA compared to itself; all ratios were within 2, consistent with low variability of the aCGH method. Both amplification methods resulted in less than 1% of probes showing CNV of greater than 2-fold, with the exception of the Normal sample amplified by GenomiPhi (1.273%). If the fold-change criterion was relaxed to 3-fold, less than approximately 0.1% of probes showed CNV. Table 4 shows regions of the genome that were consistently under-amplified by at least 3-fold across all samples by both amplification methods. These under-amplified regions represent a small portion of total genome (2.1%), and contain about 1099 genes (approximately 4.4%). Thus, while the MDA technology resulted in changes in copy number, the changes were relatively small in most regions of the genome, and were concentrated in a relatively small number of chromosomal loci.

Consistency of the WGA process
Differences in the distribution of ratios within each sample group were calculated using the Kolmogorov distance. Table 5 shows pair-wise comparisons of Kolmogorov distances within each sample group, as well as the median value for each group. The median Kolmogorov value for the four self-self hybridizations (Normal sample, unamplified vs. unamplified) was 0.025 and the median Kolmogorov values within the other sample groups were similarly low. Thus, the WGA process appears to be reproducible in terms of uniformity of amplification throughout the genome. The Phi correlation was calculated to evaluate the consistency of the position of the CNV loci (defined as having a ratio of greater than 2). Higher values (greater than 0.7) indicate the ratios at each probe location are very similar. Results

Comparisons between the two whole genome amplification kits
Both WGA kits used in this study, REPLI-g and Geno-miPhi, use the same bacterial phage Φ29 DNA polymerase. However, the assay procedures are considerably different with regard to reaction time (4 hr with Geno-miPhi and 10 hr with REPLI-g) and method of DNA denaturation (heat with GenomiPhi and alkaline with REPLI-g). Thus, comparison of outcomes of these procedures is important. The median values of pair-wise  Kolmogorov distances for Normal, CF1, and CF2 samples between REPLI-g and GenomiPhi amplified samples are 0.046, 0.055, and 0.046, respectively (Table 7) which are similar to the Kolmogorov distances from samples amplified by the same method (range: 0.013-0.085; see Table 4). The median pair-wise Phi correlations for Normal, CF1, and CF2 samples between REPLI-g and Geno-miPhi amplified samples after CBS smoothing are 0.739, 0.789 and 0.792, respectively (Table 8). Again, these values are similar to those obtained when comparing within an amplification method (see Table 6). These results indicate that the two amplification methods generate similar amplified DNA samples despite the procedural differences. Visual comparison of CNV generated by these amplification methods across chromosome 7 is illustrated in Figure 3. The pattern of CNVs is similar between the 2 amplification methods and among the 3 DNA

GenomiPhi-amplified REPLI-g-amplified
Normal CF1 CF2 Normal CF1 CF2  Quantitative real-time PCR copy number assay within the CFTR gene region Subsequent to array analysis, a quantitative PCR method was used to further evaluate the variability between the amplified and unamplified DNA samples. Ten TaqMan Copy Number PCR probes were selected from Applied Biosystem's pre-designed research assays across the length of the CFTR gene, a well-studied genetic disease locus, representing 9 introns and 1 exon (Table 9 and Figure 4). Relative copy number was calculated for each probe in reference to the unamplified DNA sample for each subject. Each of the ten probes gave comparable results showing less than 2-fold difference (considering both over-and under-amplification) in fold-change between amplified and unamplified DNA samples ( Figure 5). This low level of variability was also evident when comparing results between the two amplification methods (REPLI-g and GenomiPhi), suggesting comparable results irrespective of the manufacturer's protocol. Furthermore, the amount of variability across the three human samples was similar, suggesting robust amplification results regardless of possible biological differences. Thus, the TaqMan Copy Number PCR results are in agreement with the aCGH data ( Figure 3) in identifying low levels (below 2-fold change) of amplification variability introduced by the WGA process in the CFTR locus.

DNA sequence analysis in the CFTR gene region
Bidirectional Sanger DNA sequencing was performed on an approximately 2 kb region of the CFTR gene to examine the fidelity of the WGA process using primers shown in Table 10 (see Methods). DNA sequences from the WGA samples were compared with the respective unamplified DNA samples. The results are shown in Table 11 and there were no detectable mutations introduced by either WGA method in the 3 DNA samples (approximate 7,900 nucleotides examined for each sample/WGA method). Thus, the level of detectable mutations introduced by the GenomiPhi and REPLI-g WGA methods in this analysis was less than 1 mutation per 23,000 nucleotides examined. These results suggest that both GenomiPhi and REPLI-g  Table 8). The small red box highlights the 2 kb region of the CFTR gene that was sequenced. This DNA segment includes part of intron 11 and exon 12, which account for 1% of the gene.
Copy Number Figure 5 Quantitative real time PCR copy number assays. Taqman assays were used to examine copy number variations at 10 locations within the CFTR gene in both the unamplified and amplified DNA samples. Average calculated copy number values are plotted with bars representing minimum and maximum values from replicate measurements (n = 4). Within each amplification / sample group, the order of the probe results is I-X (see Figure 4). UnAmp (no amplification); Amp-R (REPLI-g amplified); Amp-G (GenomiPhi amplified).
appear to be robust and accurate methods for amplifying nanogram quantities of starting DNAs to microgram levels.

Discussion
The sequencing of the human genome, coupled with advances in genomics technologies, is having a major impact on understanding basic human biology, as well as the molecular causes of diseases and toxicities [38]. In fact, the leaders of the U.S. Food and Drug Administration (FDA) and the National Institutes of Health (NIH) have recently outlined the opportunities in clinical medicine that are being created by advances in basic science, including genomics [39]. The two agencies have announced a new collaborative effort to focus regulatory and translational sciences on bringing medical products and therapies into the age of personalized medicine. This growing store of knowledge has enormous potential application to the development of devices and tests for use in the diagnosis, mitigation, treatment, cure, and prevention of disease and other conditions. As of 2010, more than 100,000 germline mutations in more than 3700 genes have been associated with human inherited disease, with about 300 new disease genes and 10,000 mutations being identified annually [40]. In principle, DNA tests could be developed for each of these conditions. Development and approval of such tests for clinical diagnostics often require that accuracy be demonstrated on patient samples. That is, the test needs to detect the disease-causing mutation with high precision and accuracy. In addition, measures of proficiency are required by testing laboratories to ensure continued accuracy of the results. The Clinical Laboratory Improvement Advisory Committee has recently issued a good practices report for molecular genetic testing and has stressed the need for performance assessments [41]. Such recommendations are supportive of the 2008 Report of the Secretary's Advisory Committee on Genetics, Health, and Society (SACGHS) [42]. Additionally, the SACGHS has identified the development of genetic reference materials as one of five critical gaps in the oversight system of genetic testing [42]. The device approval process, use of reference and control materials, and ongoing proficiency assessments require relatively large quantities of clinical samples in order to assure test performance is adequate and maintained over time. The availability of sufficient samples with appropriate mutations has been recognized as a critical issue in genetic testing, given the paucity of validated clinical inherited disease gene samples [43]. Therefore, methods that can expand the limited supply of validated clinical samples would have major impacts on both the test and device performance evaluation and approval process, and the on-going proficiency assessment of the test providers.
The importance of the amount of template DNA for MDA-based WGA has been thoroughly discussed for SNP assays and STR genotyping in previous studies [28,31]. For example, large number of SNPs could be accurately detected from as low as 0.01 ng of DNA template [27,28] and even degraded DNA samples can be used for forensic SNP typing [29], while over 100 ng Primer locations (chromsome 7) are based on NCBI reference sequence NG_016465.1. "F" -forward primer; "R"-Reverse primer. of DNA template was needed for optimal STR genotyping [31]. In this study, we mainly focused on evaluating the uniformity and fidelity of WGA DNAs using CGH arrays, TaqMan copy number assays, and DNA sequencing. The results indicated the amplified DNA and its native unamplified DNA that we examined were similar, although not identical, in terms of DNA copy number variation. Importantly, no detectable introduced mutations were found under our experimental conditions. The Agilent Human CGH array used in this study includes approximately 970,000 probes at 1 to 2 kb intervals throughout the human genome. To ensure the quality of this study, four independent replications for both the WGA and aCGH processes were used. The self-self hybridization of unamplified Normal DNA samples provided us the base-line measurement of noise for aCGH technology. The box plot of unamplified Normal samples indicated less than 0.004% of the approximately 970,000 probes on the CGH array have fold changes greater than 1.5 and no CNVs can be detected by aCGH ( Figure 2 and Table 3). The low Kolmogorov distances (0.025) and high Phi correlations (1.00) between the four replicates of unamplified samples further showed the aCGH technology was reproducible and could be a reliable tool to access variability of the WGA process.
The WGA technologies have evolved over the years from Taq DNA polymerase-based (PCR-based) to bacteriophage Ф29 DNA polymerase-based methods (MDA). We focused on MDA methods in this study because of the processivity and reportedly low replication error rate of the Ф29 DNA polymerase. In this study, we compared two MDA-based commercial WGA kits: the REPLI-g and GenomiPhi kits. The comparison between the unamplified and amplified DNA samples showed that relatively few probes (approximately 1% or less) were over-or under-amplified by more than 2-fold (Table 3). GenomiPhi-amplified DNA, however, showed consistently more CNVs than the REPLI-g-amplified DNA. Kolmogorov distances also indicated that differences in the distribution of DNA copy number existed between the unamplified and amplified samples ( Table 2), and that these differences were greater than the differences between the REPLI-g and GenomiPhi amplified samples (Table 7). Thus, even though the kits differ in method of DNA denaturation (heat vs. alkaline), buffer composition, and reaction time (4 hr vs. 10 hr), they produce similar amplified DNA, likely the result of the common Ф29 DNA polymerase.
The data in this report indicate that the two WGA methods examined can consistently amplify small amounts of DNA (ng) to large quantities (~40 μg) with relatively small changes in DNA copy number along the chromosomes. Changes in DNA copy number of greater than 3-fold are evident as under-amplified regions at the ends of chromosomes, as illustrated in Figure 3, and in discrete regions on many of the chromosomes ( Table 4). Because of the consistency of the DNA amplification, knowledge of the positions of likely under-amplification can prevent the inappropriate use of WGA DNA. Results presented in Table 4 may serve as a reference guide to avoid target genes in these regions. Inspection of the GC content within these consistently under-amplified regions showed an average GC content of nearly 53% which was higher than the overall GC content of the human genome of 41%. Further investigations will be needed to evaluate the significance of this observation.
Cystic fibrosis (CF) is caused by severe dysfunction of cystic fibrosis transmembrane conductance regulator (CFTR), which commonly leads to progressive lung disease and a shortened life [44]. Currently, there is no cure available for CF, even though multiple interventions have been developed to slow its progression. Since the FDA cleared the first genotyping assay for CF in 2005, manufacturers have developed a number of genotyping tests for rare heritable diseases caused by multiple mutant alleles, using patient samples to assess performance. There is an increased interest in using WGA DNA samples created from patient samples to support the performance of the assays. This study shows that within certain limits, the WGA process produces large quantities of DNA that may be useful for this purpose. To address the concerns of mutations introduced by WGA, TaqMan copy number assays were used to analyze CNVs at 10 locations within CFTR gene region and a random selected 2 kb region in CFTR gene was also sequenced. Results from the TaqMan copy number assay are in excellent agreement with the aCGH results. The DNA sequencing analysis showed there were no mutations induced by WGA in this approximately 2 kb region of the CFTR gene (mutation induction was less than 4 x 10 -5 ), which indicates that WGA DNAs can be used for enrichment of DNA samples for cystic fibrosis genotyping assays.
The objective of the study presented here was to determine whether WGA amplified samples may be a reliable alternative to native clinical specimens for assessing the performance of a test under investigation. The conclusions from this study provide scientific input that may serve to support regulatory decisions in the ascertainment of safety and effectiveness of diagnostic products that use whole genome amplified samples in clinical studies. This study may serve as a guide to the technical qualification of WGA DNA for assessing the performance of genotyping assays.

Conclusions
In summary, WGA generates large quantities of DNA with relatively high uniformity and low replication error rate when compared to unamplified DNA. This suggests that WGA DNA may be suitable for accurate genotyping. However, because there are consistent differences between the WGA DNA and the native unamplified DNA, characterization of the genomic region of interest, as described here, will be necessary to ensure the reliability of genotyping results from WGA DNA.

Methods
DNA Samples: DNA samples used in this study were purchased from PrecisionMed, Inc. (San Diego, CA) and are shown in Table 1. All three DNA samples were extracted from human blood and were dissolved in TE buffer (1 mM EDTA, 10 mM Tris at pH 8.0). The concentration and purity (A260/280 ratio) of chromosomal DNA was measured by the NanoDrop 1000 and PicoGreen methods (Life Technologies, Carlsbad, CA). DNA samples were aliquoted to four batches and stored at −20°C before use.
Whole Genome Amplification (WGA): Two WGA kits, GenomiPhi (GE Healthcare, Piscataway, NJ ) and REPLI-g (Qiagen, Inc., Hilden, Germany), were used in this study. 20 ng of DNA template was used for the amplification process using the manufacturers' recommended protocols (Illustra GenomiPhi HY DNA Amplification kit protocol and REPLI-g Mini/Midi Handbook). The GenomiPhi reaction was allowed to proceed for 4 hr while the REPLI-g reaction was allowed to proceed for 10 hr. Both kits use the same phage Φ29 DNA polymerase to amplify the DNA templates, although the buffer components are proprietary. The average yields after WGA with GenomiPhi and REPLI-g were 37.5 μg and 14.0 μg DNA, respectively.
Array Comparative Genome Hybridization (aCGH): The Agilent (Santa Clara, CA) 1 M human CGH array used in this study is based on NCBI Build 37 (UCSC) with 963,029 biological features and 6,685 controls. The majority of the probes on this array have 1 to 2 kb spacing along the human chromosomes. DNA labeling and hybridization were performed following the Agilent Oligonucleotide Array-Based CGH for Genomic DNA Analysis protocol (V 6.1, 2009). One μg of genomic DNAs were labeled with fluorescent dyes (Cy3 or Cy5-dUTP). In this study, the amplified DNA samples were labeled with Cy3 and unamplified DNA samples were labeled with Cy5. Labeled DNAs with specific activity greater than 20 pmol of dye/μg DNA were used for aCGH. Labeled amplified (Cy3) and unamplified (Cy5) DNA samples were paired and co-hybridized to the arrays at 65°C for 40 hrs, then washed at room temperature following the Agilent Oligonucleotide Array-Based CGH for Genomic DNA Analysis protocol (V 6.1, 2009). The hybridized array was immediately scanned with an Agilent DNA Microarray Scanner (Agilent Technologies, Inc.) at 2 μm resolution. The resulting images were analyzed by quantifying the Cy3 and Cy5 fluorescence intensity at each feature on the array using the Agilent Feature Extraction Software (V10.5). The fluorescence intensity of each pixel within the feature was determined and the median fluorescence of these pixel measurements was taken as the measure of fluorescence for the whole feature after subtraction of background. Dye bias was removed by linear normalization using the Agilent Feature Extraction Software before the intensity values were used to calculate ratios at each feature.
aCGH Data Analysis: Copy number variation (CNV) was calculated at each locus along each chromosome as the ratio of the sample (either unamplified or amplified) to the unamplified sample intensities. A modified algorithm [45] of the circular binary segmentation smoothing method (CBS, [46]) was used to partition these ratio measurements into chromosomal regions containing loci with equal copy numbers. Regions with ratios less than 2 were then defined as being unchanged (no CNV). Two approaches were used to measure the similarity of CNV patterns. To measure the difference between the distributions of ratios from different experimental groups, Kolmogorov distances were calculated which are the sub-distances between the distribution functions [47]. Low values indicate little difference between distributions. Phi correlation, which is a measure of association for two binary variables [48], was calculated to measure the consistency of locations of the CNVs. A low Kolmogorov distance coupled with a high Phi correlation indicates highly uniform amplification. Agilent Genomic Workbench 6.0 software (Agilent Technologies, Inc.) was also used to analyze CNVs along the chromosomes utilizing the CBS module. Raw data were imported into Agilent Genomic Workbench 6.0 for CNV analysis. The four replicates for each sample were combined based on the weight of each sample which is proportional to its quality in Agilent Genomic Workbench before analysis. Average aberration reports (minimal 3 consecutive probes within each section) for Normal, CF1, and CF2 after WGA were generated using the CBS method with a fold change cut-off at 3-fold. Cytobands with aberrations greater than 3 fold across the six amplified samples are summarized in Table 4.
Quantitative Real Time PCR Copy Number Assays: Unamplified and amplified genomic DNAs (10 ng) were used as templates in TaqMan Copy Number Assays (Applied Biosystems, Carlsbad, CA) according to the manufacturer's protocol. Ten probes, distributed across the length of the CFTR gene as illustrated in Figure 4, were used to quantify copy number in each DNA sample using RNase P as the reference assay. The Applied Biosystems probe IDs and locations in the CFTR gene can be found in Table 9. Cycle threshold (Ct) values were exported into Applied