Normalization of arrayCGH data: influence of copy number imbalances
 Johan Staaf^{1}Email author,
 Göran Jönsson^{1},
 Markus Ringnér^{1} and
 Johan VallonChristersson^{1}Email author
DOI: 10.1186/147121648382
© Staaf et al; licensee BioMed Central Ltd. 2007
Received: 15 June 2007
Accepted: 22 October 2007
Published: 22 October 2007
Abstract
Background
Highresolution microarraybased comparative genomic hybridization (CGH) techniques have successfully been applied to study copy number imbalances in a number of settings such as the analysis of cancer genomes. For normalization of arrayCGH data, methods initially developed for gene expression microarray analysis have, in general, been directly adopted and used. However, these methods are designed to work under assumptions that may not be valid for arrayCGH data when copy number imbalances are present. We therefore sought to investigate the effect on normalization imposed by copy number imbalances.
Results
Here we demonstrate that copy number imbalances correlate with intensity in arrayCGH data thereby causing problems for conventional normalization methods. We propose a strategy to circumvent these problems by taking copy number imbalances into account during normalization, and we test the proposed strategy using several data sets from the analysis of cancer genomes. In addition, we show how the strategy can be applied to conveniently define adaptive samplespecific boundaries between balanced copy number, losses, and gains to facilitate management of variation in tissue heterogeneity when calling copy number changes.
Conclusion
We highlight the importance of considering copy number imbalances during normalization of arrayCGH data, and show how failure to do so can deleteriously affect data and hamper interpretation.
Background
Microarraybased techniques for genomewide investigation of copy number aberrations (CNAs) have recently gained much attention. Initially employing arrays developed for gene expression analysis [1], or lowdensity arrays produced from largeinsert genomic clones such as bacterial artificial chromosomes (BACs) [2], the application has evolved rapidly. Currently, specialized highdensity arrays with oligonucleotide probes or probes derived from BAC clones are predominately used. Twochannel arraybased comparative genomic hybridization (aCGH) is a direct successor to conventional metaphase CGH [3]. In both cases, DNA from two samples are differentially labeled with fluorescent dyes and cohybridized to immobilized genomic capture probes. By use of aCGH, DNA derived from tumor tissue can be compared with reference DNA, e.g., normal whole blood DNA, and genomic imbalances can effectively be investigated. The main advantage of aCGH over conventional CGH is the increased resolution achieved by microarrays with a large number of individual probes, routinely up to hundreds of thousands, covering the entire genome [4]. The power of aCGH has been demonstrated in tumor studies [5–8], as well as in the field of clinical genetics [9], and the basis of the technique is reviewed elsewhere [10]. In essence, relative ratios of copy number between two DNA samples are obtained by comparing the two fluorescent signal intensities for each probe under the assumption that intensities reflect the amount of corresponding genomic DNA in the respective sample.
Using selfself comparisons, in which a sample is compared with itself, it has been observed that other forms of technical bias, e.g., spatial or plate bias, exist that can skew measured M values enough to revoke the validity of the aforementioned normalization methods [17]. Both methods have therefore been implemented in ways that include stratification of M values in groups of data that are individually subjected to the correction. Stratification can be performed based on, e.g., spatial probe location, or probe source [17]. The general thought is that stratification will result in groups, i.e., populations, of data in which the validity of the normalization method is upheld. It has also been observed that the assumptions, required for conventional normalization methods to work, can fail as a result of a true biological distribution of M, e.g., in situations where the majority of probes measure true differences between compared samples [18].
We here highlight a well known and commonly displayed property of tumor cells, namely the presence of biologically true CNAs. Figure 1b shows a genome plot of raw M values obtained by aCGH of a female breast cancer tumor xenograft [19] compared with male normal whole blood DNA. In the genome plot, M is plotted as a function of the genomic location of the probe sequence. In figure 1b, several genomic regions with different and discrete M can readily be observed. We sought to investigate the effect on normalization imposed by this property of aCGH data. We show that this property results in consequential drawbacks when using conventional normalization methods and propose a strategy that incorporates any populations present in the data into the normalization.
The proposed strategy can be integrated with any of several existing normalization methods and results in improved data quality. Also, spatial effects resulting in nonbiological, but relevant, populations that can bias normalization are handled when calculating corrections. We also note that part of the procedure can be applied to assign adaptive samplespecific thresholds for calling copy number changes. The proposed normalization strategy, as well as the adaptive samplespecific level scaling, provides powerful and convenient means for improved copy number analysis using aCGH.
Results and Discussion
This study is outlined as follows with results and discussion presented accordingly. To investigate the influence of copy number imbalances on normalization we first created a set of mimicked data representing states of an increasing fraction of genomic gain. Using the mimicked data we demonstrate the effects of gain on normalization using Median and Lowess. We then evaluated an alternative normalization strategy in which data is stratified into separate populations representing gain and balanced copy number respectively. Whereas mimicked data provide prior knowledge facilitating stratification, most experiments lack this information. Therefore, we developed a method for stratification of data and evaluated the method using previously characterized cases. By applying our procedure for stratification and normalization to tumor specimens on different aCGH platforms we compare performance with standard methods. We investigate the implication of technical spatial effects and propose a strategy for improved normalization. In addition, we evaluate the possibility to apply our method to assess noise levels in data and assign samplespecific thresholds for detection of copy number imbalances.
Normalization of aCGH data using Median
Genomic imbalances correlate with intensity in aCGH data
Normalization of aCGH data using populationbased intensitybased lowess
We sought to develop a method that corrects for intensity dependence of M due to technical bias while retaining intensity dependence of biological relevance. We reasoned that if we could stratify aCGH ratios from an experiment with respect to copy number populations, we could use this information to circumvent the drawbacks with Lowess. One way to do this would be to run Lowess on one selected population and then apply the resulting correction line on all M values. We refer to this general strategy of considering copy number populations when using Lowess as populationbased intensitybased lowess (popLowess). Applying popLowess would serve two purposes. Firstly, data would be centered at a copy number population rather than a mean or median of a mixture of different and possibly diverse copy number levels. Secondly, correlations between M and A related to technical bias would be identified and corrected for without affecting the intensity dependence due to different copy numbers. To test this strategy, we subjected the mimicked XXX/XX data sets to popLowess. Since we had prior knowledge about this case we could stratify values into copy number populations based on chromosome mapping. All values for autosomes were considered to comprise one population and all values from the X chromosome another.
Stratification of M values into copy number populations
Comparison of popLowess enriched population assignment to karyotyping data for eight hyperdiploid cases [21]
Case  Gain (called/karyotype)*  Diploid (called/karyotype)**  Gain (fraction of karyotype called)*** 

1  0.97  1.05  0.99 
2  0.87  1.09  1.00 
3  1.00  1.00  0.85 
4  1.14  0.88  0.85 
5  0.89  1.21  0.99 
6  0.97  1.08  1.00 
7  0.95  1.11  0.80 
8  1.09  0.76  0.63 
A procedure for normalization of aCGH data using popLowess
Once data is stratified into sets of enriched copy number populations we can select one, e.g., the largest, to perform Lowess normalization on. The generated correction curve must be generalized to cover the full range of A allowing for correction of all M values (Figure 6e). This procedure will ensure that the lowess derived correction line trails one population and remains unaffected by adjacent ones. We refer to this action as popLowesso (where the letter o is a mnemonic for one) as it makes use of one population to derive a correction line for all data. The complete procedure of data stratification and popLowess normalization is shown in figure 5, steps 1–8. Naturally, once data is stratified alternative variants of calculating normalization corrections are imaginable. For example, one could fit lowess lines to each population and correct them individually or one could individually center populations and then use the combined data to create a lowess derived correction line. We refer to these alternatives as popLowessi (where the letter i is a mnemonic for individual) and popLowessc (where the letter c is a mnemonic for common) respectively. The latter alternative has the added advantage of reducing the degree to which the correction line needs to be extrapolated to cover the full range of A. Both alternatives require an additional step to center a selected copy number population at M = 0. The variants popLowesso and popLowessc rely on that the intensitybased curvature in MA space is reasonably shared between populations.
Selecting a population to represent intrinsic copy number
The normalization procedure presented herein will center a population with unknown copy number at M = 0. The rationale for selecting an appropriate population for this purpose can differ depending on samples analyzed and the aim of a project. For instance, in the field of cytogenetics, gains and losses in tumors are by convention described as net changes relative to intrinsic balanced copy number, i.e., relative ploididy. As the number of centromeres determines ploidity, a parallel rationale would be to relate imbalances relative to the largest identified population and therefore center this population at M = 0. However, in some applications it might be more appropriate to relate imbalances to a normal diploid state. Thus, selecting a population to center data at can include using prior knowledge about regions with known copy number or selecting the middle population out of three, if present. Irrespectively of preferences of how data best be centered, the proposed popLowess procedure will alleviate the normalization problems related to mixed copy number populations. Importantly, when performing focused aCGH with specialized arrays that do not cover the entire genome, or comprise probes with a disproportioned focus on specific genomic regions, even CNAs that affect a minor part of the genome can introduce a significant correlation between copy number and intensity, and can result in misinterpretations of how a given ratio level relate to copy number.
Application to tumor specimens on different aCGH platforms
Comparison of popLowess strategy to standard normalization methods
We set out to test if the popLowess strategy could systematically reduce variation in M within copy number populations in different aCGH data sets. We hypothesized that when correction curves cross, or not accurately track, copy number populations; or when intensitybased curvature is not properly addressed, a larger variation in M is obtained after normalization. To this aim, we compared the performance of the popLowess strategy versus Median and Lowess using seven different aCGH data sets (data sets 1–6, 8). The data sets cover three different types of aCGH platforms hybridized with a variety of cell line and tumor samples displaying a large variation of CNAs.
We used the strategy in figure 5 to identify copy number populations in each of the data sets. We then normalized each data set in parallel using popLowess, Lowess, and Median. After normalization, we calculated standard deviations of M for each identified population for each method and compared results.
Comparison of effect on population variance between different normalization strategies
Pvalues for data sets  

Data set  1 [23]  2 [25]  3 [20]  4 [8]  5  6  8 [21]  
Nbr of samples  7  28  10  52  8  8  8  
Platform  BAC 32 K  BAC 32 K  BAC 32 K  BAC 1 Mb  Agilent 244 K  Agilent 44 K  BAC 32 K  
popLowess vs Lowess  All populations  1.1e4  7.0e12  5.6e8  3.4e28  2.5e05  1.6e4  7.2e5 
Population 1  7.8e3  1.4e5  9.8e4  9.9e32  2.0e3  2.0e3  3.9e3  
Population 2  7.8e3  1.5e6  9.8e4  7.4e4  2.0e3  0.09  3.5e2  
Population 3  0.23  6.3e3  2.0e2  2.5e7  0.25  0.09  0.25  
popLowess vs Median  All populations  < 1e32  < 1e32  < 1e32  < 1e32  < 1e32  < 1e32  < 1e32 
Population 1  7.8e3  3.7e9  9.8e4  9.9e32  2.0e3  2.0e3  3.9e3  
Population 2  6.3e2  1.4e5  0.17  3.8e2  0.50  0.09  0.14  
Population 3  0.23  9.0e5  0.25  0.28  0.25  0.50  0.75 
Since we do not have prior knowledge of CNAs in most of the cases we cannot evaluate variation within confirmed genomic regions of similar copy number. Therefore, one could argue that the better performance of popLowess, resulting in lower variation within populations when compared with conventional normalization, is biased by the fact that populations are inferred from the data. However, from looking at the data in table 1, and at the genome plots in figure 7 (panel g and h) we note that the identified populations reflect regions with discrete copy number levels. Therefore, we argue that decreased intra population variation is beneficial to both interpretation and downstream analysis and provides improved data quality.
Spatial effects
Presence of technical artifacts in array data resulting in correlation between M and spatial probe location on the array is a wellknown and previously described phenomenon. We focused on two plausible consequences of such spatial effects in aCGH data. Firstly, affected values can introduce populations that compromise normalization in the same way as copy number populations. Secondly, affected values will be incorrectly scaled compared to nonaffected.
We reasoned that ratios biased by spatial artifacts are controlled for by our proposed popLowess strategy as it filters outlier data guided by genomic mapping. Thus, when calculating an intensity dependent correction for normalization, our strategy would not be compromised by spatial bias as affected values are disregarded together with values from break points, highlevel amplifications, and homozygous deletions. On the other hand, popLowess does not correct for spatial effects and affected values would remain incorrectly scaled after normalization even if the intensity bias is removed.
As the proposed popLowess strategy does not correct for spatial effects, we reasoned that a prenormalization step might be appropriate for data displaying spatially related bias in order to properly scale affected values. This could be accomplished by applying one of many available spatial correction methods [15–17], or variations thereof, prior to popLowess. However, since we have shown that genomic imbalances correlate with intensity, we are cautious about addressing spatial effects using prenormalization algorithms that are intensitybased.
To test our reasoning we applied popLowess to data set 7. Samples in this set have little to no genomic alterations but the data display variation in MA curvature and spatial effects. Data set 7 was normalized using popLowess, blockbased Median followed by popLowess, or blockbased Lowess followed by popLowess. For popLowess, by itself or in combination with a prenormalization step, a merge cluster criteria of 0.3 in M was employed to account for the presence of only two copy number populations.
Effect of prenormalization to correct spatial bias prior to applying popLowess
unnormalized*  popLowess**  prenormalization by blockbased Median***  prenormalization by blockbased Lowess****  

XY vs XY  0.062  0.062  0.009  0.003 
XY vs XY  0.067  0.067  0.003  0.003 
XX vs XX  0.097  0.069  0.035  0.003 
XX vs XX  0.125  0.127  0.031  0.004 
XXX vs XX  0.051  0.049  0.003  0.004 
XX vs XY  0.046  0.045  0.003  0.003 
XXXX vs XX  0.060  0.033  0.004  0.004 
XXX vs XY  0.058  0.058  0.007  0.005 
XXXX vs XY  0.060  0.058  0.008  0.004 
We conclude that the proposed popLowess strategy is robust in the sense that it can handle the presence of otherwise deleterious populations without relying on them. We also conclude that, whereas popLowess is inert to spatial effects, in the sense that it does not compromise calculation of an intensity dependent correction, a prenormalization step that correct for spatial bias is warranted.
Adaptive samplespecific thresholds for calling copy number change
During development of the popLowess strategy, we recognized that the samplespecific cutoff value (Figure 5, step 3) could be used to assess noise level in data and to assign thresholds for copy number imbalances on a samplespecific basis. Several reports [5, 8, 22, 23] have utilized global thresholds in M for calling CNA as gains or losses. These thresholds are assigned by adding/subtracting a value in M from a base line typically at M = 0. Determining suitable thresholds may be problematic in large sample sets with samples of varying quality and heterogeneity, often the case for tumor studies [10], and may result in setting too conservative thresholds for certain samples in order to avoid erroneous CNA calls. Deriving samplespecific threshold values scalable for desired stringency in an automated fashion is then of relevance.
A parallel can be made to the derivative log ratio spread (DLR) value calculated by the Agilent CGH Analytics software. The DLRvalue can be used to assess hybridization quality and provide a sample scalable threshold for calling CNAs using, e.g., the Zscoring algorithm in the CGH Analytics software.
Normalization affects downstream analysis
Conclusion
We show that the presence of copy number populations in aCGH data deleteriously affects normalization using curvegenerating algorithms such as intensitybased lowess and may cause erroneous centering of data. We demonstrate that genomic imbalances correlate with intensity in aCGH data and therefore must be accounted for during normalization in order to correct for intensity dependence of M due to technical bias while retaining intensity dependence of biological relevance. Here we propose a populationbased normalization strategy that accounts for the presence of copy number populations. We show that benefits of a populationbased normalization approach are clearly recognized for data displaying numerous CNAs. We also demonstrate that the proposed procedure can be applied to assign adaptive samplespecific thresholds for calling copy number changes. We appreciate that the suggested strategy represents only one conceivable way of implementing populationbased normalization and that any implementation that effectively discerns copy number populations in aCGH data, whether utilizing prior knowledge regarding samples or inference from the data itself, could be used. In addition, once copy number populations are identified, this information can be used in a variety of ways to circumvent highlighted problems related to conventional normalization of aCGH data. Taken together, we demonstrate that copy number populations in aCGH data should be accounted for during normalization and that the proposed normalization strategy, as well as the adaptive samplespecific level scaling, provides powerful and convenient means for improved copy number analysis using aCGH.
Methods
Data sets
We used eight data sets derived from BAC arrays and from Agilent 244 K oligonucleotide CGH arrays to evaluate normalization methods. Data set 1 consists of seven breast cancer cell lines analyzed using tiling 32 K BAC arrays [23]. Data set 2 consists of 28 lung cancer cell lines analyzed using tiling 32 K BAC arrays [25]. Data set 3 consists of ten breast cancer cell lines analyzed using tiling 32 K BAC arrays [20]. Data set 4 consists of 52 breast cancer tumors analyzed in dyeswaps on 1 Mb BAC arrays [8]. Data set 5 consists of 8 breast cancer tumors and one dyeswap analyzed using Agilent 244 K oligonucleotide CGH arrays [26]. These tumors displayed DLR values between 0.196 and 0.364 when analyzed with Agilent CGHAnalytics software ver 3.4.27 [26]. Data set 6 was created from data set 5 by matching the oligonucleotide probe IDs from the 244 K arrays to the Agilent 44B probe IDs available through Agilent eArray [27], thus creating a virtual 44 K oligonucleotide CGH array. Of 42,447 genomemapped probe IDs on the 44B array, 41,599 were found on the 244 K arrays (98%). Data set 7 consists of nine hybridizations of chromosome X aberrant cell lines with karyotype 47, XXX and 48, XXXX, and male 46, XY and female 46, XX samples in various combinations [20]. Samples in data set 7 are expected to display a normal karyotype for chromosomes 1–22. Data set 8 consists of eight hyperdiploid childhood ALL cases analyzed using tiling 32 K BAC arrays [21].
Prefiltering and conventional normalization of aCGH data
All data sets were loaded into BioArray Software Environment (BASE) [28] for analysis. Positive and nonsaturated spots were background corrected using the median foreground minus the median background signal intensity for each channel and log ratios (M) were calculated from the background corrected intensities. In all analysis we used M = log_{2}(int1/int2) and A = log_{10}(sqrt(int1*int2)), where int1 and int2 are background corrected intensities from the investigated sample and reference, respectively. Data sets 1–4 and 7–8 were filtered for signaltonoise ratio for each spot in both channels according to published reports and the remaining data sets for signaltonoise ratio > 5 in both channels before BASE implemented software plugins of the different normalization strategies were employed. A lowess smooth factor of 0.33, delta of 0.1, and four iterations were used for standard Lowess, popLowess and blockbased lowess normalization. Block group size was set to 1 for all blockbased normalizations.
Populationbased intensitybased lowess
A schematic overview of the proposed popLowess normalization strategy is shown in figure 5. The approach is applied on a per sample basis starting with genomic mapping and raw intensities (int1 and int2) for N probe IDs (step 1, Figure 5). The probes are sorted according to genomic position and M and A are calculated for each probe (step 2, Figure 5). Next, a standard deviation in M is calculated for each probe in sliding windows of userdefined size along the genome. The resulting distribution of N standard deviations is subjected to a cutoff criterion generating K probes with standard deviations < cutoff for continued population analysis (step 3, Figure 5). A moving window size of 11 probes was used and the median of the standard deviation distribution was used as cutoff value. This selection criterion is sample adaptive avoiding problems with using a global cutoff criterion. The K selected probes are next segmented on a per chromosome basis using, e.g., the CGHplotter algorithm [14] or the faster circular binary segmentation (CBS) algorithm [13] (step 4, Figure 5). Herein, the segmentation algorithm proposed by Autio et al. was used with the constant for computing the number of changes (cparameter) set to 10 [14]. Segmented values are used to cluster the K probes into three distinct clusters by means of robust kmeans clustering (step 5, Figure 5). After clustering, there is an option to merge clusters with cluster centers close to each other. Merging is typically useful for samples not displaying three populations, e.g., samples with 1 or 2 copy number populations. When indicated, a merge cluster criterion of 0.2 or 0.3 in M was used. The resulting data consists of 1–3 distinct populations of data that contains information about the genomic mapping, M, and A for each probe. The largest population is selected for lowess normalization [29] generating a population specific correction curve (step 6, Figure 5). The correction curve is next extrapolated to the entire range of A and used to correct M for all N reporters similar to Lowess (step 7, Figure 5). The extrapolation is done conservatively in the end points of A by using the first/last data point of the population specific correction curve to level out the global correction curve horizontally in the MA plot thereby moderating the impact of extreme points or missing values. After lowess correction, one population is selected as the center population and all data is shifted such that this population obtains median M equal to 0. Selection of a center population can be based on different assumptions. Finally, the normalized int1 and int2 intensities are returned (step 8, Figure 5). By not segmenting the entire set of observations, and by setting the crucial segmentation parameters for detecting breakpoints in the lower scale, speed is gained while still retaining robustness as long as the standard deviation cut off is not set too low. The purpose of segmentation is to refine large regions with identical copy number and not to detect small complex copy number alterations.
Comparison of normalization methods
For comparisons, the R implemented lowess function was used to create lowessnormalized data. For each identified population (step 1–5, Figure 5) in every sample in data sets 1–6 and 8, the standard deviations in M of the reporters in the population after Lowess, popLowess, and no normalization (equal to Median) were calculated separately. The number of populations in a data set for which the popLowess strategy rendered a lower standard deviation compared to the competitor was calculated. To evaluate if popLowess resulted in a significant number of populations with lower standard deviations, one sided pvalues were calculated using the binomial distribution with p = 0.5. This binomial test corresponds to the null hypothesis that lower standard deviations for popLowess are obtained by chance. This comparison was done both when studying all populations as a whole and for each population individually.
Sample adaptive gain/loss thresholds
Sample adaptive thresholds for calling gain or loss can be generated by performing steps 1–3 in Figure 5 using the same form of data input and standard deviation cutoff criteria. The identified standard deviation cutoff value can be scaled by multiplicative factors to generate sample specific gain/loss thresholds of desired stringency for downstream applications, e.g., calling CNAs after segmentation. Before creating sample adaptive thresholds, data was prefiltered and normalized using the popLowess strategy. Sample adaptive thresholds for the Ca13928 breast tumor were created before and after a smoothing window of 250 kBp size for 32 K BAC data and 50 kBp for Agilent 244 K data. Thresholds were estimated using a chromosomal moving window of size 1% of the total probe number for each chromosome separately and the standard deviation cutoff value was selected as the median of the standard deviation distribution. The cutoff value was scaled by a factor 2 to create the ± thresholds in M displayed in figure 9.
Availability and requirements
An implementation of popLowess in R http://www.rproject.org is available both as a plugin to the BioArray Software Environment (BASE) [28] and as a standalone version.
Project name: popLowess
Project home page: http://baseplugins.thep.lu.se/wiki/se.lu.onk.popLowess
Operating system(s): Platform independent
Programming language: R
License: GNU GPL
List of abbreviations
 aCGH:

arraybased CGH
 ALL:

acute lymphoblastic leukemia
 BAC:

bacterial artificial chromosome
 BASE:

BioArray Software Environment
 CGH:

comparative genomic hybridization
 CNA:

copy number aberration
 CNV:

copy number variation
 FISH:

Fluorescence in situ hybridization
 IQR:

Inter Quartile Range
 Lowess:

Global intensitybased lowess normalization
 Median:

Global median normalization
 popLowess:

populationbased intensitybased lowess normalization
 SKY:

Spectral karyotyping technique
Declarations
Acknowledgements
We wish to thank Patrik Edén and Mattias Höglund for helpful comments on the manuscript. This work was supported by the Knut and Alice Wallenberg Foundation via the SWEGENE program (JS and JVC), the Swedish Cancer Society (GJ), the American Cancer Society (GJ and JVC), John och Augusta Perssons stiftelse (GJ and JVC), and the Swedish Foundation for Strategic Research through CREATE Health – the Lund Strategic Centre for Clinical Cancer Research (MR).
Authors’ Affiliations
References
 Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, Jeffrey SS, Botstein D, Brown PO: Genomewide analysis of DNA copynumber changes using cDNA microarrays. Nat Genet. 1999, 23 (1): 4146. 10.1038/12640.PubMedView ArticleGoogle Scholar
 Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, Dairkee SH, Ljung BM, Gray JW, Albertson DG: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998, 20 (2): 207211. 10.1038/2524.PubMedView ArticleGoogle Scholar
 Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D: Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science. 1992, 258 (5083): 818821. 10.1126/science.1359641.PubMedView ArticleGoogle Scholar
 Ylstra B, van den Ijssel P, Carvalho B, Brakenhoff RH, Meijer GA: BAC to the future! or oligonucleotides: a perspective for micro array comparative genomic hybridization (array CGH). Nucleic Acids Res. 2006, 34 (2): 445450. 10.1093/nar/gkj456.PubMed CentralPubMedView ArticleGoogle Scholar
 Hicks J, Krasnitz A, Lakshmi B, Navin NE, Riggs M, Leibu E, Esposito D, Alexander J, Troge J, Grubor V, Yoon S, Wigler M, Ye K, BorresenDale AL, Naume B, Schlicting E, Norton L, Hagerstrom T, Skoog L, Auer G, Maner S, Lundin P, Zetterberg A: Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 2006, 16 (12): 14651479. 10.1101/gr.5460106.PubMed CentralPubMedView ArticleGoogle Scholar
 Fridlyand J, Snijders AM, Ylstra B, Li H, Olshen A, Segraves R, Dairkee S, Tokuyasu T, Ljung BM, Jain AN, McLennan J, Ziegler J, Chin K, Devries S, Feiler H, Gray JW, Waldman F, Pinkel D, Albertson DG: Breast tumor copy number aberration phenotypes and genomic instability. BMC Cancer. 2006, 6: 9610.1186/14712407696.PubMed CentralPubMedView ArticleGoogle Scholar
 Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve RM, Qian Z, Ryder T, Chen F, Feiler H, Tokuyasu T, Kingsley C, Dairkee S, Meng Z, Chew K, Pinkel D, Jain A, Ljung BM, Esserman L, Albertson DG, Waldman FM, Gray JW: Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 2006, 10 (6): 529541. 10.1016/j.ccr.2006.10.009.PubMedView ArticleGoogle Scholar
 Jonsson G, Naylor TL, VallonChristersson J, Staaf J, Huang J, Ward MR, Greshock JD, Luts L, Olsson H, Rahman N, Stratton M, Ringner M, Borg A, Weber BL: Distinct genomic profiles in hereditary breast tumors identified by arraybased comparative genomic hybridization. Cancer Res. 2005, 65 (17): 76127621.PubMedGoogle Scholar
 Vissers LE, Veltman JA, van Kessel AG, Brunner HG: Identification of disease genes by whole genome CGH arrays. Hum Mol Genet. 2005, 14 (Spec No 2): R215223. 10.1093/hmg/ddi268.PubMedView ArticleGoogle Scholar
 Pinkel D, Albertson DG: Comparative genomic hybridization. Annu Rev Genomics Hum Genet. 2005, 6: 331354. 10.1146/annurev.genom.6.080604.162140.PubMedView ArticleGoogle Scholar
 Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32 (Suppl): 496501. 10.1038/ng1032.PubMedView ArticleGoogle Scholar
 Jong K, Marchiori E, Meijer G, Vaart AV, Ylstra B: Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics. 2004, 20 (18): 36363637. 10.1093/bioinformatics/bth355.PubMedView ArticleGoogle Scholar
 Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of arraybased DNA copy number data. Biostatistics. 2004, 5 (4): 557572. 10.1093/biostatistics/kxh008.PubMedView ArticleGoogle Scholar
 Autio R, Hautaniemi S, Kauraniemi P, YliHarja O, Astola J, Wolf M, Kallioniemi A: CGHPlotter: MATLAB toolbox for CGHdata analysis. Bioinformatics. 2003, 19 (13): 17141715. 10.1093/bioinformatics/btg230.PubMedView ArticleGoogle Scholar
 Neuvial P, Hupe P, Brito I, Liva S, Manie E, Brennetot C, Radvanyi F, Aurias A, Barillot E: Spatial normalization of arrayCGH data. BMC Bioinformatics. 2006, 7: 26410.1186/147121057264.PubMed CentralPubMedView ArticleGoogle Scholar
 Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise framework for the normalization of array CGH data. BMC Bioinformatics. 2005, 6: 27410.1186/147121056274.PubMed CentralPubMedView ArticleGoogle Scholar
 Smyth GK, Speed T: Normalization of cDNA microarray data. Methods. 2003, 31 (4): 265273. 10.1016/S10462023(03)001555.PubMedView ArticleGoogle Scholar
 Oshlack A, Emslie D, Corcoran LM, Smyth GK: Normalization of boutique twocolor microarrays with a high proportion of differentially expressed probes. Genome Biol. 2007, 8 (1): R210.1186/gb200781r2.PubMed CentralPubMedView ArticleGoogle Scholar
 Johannsson OT, Staff S, VallonChristersson J, Kytola S, Gudjonsson T, Rennstam K, Hedenfalk IA, Adeyinka A, Kjellen E, Wennerberg J, Baldetorp B, Petersen OW, Olsson H, Oredsson S, Isola J, Borg A: Characterization of a novel breast carcinoma xenograft and cell line derived from a BRCA1 germline mutation carrier. Lab Invest. 2003, 83 (3): 387396.PubMedView ArticleGoogle Scholar
 Jonsson G, Staaf J, Olsson E, Heidenblad M, VallonChristersson J, Osoegawa K, de Jong P, Oredsson S, Ringner M, Hoglund M, Borg A: Highresolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization. Genes Chromosomes Cancer. 2007, 46 (6): 543558. 10.1002/gcc.20438.PubMedView ArticleGoogle Scholar
 Paulsson K, Heidenblad M, Morse H, Borg A, Fioretos T, Johansson B: Identification of cryptic aberrations and characterization of translocation breakpoints using array CGH in high hyperdiploid childhood acute lymphoblastic leukemia. Leukemia. 2006, 20 (11): 20022007. 10.1038/sj.leu.2404372.PubMedView ArticleGoogle Scholar
 de Leeuw RJ, Davies JJ, Rosenwald A, Bebb G, Gascoyne RD, Dyer MJ, Staudt LM, MartinezCliment JA, Lam WL: Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes. Hum Mol Genet. 2004, 13 (17): 18271837. 10.1093/hmg/ddh195.PubMedView ArticleGoogle Scholar
 Shadeo A, Lam WL: Comprehensive copy number profiles of breast cancer cell model genomes. Breast Cancer Res. 2006, 8 (1): R910.1186/bcr1370.PubMed CentralPubMedView ArticleGoogle Scholar
 Lipson D, BenDor A, Yakhini Z: Determining the center of arrayCGH data. Computational aspects of DNA copy number measurement. 2007, Technion – Israel Institute of Technology, Computer Science Department, 105110.Google Scholar
 Garnis C, Lockwood WW, Vucic E, Ge Y, Girard L, Minna JD, Gazdar AF, Lam S, Macaulay C, Lam WL: High resolution analysis of nonsmall cell lung cancer cell lines by whole genome tiling path array CGH. Int J Cancer. 2006, 118 (6): 15561564. 10.1002/ijc.21491.PubMedView ArticleGoogle Scholar
 Agilent Technologies. [http://www.agilent.com]
 Agilent eArray. [http://earray.chem.agilent.com/earray]
 Saal LH, Troein C, VallonChristersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol. 2002, 3 (8): SOFTWARE0003. 10.1186/gb200238software0003.PubMed CentralPubMedView ArticleGoogle Scholar
 Yang MC, Ruan QG, Yang JJ, Eckenrode S, Wu S, McIndoe RA, She JX: A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays. Physiol Genomics. 2001, 7 (1): 4553.PubMedView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.