- Research article
- Open Access
Chromothripsis-like patterns are recurring but heterogeneously distributed features in a survey of 22,347 cancer genome screens
BMC Genomicsvolume 15, Article number: 82 (2014)
Chromothripsis is a recently discovered phenomenon of genomic rearrangement, possibly arising during a single genome-shattering event. This could provide an alternative paradigm in cancer development, replacing the gradual accumulation of genomic changes with a “one-off” catastrophic event. However, the term has been used with varying operational definitions, with the minimal consensus being a large number of locally clustered copy number aberrations. The mechanisms underlying these chromothripsis-like patterns (CTLP) and their specific impact on tumorigenesis are still poorly understood.
Here, we identified CTLP in 918 cancer samples, from a dataset of more than 22,000 oncogenomic arrays covering 132 cancer types. Fragmentation hotspots were found to be located on chromosome 8, 11, 12 and 17. Among the various cancer types, soft-tissue tumors exhibited particularly high CTLP frequencies. Genomic context analysis revealed that CTLP rearrangements frequently occurred in genomes that additionally harbored multiple copy number aberrations (CNAs). An investigation into the affected chromosomal regions showed a large proportion of arm-level pulverization and telomere related events, which would be compatible to a number of underlying mechanisms. We also report evidence that these genomic events may be correlated with patient age, stage and survival rate.
Through a large-scale analysis of oncogenomic array data sets, this study characterized features associated with genomic aberrations patterns, compatible to the spectrum of “chromothripsis”-definitions as previously used. While quantifying clustered genomic copy number aberrations in cancer samples, our data indicates an underlying biological heterogeneity behind these chromothripsis-like patterns, beyond a well defined “chromthripsis” phenomenon.
One consistent hallmark of human cancer genomes are somatically acquired genomic rearrangements, which may result in complex patterns of regional copy number changes [1, 2]. These alterations have the potential to interrupt or activate multiple genes, and consequently have been implicated in cancer development . Analysis of genomic rearrangements is essential for understanding the biological mechanisms of oncogenesis and to determine rational points of pharmacological interference [4, 5]. Some large-scale efforts have been undertaken to correlate genomic rearrangements to genome architecture as well as to the progression dynamics of cancer genomes [6, 7]. At the moment, the stepwise development of cancer with the gradual accumulation of multiple genetic alterations is the most widely accepted model .
Recently, using state-of-the-art genome analysis techniques, a phenomenon termed “chromothripsis” was characterized in cancer genomes, defined by the occurrence of tens to hundreds of clustered genomic rearrangements, having arisen in a single catastrophic event . In this model, contiguous chromosomal regions are fragmented into many pieces, via presently unknown mechanisms. These segments are then randomly fused together by the cell’s DNA repair machinery. It has been proposed that this “shattering” and aberrant repair of a multitude of DNA fragments could provide an alternative oncogenetic route , in contrast to the step-by-step paradigm of cancer development [8–10]. The initial study reported 24 chromothripsis cases, with some evidence of a high prevalence in bone tumors .
Besides human cancers, recent studies have also reported chromothripsis events in germline and non-human genomes [11–13]. However, due to the overall low incidence of this phenomenon, most studies were limited to relatively small numbers of observed events. For example, in a study screening 746 multiple myelomas by SNP arrays, only 10 cases with chromothripsis-like genome patterns were detected . Larger sample numbers are required to gain further insights into features and mechanisms of these events in different cancers.
In contrast to a strict definition of chromothripsis events relying on sequencing based detection of specific genomic rearrangements , other studies [7, 14, 16] have described chromothripsis events based on genomic array analysis without support from whole genome sequencing data. Overall, the minimal consensus of array based studies is the detection of a large number of locally clustered CNA events. In Table 1, we provide an overview of studies which so far have reported instances of “chromothripsis” in human cancers [7, 9, 11, 13, 14, 16–35].
Here, we present a statistical model for the discovery of clustered genomic aberration patterns, similar to those previously labeled as “chromothripsis” events, from genomic array data sets. For the scope of this article, we introduce the term “chromothripsis-like patterns” (CTLP) when discussing those events.
Applying our methodology to 22,347 genomic arrays from 402 GEO (Gene Expression Omnibus) derived experimental series , we were able to detect 918 chromothripsis-like cases, and to determine the frequency and genomic distribution of CTLP events in this dataset. Our collection of oncogenomic array data represents 132 cancer types as defined using the ICD-O 3 (International Classification of Diseases for Oncology) coding scheme, enabling us to estimate the incidence of CTLP in diverse tumor types. Among the CTLP cases, varying distributions of fragmented chromosomal regions as well as an abundance of large non-CTLP copy number aberrations (CNA) regions were found, and the genomic context of chromothripsis-like events was investigated. Finally, we evaluated clinical associations of CTLP cottoning samples, based on the clinical information at hand. Overall, this study characterized heterogeneous features of chromothripsis-like events through a large-scale analysis of oncogenomic array data sets and provides a better understanding of clustered genomic copy number patterns in cancer development.
Detection of chromothripsis-like patterns from oncogenomic arrays
We collected 402 GEO series, encompassing 22,347 high quality genomic arrays of human cancer samples. A procedure was employed to detect CTLP from these arrays (Figure 1A). The annotated information of the arrays, including normalized probe intensity, segmentation data and quality evaluation, was obtained from our arrayMap database  (see Methods for array processing pipeline). After removing technical repeats (e.g. multiple platforms for one sample), a total of 18,394 cases representing 132 cancer types remained. The input data is summarized, at array and case-level, respectively, in Additional file 1: Table S1 and Additional file 2: Table S2. The segmentation data and array profiling can be accessed and visualized through the arrayMap website (http://www.arraymap.org).
According to previous studies, segmental copy number status changes and significant breakpoint clustering are two relevant features of chromothripsis [9, 23]. For an automatic identification of CTLP, we developed a scan-statistic based algorithm . We employed a maximum likelihood ratio score, which is commonly used to detect clusters of events in time and/or space and to determine their statistical significance  (see Methods). For each chromosome, the algorithm uses a series of sliding windows to identify the genomic region with the highest likelihood ratio as the CTLP candidate. In order to test the performance of the algorithm, 23 previously published chromothripsis cases with available raw array data were collected and used as a training set. This data contained 31 chromothriptic and 475 non-chromothriptic chromosomes that acted as positive and negative controls, respectively (Additional file 3: Table S3). Comparison of copy number status change times and likelihood ratios showed that chromothriptic chromosomes could reliably be distinguished from non-chromothriptic ones (Additional file 1: Figure S1). We generated a receiver operating characteristic (ROC) curve from the training set results, and selected cutoff values based on this curve (copy number status switch times ≥ 20 and log10 of likelihood ratio ≥ 8) (Figure 1B). Furthermore, the sliding window scan statistic accurately identified the genomic regions involved (Additional file 1: Figure S2). Applying this algorithm to the complete input data set, a total of 1,269 chromosomes from 918 cases passed our thresholds and were marked as CTLP events (Additional file 1: Figure S3, Additional file 4: Table S4).
Chromothripsis-like patterns across diverse tumor types
When evaluating the 1,269 CTLP events, we found a pronounced preference for some chromosomes; this preference showed only limited association with chromosome size (Figure 2A). CTLP occurred more frequently in chromosome 17 than in any other chromosome. This observation is in accordance with data reporting an association between chromothripsis and TP53 mutations in Sonic-Hedgehog medulloblastoma and acute myeloid leukemia . TP53 is located in the p arm of chromosome 17, and is involved in cell cycle control, genome maintenance and apoptosis [40, 41]. Our dataset showed TP53 losses in 438 out of 918 (~48%) CTLP cases, compared to 3,274 out of 17,476 (~19%) cases in the non-CTLP group (p < 2.2 × 10-16; two-tailed Fisher’s exact test; Additional file 2: Table S2). 45 of the 438 TP53 deletions were part of a CTLP, confirming TP53 mutation as a recurring event with possible involvement in CTLP formation. Other chromosomes with relatively high incidences of CTLP included chromosomes 8, 11 and 12.
In our study, genomic projection of regional CTLP frequencies revealed their heterogeneous distribution in different cancer types (Figure 2B). The total length of fragmented genomic regions (CNA level and interspersed normal segments) accounted for 1%-14% of the corresponding genomes (Figure 2C). The large size of our input data set, resulting in high number of CTLP cases, permitted an investigation of the frequency and genomic distribution of these patterns in different cancer types. Our input samples represented 65 “diagnostic groups”, as defined by a combination of ICD-O morphology and topography codes. The majority of samples (18,238) came from 50 diagnostic groups, each represented by more than 25 arrays. We observed in total of 918 CTLP events across all 18,394 cases, representing an overall ~ 5% prevalence. The 17 diagnostic groups represented by at least 45 cases, and having frequencies higher than 4% (CTLP high) are listed in Table 2 (full list in Additional file 5: Table S5).
The initial study by Stephens et al. hypothesized that chromothripsis has a high incidence in bone tumors . Notably, several soft tissue tumor types appeared in our “CTLP high” frequency set (6 out of 17), including the 3 types with the highest scores. Moreover, the high prevalence of CTLP in soft tissue tumors was reflected in the ICD-O specific frequencies (Additional file 6: Table S6). The genesis and/or effect of multiple localized chromosomal breakage-fusion events may be related to specific molecular mechanisms in those tumor types. Notably, gene fusions are well-documented recurring events in sarcomas , in contrast to most other solid tumors, and a local clustering of genomic re-arrangements had been previously reported for liposarcomas . So far, more than 40 fusion genes have been recognized in sarcomas and treated as potential diagnostic and prognostic markers . Possibly, the double-strand breaks and random fragment stitching events in chromothripsis-like events promote the generation of oncogenic fusion genes . Further sequencing-based efforts will be needed to identify the true extent of fusion gene generation and to elucidate their functional impact in chromothripsis-like cases.
Genomic context of chromothripsis-like events
It has been hypothesized that chromothripsis is a one-off cellular crisis generating a malignant clone in a very short time [9, 44]. However, in many of the CTLP samples in our study, highly fragmented chromosomal regions were embedded in larger CNA regions showing variations in patterns and overall extent (Figure 3A). To test whether CTLP generating events are associated with overall genomic instability, we examined the extent of all copy number imbalances detected in our dataset. Comparing the 918 CTLP positive arrays with the remainder of 17,476 CTLP negative arrays, we found that CTLP samples tended to have higher proportions of CNA coverage in their genomes (p < 2.2 × 10-16; Kolmogorov-Smirnov test) (Figure 3B,C). This indicated that chromothripsis-like events frequently co-occur with other types of copy number aberrations. Plausible and non-exclusive explanations could be that CTLP might frequently arise due to previously established errors in the maintenance of genomic stability, or that chromothriptic aberrations involving genomic maintenance genes may predispose to the acquisition of additional CNA. For those frequent cases exhibiting additional non-CTLP CNA events, their possible contribution to oncogenesis has to be considered when modeling the role of chromothripsis-like events in cancer development.
Potential mechanisms for chromosome shattering
While the mechanism(s) responsible for the generation of chromothripsis remain elusive, a number of studies have proposed hypotheses including ionizing radiation , DNA replication stress , breakage-fusion-bridge cycles [9, 23, 46], premature chromosome compaction , failed apoptosis [48, 49] and micronuclei formation . Some of these proposed mechanisms are associated with features which could be addressed in our study.
In our dataset, although most (76%) CTLP cases presented single chromosome CTLP events, in approximately 24% CTLP affected at least 2 chromosomes (Figure 4A). For certain candidate mechanisms, e.g. micro-nucleus formation due to mitotic delay, this observation would imply more than one event, whereas the observation appears compatible with e.g. an aborted apoptosis process.
For relating to cytogenetic aberration mechanisms, an additional parameter explored by us was the extent of CTLP regions when normalized to their respective chromosomes. Affected regions were classified into the categories “arm-level” (≥ 90% arm length), “chromosome-level” (≥ 80% chromosome length) or “localized” (Figure 4B). Arm-level CTLP events were observed with a relatively high frequency (~19%). In the arm-level patterns, the CTLP rearrangements were concentrated in one chromosome arm, with the other arm of the same chromosome remaining normal or showing isolated CNA. Since arm-level events involve both peri-centromeric and telomeric regions, cytogenetic events involving these chromosomal structures present themselves as possible causative mechanisms.
Notably, one model that closely conforms to this pattern involves breakage-fusion-bridge cycles [9, 23, 46, 47, 51–54]. In general, such cycles start with telomere loss and end-to-end chromosome fusions. When the dicentric chromosomes are formed and pulled to opposite poles during anaphase, a double-strand DNA break acts as starting point for the next cycle. Chromosomal rearrangements would gradually accumulate during the additional cycles, and should be concentrated in one chromosome arm, particularly near the affected telomere. In our dataset, up to 44% of all CTLP chromosomes involved telomere regions. We performed simulations to explore whether this telomere enrichment could be explained by chance. In brief, for each sample, we retained the location of CTLP region in the genome and shuffled the telomere position of each chromosome while keeping the length of each chromosome constant. In contrast to the actual observations, the simulation did not result in telomeric CTLP enrichment (p < 0.0001; 10,000 simulations; see Methods). CTLP generation through breakage-fusion-bridge cycles would be a viable candidate hypothesis compatible both with the statistically significant telomere enrichment and the high proportion of arm-level pulverization. However, for arm-level CTLP events centromere-related instability mechanisms should also be considered for future discussions.
Based on clinical associations of “chromothripsis” patterns, it has been claimed that these events may correlate with a poor outcome in the context of the respective tumor type [14, 25, 55]. In our meta-analysis, we explored a general relation of CTLP with clinical parameters, across the wide range of cancer entities reflected in our input data set. Clinical data was collected from GEO and from the publications of the respective series (Additional file 2: Table S2 and Additional file 1: Table S7) and parameters available for at least 1,000 cases were considered. From our dataset, CTLP seemed to occur at a more advanced patient age as compared to non-CTLP samples (Figure 5A) . CTLP mainly occurred at stage II and III (70%), which was significantly different from the stage distribution of total samples (55.2%) (p = 0.0149; Chi-square test) (Figure 5B). No difference of grade distribution was observed in our dataset (p = 0.425; Chi-square test) where CTLP samples showed a predominance for grades 2 and 3, similar to the bulk of all samples (~80%). We also found that CTLP was overrepresented in cell lines compared to primary tumors (p < 2.2 × 10-16; two-tailed Fisher’s exact test).
For a subset of 1,203 patients, we were able to determine basic follow-up parameters (follow-up time and survival status). For 72 of these individuals, CTLP was detected in their tumor genomes. Notably, patients with CTLP survived a significantly shorter time than those without this phenomenon (p = 0.0039; log-rank test; Figure 5C). Note that this analysis was based on a sample of convenience averaged over cancers, stages and grades. If we break down this dataset by cancer type, the numbers are not large enough to provide statistical confidence (Additional file 1: Figure S4). While the cancer type independent association of CTLP patterns and poor outcome is intriguing, potential clinical effects of chromothripsis-like genome disruption should be evaluated in larger and clinically more homogeneous data sets.
Sensitivity of array platforms for detection of chromothripsis-like patterns
Presumed chromothripsis events have been reported from genomic datasets generated through different array and sequencing based techniques (see Table 1). We performed an analysis of the platform distribution of our CTLP samples, to estimate the detection bias among various genomic array platforms. As the resolution of a platform depends both on type and density of the probes on an array, we divided the platforms into 4 groups according to their probe numbers and techniques (BAC/P1, DNA/cDNA, oligonucleotide ≤ 200 K and oligonucleotide > 200 K). Although CTLP were detected by all types of genomic arrays, a higher fraction of CTLP samples was found using data from high resolution oligonucleotide arrays (Figure 6), possibly due to increased sensitivity related to higher probe density. Indeed, when performing platform simulations, the sensitivity of CTLP detection improved with increasing probe numbers (Additional file 1: Figures S5 and S6; see Methods). According to these simulations, array platforms consisting of more than 250 k probes should be preferred when screening for CTLP events. Since our analysis relied on a variety of array platforms, we can assume that the overall prevalence of CTLP in cancer is higher than our reported 5% of samples.
The description of the “chromothripsis” phenomenon has initiated a vital discussion about clustered genomic aberration events and their role in cancer development [52, 55, 56]. While chromothripsis senso stricto has been characterized as a type of focally clustered genomic aberrations generated in a one time cellular event and being limited to a defined set of copy number states , other operational definitions have been employed based on clustered aberrations [7, 16, 23, 45, 55, 57]. It seems likely that some of the previous discussions of “chromothripsis” referred to a number of underlying event types, all resulting in localized genome fragmentation and re-assembly events. For instance, DNA double strand break and end-joining-mediated repair may result in a restricted number of copy number levels, whereas aberrant replication based mechanisms will lead to a more diverse set of copy number aberrations [45, 55]. Here, we introduce the term “chromothripsis-like patterns” (CTLP) when referring to clustered genomic events, to accommodate both common labelling and presumed biological variability of clustered genomic copy number aberrations.
At this time, due to the lack of sufficiently large number of cancer data sets from whole-genome sequencing analyses, a meta-analysis of “strict” chromothripsis cases is not feasible. We have followed a pragmatic approach to quantify the occurrence of CTLP from genomic array data sets. In our algorithm, we implemented the two most significant features shared by different operational chromothripsis definitions, namely copy number status changes and breakpoints clustering, which can be well measured by array based technologies. Previous studies provided various algorithms to detect “chromothripsis” events [9, 15, 58]. However, besides its application to an extensive data set, the specific advantage of our method presented here is its ability to detect regions of shattering with limited influence from the varying sizes of affected chromosomes. Since the step length of our scanning window is 5 Mb, theoretically the detected CTLP regions are within an accuracy of ±5 Mb. Note that the performance of this algorithm may be influenced by poor quality arrays, especially those with highly scattered and unevenly distributed probe signal intensities.
In this study, we identified 918 CTLP-containing genome profiles, based on an analysis of copy number aberration patterns from 22,347 oncogenomic arrays and representing 132 cancer types. Despite the inherent limitations of such a meta-analysis approach, we were able to provide several new insights regarding the distribution of clustered genomic copy number aberrations and to produce a comprehensive estimate of CTLP incidence in a large range of cancer entities.
In our analysis, CTLP exhibited an uneven distribution along tumor genomes, with disease related local enrichment. These “CTLP dense” chromosomal regions may reveal associations between disease related cancer associated genes and molecular mechanisms behind genome shattering events. This potential correlation is exemplified by the prevalence of mutant TP53 in “chromothriptic” Li-Fraumeni syndrome associated Sonic-Hedgehog medulloblastomas . As the extent of CTLP related deletions of the TP53 locus indicates, CTLP related gene dosage changes may predispose to double-hit effects on specific tumor suppressors. In contrast, we found regional enrichment for CTLP with pre-dominant copy number gains on chromosomes 8, 11 and 12. In the initial study, chromosome 8 shattering was found in a small cell lung cancer cell line . This event contained the MYC oncogene, which had be shown to be amplified in 10-20% of small cell lung cancers . Moreover, strong overexpression of MYC involved in a “chromothripsis” region was also detected in a neuroblastoma sample . In a study of colorectal tumors, chromosomes 8 and 11 were involved in concurrent pulverization events with generation of fusion genes, involving e.g. SAPS3 and ZFP91. In a study on hepatocellular carcinoma, CCND1 amplification was embedded within a “chromothriptic” event on chromosome 11 . Therefore, the overall uneven distribution of CTLP may point to specific driver mutations that contribute to CTLP generation, and/or to a class of cancer promoting mutations based on regional genome shattering events.
When comparing cancer types, we observed a high CTLP prevalence in a limited set of entities, particularly among soft tissue tumors. This finding supports and improves upon a previous prediction of particularly high “chromothripsis” rate in bone tumors . Also, the uneven distribution of CTLP is a strong indicator for a disease related selection of specific genomic aberrations, supporting their involvement in the oncogenetic process.
In the initial study, the authors stated that chromothripsis could be a one-off cataclysmic event that generates multiple concurrent mutations and rearrangements . However, the role of chromothripsis in terms of “shortcut” to cancer genome generation is still elusive. We note that additional and complex non-CTLP genome re-arrangements exist in the majority of CTLP samples. The number and uneven distribution of affected chromosomes in CTLP supports the biological heterogeneity of cancer samples with CTLP containing genome profiles. Furthermore, the normalized spatial distribution of shattered chromosomal regions, as well as the observed significant overlap between telomere and pulverized regions is supportive of breakage-fusion-bridge cycles as one of the mechanisms acting in a subset of samples. Further efforts are needed to investigate the temporal order of chromothripsis and non-chromothripsis events in complex samples, and to substantiate the existence of a dichotomy between “one-off” chromothripsis and other classes of localized genome shattering events, all resulting in clustered genomic copy number aberrations.
In our associated clinical data, CTLP were related to more advanced tumor stages and overall worse prognosis when compared to non-CTLP cases. One possible explanation is that the numerous concurrent genetic alterations induced by genome shattering events disturb a large number of genes and contribute to more aggressive tumor phenotypes. By themselves, these observations do not differentiate whether CTLP arise as a early events promoting aggressive tumor behavior with fast growth rates and reduced response rates to therapeutic interventions; or whether this observation relates to underlying primary mutations predisposing to genomic instability, aggressive clinical behavior and CTLP as a resulting epiphenomenon. Interestingly, the high rate of TP53 involvement by itself would support both possibilities for this gene, i.e. chromothripsis as result of TP53 mutation as well as chromothriptic events with TP53 locus involvement promoting an aggressive clinical behavior.
From Table 1 we may notice that the array based technologies are, in general, less sensitive than whole-genome sequencing data for calling chromothripsis-like events. This is partly due to the very limited ability of most array platforms to detect balanced genomic aberrations, such as inversions and translocation events. In the future, the accumulation of large-scale sequencing data should be able to provide further insights into localise genome shattering events.
CTLP represent a striking feature occurring in a limited set of cancer genomes, and can be detected from array based copy number screening experiments, using biostatistical methods. The observed clustered genomic copy number aberrations may reflect heterogenous biological phenomena beyond a single class of “chromothripsis” events, and probably vary in their specific impact on oncogenesis. Fragmentation hotspots derived from our large-scale data set could promote the detection of markers associated with genome shattering, or may be used for assigning disease related effects to CTLP-induced genomic events.
Genome-wide microarrays and data preparation
In this study, we screened 402 GEO series , encompassing 22,347 high quality genomic arrays (Additional file 2: Table S2). All selected arrays were human cancer samples hybridized onto genome-wide array platforms. The normalized probe intensities, segmented data and quality information were obtained from the arrayMap database, which is a publicly available reference database for copy number profiling data . In brief, the annotated data was obtained by the following processing pipeline: for Affymetrix arrays, the aroma.affymetrix R package was employed to generate log2 scale probe level data ; for non-Affymetrix arrays, available probe intensity files were processed; CBS (Circular Binary Segmentation) algorithm  was performed to obtain segmented copy number data. The probe locations were mapped on the human reference genome (UCSC build hg18). In the case of technical repeats (e.g. one sample was hybridized on multiple platforms), only one of the arrays was considered for analysis (preferably with the highest resolution and/or best overall quality). The array profiling can be visualized through the arrayMap website.
Scan-statistic based chromothripsis-like pattern detection algorithm
To detect chromothripsis-like cases, we formulated an algorithm identifying clustering of copy number status changes in the genome. Several parameters were considered to define the alteration of copy number status:
The thresholds of log2 ratio for calling genomic gains and losses. These values were array specific and stored in arrayMap database. For each array, the thresholds were obtained from related publications or empirically assigned based on the log2 ratio distribution.
ii) The intensity distance between adjacent segments. Due to local correlation effects between probes or the existence of background noise, the segmentation profiles occasionally exhibit subtle striation patterns. This pattern is constituted with a large number of small segments, which is unlikely to be a biological phenomenon. To reduce artificial copy number status change, the distance of signal intensity between adjacent segments was used as a threshold, and defined here as the sum of the absolute values to call gains and losses. If the distance of two adjacent segments differed by less than this threshold, the copy number status change was not considered.
iii) Segment size. The resolution of a platform depends on the density of probes on the array. One of the platforms with the highest density in our dataset is Affymetrix SNP6, which contains 1.8 million polymorphic and non-polymorphic markers with the mean inter-marker distance of 1.7 kb. It provides a practical resolution of 10 to 20 kb. Therefore, in this study, segments smaller than 10 kb were removed.
In order to identify clustering of copy number status changes, a scan-statistic likelihood ratio based on the Poisson model was employed . In our implementation, a fixed-size window was moved along the genome and for each window the likelihood ratio was computed from observed and expected copy number status change times. Let G be the genome represented linearly, and W is a window with fixed size. As the window W moves over G, it defines a collection of zones Z, where Z ⊂ G. Let n W denotes the observed copy number status change times in window W, and n G the total number of observed status change in G. μ W is the expected status change times in window W, and is calculated as W/G × n. The likelihood function is expressed as
This function detects the zone that is most likely to be a cluster.
Due to lack of prior knowledge about the size of W, we predefined a series of window sizes from 30 Mb to 247 Mb (Additional file 1: Table S8), which were based on chromosome sizes. The scanning process was repeated for the series of window sizes for each sample. When W moved over G, the step length was set to 5 Mb, and there was no overlap between different chromosomes in window W. In this way, for each genome, the collection of Z contained 4,414 windows in various sizes. The window that maximized the likelihood ratio defined the most probable CTLP region. Thus it can detect both the location and the size of the cluster. When analyzing the complete input dataset, the window with the highest likelihood ratio was selected as a candidate of chromothripsis for each chromosome of the 22,347 arrays. The R script for detecting CTLP cases can be provided upon request.
Analysis of fragment enrichment in telomere region
Telomere positions were simulated to test the DNA fragment enrichment. For each case, the CTLP region was kept at its location in the genome. Locations of chromosome terminals were randomly selected while the length of each chromosome was kept. A genomic interval of 5 Mb from the chromosome terminal was considered as the telomere region. The simulation was performed 10,000 times.
Simulation of platform resolution
The 15 Affymetrix SNP6 CTLP chromosomes in the training set were used for simulation. For each genome, a certain number of probes were randomly chosen from the original probe set. These probes generally represented the profile that the same sample was hybridized on a platform with corresponding resolution. Then the CTLP pattern detection algorithm was applied on the simulated arrays, and the number of cases that passed the thresholds were recorded.
The significance in the number of CTLP cases with TP53 loss in comparison to those in non-CTLP cases was assessed using two-tailed Fisher's exact test. We performed a Kolmogorov-Smirnov test to compare the distributions of copy number aberration proportions in the genome between CTLP and the other cases. The Chi-square test was used to assess the significance in the distribution of both patient stage and grade in CTLP and the whole input dataset. The associations between the number of cell lines in CTLP and non-CTLP cases were tested by two-tailed Fisher's exact test. The difference in the survival curves between two subgroups was evaluated by the log-rank test.
Albertson DG, Collins C, McCormick F, Gray JW: Chromosome aberrations in solid tumors. Nat Genet. 2003, 34 (4): 369-376. 10.1038/ng1215.
Baudis M: Genomic imbalances in 5918 malignant epithelial tumors: an explorative meta-analysis of chromosomal CGH data. BMC Cancer. 2007, 7 (1): 226-10.1186/1471-2407-7-226.
Yates LR, Campbell PJ: Evolution of the cancer genome. Nat Rev Genet. 2012, 13 (11): 795-806. 10.1038/nrg3317.
Chen JM, Cooper DN, Férec C, Kehrer-Sawatzki H, Patrinos GP: Genomic rearrangements in inherited disease and cancer. Semin Cancer Biol. 2010, 20 (4): 222-233. 10.1016/j.semcancer.2010.05.007.
Chin L, Gray JW: Translating insights from the cancer genome into clinical practice. Nature. 2008, 452 (7187): 553-563. 10.1038/nature06914.
Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm JS, Dobson J, Urashima M, Henry KTM, Pinchback RM, Ligon AH, Cho YJ, Haery L, Greulich H, Reich M, Winckler W, Lawrence MS, Weir BA, Tanaka KE, Chiang DY, Bass AJ, Loo A, Hoffman C, Prensner J, Liefeld T, Gao Q, Yecies D, Signoretti S, et al: The landscape of somatic copy-number alteration across human cancers. Nature. 2010, 463 (7283): 899-905. 10.1038/nature08822.
Kim TM, Xi R, Luquette LJ, Park RW, Johnson MD, Park PJ: Functional genomic analysis of chromosomal aberrations in a compendium of 8000 cancer genomes. Genome Res. 2013, 23 (2): 217-227. 10.1101/gr.140301.112.
Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature. 2009, 458 (7239): 719-724. 10.1038/nature07943.
Stephens PJ, Greenman CD, Fu B, Yang F, Bignell GR, Mudie LJ, Pleasance ED, Lau KW, Beare D, Stebbings LA, Mclaren S, Lin ML, Mcbride DJ, Varela I, Nik-Zainal S, Leroy C, Jia M, Menzies A, Butler AP, Teague JW, Quail MA, Burton J, Swerdlow H, Carter NP, Morsberger LA, Iacobuzio-Donahue C, Follows GA, Green AR, Flanagan AM, Stratton MR, et al: Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011, 144: 27-40. 10.1016/j.cell.2010.11.055.
Kitada K, Taima A, Ogasawara K, Metsugi S, Aikawa S: Chromosome-specific segmentation revealed by structural analysis of individually isolated chromosomes. Genes Chromosom Cancer. 2011, 50: 217-227.
Chiang C, Jacobsen JC, Ernst C, Hanscom C, Heilbut A, Blumenthal I, Mills RE, Kirby A, Lindgren AM, Rudiger SR, Mclaughlan CJ, Bawden CS, Reid SJ, Faull RLM, Snell RG, Hall IM, Shen Y, Ohsumi TK, Borowsky ML, Daly MJ, Lee C, Morton CC, Macdonald ME, Gusella JF, Talkowski ME: Complex reorganization and predominant non-homologous repair following chromosomal breakage in karyotypically balanced germline rearrangements and transgenic integration. Nat Genet. 2012, 44 (4): 390-397. 10.1038/ng.2202.
Deakin JE, Bender HS, Pearse AM, Rens W, O’brien PCM, Ferguson-Smith MA, Cheng Y, Morris K, Taylor R, Stuart A, Belov K, Amemiya CT, Murchison EP, Papenfuss AT, Graves JAM: Genomic restructuring in the Tasmanian devil facial tumour: chromosome painting and gene mapping provide clues to evolution of a transmissible tumour. PLoS Genet. 2012, 8 (2): e1002483-10.1371/journal.pgen.1002483.
Kloosterman WP, Guryev V, Roosmalen MV, Duran KJ, Bruijn ED, Bakker SCM, Letteboer T, Nesselrooij BV, Hochstenbach R, Poot M, Cuppen E: Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. Hum Mol Genet. 2011, 20 (10): 1916-1924. 10.1093/hmg/ddr073.
Magrangeas F, Avet-Loiseau H, Munshi NC, Minvielle S: Chromothripsis identifies a rare and aggressive entity among newly diagnosed multiple myeloma patients. Blood. 2011, 118 (3): 675-678. 10.1182/blood-2011-03-344069.
Korbel JO, Campbell PJ: Criteria for inference of chromothripsis in cancer genomes. Cell. 2013, 152 (6): 1226-1236. 10.1016/j.cell.2013.02.023.
Northcott PA, Shih DJH, Peacock J, Garzia L, Morrissy AS, Zichner T, Stuetz AM, Korshunov A, Reimand J, Schumacher SE: Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature. 2012, 488 (7409): 49-56. 10.1038/nature11327.
Le LP, Nielsen GP, Rosenberg AE, Thomas D, Batten JM, Deshpande V, Schwab J, Duan Z, Xavier RJ, Hornicek FJ, Iafrate AJ: Recurrent chromosomal copy number alterations in sporadic chordomas. PLoS ONE. 2011, 6 (5): e18846-10.1371/journal.pone.0018846.
Bass AJ, Lawrence MS, Brace LE, Ramos AH, Drier Y, Cibulskis K, Sougnez C, Voet D, Saksena G, Sivachenko A, Jing R, Parkin M, Pugh T, Verhaak RG, Stransky N, Boutin AT, Barretina J, Solit DB, Vakiani E, Shao W, Mishina Y, Warmuth M, Jimenez J, Chiang DY, Signoretti S, Kaelin WG, Spardy N, Hahn WC, Hoshida Y, Ogino S, et al: Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat Genet. 2011, 43 (10): 964-968. 10.1038/ng.936.
Kloosterman WP, Hoogstraat M, Paling O, Tavakoli-Yaraki M, Renkens I, Vermaat JS, van Roosmalen MJ, van Lieshout S, Nijman IJ, Roessingh W, van’t Slot R, van de Belt J, Guryev V, Koudijs M, Voest E, Cuppen E: Chromothripsis is a common mechanism driving genomic rearrangements in primary and metastatic colorectal cancer. Genome Biol. 2011, 12 (10): R103-10.1186/gb-2011-12-10-r103.
Zhang J, Ding L, Holmfeldt L, Wu G, Heatley SL, Payne-Turner D, Easton J, Chen X, Wang J, Rusch M, Lu C, Chen SC, Wei L, Collins-Underwood JR, Ma J, Roberts KG, Pounds SB, Ulyanov A, Becksfort J, Gupta P, Huether R, Kriwacki RW, Parker M, Mcgoldrick DJ, Zhao D, Alford D, Espy S, Bobba KC, Song G, Pei D, et al: The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature. 2012, 481 (7380): 157-163. 10.1038/nature10725.
Kitada K, Aida S, Aikawa S: Coamplification of multiple regions of chromosome 2, including MYCN, in a single patchwork amplicon in cancer cell lines. Cytogenet Genome Res. 2012, 136: 30-37. 10.1159/000334349.
Poaty H, Coullin P, Peko JF, Dessen P, Diatta AL, Valent A, Leguern E, Prévot S, Gombé-Mbalawa C, Candelier JJ, Picard JY, Bernheim A: Genome-wide high-resolution aCGH analysis of gestational choriocarcinomas. PLoS ONE. 2012, 7: e29426-10.1371/journal.pone.0029426.
Rausch T, Jones DTW, Zapatka M, Stütz AM, Zichner T, Weischenfeldt J, Jäger N, Remke M, Shih D, Northcott PA, Pfaff E, Tica J, Wang Q, Massimi L, Witt H, Bender S, Pleier S, Cin H, Hawkins C, Beck C, Deimling AV, Hans V, Brors B, Eils R, Scheurlen W, Blake J, Benes V, Kulozik AE, Witt O, Martin D, et al: Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell. 2012, 148( (1–2): 59-71.
Jiang Z, Jhunjhunwala S, Liu J, Haverty PM, Kennemer MI, Guan Y, Lee W, Carnevali P, Stinson J, Johnson S, Diao J, Yeung S, Jubb A, Ye W, Wu TD, Kapadia SB, Sauvage FJD, Gentleman RC, Stern HM, Seshagiri S, Pant KP, Modrusan Z, Ballinger DG, Zhang Z: The effects of hepatitis B virus integration into the genomes of hepatocellular carcinoma patients. Genome Res. 2012, 22 (4): 593-601. 10.1101/gr.133926.111.
Molenaar JJ, Koster J, Zwijnenburg DA, van Sluis P, Valentijn LJ, van der Ploeg I, Hamdi M, van Nes J, Westerman BA, van Arkel J, Ebus ME, Haneveld F, Lakeman A, Schild L, Molenaar P, Stroeken P, van Noesel MM, Øra I, Santo EE, Caron HN, Westerhout EM, Versteeg R: Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature. 2012, 483 (7391): 589-593. 10.1038/nature10910.
Lapuk AV, Wu C, Wyatt AW, Mcpherson A, Mcconeghy BJ, Brahmbhatt S, Mo F, Zoubeidi A, Anderson S, Bell RH, Haegert A, Shukin R, Wang Y, Fazli L, Hurtado-Coll A, Jones EC, Hach F, Hormozdiari F, Hajirasouliha I, Boutros PC, Bristow RG, Zhao Y, Marra MA, Fanjul A, Maher CA, Chinnaiyan AM, Rubin MA, Beltran H, Sahinalp SC, Gleave ME, et al: From sequence to molecular pathology, and a mechanism driving the neuroendocrine phenotype in prostate cancer. J Pathol. 2012, 227 (3): 286-297. 10.1002/path.4047.
Berger MF, Hodis E, Heffernan TP, Deribe YL, Lawrence MS, Protopopov A, Ivanova E, Watson IR, Nickerson E, Ghosh P, Zhang H, Zeid R, Ren X, Cibulskis K, Sivachenko AY, Wagle N, Sucker A, Sougnez C, Onofrio R, Ambrogio L, Auclair D, Fennell T, Carter SL, Drier Y, Stojanov P, Singer MA, Voet D, Jing R, Saksena G, Barretina J, et al: Melanoma genome sequencing reveals frequent PREX2 mutations. Nature. 2012, 485 (7399): 502-506.
Natrajan R, Mackay A, Lambros MB, Weigelt B, Wilkerson PM, Manie E, Grigoriadis A, A’hern R, Groep PVD, Kozarewa I, Popova T, Mariani O, Turajlic S, Furney SJ, Marais R, Rodruigues DN, Flora AC, Wai P, Pawar V, Mcdade S, Carroll J, Stoppa-Lyonnet D, Green AR, Ellis IO, Swanton C, Diest PV, Delattre O, Lord CJ, Foulkes WD, Vincent-Salomon A, et al: A whole-genome massively parallel sequencing analysis of BRCA1 mutant oestrogen receptor-negative and -positive breast cancers. J Pathol. 2012, 227: 29-41. 10.1002/path.4003.
Nik-Zainal S, Loo PV, Wedge DC, Alexandrov LB, Greenman CD, Lau KW, Raine K, Jones D, Marshall J, Ramakrishna M, Shlien A, Cooke SL, Hinton J, Menzies A, Stebbings LA, Leroy C, Jia M, Rance R, Mudie LJ, Gamble SJ, Stephens PJ, Mclaren S, Tarpey PS, Papaemmanuil E, Davies HR, Varela I, Mcbride DJ, Bignell GR, Leung K, Butler AP, et al: The life history of 21 breast cancers. Cell. 2012, 149( (5): 994-1007.
Kloosterman WP, Tavakoli-Yaraki M, Roosmalen MJV, Binsbergen EV, Renkens I, Duran K, Ballarati L, Vergult S, Giardino D, Hansson K, Ruivenkamp CAL, Jager M, Haeringen AV, Ippel EF, Haaf T, Passarge E, Hochstenbach R, Menten B, Larizza L, Guryev V, Poot M, Cuppen E: Constitutional chromothripsis rearrangements involve clustered double-stranded DNA breaks and nonhomologous repair mechanisms. Cell Reports. 2012, 1 (6): 648-655. 10.1016/j.celrep.2012.05.009.
Wu C, Wyatt AW, Mcpherson A, Lin D, Mcconeghy BJ, Mo F, Shukin R, Lapuk AV, Jones SJM, Zhao Y, Marra MA, Gleave ME, Volik SV, Wang Y, Sahinalp SC, Collins CC: Poly-gene fusion transcripts and chromothripsis in prostate cancer. Genes Chromosom Cancer. 2012, 51 (12): 1144-1153. 10.1002/gcc.21999.
Jones DTW, Jäger N, Kool M, Zichner T, Hutter B, Sultan M, Cho YJ, Pugh TJ, Hovestadt V, Stütz AM, Rausch T, Warnatz HJ, Ryzhova M, Bender S, Sturm D, Pleier S, Cin H, Pfaff E, Sieber L, Wittmann A, Remke M, Witt H, Hutter S, Tzaridis T, Weischenfeldt J, Raeder B, Avci M, Amstislavskiy V, Zapatka M, Weber UD, et al: Dissecting the genomic complexity underlying medulloblastoma. Nature. 2012, 488 (7409): 100-105. 10.1038/nature11284.
Stevens-Kroef M, Weghuis DO, Croockewit S, Derksen L, Hooijer J, Elidrissi-Zaynoun N, Siepman A, Simons A, Kessel AGV: High detection rate of clinically relevant genomic abnormalities in plasma cells enriched from patients with multiple myeloma. Genes Chromosom Cancer. 2012, 51 (11): 997-1006. 10.1002/gcc.21982.
Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J, Chen K, Walker J, Mcdonald S, Bose R, Ornitz D, Xiong D, You M, Dooling DJ, Watson M, Mardis ER, Wilson RK: Genomic landscape of Non-small cell lung cancer in smokers and never-smokers. Cell. 2012, 150 (6): 1121-1134. 10.1016/j.cell.2012.08.024.
Zehentner BK, Hartmann L, Johnson KR, Stephenson CF, Chapman DB, Baca MED, Wells DA, Loken MR, Tirtorahardjo B, Gunn SR, Lim L: Array-based karyotyping in plasma cell neoplasia after plasma cell enrichment increases detection of genomic aberrations. Am J Clin Pathol. 2012, 138 (4): 579-589. 10.1309/AJCPKW31BAIMVGST.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A: NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013, 41 (D1): D991-D995. 10.1093/nar/gks1193.
Cai H, Kumar N, Baudis M: arrayMap: a reference resource for genomic copy number imbalances in human malignancies. PLoS ONE. 2012, 7 (5): e36944-10.1371/journal.pone.0036944.
Naus JI: The distribution of the size of the maximum cluster of points on a line. J Am Stat Assoc. 1965, 60: 532-538. 10.1080/01621459.1965.10480810.
Kulldorff M: A spatial scan statistic. Commun statist. 1997, 26 (6): 1481-1496. 10.1080/03610929708831995.
Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA: COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011, 39 (Database): D945-D950. 10.1093/nar/gkq929.
Vogelstein DLB, Levine AJ: Surfing the p53 network. Nature. 2000, 408: 307-310. 10.1038/35042675.
Mitelman F, Johansson B, Mertens F: The impact of translocations and gene fusions on cancer causation. Nat Rev Cancer. 2007, 7 (4): 233-245. 10.1038/nrc2091.
Taylor BS, Barretina J, Maki RG, Antonescu CR, Singer S, Ladanyi M: Advances in sarcoma genomics and new therapeutic targets. Nat Rev Cancer. 2011, 11 (8): 541-557. 10.1038/nrc3087.
Maher CA, Wilson RK: Chromothripsis and human disease: piecing together the shattering process. Cell. 2012, 148 (1–2): 29-32.
Liu P, Erez A, Nagamani SCS, Dhar SU, Kołodziejska KE, Dharmadhikari AV, Cooper ML, Wiszniewska J, Zhang F, Withers MA, Bacino CA, Campos-Acevedo LD, Delgado MR, Freedenberg D, Garnica A, Grebe TA, Hernández-Almaguer D, Immken L, Lalani SR, Mclean SD, Northrup H, Scaglia F, Strathearn L, Trapane P, Kang SHL, Patel A, Cheung SW, Hastings PJ, Stankiewicz P, Lupski JR, et al: Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell. 2011, 146 (6): 889-903. 10.1016/j.cell.2011.07.042.
Sorzano COS, Pascual-Montano A, de Diego AS, Martinez-A C, van Wely KH: Chromothripsis: Breakage-fusion-bridge over and over again. Cell Cycle. 2013, 12 (13): 1-8.
Meyerson M, Pellman D: Cancer genomes evolve by pulverizing single chromosomes. Cell. 2011, 144: 9-10. 10.1016/j.cell.2010.12.025.
Fullwood MJ, Lee J, Lin L, Li G, Huss M, Ng P, Sung WK, Shenolikar S: Next-generation sequencing of apoptotic DNA breakpoints reveals association with actively transcribed genes and gene translocations. PLoS ONE. 2011, 6 (11): e26054-10.1371/journal.pone.0026054.
Tubio XEJ: When catastrophe strikes a cell. Nature. 2011, 24: 476-477.
Crasta K, Ganem NJ, Dagher R, Lantermann AB, Ivanova EV, Pan Y, Nezi L, Protopopov A, Chowdhury D, Pellman D: DNA breaks and chromosome pulverization from errors in mitosis. Nature. 2012, 482 (7383): 53-58. 10.1038/nature10802.
Artandi SE, Depinho RA: Telomeres and telomerase in cancer. Carcinogenesis. 2010, 31: 9-18. 10.1093/carcin/bgp268.
Holland AJ, Cleveland DW: Chromoanagenesis and cancer: mechanisms and consequences of localized, complex chromosomal rearrangements. Nat Med. 2012, 18 (11): 1630-1638. 10.1038/nm.2988.
McClintock B: The production of homozygous deficient tissues with mutant characteristics by means of the aberrant mitotic behavior of ring-shaped chromosomes. Genetics. 1938, 23: 315-376.
McClintock B: The stability of broken ends of chromosomes in Zea Mays. Genetics. 1941, 26: 234-282.
Forment JV, Kaidi A, Jackson SP: Chromothripsis and cancer: causes and consequences of chromosome shattering. Nat Rev Cancer. 2012, 12 (10): 663-670. 10.1038/nrc3352.
Jones MJK, Jallepalli PV: Chromothripsis: chromosomes in crisis. Dev Cell. 2012, 23 (5): 908-917. 10.1016/j.devcel.2012.10.010.
Malhotra A, Lindberg M, Faust GG, Leibowitz ML, Clark RA, Layer RM, Quinlan AR, Hall IM: Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 2013, 23 (5): 762-776. 10.1101/gr.143677.112.
Zack TI, Schumacher SE, Carter SL, Cherniack AD, Saksena G, Tabak B, Lawrence MS, Zhang CZ, Wala J, Mermel CH, Sougnez C, Gabriel SB, Hernandez B, Shen H, Laird PW, Getz G, Meyerson M, Beroukhim R: Pan-cancer patterns of somatic copy number alteration. Nat Genet. 2013, 45 (10): 1134-1140. 10.1038/ng.2760.
Sher T, Dy GK, Adjei AA: Small cell lung cancer. Mayo Clin Proc. 2008, 83 (3): 355-367. 10.4065/83.3.355.
Bengtsson H, Simpson K, Bullard J, Hansen K: Tech Report #745 Department of Statistics. Aroma.Affymetrix: a genetic framework in R for analyzing small to very large affymetrix data sets in bounded memory. 2008, Berkeley: University of California
Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004, 5 (4): 557-572. 10.1093/biostatistics/kxh008.
The authors would like to thank Henrik Bengtsson and Ni Ai for useful discussions.
The authors declare that they have no competing interests.
HC, MDR and MB conceived and designed the experiments. HC and NK analyzed the data. HC, NK, HCB, CvM, MDR and MB contributed reagents/materials/analysis tools. All authors contributed to draft the manuscript and approved the final manuscript.