Combined subtractive cDNA cloning and array CGH: an efficient approach for identification of overexpressed genes in DNA amplicons

Background Activation of proto-oncogenes by DNA amplification is an important mechanism in the development and maintenance of cancer cells. Until recently, identification of the targeted genes relied on labour intensive and time consuming positional cloning methods. In this study, we outline a straightforward and efficient strategy for fast and comprehensive cloning of amplified and overexpressed genes. Results As a proof of principle, we analyzed neuroblastoma cell line IMR-32, with at least two amplification sites along the short arm of chromosome 2. In a first step, overexpressed cDNA clones were isolated using a PCR based subtractive cloning method. Subsequent deposition of these clones on a custom microarray and hybridization with IMR-32 DNA, resulted in the identification of clones that were overexpressed due to gene amplification. Using this approach, amplification of all previously reported amplified genes in this cell line was detected. Furthermore, four additional clones were found to be amplified, including the TEM8 gene on 2p13.3, two anonymous transcripts, and a fusion transcript, resulting from 2p13.3 and 2p24.3 fused sequences. Conclusions The combinatorial strategy of subtractive cDNA cloning and array CGH analysis allows comprehensive amplicon dissection, which opens perspectives for improved identification of hitherto unknown targeted oncogenes in cancer cells.


Background
Human cancers frequently manifest amplification of large stretches of DNA, cytogenetically detectable as homogeneously staining regions (HSR) or double minute chromatin bodies (dmin). DNA amplification is considered to be a consequence of the intrinsic genomic instability of cancer cells, and it is presumed that overexpression of a single or few amplified genes confers a selective advantage on these HSR or dmin bearing clones. Consequently, activation of proto-oncogenes by amplification is thought to play an important role in the development and maintenance of many human solid tumours [1][2][3]. Detection of amplification of many different chromosomal regions in various tumour types has lead to the identification of the targeted oncogenes and greatly contributed to our current understanding of the genetic basis of cancer. Furthermore, amplified genes often act as markers of tumour behaviour, drug response, patient outcome, and may represent targets for future molecular cancer therapy.
In the past, various strategies have been used for the detection of amplified chromosomal regions and DNA sequences in cancer. Comparative genomic hybridization (CGH) [4] has been particularly useful for detection of amplified sequences and assignment of the chromosomal position [5]. This approach allows whole genome screening for chromosomal imbalances up to 5-10 Mb and gene amplification of sufficiently large amplicons and/or highly overrepresented regions. Due to this limited resolution, a time consuming mapping is required following amplicon identification in order to pinpoint the putative target genes. Recently, two methods were introduced that allow mapping of the genomic content of amplicons with a 10-100 fold increased resolution. Array CGH employs arrayed fragments of genomic DNA clones (with partial or complete sequence information) instead of metaphase chromosomes [6,7]. Digital karyotyping is a SAGE (serial analysis of gene expression) based method to enumerate genomic DNA tags [8]. This method allows identification of specific amplifications and deletions that were not previously detected by conventional CGH or other methods. An important limitation of both methods is the inability to directly identify the overexpressed genes that are targeted by the amplification. This limitation was overcome by another variant on the classic CGH approach in which the normal metaphase chromosomes were replaced by a large number of microarrayed cDNA clones [9]. This approach has the advantage that losses and gains are mapped by their gene position rather than chromosomal band (as with conventional CGH) or genomic position (as with array CGH and digital karyotyping). The analysis immediately provides a list of candidate genes that occur within the region of interest. Another advantage is the ability to perform expression profiling on the same slides using the cDNA microarray approach, which enables the investigator to correlate copy number and gene expression, in order to identify candidate oncogenes that are both amplified and overexpressed. Moreover, the small size and large number of the arrayed cDNA clones provide a higher resolution in contrast to current PAC and BAC arrays for which the resolution is limited because of the relatively large size of the clones (120-200 kb). Two limitations of the cDNA array CGH approach are the confined analysis of genes that are present on the array and the analytical challenges in terms of sensitivity by the complexity of the probe and the small sizes of the arrayed target cDNAs (0.5-2 kb) (signal intensities in genomic hybridizations being proportional to the length of the target DNA).
In this paper, we propose a fast and straightforward approach to identify overexpressed genes in amplified regions, enabling direct identification of the relevant targeted oncogene(s). The approach is based on the abovementioned cDNA array CGH method, but includes a preceding selection of differentially expressed genes. In a first step, subtractive cDNA cloning is performed on an amplified tumour sample to isolate cDNA clones that are overexpressed. Subsequent CGH analysis on a cDNA microarray containing the subtracted clones allows detection of differentially expressed genes which are amplified at the DNA level. As a proof of principle, neuroblastoma cell line IMR-32 with at least two amplification sites along the short arm of chromosome 2 (including the MYCN locus) was used as a model system [10,11].
In this study, the combinatorial power and efficacy of subtractive cDNA cloning and high-throughput DNA copy number determination using array CGH was demonstrated for identification of amplified and overexpressed genes. In addition to those genes which were already known to be amplified and overexpressed in IMR-32, we also detected hitherto unknown genes, which were not previously described to be amplified in neuroblastoma.

Identification of differentially expressed genes by suppression subtractive hybridisation (SSH)
In a first step, overexpressed genes in neuroblastoma cell line IMR-32 were isolated (of which some have increased expression due to amplification) through a PCR select cDNA subtraction with IMR-32 as tester and SK-N-SH as driver (the latter being a neuroblastoma cell line without DNA amplification [12]). This yielded a cDNA library of 960 clones. By comparing the unsubtracted and subtracted cDNA library for the abundance of an internal control gene GAPD, the enrichment was estimated to be 100 fold. Upon hybridization of the subtracted and reverse subtracted probe on nylon filters containing all subtracted clones (i.e. differential screening according to the manufacturer), false positive clones (non-differentially expressed genes) could be identified, resulting in the retention of 281 IMR-32 overexpressed clones. After sequencing, alignment, EST contig building and UniGene database search, a non-redundant list was obtained containing 126 known genes, 22 UniGene clusters, and 10 anonymous ESTs. For each unique gene, transcript or EST contig, at least one representative clone was selected for re-arraying, insert amplification and spotting on a custom microarray.

CGH on cDNA microarray and characterization of the clones
In order to determine which of the overexpressed cDNA clones were amplified in IMR-32, array CGH analysis was performed on a custom cDNA microarray using DNA of IMR-32 as test probe and DNA of a normal male lymphoblastoid cell line as normal control probe. All clones that were found to be amplified, were located on one of the two known amplified regions on the short arm of chromosome 2 (2p13-14 and 2p24) (Figures 1 and 2). In addition to the 3 known genes on 2p24 that are frequently co-amplified in neuroblastoma (i.e. MYCN, DDX1 and NAG), ten other partial cDNA clones on the microarray were shown to be amplified in cell line IMR-32. One clone (g6f6) was part of the MEIS1 homeobox gene (2p14) that was recently shown also to be amplified in IMR-32 [13,14]. Two other clones (g1h7 and g8f10) belonged to the TEM8 gene on 2p13.3. A fourth clone (g4d5) is located between the MEIS1 and the TEM8 gene and is part of an as yet not characterized gene. RT-PCR analysis revealed that this clone is not part of the neighbouring (not yet fully annotated) ETAA16 gene. No homology was found for g4d5 with other EST sequences or known genes. Another clone (g10d12) is located 500 kb telomeric to NAG and also displayed no homology to any known sequence. Two other transcripts (g9d9 and g10e3) are probably part of the NSE1 gene as demonstrated by alignment of the clones to NSE1 transcript variants ( Figure 2) and RT-PCR assays using a forward primer in the subtracted clone and a reverse primer in the NSE1 gene. Two clones (g1c2, g6d4) were located in the large 150 kb intron of the 4.5 kb NAG sequence reported by Wimmer et al. [15] (acc. no. AF056195) between exon 4 and 5. BLAST analysis of the human EST database with exon 4 and 5 of the NAG gene as a query sequence failed to identify an EST clone that contained both exons. Furthermore, RT-PCR with a forward primer in exon 4 and a reverse primer in exon 7 failed to yield the expected band of 341 bp in IMR-32 and SK-N-SH. In contrast, a sharp and single band of approximately 3.5 kb was amplified. Furthermore, Northern blot analysis estimated the NAG transcript size to be approximately 2.5 kb longer compared to the published sequence (data not shown). These data further support the recent observation that the published NAG gene (acc. no. AF056195) is misannotated and should contain 21 more exons between former exon 4 and 5 [16]. Hence, clones g1c2 and g6d4 (present on our cDNA array) and clone g3e7 are in fact part of the newly annotated NAG gene (acc. no. AF388385).
The tenth amplified clone (g2h10a) is of particular interest because one part of the sequence aligns to the TEM8 gene on 2p13.3 and the other part aligns to a sequence in band 2p24.3 ( Figure 2). The fusion nature of this clone was confirmed by RT-PCR on cell line IMR-32 using a primer in the first part of the transcript on 2p13.3 and a primer in the other part of the transcript on 2p24.3. Cloning and sequencing of the PCR product revealed that IMR-32 contains at least two different splice variants of the fusion transcript, i.e. g2h10b (acc. no. CD664535) and g2h10c (acc. no. CF384614) (splice variants are detected in the part that aligns to 2p24.3). Most splicing sites are surrounded by consensus splice site sequences (data not shown).

Confirmation of amplification status
Real-time quantitative PCR on IMR-32 was performed in order to validate the amplification status of all sequences that were catalogued as amplified by array CGH analysis: five genes that were previously reported to be amplified in IMR-32 (MYCN, DDX1, NAG, NSE1 and MEIS1), one newly amplified gene (TEM8), 2 anonymous expressed sequences (g10d12 and g4d5) and 1 fusion transcript. Amplification of all these genes and clones was confirmed in IMR-32 (Table 2).
Using FISH analysis, it was demonstrated that the MYCN, DDX1, NAG, MEIS1, NSE1 and TEM8 genes and the g10d12 clone are present as multiple copies on all 3 known HSRs in IMR-32 ( Figure 3). This suggests that the 3 HSRs originate from the same complex amplicon.
To verify whether the subtracted clones that were shown to be amplified are indeed overexpressed at the mRNA level in IMR-32, real-time quantitative RT-PCR was performed and demonstrated that all genes were highly overexpressed (range 10 1 -10 4 fold overexpression) ( Table 2). The fusion transcript was only expressed in cell line IMR-32.
Three genes were shown to be amplified in the 2p13.3-14 amplicon (of which only MEIS1 was previously reported). To our surprise, more known genes are located between amplified clone g4d5 and TEM8, but those were not present in our subtracted cDNA library. To test whether our approach failed to identify these genes or whether these genes were indeed not amplified in IMR-32, we randomly selected 3 genes (PPP3R1, PLEK and BMP10) and determined their copy number and expression level in IMR-32. Neither amplification nor overexpression could be detected for these genes, demonstrating that the 2p13.3-14 amplicon in IMR-32 is complex and discontinuous.
A recent study reported that the DNMT3A gene on chromosome band 2p23.3 is amplified in IMR-32 and is probably part of a third amplicon on 2p [17]. As our approach did not identify this gene, we decided to evaluate the DNMT3A gene copy number and expression level with real-time quantitative PCR. Neither amplification nor overexpression could be detected in cell line IMR-32.

Extended gene copy number and mRNA expression analysis of the novel amplified genes in a panel of neuroblastoma cell lines
Real-time quantitative PCR was performed in order to analyse the mRNA expression level and gene copy number of novel amplified genes TEM8, g10d12, g10e3, and g4d5, and already known amplified genes MYCN, DDX1, NAG and MEIS1 in 30 NB cell lines and 9 normal human tissue samples (Table 3 and Figure 4). These analyses showed that g10e3 and g4d5 were only amplified and overexpressed in cell line IMR-32. Clone g10d12 was also found to be amplified and overexpressed in cell line SJNB-6. Subsequent gene copy number determination of g10d12 in primary tumour samples indicated a co-amplification frequency with MYCN of 12 % (9/75 tested MYCN ampli-Array CGH based haploid copy number of SSH clones mapping on chromosome 2 Figure 1 Array CGH based haploid copy number of SSH clones mapping on chromosome 2: Base position of the SSH clones on chromosome 2 (with exception of fusion transcript clone g2h10) was determined according to the human genome browser at UCSC (April 2003 freeze [33]). Two clear amplification sites along the short arm emerge. Insert: detail of the array CGH (IMR-32 in red and control DNA in green), amplified clones are indicated.  fied tumour samples). The mRNA expression and gene amplification pattern for TEM8 resembles that of MEIS1 ( [13] and this study): high expression in a number of cell lines, independent of DNA amplification.

Discussion
In this study, we demonstrate that subtractive cDNA cloning followed by CGH on cDNA microarrays containing the subtracted clones is a powerful strategy for rapid and efficient isolation of amplified genes that are overex-    pressed. As a proof of principle, we analysed neuroblastoma cell line IMR-32 which contains at least two distinct amplification sites on the short arm of chromosome 2 [10,11].
Upon subtractive cDNA cloning and array CGH analysis, fifteen partial cDNA clones located on these sites on 2p were found to be amplified in IMR-32, representing 9 different transcripts. Five of these constitute genes that were previously reported to be amplified in IMR-32 (Table 4), i.e. MYCN [18], DDX1 [19], NAG [15] and NSE1 [17] on chromosome band 2p24, and MEIS1 [13,14] on 2p14, demonstrating the validity and success of our approach.
We not only confirmed NAG amplification, but also isolated two partial cDNA clones located within a large intron of the NAG gene. Subsequent analyses demonstrated that these clones are part of the NAG gene that was initially misannotated and should in fact contain 21 additional exons, as recently confirmed in another study [16]. We also identified 4 newly amplified transcripts, including the tumour endothelial marker gene TEM8 (2 partial cDNA clones), encoding a protein highly expressed in tumour endothelial cells but not in normal endothelial cells [20]. Two other transcripts show no homology to any known sequence. The detailed characterization of these anonymous transcripts was beyond the scope of this study.
Our amplicon dissection strategy clearly provides a comprehensive view on the gene content and complex structure of the HSRs (homogeneously staining regions) present in cell line IMR-32. All three HSRs appear to contain the same genes as visualized in a series of FISH mappings, and presumably arise from a single non-synthenic amplification event. The amplified genes are located on two different regions, of which the 2p24 region appears to be amplified contiguously, while the other region is amplified in a discontinuous manner as demonstrated by the presence of a single copy region on 2p13-14, flanked on both sides by amplified sequences. The non-synthenic amplification and inherent fusion of the 2 amplification sites may have caused the formation of a fusion transcript, with activation of cryptic exons, which are not transcribed under normal circumstances. This amplified and highly expressed fusion transcript contains part of the TEM8 gene on 2p13.3, fused to anonymous spliced sequences located in BAC clone RP11-314E10 on band 2p24.3. The occurrence of a fusion transcript as a result of amplicon formation has been described in a breast cancer cell line MCF7 (caused by non-synthenic co-amplification of two common amplification sites in breast cancer, i.e. 17q23 and 20q13) [21]. However, the significance of these fusion transcripts is at present unclear as no similar fusion transcripts have been detected in other neuroblastoma or breast cancer cells.
Our study clearly demonstrates the power, speed and efficacy of combined subtractive cDNA cloning and DNA copy number determination using array CGH for the identification of clones that are overexpressed and part of the amplicon, within 4 weeks time. Moreover, the procedure results in the infinite availability of the subtracted cDNA clones, suitable for downstream analyses, such as Northern blot, in situ hybridization or RNA interference using diced double stranded RNA. As a further improvement and simplification of the proposed strategy, we recommend to sequence only the amplified genes detected on the array, instead of sequencing all subtracted clones as performed in this proof-of-principle study. The proposed strategy will not allow isolation of genes which are amplified but not overexpressed. However, one can question the relevance of these genes, as these will most probably not have any biological effect. To our knowledge, such amplification events have not been reported yet.
Oncogene identification consisting of prior selection of differentially expressed genes has already been reported in other cancer cell lines, but -unlike our strategy-was severely hampered by a rate-limiting step for the verification of amplification by radiation hybrid mapping of the subtracted clones [22]. Table 4 summarizes the different strategies used in the past for the identification of amplified genes in neuroblastoma cells. Some of these reports employed a laborious and/or technically challenging method to identify or clone only one single amplified gene. In contrast, a recent study provided a global gene content analysis of the observed amplicons in IMR-32 cells, using CGH on cDNA microarrays [17]. However, this approach was restricted to the identification of genes that were present on the microarray and consequently missed some genes as compared to our strategy (such as the known amplified DDX1 gene, previously unannotated NAG exons, the TEM8 gene and fusion transcript). Amplification of DNMT3A located at 2p23.3, was also reported in above referenced CGH on cDNA microarray study. As the subtractive cDNA cloning procedure did not yield a clone for this gene we performed real-time quantitative PCR analyses which clearly showed that no DNMT3A amplification nor overexpression was present in the investigated IMR-32 cells in this study. An explanation for this discrepancy may be cell heterogeneity, as it has been reported that a third amplification site was only present in a minor portion of IMR-32 cells [10].
Investigation of the amplification status of IMR-32 amplified genes in other NB cell lines revealed that three of the nine genes were also amplified in other samples, albeit always co-amplified with MYCN. However, it remains an unsolved question whether co-amplified genes represent silent passengers, or co-determinants of phenotype [23]. The frequently co-amplified gene DDX1 is a nice example, as no correlation between amplification and patient outcome could be established [24], but nevertheless, the gene appears to have oncogenic properties [23]. Six of the nine identified genes, were only amplified in cell line IMR-32. However, the amplification of a gene in only a single sample does not preclude in advance its possible role in tumour biology. An interesting example is the MEIS1 oncogene, with proven oncogenic properties (reviewed in [25]). Albeit amplified in only one neuroblastoma sample, the gene is overexpressed in about one quarter of other tested neuroblastoma tumour samples ( [13] and this study). A similar situation occurs for TEM8, a tumour-specific endothelial marker that has been implicated in colorectal cancer [26]. Besides amplification and overexpression in IMR-32, high TEM8 expression independent of gene amplification is observed, suggesting alternative pathways for gene activation and a possible role in neuroblastoma pathogenesis. Further evidence that one or more genes in the 2p13-14 amplicon plays a role in neuroblastoma comes from the observation of genomic amplification at chromosome bands 2p13-14 in 3 primary tumour samples, from a large European multicentre CGH study of 204 cases [27]. Unfortunately, no material was available for further investigation of these samples. Clearly, more detailed analyses of the amplified genes (amongst others in a large cohort of uniformly treated primary tumour samples) and functional studies are required to establish a possible role of one of the new genes in tumourigenesis.

Conclusions
The present study shows that the combinatorial method of subtractive cDNA cloning followed by array CGH allows straightforward and efficient isolation of overexpressed genes located in amplification sites. The validity of our approach is clearly illustrated by the detection of all genes that were previously found to be amplified in neuroblastoma cell line IMR-32; the identification of 3 newly amplified genes and a fusion transcript and the generation of new data on gene content and structure of the amplicon.

DNA and RNA isolation
DNA from cultured neuroblastoma cells and 75 MYCN amplified DNA tumours was extracted using the Easy DNA kit following the instructions of the manufacturer (Invitrogen). Total RNA of cultured cell lines was isolated using the RNeasy Midi kit (Qiagen), and mRNA was extracted from SK-N-SH and IMR-32 with the FastTrack kit (Invitrogen), both according to the manufacturer's instructions.
RNA and DNA concentration was determined using the Picogreen and Ribogreen reagent, respectively (Molecular Probes) on a TD-360 fluorometer (Turner Designs).

Suppression subtractive hybridization (SSH)
Starting from 2 µg of mRNA from cell lines SK-N-SH (driver) and IMR-32 (tester), SSH was performed with the PCR-Select cDNA Subtraction kit (BD Biosciences, Clontech) as described by the manufacturer. The PCR product mixture of putative differentially expressed genes was subcloned into the pGEM-T Easy vector (Promega) and propagated in DH5α E. coli. 960 clones were picked, grown in 96-well plates and stored as glycerol stocks at -80°C for further analysis. Differential screening was performed to eliminate possible false positive clones according to the guidelines described in the Differential Screening kit (BD Biosciences, Clontech).

DNA sequencing and analysis
SSH clones were PCR amplified using SP6 and T7 vector specific sequences flanking the cloning site. PCR products were exonuclease and phosphatase treated and cycle sequenced using BigDyeTerminator chemistry on an ABI377 (Applied Biosystems) with primers that annealed to the SP6 or T7 sequences. Similarity searches were performed using the BLAST algorithm [28] after removing vector and masking repeat sequences using RepeatMasker [29]. Sequence alignment and EST contig building were performed using the freely available BioEdit package [30].

CGH on cDNA microarray
This protocol is based on a previously published CGH on cDNA microarray protocol [31] and a CGH on BAC microarray protocol [32]. The slides were scanned in a GMS418 scanner (MWG Biotech) and images were analyzed using ImaGene v5.5 software (BioDiscovery). After background subtraction, spots (background signal < signal, for the 2 colours) were normalized with the geometric mean of selected data points (signal > background signal + 3 × standard deviation of all background signals, for the 2 colours). Ratios were calculated using these normalized data and put in a graph against the base position of the clone according the human genome browser at UCSC (April 2003 freeze [33]).

Real-time quantitative PCR based copy number determination and gene expression analysis
The gene copy number of known genes MYCN, DDX1, NAG, MEIS1, TEM8, BPM10, PLEK, PPP3R1 and DNMT3A and anonymous SSH clones g10e3, g9d9, g10d12, g4d5 and g2h10a was determined in 32 other neuroblastoma cell lines with listed primers (Table 1) according to a previously described protocol with BCMA and SDC4 as normalizing control genes and normal human genomic DNA (Roche) as calibrator sample [24]. Clones that were found to be amplified in cell lines other than IMR-32 were also tested in 75 MYCN amplified tumours. PCR reactions were performed on an ABI 5700 SDS (Applied Biosystems). Amplification mixtures (25 µl) contained template DNA (approximately 10 ng), 1 × qPCR MasterMix for SYBR Green I (Eurogentec) and 300 nM of each primer. The cycling conditions comprised 10 min polymerase activation at 95°C, 40 cycles at 95°C for 15 sec and 60°C for 1 min. A dissociation curve was run after each PCR reaction in order to verify amplification specificity.
The relative expression levels of the clones were determined in the neuroblastoma cell line panel and on 9 normal tissue samples (RNA obtained from BD Biosciences, Clontech) using the above listed primer pairs according to an optimized two-step real-time SYBR Green I RT-PCR assay [34]. The gene expression levels were normalized using the geometric mean of 4 stable housekeeping genes in neuroblastoma (SDHA, UBC, GAPD and HPRT1) as described previously [35].

Further characterization of anonymous SSH clones
RT-PCR assays on IMR-32 cDNA were designed to test whether an anonymous SSH clone and a neighbouring (putatively not yet fully annotated) transcript are part of the same gene. Taking into account the orientation of the sequences, a forward primer was designed in the SSH clone and a reverse primer in the known transcript: forward primer in clone g10e3 5'AGTCACTGAGACAGAAAAGAGGTGGAATGC3' and reverse primer in gene NSE1 5'GGAGGAAGATGGCGCTGCGAATTC3', forward primer in clone g9d9 5'CCACAGAAGGTGTTTCACACCCAGCCT3' and reverse primer in NSE1 5'GGAGGAAGATGGCGCTGCGAATTC3'; forward primer in clone g10d12 5'GACAGGCTT-GCCAATTTTCACAGTGTGG 3' and reverse primer in gene NSE1 5'CCCGACCCGCAGTTCGTCCTTTT3'; forward primer in clone g4d5 5'AGCTAGGCTCGCAAACAACGTTTCCAGA3' and reverse primer in gene ETAA16 5'GCCAAGAACTGCCAGAGGCTTTTTGGA3'. To determine the NAG transcript length between exon 4 and 7 (acc. no. AF056195), RT-PCR with a forward primer in exon 4 and a reverse primer in exon 7 was performed (F 5'GCTCCCTGATGGACTGGTTCGCTTGGT3' and R 5'CCGGCCAGTGTGCCTCGTCAATCTA3'). Examination of the fusion transcript was done with a forward primer in the first part of the transcript (F 5'CACACTGTTCTGACGGTTCCA3') and a reverse primer in the other part (R 5'CAAAGTAGAATATAGTTGTCCAAAACACAA3').
RT-PCR amplification on random hexamer primed IMR-32 cDNA was performed with the Advantage 2 PCR Kit (Clontech, BD Biosciences) according to the manufacturer.
PCR fragments run on a 1.5% TBE-agarose gel were excised and purified on a GenElute Minus EtBr Spin Column (Sigma-Aldrich). Cycle sequencing was performed using purified amplicons (3-10 ng) using the above-mentioned primers at a concentration of 80 nM and the ABI PRISM BigDye Terminators v3.0 Cycle Sequencing Kit (Applied Biosystems) according to the manufacturer, with the following thermocycling conditions: 25 cycles at 92° for 10 sec, 55°C for 5 sec and 60°C for 3.5 min. Sequencing of the fusion transcript was preceded by cloning of the PCR product with the TOPO TA cloning kit for sequencing (Invitrogen). After ethanol precipitation, the products were run on an automated sequencer ABI3100 and analyzed with the Sequencing Analysis software v3.7 (Applied Biosystems).

Authors' contributions
JV oversaw the project and performed SSH, differential screening and sequencing, in collaboration with GB and KS in the lab of FVR. KDP and FP were involved in the microarray production, array CGH analysis and further characterization of SSH clones by quantitative RT-PCR. BM helped with fine-tuning of the array CGH protocol. KDP and JV performed further analysis on the amplified SSH clones and drafted the manuscript; all other authors have reviewed the manuscript and FS and ADP were the final editors of the manuscript.