Genome-wide genetic aberrations of thymoma using cDNA microarray based comparative genomic hybridization

Background Thymoma is a heterogeneous group of tumors in biology and clinical behavior. Even though thymoma is divided into five subgroups following the World Health Organization classification, the nature of the disease is mixed within the subgroups. Results We investigated the molecular characteristics of genetic changes variation of thymoma using cDNA microarray based-comparative genomic hybridization (CGH) with a 17 K cDNA microarray in an indirect, sex-matched design. Genomic DNA from the paraffin embedded 39 thymoma tissues (A 6, AB 11, B1 7, B2 7, B3 8) labeled with Cy-3 was co-hybridized with the reference placenta gDNA labeled with Cy-5. Using the CAMVS software, we investigated the deletions on chromosomes 1, 2, 3, 4, 5, 6, 8, 12, 13 and 18 throughout the thymoma. Then, we evaluated the genetic variations of thymoma based on the subgroups and the clinical behavior. First, the 36 significant genes differentiating five subgroups were selected by Significance Analysis of Microarray. Based on these genes, type AB was suggested to be heterogeneous at the molecular level as well as histologically. Next, we observed that the thymoma was divided into A, B (1, 2) and B3 subgroups with 33 significant genes. In addition, we selected 70 genes differentiating types A and B3, which differ largely in clinical behaviors. Finally, the 11 heterogeneous AB subtypes were able to correctly assign into A and B (1, 2) types based on their genetic characteristics. Conclusion In our study, we observed the genome-wide chromosomal aberrations of thymoma and identified significant gene sets with genetic variations related to thymoma subgroups, which might provide useful information for thymoma pathobiology.


Background
Thymoma is a thymic epithelial cell tumor having organotypic features and no overt cytologic atypia. Although controversy still remains, the classical histological classifications of thymoma based on the proportion of reactive lymphocytes and tumor epithelial cells have been replaced by a histogenetic classification that basically subdivides thymoma into medullary, cortical, and mixed types, according to cytological features of epithelial cells [1][2][3][4][5]. Several studies supported the validity of the histogenetic classification and the World Health Organization (WHO) classification adopted five subtypes of thymoma stratified in histogenetic classification [6][7][8][9]. The histological subtypes in the WHO classification have been reported to have independent prognostic significance, and types A and AB demonstrate indolent biological behavior compared with type B [6][7][8]. Several studies have tried to demonstrate the underlying pathogenetic mechanisms to explain the different biological behaviors of thymoma, according to the histological subtype and stages by applying several markers, but no conclusive results have been demonstrated yet [10][11][12][13][14][15][16][17][18].
A recent report on genetic alterations of thymoma using comparative genomic hybridization (CGH) and fluorescent in situ hybridization methods demonstrated that type B3 thymoma frequently occurred with losses of chromosomes 6, 13 q, 16 q and gains of chromosome 1 q, while type A thymoma rarely showed cytogenetic abnormalities [19]. Subsequent studies by the same group based on loss of heterozygosity (LOH) analyses inferred two different genetic pathways of tumorigenesis of thymoma, and heterogeneous genetic alterations in subtypes of thymoma, excluding type A, were identified by CGH and LOH analyses [20,21]. However, a recent CGH study identified several new chromosomal imbalances even in a significant proportion of type A thymomas [22].
The application of DNA microarray technology has provided us a high-throughput evaluation of the whole genome as well as significant genetic information at a single gene level and has enabled us to classify different neoplasms based on the characteristic genetic patterns. So far there has been no report focusing on differences of genetic alterations between subtypes of thymoma using a cDNA microarray based-CGH method (microarray-CGH). In our study, genetic alterations of all WHO-defined subtypes of thymoma were investigated using high resolution microarray-CGH followed by hierarchical cluster analyses of the data to identify specific patterns of genetic aberrations and genes associated with each subtype.

Chromosome analysis of thymoma
For evaluating the pattern of genetic aberration of thymoma, 13,248 unique genes were obtained after preprocessing, of which 8,411 were mapped by SOURCE. Commonly amplified or deleted regions were identified by averaging the log 2 ratio values of each gene from 39 thymoma patients, plotted on the location of the chromosomes using the Chromosome Analyzer and Map Viewer using S-plus (CAMVS) ( Figure 1A). When we evaluated overall chromosome patterns based on the cut-off value of ± 0.3, deletions in chromosomes 1, 2, 3, 4, 5, 6, 8, 12, 13 and 18 were identified.

Overall genetic pattern analysis
To investigate the genetic differences of five subtypes of thymoma, we selected 36 distinctive genes at a false discovery rate (FDR) of 2.5% by multi-class SAM (23) (Additional file 1). Supervised hierarchical clustering of 39 cases of thymoma microarray-CGH data with the selected 36 genes demonstrated that types A and B were separately clustered ( Figure 3). However, types B1, B2 and B3 branches were intermingled with each other. In addition, eight samples of type AB were dispersed in various B branches, and three samples of type AB were included in type A. These results suggest that type AB has combined characteristics of types A and B, genetically ( Figure 3). Hence, type AB was excluded from further analyses to diminish error rate in understanding the genetic characteristics of thymoma subtypes. Among the 36 selected genes, 16 are ESTs with unknown functions and six are located in the 1q region. Cervical cancer oncogene-4 (HCC-4, 2q24.2) was amplified in all subtypes except for type B1. T-cell receptor gamma locus (TARP, 7p15-p14) was deleted in all subtypes except for type A (AB: 63%, B1: 33%, B2: 50%, B3: 12.5%).
Microarray-CGH profiles using CAMVS (Chromosome analyzer and Map Viewer using S plus)

Comparison of genetic aberration patterns between subtypes of thymoma A) Comparison between type A and type B
SAM was performed with types A and B in order to identify genetic alterations distinguishing tumors composed of medullary and cortical epithelium. We selected 50 genes at a FDR of 1.7%, and the hierarchical clustering showed the clear separation of type A and type B ( Figure  4-A, Additional file 2). Small branch pattern analyses showed more similarity between type B1 and type B2 than between type B2 and type B3, and some similarity between type A and type B3. Among the 50 genes selected, chromosomes 1 and 5 included 10% each (5/50), and chromosomes 6, 11, 13 and 17 each included 6% (3/50). We observed that these genes were closely located within each chromosome, suggesting the possible fragmental losses of thymoma chromosomes. For example, three genes were located nearby on 1p31, and four genes were located nearby on the 5q arm. Among the genes on chro-

B) Comparison between type A and type B3
SAM was performed with types A and B3 in order to identify genetic alterations responsible for the markedly different biologic behaviors of types A and B3. Seventy genes with different copy numbers were selected with a FDR of 1.2%, and the cluster analysis using these 70 genes showed clear separation between type A and type B3 (Figure 4-B, Table 1). Gains of these genes were prevalent in type A while losses were prevalent in type B3. Twenty-five of the 70 genes were ESTs, and 12 genes were located on chromosome 6, most of which were concentrated on the 6p21.3-6p25 regions. In type B3, frequent deletions were observed in chromosome 6 open reading frame 10 (C6orf10, 6p21.3), butyrophilin subfamily 3, member A2 (BTN3A2, 6p22.1), and thiopurine s-methyltransferase (TPMT, 6p22.3), with a frequency of 75% (6/8), 87.5% (7/8), 75% (6/8), respectively. All the selected genes on chromosome 5 were on the 5q arm, and we observed that cervical cancer oncogene 4 (HCC-4, 2q24.2) was more amplified in all type A cases than type B3 cases.

C) Comparison of cortical subtypes
Based on the previous results demonstrating that among cortical subtypes, the pattern of small branches showed more similarity between types B1 and B2 than types B2 and B3, we divided cortical subtypes into types B1+B2 and B3 for evaluating genetic alterations. We selected 48 significant genes with a FDR of 11% by SAM ( Figure 4-C, Additional file 3). As the FDR value of differential gene selection is relatively high, type B1+B2 and type B3 were not clustered clearly, but they showed a tendency of separation. Twenty-nine of 48 selected genes were on chromosome 1, and nine of them were ESTs. Among the 29 genes on chromosome 1, 26 (86%, 26/29) were amplified in type B3 type, while only four (13%, 4/29) were amplified in type B1+2. The selected genes were concentrated in the 1q32 and 1q42-q43 regions.

D) Comparison between type A, type B 1+2, and type B3
SAM was carried out with three subgroups of types A, B1+B2, B3, and 33 significant genes were selected at 2.6% FDR, most of which are located on chromosomes 1 and 6 ( Table 2). When a clustering analysis was carried out with these 33 genes ( Figure 4-D), type A was clearly separated from types B1+2 and B3.

E) Prediction analysis of thymoma subtypes
Based on the previous results, thymoma could be divided into three genetically distinct subgroups (types A, B1+B2, and B3) and genetically heterogeneous type AB. Prediction Analysis of Microarray (PAM) (24) as in the method was carried out to predict to which subgroup type AB belongs. To equilibrate each group, the training set evenly included six cases from types A, B1+2, and B3. Eleven cases of type AB were used as a test set in the prediction analysis. The cross-validation of the selected 44 subgroup classifier genes showed 100% accuracy with types A and B1+B2 and 50% accuracy with type B3 in the training set ( Figure 5-A). The prediction analysis demonstrated that among the 11 cases of type AB, three cases were classified into type A, and the remaining eight cases into type B (1, 2) ( Figure 5-B), which is coherent with the dendrogram in Figure 3.

Discussion
A number of study results on systemic genetic analyses of thymoma have been reported in the past few years after a few case reports on the chromosomal abnormalities of thymoma [19][20][21][22], [25][26][27]. Zettl et al. demonstrated for the first time recurrent chromosomal imbalances in type B3 thymoma with CGH and FISH methods [19]. They identified 16 cases of type B3 that showed chromosomal gains (1q, Xq, and 8p12) and losses (6 and 13q), while 12 cases of type A did not reveal any chromosomal abnormalities, with the exception of one case with a partial loss of 6p. Subsequently, the same group, using CGH and microsatellite analyses, inferred two pathways of thymoma tumorigenesis by demonstrating that type A (3/8 cases) presented with consistent LOH in the region 6p23.3-25.5 only, while type B3 revealed various aberrations such as APC on chromosome 5q21 (3/14 cases), RB on 13q14 (5/ 14 cases), and p53 gene on 17p13.1 (4/14 cases) loci, as well as LOH in the region 6p23.3-25.5 (5/14 cases) [20]. They expanded their samples to include types AB and B2 in addition to types A and B3 by using laser-assisted microdissection or short-term thymic epithelial tumor cell cultures and found that 1) the various WHO-defined subtypes of thymoma exhibited different profiles of genetic alterations, 2) only type A was genetically homogeneous with abnormalities mainly involving chromosome 6 while other subtypes showed genetic heterogeneity, and 3) some cases of type B2 were genetically closely related to type B3, and might arise from this type by gain of genetic aberrations [21]. In addition, Penzel et al [22] reported a main cluster characterized by a gain of 1q and losses of 6q and 16q occurring only in type B3 (3/4 cases). Three of eight cases of type A thymoma demonstrated various types of chromosomal imbalances in contrast to the study results reported by Zetti et al [19]. Namely, the results of the CGH studies on thymoma by two groups demonstrated some discrepancies [19][20][21][22]. We used cDNA microarray based-CGH to investigate the genome-wide genetic aberrations of thymoma. In contrast to CGH and LOH analyses, cDNA microarray based-CGH could provide us a high-throughput evaluation of the whole genome and also with significant genetic information at a single gene level. The other advantages of using microarray-CGH are the utility of paraffin embedded tissue DNA, the requirement of a small amount of genomic DNA and the possible direct comparison with RNA expression. With this technique there was no need to amplify the DNA in order to obtain the genetic information of more than 10,000 genes, which sometimes may blur the results. Compared to RNA expression analysis, for which we need the fresh, well-stored tissue samples (the major limitation of tissue procurement), using the paraffin embedded tissues allowed us to do the retrospective studies using a large number of samples.
The main purpose of our study was to identify differential genetic patterns to explain distinct morphologic findings and different biologic behaviors, according to subtypes of thymoma. For this, we included all WHO-defined subtypes of thymoma for genetic analysis, as in the previous reports [25][26][27][28][29]. The previous studies using CGH and microsatellite analyses excluded subtypes with large numbers of lymphocytes (type AB, B1, and B2) from the samples because CGH analyses generally require a tumor cell content of more than 50% [19,20]. As we consider the clinicopathological behavior of thymoma comes from the complex biology with thymic spindle epithelial cells and surrounding microenvironment including lymphocytes. Hence, to understand the phenotypic biology of thy-moma, whole tissues were used for genetic evaluation in this study. The results from the current data might support in-depth understanding of tumor biology with more clinically relevant information, resulting in potential clinically useful biomarker candidates for subtype classification. However, the careful explanation is needed for understanding pathogenesis of thymoma concerning the influence from the lymphocytes.
The CAMVS, which was developed in our institute for the effective analysis and visualization of microarray-CGH results, was used to identify the significant genetic aberrations of thymoma. Chromosomal deletions in chromosomes 1, 2, 3, 4, 5, 6, 8, 12, 13 and 18 were observed by the comparison of the mean log 2 ratio after microarray-CGH of a total of 39 cases, suggesting the characteristic genetic aberrations of thymoma. The chromosome 6 loci, known to possess many tumor suppressor genes, wellestablished deletion sites in thymoma such as 6q21, 6q23, and 6q25-27, and the genes selected in our study, were also frequently located on 6q, confirming the previous data [29].
Subtype specific analyses demonstrated that losses of chromosomes 2, 4, 6, and 13 were identified in all subtypes of thymoma. Type A thymoma had the least number of chromosomal abnormalities while type B had many more chromosomal abnormalities in accordance with the previously reported data [19][20][21][22]. Thymoma type B revealed increased genetic aberrations suggesting a rupture in chromosomal stability, and these findings correlate well with the indolent biological behavior of type A The ID indicate GeneBank ID and the symbol and the cytoband information are from their SOURCE http://source.stanford.edu. The incidence is the number oh cases having log2 ratio in the range of our criteria ± 0.3. ESTs are expressed sequenced tags, clones of unknown functions. Genes are listed according to order FDR values. We could confirm the type B3 specific 1q gain and losses of 6, 13, and 16 in all three cortical subtypes, as in previous reports [19][20][21][22]. However, losses of chromosomes 2q, 4, 5, 8, 13 and 18 were also identified in all three cortical subtypes. So our results also confirm the presence of various other overlapping chromosomal abnormalities in three cortical subtypes in addition to well established B3 specific 1q gain and chromosome 6, 13, and 16 losses. Furthermore, we identified chromosomal abnormalities The ID indicate GeneBank ID and the symbol and the cytoband information are from their SOURCE http://source.stanford.edu. The incidence is the number oh cases having log 2 ratio in the range of our criteria ± 0.3. ESTs are expressed sequenced tags, clones of unknown functions. Genes are listed according to order FDR values.
Prediction analysis of thymoma subgroups of type B1 for the first time. Type B1 shared a similar pattern of chromosome losses with B2 but showed a 9q gain, which was identified only in type B1. These diverse genetic variations in our data support the underlying genetic influence on the biology of thymoma subtypes, resulting in various clinical behaviors.
After the systematic insight into genetic aberrations of thymoma was achieved according to molecular characteristics, the types were dissected into genetically related subgroups for further detailed analyses. First, we identified that type AB is genetically heterogeneous, so we excluded type AB for further analyses. Then, we identified that type A was distinct from type B, based on the molecular characteristics.
The cortical subtypes did not demonstrate clear separation by hierarchical clustering analysis. Among cortical subtypes, type B1 morphologically maintains distinct corticomedullary differentiation with indistinct tumor epithelial cells and usually shows indolent biologic behavior, as compared with types B2 and B3. Type B2, which definitely shows more aggressive biologic behavior than type B1, has more prominent tumor epithelial cells than type B1 and gray zones are sometimes present between type B2 and B3 histologically. Furthermore, tumors having both type B2 and B3 areas are commonly present. So we expected genetic similarities between type B2 and type B3. However the pattern of small branches showed more similarities between types B1 and B2 than between types B2 and B3.
Based on the hierarchical clustering analysis, we could assume that thymoma could be divided into four genetically different subgroup of types A, AB, B1+2, and B3. We tried to assign 11 genetically heterogeneous cases of type AB into types A and B using the prediction analysis. The prediction analysis results were coherent with the clustering analysis results, showing that three clustered in type A were predicted to be type A, and eight clustered in type B were designated as type B. Type AB was reported to be genetically more heterogeneous than type A and some chromosomal aberrations characteristic of type B were reported to be present in type AB [21,22]. Type AB is defined as an organotypic thymoma, showing both features of medullary and cortical thymoma. In fact, there is a wide morphologic spectrum in type AB. Some show a distinct nesting of spindled epithelium as in the Regard type of nasopharyngeal carcinoma, while in others, epithelial cells are sprinkled individually as in the Schmincke type of nasopharyngeal carcinoma.
The WHO classification of thymoma is reported to be associated with the invasiveness and recurrence of thymoma. Among the various gene sets, the 70 genes distin-guishing types A and B3 might be related to the malignancy of thymoma, because B3 is the typical malignant tumor with a metastatic property among 5 subtypes. Among the 70 genes, the significant number of genes were related to cell structure and adhesion. NEDD9 (T61428, neural precursor cell expressed developmentally down regulating 9) and CTNNB (AA442092, cadherin-associated protein beta 1), which are known to be the adhesionrelated genes, were deleted in type B3, suggesting increased cell motility for metastasis. As Penzel et al.
reported the genetic amplification in chromosome 1 in malignant type B2 and B3, frequently amplified genes chromosome 1 were observed in B3 [22].
There have been no standard methods to detect the significant genes among the distinct groups in microarray-CGH. Several reports mentioned the correlation of the CGH and expression profiling suggesting the possibility of using similar approach for both in selecting the significant genes [33,34]. Among the several analytic methods for gene expression profiling, SAM method has been used as one of the standard methods based on the t-test. Therefore, SAM method is applied in this investigation.
Recently, a few reports suggest new methods to detect numbers of significant genes between distinct groups in microarray-CGH [35,36]. As there is no valid method to analyze differentially gained or lost genes in microarrya-CGH, more appropriate method with biological validation should be evaluated.
Only one concern of this study is the limited numbers of the samples in each type. Especially with the microarray-CGH, which could evaluate thousands of genes simultaneously, the sample size is the significant matter. To overcome the over-fitting problem, one effort is to divide the samples into the training and the independent test set. However, the sample size is not large enough to select the genes based on this approach, requiring us to apply the cross-validation method in the same training set. As all these efforts to define the subgroups based on the clinicopathologic or molecular level result from the lack of good prognostic markers and treatment strategies of thymoma, the capacity of prediction of current proposed genes sets needs to be validated in more samples prospectively in accordance with the clinical parameters.

Conclusion
In this study, we evaluated the genetic characteristics of the WHO-defined five subtypes of thymoma using microarray-CGH. We observed that thymoma could be divided into four genetically distinct groups of A, AB, B1+2, and B3. Type AB was determined to be genetically heterogeneous in morphology. We identified sets of genes which characterize the molecular subgroups for the basis of understanding thymoma biology and the candidate biomarkers of each group. In conclusion, this study provides significant information on the genetic background of thymoma for classification purposes.

Patients
Thirty-nine cases of thymoma tissue samples (A : 6, AB : 11, B1 : 7, B2 : 7, B3 : 8) were obtained as paraffin embedded tissue blocks from the files of the Department of Pathology at the Severance Hospital, Yonsei University College of Medicine. The median age of patients at diagnosis was 50 years (range 26-71 years), and the male versus female ratio was 23:16.

DNA preparation
Ten serial 10 µm thick tissue sections were cut from representative paraffin blocks for genomic DNA extraction. The pathologist (Professor Yang WI) confirmed the diagnoses and subtypes according to the WHO classification (9), and localized tumor areas in the corresponding hematoxylin-eosin (H&E) stained tissue section. Fresh normal placental tissues from healthy newborns were snap-frozen for reference samples. Genomic DNA extraction from the slide was performed according to the conventional protocol [30]. Briefly, scraped tissue from the slides was deparaffinized by washing it twice with 1 µl of xylene at 55°C for five minutes. After two washes with 100% ethanol, the samples were dried for two hours at 50°C. The tissues for reference samples were incubated with 400 µl of DNA lysis buffer [10 mM Tris pH 7.6, 10 mM EDTA, 50 mM NaCl, 0.2% SDS, 200 µg/ml Proteinase K] at 42°C for 12 to -24 hours. The incubated products were treated with the same amount of phenol/chloroform/isoamylalcohol (Gibco-BRL, Gaithersburg, MD, USA) to isolate the nucleic acid from the proteins. The DNA was precipitated with 100% ethyl alcohol containing a 1/3 volume of 10 M ammonium acetate and 2 µl of glycogen. After being rinsed with 70% ethyl alcohol, the DNA was dried at room temperature and then dissolved in ultra-pure water.
The quantity and quality of the DNA were evaluated using the Gene Spec III (Hitachi, Japan) and the Gel Documentation-Photo System (Vilber Lourmat, France).

cDNA microarray based -CGH (Microarray-CGH)
In this study, we used 17K cDNA microarrays (CMRC-GenomicTree, Daejeon, Korea) that included 15,720 unique genes. Of these genes, 11,552 were mapped by SOURCE, a web based database provided by the Genetics Department of Stanford University [37]. Microarray-CGH experiments were performed with the indirect design to determine the genome-wide genetic aberrations in thymoma subtypes using sex-matched placental tissue as a reference. Labeling of DNA was performed following the institutional protocol as described previously [30,31]. Briefly, four µg of placenta and thymoma tissue DNA were fluorescently labeled with Cy3 or Cy5-dUTP (Dupont NEN Life Sciences, Boston, MA, USA), respectively, using BioPrime DNA Labeling System (Invitrogen, Carlsbad, CA, USA). Labeled products were purified with a PCR Purification Kit (Qiagen, Dusseldorf, Germany) and combined with 30 µg of human Cot-1 DNA (Gibco BRL, Gaitherburg, MD, USA), 100 µg of yeast tRNA (Gibco BRL, Gaitherburg, MD, USA) and 20 µg of poly(dA-dT) (Sigma, Saint Louis, MO, USA). Then, the hybridization mixture was concentrated using a Micro-con 30 (Millipore, Bedford, MA, USA) and hybridized to the 17K cDNA microarray at 65°C for 16 to18 hours. After washing, the microarray was scanned using GenePix 4000B (Axon Inc, Foster, CA, USA). The microarray data was obtained by GenePix Pro 4.1 software (Axon Inc, Foster, CA, USA).

Data Analysis
Raw data preprocessing and normalization Fluorescent spot signals were obtained by subtracting background intensity from the total spot intensity. The genes with missing values for more than one of the experiments were removed for further analysis. The variation from the different labeling efficiencies was corrected using within-slide global normalization, which subtracted the median of log 2 (R/G) intensity ratio from the log 2 transformed data.

Analysis of Microarray-CGH
The obtained data were analyzed using the Chromosome Analyzer and Map Viewer using S-plus (CAMVS) developed by the Cancer Metastasis Research Center (CMRC), Yonsei University College of Medicine, Seoul, Korea. To evaluate the general genetic pattern of whole chromosome, 0.025 span was introduced to give the weighted mean through the neighboring 250 probes, resulting in the advantage of evaluation of ratios of 250 proves simultaneously. Smoothing line is based on the more weight on the nearer probes. In this study, we used the predetermined cut-off value for the significant copy changes by comparing the level of genetic changes using normal tissue DNAs including placenta, lymphocyte and gastric tissues in the previous study [32]. Based on the assumption that the sex chromosomal single copy difference is the only change between the XX and XY normal gastric tissue DNA, the result showed that the changes in the autosomal chromosomes were minimal with the range of -0.3 < log 2 (R/G) < 0.3 (mean ± 1SD), while the changes in the sex chromosomes were -0.64 < log 2 (R/G)< 0.64. Hence, cut-off values of log 2 (R/G) > ± 0.64 (mean ± 1SD) as amplification or deletion, log 2 (R/G)> ± 0.3 (mean ± 1SD) as gain or loss were determined.
The Significant Analysis of Microarray (SAM) was used to identify the specific genes that showed differences between the groups [23] with more than 15% frequencies in the group. To evaluate the accuracy of selected genes representing the thymoma subtypes, we performed the analysis by using Prediction Analysis of Microarray (PAM) software [24]. The obtained data were clustered using software program Cluster version 2.10 and visualized with Treeview version 1.47 [38].