Genome wide analysis of DNA copy number neutral loss of heterozygosity (CNNLOH) and its relation to gene expression in esophageal squamous cell carcinoma

Background Genomic instability plays an important role in human cancers. We previously characterized genomic instability in esophageal squamous cell carcinomas (ESCC) in terms of loss of heterozygosity (LOH) and copy number (CN) changes in tumors using the Affymetrix GeneChip Human Mapping 500K array in 30 cases from a high-risk region of China. In the current study we focused on copy number neutral (CN = 2) LOH (CNNLOH) and its relation to gene expression in ESCC. Results Overall we found that 70% of all LOH observed was CNNLOH. Ninety percent of ESCCs showed CNNLOH (median frequency in cases = 60%) and this was the most common type of LOH in two-thirds of cases. CNNLOH occurred on all 39 autosomal chromosome arms, with highest frequencies on 19p (100%), 5p (96%), 2p (95%), and 20q (95%). In contrast, LOH with CN loss represented 19% of all LOH, occurred in just half of ESCCs (median frequency in cases = 0%), and was most frequent on 3p (56%), 5q (47%), and 21q (41%). LOH with CN gain was 11% of all LOH, occurred in 93% of ESCCs (median frequency in cases = 13%), and was most common on 20p (82%), 8q (74%), and 3q (42%). To examine the effect of genomic instability on gene expression, we evaluated RNA profiles from 17 pairs of matched normal and tumor samples (a subset of the 30 ESCCs) using Affymetrix U133A 2.0 arrays. In CN neutral regions, expression of 168 genes (containing 1976 SNPs) differed significantly in tumors with LOH versus tumors without LOH, including 101 genes that were up-regulated and 67 that were down-regulated. Conclusion Our results indicate that CNNLOH has a profound impact on gene expression in ESCC, which in turn may affect tumor development.

ESCC is a common malignancy worldwide and one of the most common cancers in the Chinese population; Shanxi Province in north central China has some of the highest esophageal cancer rates in the world [20,21]. Previously, we identified several regions of LOH and CN alteration in ESCC using microsatellite markers and lowand high-density SNP arrays [22][23][24][25][26][27], where the majority of ESCC patients from this high-risk population were found to have high genomic instability and high frequency of LOH on several chromosome arms. However, we have not found causal mutations in candidate genes within the LOH regions identified. For example, 82% of 56 ESCCs showed LOH when tested with four microsatellite markers flanking ANXA1 (9q11-q21), but no somatic mutations were detected in these patients [28]. Another example is BRCA2, which also showed frequent LOH in ESCC (57% for D13S260, 83% for D13S767), but only infrequent somatic mutations in these cancer patients (2/56, 3.5%) [29,30]. Contrary to expectation, expression of BRCA2 was often increased (unpublished data).
In the present study, we analyzed DNA from 30 microdissected ESCC tumors, adjacent normal tissue, and blood DNA from the same patient using the Affymetrix 500K SNP array to identify the distribution of complex DNA alterations, including CNNLOH, and we related CNNLOH to expression of the genes affected as assessed with the Affymetrix U133A 2.0 array in these patients.

Case selection
This study was approved by the Institutional Review Boards of the Shanxi Cancer Hospital and the US National Cancer Institute (NCI). Cases diagnosed with ESCC between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, and considered candidates for curative surgical resection were identified and recruited to participate in this study. None of the cases had prior therapy and Shanxi was the ancestral home for all. After obtaining informed consent, cases were interviewed to obtain information on demographics, cancer risk factors (eg, smoking, alcohol drinking, and detailed family history of cancer), and clinical information. The cases evaluated here were part of a larger case-control study of upper gastrointestinal cancers conducted in Shanxi Province [31][32][33].

Biological specimen collection and processing
Venous blood (10 ml) was taken from each case prior to surgery and germ-line DNA from whole blood was extracted and purified using the standard phenol/chloroform method.
Tumor and adjacent normal tissues were dissected at the time of surgery and stored in liquid nitrogen until used. One 5-micron section was H&E stained and reviewed by a pathologist from the NCI to guide the micro-dissection. Five to ten consecutive 8-micron sections were cut from fresh frozen tumor and adjacent normal tissues. Tumor and normal cells were manually micro-dissected under light microscopy. DNA was extracted from micro-dissected tumor as previously described [34] using the protocol from the Puregene DNA Purification Tissue Kit (Gentra Systems, Inc., Minneapolis, MN). RNA was extracted from 17 of these micro-dissected tumor and matched normal tissue pairs using the protocol from the PureLink Micro-to-Midi Total RNA Purification System (Catalog number 12183-018, Invitrogen, Carlsbad, CA). RNA quality and quantity were determined using the RNA 6000 Labchip/Agilent 2100 Bioanalyzer (Agilent Technologies, Germantown, MD). The same tissue blocks were used for extraction of both DNA and RNA for each case studied.

Target preparation for GeneChip Human Mapping 500 K array set
The Affymetrix GeneChip Human Mapping 500 K array set contains~262,000 (Nsp I array) and~238,000 (Sty I array) SNPs (mean probe spacing = 5.8 Kb, mean heterozygosity = 27%). A detailed gene chip protocol can be found at http://www.affymetrix.com/support/downloads/manuals/500k_assay_manual.pdf.
Experiments were conducted according to the protocol (GeneChip Mapping Assay manual) supplied by Affymetrix, Inc. (Santa Clara, CA). Genotype calls were generated by GTYPE v 4.0 software (Affymetrix). Germline, tumor and adjacent normal DNA from each case were run together in parallel in the same experiment (ie, same batch, same day). The GEO accession numbers for these array data are GSE15526 and GSE20347.
Probe preparation and hybridization for Human Genome U133A 2.0 array The Affymetrix Human Genome U133A 2.0 array is a single array used to interrogate expression of 14,500 well-characterized human genes. Array experiments were performed using 1-5 μg total RNA each. We followed the protocol provided by the manufacturer to carry out reverse transcription, labeling, and hybridization.

GeneChip 500 K array data analysis
Probe intensity data from Affymetrix 500 K SNP arrays were used to identify DNA alterations in the present study. To avoid gender-related issues, SNPs mapped to either the X or Y chromosome were excluded.
Copy number (CN) loss or gain was based on comparisons of either adjacent normal to germ-line DNA or tumor to germ-line DNA. Microarray data were first normalized using the gtype-probe set-genotype package included in Affymetrix Power Tools version 1.85. Each tumor sample was individually normalized via the BRLMM algorithm along with 99 blood samples. These blood samples were obtained from the 30 ESCC cases evaluated in the present study plus 69 healthy controls (age-, sex-, and region-matched to cases) who were all part of a larger case-control study of upper gastrointestinal cancers conducted in Shanxi Province (as noted above). Paired CN analysis was then performed on each sample using the Affymetrix Power Tools paired-copynumber workflow, which implements the Affymetrix Copy Number Analysis Tool (CNAT) algorithm. DNA obtained from the blood of each case served as the normal control; a sliding window of 100 kb was chosen to optimize the identification of extended regions of CN alteration (see http://www.affymetrix.com/support/technical/whitepapers/cnat_4_algorithm_whitepaper.pdf). The output of the CNAT program is CN state rather than an absolute CN prediction: normal CN corresponds to a state of 2; zero and 1 correspond to CN loss; and states 3 and 4 correspond to CN gain.
In the present study, we modified the method for identifying LOH used in our previous studies [26,27]. Here, LOH was determined using the Affymetrix Power Tools copynumber-pipeline program paired-LOH workflow. Input was *.CHP files generated with the gtypeprobeset-genotype package as described above. Matched blood DNA served as the reference for LOH analysis for each tumor and normal adjacent sample.

Combination of LOH and CN alterations
We defined six combinations of copy number state and LOH status. LOH positive loci may have CN loss (CN ≤ 1), be CN neutral (CNNLOH, CN = 2) or show CN gain (CN ≥ 3); Likewise, LOH negative loci may show CN loss, gain, or neutrality. LOH and CN segments for each tumor were defined independently for each sample as contiguous blocks of informative SNPs that possessed the same LOH and CN state. Endpoints of LOH/CN segments were defined by informative SNPs. Some uninformative SNPs were located between these LOH/CN segments; we considered these SNPs to have an undefined LOH/CN state (see Additional file 1/Figure S1). Segment sizes were empirically observed from the data.
Comparison of CN status in DNA from blood versus micro-dissected adjacent normal tissue DNA isolated from normal adjacent tissue is frequently used as a control in microarray experiments. In the present study we used DNA isolated from peripheral blood. We expected peripheral blood DNA to be a superior control for two reasons: first, unlike adjacent normal tissue, it is does not run the risk of being contaminated with tumor cells; second, adjacent normal tissue may actually be precancerous and contain genetic lesions. To examine whether blood DNA and adjacent normal esophageal DNA were equivalent controls, we compared copy number state calls for blood and normal adjacent from each of the 30 ESCC patients. We found that the two controls were equivalent: 99.29% to 99.99% of all copy number calls were identical. Overall, 99.96% of SNPs in blood and 99.93% in normal adjacent tissue were CN = 2 state.
Human Genome U133A 2.0 array data analysis and relation between CNNLOH and mRNA expression The Robust Multiarray Average (RMA) algorithm [35,36] implemented in Bioconductor in R http://www. bioconductor.org was used for background correction and normalization across all samples. For each sample log2 fold changes in gene expression were calculated by subtracting the adjacent normal RMA value from the corresponding tumor RMA value.
To determine whether any gene showed a difference in the tumor versus normal gene expression fold change that was dependent on LOH state, we performed the following steps: (i) First, genes assayed by the U133A microarray were mapped onto each LOHCN segment of each sample. Map locations of genes were taken from the Affymetrix version na29 microarray annotation file. Note that probe sets from the same gene may have different reference sequences which differ in their chromosomal locations. Also, not every gene will map to every sample -in a particular sample, a gene may map to a gap between LOHCN regions. (ii) Next, we identified genes for which at least two of the 17 ESCC samples with expression data were LOH negative and at least two samples were LOH positive. (iii) We then performed two-sided unpaired t-tests comparing the log2 fold changes for a probe set in LOH positive and LOH negative samples. A P-value < 0.01 was considered significant. (iv) Finally, SNPs on the 500 K microarray were mapped to the reference sequence for each expression probe set. Since probe sets from the same gene may have different reference sequences, they may differ in the number of SNPs assigned to them (Additional file 2/ Figure S2).

Results
In the present study we determined copy number and loss of heterozygosity (LOH) status in DNA isolated from germ-line and micro-dissected tumor and matched adjacent normal samples from 30 ESCC patients using the Affymetrix 500 K SNP array. The average genotype call rate was 96% (89-99%): the 250 K Nsp I array was 96% (90-98%) and 250 K Sty I array was 95% (89-99%). Genotype call rates were similar for all three tissue types examined. We first analyzed whether copy numbers were similar between DNAs from the two normal tissues: germ-line (blood) and micro-dissected adjacent normal samples. Our analysis indicated that DNA CN values were similar between the two normal tissues (Additional file 3 - Table S1), as expected. Our results indicate that germ-line DNA can be used as a normal control in studies of CN alteration; it is more readily available than matched adjacent normal tissue.

Relation between genomic alterations and gene expression
The average present call rate on the Human Genome U133A array was 53% (range 51-61%) for the 34 chips from the 17 sample pairs with sufficient tissue for RNA isolation and testing. To investigate the relation between LOH/CNV and gene expression levels, we intersected genes on the Affymetrix U133A chip with SNPs on the 500 K SNP array. SNPs that mapped within genes are summarized in Additional file 6/ Table S4 and include 169,687 SNPs within 12,225 genes. We were interested in identifying differentiallyexpressed genes between LOH and non-LOH groups in genes that were CN neutral. A total of 4,572 genes qualified for this analysis (see Methods). Among these genes, 168 genes showed significant differences in expression between tumors with and without LOH (P < 0.01) (Additional file 7/ Table S5). Based on chance alone (at the P < 0.01 level), differences in only 45 genes would be  expected, therefore, expression differences were observed in over three times as many genes as expected. One hundred and one (60%) of the 168 genes showed lower expression levels in CNNLOH than in the normal group (ie, CNN, no LOH), whereas 67 genes (40%) showed higher expression levels in CNNLOH (Additional file 7/ Table S5). Twenty-eight of the 101 down-regulated genes (32 probes) and 18 of the 67 up-regulated genes (19 probes) showed expression differences ≥ 2-fold (Table 3). These findings suggest that in the CN neutral state, LOH can affect gene expression. We also compared expression of genes with LOH versus no LOH in CN loss genes. We identified six of 600 genes which showed significantly different expression between the LOH groups. All six genes showed increased expression in tumors with LOH (Table 4a).
Finally, we compared gene expression in the CN gain state between tumors with and without LOH. We found that six of 354 genes showed significant differences in expression between the two groups, including two down-regulated and four up-regulated genes (Table 4b).

Discussion
We characterized ESCC tumors for complex DNA alterations -LOH and CNV -and related these genomic alterations to gene expression. To our knowledge, this is the first report to comprehensively address the distribution of complex DNA alterations in ESCC and its relation to gene expression on a genome-wide scale.
Ninety percent of cases showed CNNLOH in their tumors and, over all cases, CNNLOH was found on  The frequency of CNNLOH observed here in ESCC was much less than has been reported in other cancers [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19]. For example, in colon cancer and basal cell carcinoma nearly all LOH was associated with copy number neutral regions [3,10]. In general, CNNLOH occurs with variable frequency in different genomic regions in tumors of different origin. There are several differences between the study reported here and previous studies which likely influenced the results. First, DNA from micro-dissected tumor and adjacent normal was used in the present study, while either cancer DNA without matched controls or cancer cell lines were used in most other reported studies. Second, we examined LOH and CN alterations using the same SNP array platform, while other studies used SNPs for LOH and CGH arrays for CN analyses. Third, the criteria for identifying LOH differed among the studies reported. Finally, the types of cancers studied previously differ from the present study which is the first report of CNNLOH in ESCC.
In previous LOH studies, we reported high-frequency LOH on several chromosome arms, including 3p, 4p, 4q, 9p, 9q, 13q, 17p, and 17q [23,26,27]. By integrating LOH and CN alteration data in the present study, we can now say that the LOH on 3p is primarily due to CN loss LOH, while the LOH on the other seven chromosome arms is predominantly due to CNNLOH.
Our results showed that CNNLOH can change expression levels of genes in ESCC, either increasing or decreasing them. We do not know why CNNLOH changes gene expression, but one possibility is that the two alleles may have different gene expression levels. For example, if allele A expression is greater than allele B, the expression level for the 3 genotypes would be ordered as AA > AB > BB. CNNLOH with retention of two B alleles (genotype BB) would then show lower expression than genotype AB. Conversely, CNNLOH with loss of the allele B would result in two copies of allele A and a higher level of expression than that of AB cells. Another possibility is that the two alleles have different expression due to different epigenetic states, with LOH resulting in copies with two extreme epigenetic states. A third possibility is that one allele harbors a mutation and subsequent LOH leads to a homozygous mutant. Several studies have shown that CNNLOH regions can harbor mutated genes. For example, JAK2 V617F, FLT3-ITD, AML1/RUNX1, WT1, and NPM1 mutations were all found in CNNLOH regions in AML [15]. These various hypotheses merit testing in the future.
The study design in the present study has several important features: (i) we compared CN status between DNA from germ-line and micro-dissected adjacent normal tissue; (ii) we used micro-dissected DNA from tumor tissue; (iii) we assessed both LOH and CN alterations simultaneously using the same array platform; and (iv) we integrated complex DNA alterations and gene expression data on a genome-wide level using both high density SNP and expression arrays in the same cases. A noteworthy weakness of our study is the relatively small number of cases evaluated (including a particularly small number of cases with both LOH and RNA expression data to evaluate, due in part to the 500K chip mean heterozygosity of 27%), which limited our power to detect significant differences in loci between LOH and non-LOH groups. In addition, findings for ESCC from this high-risk region may not be generalizable to populations elsewhere in the world.
In summary, we investigated the distribution of complex DNA alterations in ESCCs at the genome-wide level and determined that CN neutral is the most common CN state in LOH, and that CNNLOH is a very common phenomenon overall. Importantly, we also showed that CNNLOH could alter the expression level of genes affected in ESCC.