Novel SNP improves differential survivability and mortality in non-small cell lung cancer patients

Background Non-small cell lung cancer (NSCLC) is a major cause of cancer-related death worldwide due to poor patient prognosis and clinical outcome. Here, we studied the genetic variations underlying NSCLC pathogenesis based on their association to patient outcome after gemcitabine therapy. Results Bioinformatics analysis was used to investigate possible effects of POLA2 G583R (POLA2+1747 GG/GA, dbSNP ID: rs487989) in terms of protein function. Using biostatistics, POLA2+1747 GG/GA (rs487989, POLA2 G583R) was identified as strongly associated with mortality rate and survival time among NSCLC patients. It was also shown that POLA2+1747 GG/GA is functionally significant for protein localization via green fluorescent protein (GFP)-tagging and confocal laser scanning microscopy analysis. The single nucleotide polymorphism (SNP) causes DNA polymerase alpha subunit B to localize in the cytoplasm instead of the nucleus. This inhibits DNA replication in cancer cells and confers a protective effect in individuals with this SNP. Conclusions The results suggest that POLA2+1747 GG/GA may be used as a prognostic biomarker of patient outcome in NSCLC pathogenesis.

Background Non-small cell lung cancer (NSCLC) is a leading cause of cancer mortality worldwide with over one million deaths annually [1]. It accounts for 75% of lung cancer cases and consists of three major subtypes: adenocarcinoma, largecell carcinoma, and squamous-cell carcinoma [2]. Recent introduction of targeted therapy and increasing numbers of available chemotherapeutic regimens, such as platinums, taxanes and gemcitabine, do not effectively cure NSCLC patients, with varied response towards treatment and occurrence of drug toxicity [3,4]. In addition, prognosis remains dismal in NSCLC patients albeit careful evaluation of clinico-pathological factors that determine patient response to therapy, such as tumor, nodes and metastasis (TNM) staging, performance status, gender and weight loss. The long-term survival rate is low with only 14% of patients surviving five years after diagnosis [5] and the risk for relapse is high.
Gemcitabine is a third generation chemotherapeutic agent that has shown activity in NSCLC. Preclinical studies have shown that the compound is a potent radiosensitizer, with response in stage III NSCLC [6]. Gemcitabine can be administered as a single agent, or in platinum and non-platinum combination. The agent can also be combined with the chemotherapy drug pemetrexed, as well as the vascular endothelial growth factor (VEGF) inhibitor, for adenocarcinoma NSCLC. Due to its significant benefit and advantageous toxicity profile, gemcitabine has since evolved to become one of the most commonly used agents for lung cancer chemotherapy.
In recent years, much effort has been expended to identify genetic determinants in patient outcomes, so as to improve clinical treatment decisions and for the design of therapeutic agents. The epidermal growth factor receptor (EGFR) mutations, for instance, are common in patients with NSCLC [7], and are known to confer survival benefit and better clinical outcome when treated with EGFR tyrosine kinase inhibitors (TKIs) [8,9]. To date, no known genetic variants have been reported, that could help determine the dose and clinical outcomes in NSCLC patients receiving gemcitabine chemotherapy.
Here, we studied the polymorphism of genes involved in gemcitabine transport, metabolism and activity, based on their association to patient outcome after gemcitabine therapy. We showed for the first time that the single nucleotide polymorphism (SNP) POLA2+1747 GG/GA (rs487989) is a key determinant of mortality and survival outcome in gemcitabine-treated NSCLC patients. The POLA2 gene encodes DNA polymerase alpha subunit B in humans, which is involved in the initiation of chromosomal DNA replication [10][11][12]. The SNP causes DNA polymerase alpha subunit B to localize in the cytoplasm instead of the nucleus. This inhibits DNA replication in cancer cells and confers a protective effect in individuals with this SNP. The results suggest that POLA2+1747 GG/ GA (rs487989) may be used as a prognostic biomarker of patient outcome in NSCLC pathogenesis.

Results and discussion
Association of genotypes and the mortality of NSCLC patients after gemcitabine therapy How genetic variations affect the survival outcome of NSCLC patients after gemcitabine therapy is im-portant for improved clinical treatment decisions and for the design of therapeutic agents. Using Fisher's exact probability test and chi-squared test, we show, for the first time, that POLA2+1747 GG/GA (rs487989) is the most statistically significant SNP to be associated with mortality (Table 1), with a P value of 0.0406. The POLA2 gene corresponds to the p68 subunit of mouse DNA polymerase alpha, which couples the catalytic subunit of polymerase alpha to the primases, and translocates the polymerase alpha/primase complex from the cytoplasm to the nucleus for chromosomal DNA replication [13].
POLA2+1747=GG/GA imposes differential effects on mortality and survival time of NSCLC patients after gemcitabine therapy Using conditional probability test, we studied the effects of this POLA2 variant on mortality of patients after gemcitabine therapy. We found that the probability of death (P value = 0.0128) for patients with wild-type GG genotype is significantly higher (89.19%) than those with GA variant (50%).
Next, we studied the interactions among these 21 non-synonymous SNPs and how they impact the overall survival time of NSCLC patients. Table 2 shows the top ranked SNP-SNP interactions (based on their P value) that are associated with overall survival times of NSCLC patients. Among them, statistically significant differences in survival time between wild-type GG and variant GA genotypes are observed in POLA2+1747 GG/GA together with SLC28A2+65 CC (P value = 0.0004), and POLA2+1747 GG/GA together with SLC28A2+225 CC (P value = 0.0010). It is noteworthy that all the topranked SNP-SNP interactions listed in Table 2 involved POLA2. This clearly emphasizes the im-portance of POLA2 as a potential biomarker. These significant differences in survival time are depicted in Kaplan-Meier plot as shown in Figure 1. Patients with POLA2 +1747=GA variant exhibit improved overall survival, as compared to their GG counterparts.
POLA2+1747 GA together with SLC28A2+65 CC are observed to be associated with increased median survival time ( Figure 1A). For POLA2+1747 GG/GA together with SLC28A2+65 CC, the median overall survival time of patients is 7.39 months and 13.18 months in patients with GG and GA genotypes, respectively (P value = 0.0004). Likewise, we also observed that the POLA2 +1747 GA together with SLC28A2+225 CC are associated with increased median survival time ( Figure 1B).

Computational prediction of functional effects of POLA2 G583R
PolyPhen-2 [14] predicted POLA2 G583R (rs487989) to be "possibly damaging" with a score of 0.474 (sensitivity: 0.89; specificity: 0.90) and SNAP (Bromberg et al., 2008) suggested it to be "non-neutral". In contrast, when we used either the rsid or our alignment of orthologues (Additional File 1: Fig S1) as input for SIFT (Ng and Henikoff, 2001), the prediction was both "tolerated" with scores 0.45 and 0.50, respectively (score threshold <0.05 for "deleterious"). Figure 2A shows the homology model of POLA2 in complex with the carboxyl-terminal domain of DNA polymerase alpha (as seen in the yeast crystal structure). The SNP is located far from the evolutionary widely conserved interaction surface of the catalytic subunit in an opposite surface loop. This region is not particularly conserved among remote orthologues ranging from human to yeast and plants. Hence, the functional importance, if any, would be restricted to a subset of species more closely related to human. When we used FoldX (Schymkowitz et al, 2005) to predict the effect of the SNP on protein structure stability, interestingly, the average free energy change of the SNP was significantly elevated (3.79 kcal/mol, SD = 0.31) which means that the SNP has a destabilizing effect on the protein structure ( Figure 2B). Since this result was derived from an energy minimized but static homology model, we wanted to see if this effect can also be reproduced in a dynamic model through molecular dynamics simulations. Indeed, 5 repetitions each of wildtype and mutant POLA2 simulations over 10ns in explicit water showed that the surface loop region harbouring the mutation is consistently destabilized and more flexible through G583R Additional File 2: Fig S2 and Additional File 3: S3). In the absence of presumably catalytic residues around the mutation site, the surface-exposed nature and taxon-restricted conservation of the loop would suggest a possible role for this region in protein interactions. Based on our structural modelling and simulations, the altered conformation and increased flexibility through the G583R mutation could potentially disrupt protein interactions. It is known that complex formation with various partners can influence POLA2 nuclear shuttling. Therefore, we hypothesized that a possible functional effect of G583R could be altering POLA2 localization in the cell.     Figure 3). Untransfected HEK 293 cells show no green fluorescence from GFP. For the pEGFP-N3 control, the green fluorescence protein is predominantly localized in the cytoplasm. We find that wild type DNA polymerase alpha subunit B is localized in the nucleus (DAPI-stained), whereas, the DNA polymerase alpha subunit B mutant (POLA2 G583R) is predominantly localized in the cytoplasm.

Discussion
In a recent study on genes involved in gemcitabine pharmacology in ethnic Asian populations [15], we reported on the use of a statistical approach to examine associations between genotypes and the outcome of NSCLC patients including response rate, time to progression, gemcitabine toxicity and overall survival.
We have now extended the study to another aspect of NSCLC patient outcome that was not examined previously, i.e. mortality, and also shown here that the POLA2+174 GG/GA (rs487989) is strongly associated with mortality rate and survival time among NSCLC patients treated with gemcitabine. We have previously shown that this particular SNP by itself did not have a significant effect on survival time [15]. Now, we found that its interaction with SLC28A2+65 CC and SLC28A2+225 CC led to an increase in the overall survival times of NSCLC patients. This POLA2+1747 variant (rs487989) is not only present in the European and African populations [16], but is also prevalent in the Asian population among Chinese, Indians and Malays [15]. This SNP encodes for a glycine to arginine amino acid change (G583R), where G is an ancestral allele, resulting inside chain polarity and charge reversal. Here, we showed, through biostatistics, that individuals with the ancestral allele G for POLA2 tend to have lower survival rates in NSCLC pathogenesis, compared to individuals with GA polymorphism. To unravel possible molecular mechanisms of functional effects of this mutation, we utilized multiple computational approaches based on evolutionary conservation, structural modelling and molecular dynamics simulations. Given its location in a surface loop of the structure and causing flexible rearrangements of this surface area, we hypothesized that it could disrupt protein interactions which may be important for subcellular localization. Indeed, we experimentally showed that this point mutation is functionally significant, leading to a change in localization that is likely to affect regulatory activity and induces better survival in NSCLC patients treated with gemcitabine. The wild type POLA2 that is known to facilitate nuclear DNA replication is predominantly found in the nucleus, whereas the mutant POLA2 G583R protein [12] that is strongly associated with better survival in NSCLC patients is mainly localized in the cytoplasm. DNA polymerase alpha subunit B is required for cell viability [12]. By localizing in the cytoplasm, nuclear DNA polymerase alpha activity is inhibited. This confers a protective effect in NSCLC patients who possess the POLA2+1747 GG/GA SNP genotype, as the tumour DNA could not replicate. This inhibits tumour cell proliferation, and ultimately results in tumour cell death.

Conclusions
In summary, we established that the POLA2+1747 GG/ GA (rs487989) is a genetic determinant of clinical outcomes in NSCLC patients receiving gemcitabine treatment. EGFR mutations are used for profiling NSCLC patients treated with EGFR tyrosine kinase inhibitors, and similarly, the findings in this article can become a stepping stone for the discovery of new options for gemcitabine-based therapy. Due to the lack of genetic variants that could help determine the dose and clinical outcomes in NSCLC patients receiving gemcitabine chemotherapy, such biomarkers would be useful for doctors in treating patients more efficiently to achieve satisfactory clinical outcome and better survival.

Study population
The study population consists of 43 NSCLC Chinese patients from our previous study [15]. Table 3 gives more details about the study population used in this work.

Statistical Analyses
Fisher's exact probability test was used to assess the relationship between each of the 21 SNPs and the mortality of 43 NSCLC patients based on the p-values between genotypes. Conditional probability of death given a geno-type of a SNP was used to characterize the differential effects on mortality. Chi-squared test was employed to confirm the significance (P value) of the difference between genotypes. Differences were considered statistically significant when the P value was less than 0.05. All statistical tests were two-sided. Kaplan-Meier method and log-rank test were used to compare overall survival time for interaction pairs. SPSS software version 14.0 (SPSS Inc., Chicago, IL) was used.

Bioinformatics analysis
In order to investigate possible effects of POLA2 G583R (POLA2+1747 GG/GA, dbSNP ID: rs487989) in terms of protein function, we analysed the mutation with Poly-Phen-2 version 2.2.2 using the rsid (rs487989) of POLA2 G583R as input, SNAP using the amino acid Isolation of total RNA from HEK 293 cell culture HEK 293 cells were lysed directly in a 10 cm culture dish using TRIZOL ® Reagent (Invitrogen, Carls-bad, CA, USA). Total RNA was isolated and used for further Reverse transcription of total RNA and POLA2 gene amplification by PCR Using SuperScript™ III One-Step RT-PCR System with Platinum ® Taq High Fidelity (Invitrogen), total RNA was reverse transcribed into complimentary DNA (cDNA), followed by amplification of the wild type POLA2 using forward primer with Kpn1 restriction site; 5'-AAGGTACCATGTCCGCATCCGCCCAGCA-3' and reverse primer with BamH1 restriction site; 5'-AAG-GATCCGATCCTGACGACCTGCACAGCA-3'. Amplicon size was verified by running the PCR product and GeneRuler™ 100 base pairs DNA ladder on 0.8% agarose gel.

Cloning of POLA2 amplicon and sequence verification
The POLA2 amplicon was cloned into pGEM ® -T easy vector (Promega, USA), and subsequently transformed into Escherichia coli DH5α bacteria. The plasmid DNA was extracted and purified using QIAprep Spin Miniprep Kit (QIAGEN, Germany). Next, the concentration and purity of plasmid DNA was measured using Nano-Drop (Thermo Fisher Scientific, USA). The resultant plasmid was digested with EcoRI and ran on 0.8% agarose gel to identify recombinant clones. Integrity of the wild type POLA2 constructs was verified by sequencing. Point mutation was introduced into POLA2 using the XL QuikChange Site-Directed Mutagenesis Kit, and POLA2+1747 GG/GA were verified by sequencing. Wild type POLA2 and POLA2+1747 GG/GA were then cloned into pEGFP-N3 (Clontech, USA) at Kpn1 and BamHI restriction sites.

Western blotting
Equivalent volumes (20 µl) of cell lysates were loaded onto 8% SDS-PAGE gels to resolve proteins. Then, proteins were transferred onto PVDF membrane and blocked using 5% non-fat dry milk for 1 hour to reduce non-specific binding and incubated overnight at 4°C with mouse anti-GFP (Roche) at 1:2500 dilution in 0.5% non-fat dry milk followed by Goat anti mouse IgG-HRP (Santa Cruz Biotechnology, US) at 1:10000 dilution in 0.5% non-fat dry milk for 1 hour at room temperature. Immunoblots were developed using Amersham™ ECL™ Prime Western Blotting Detection Reagent (GE healthcare, Sweden), following the manufacturer's protocol.