Exome Sequencing of Normal and Isogenic Transformed Human Colonic Epithelial Cells (HCECs) Reveals Novel Genes Potentially Involved in the Early Stages of Colorectal Tumorigenesis

Background We have generated a series of isogenically derived immortalized human colonic epithelial cell (HCEC 1CT and HCEC 2CT) lines, including parental un-immortalized normal cell strains. The CDK4 and hTERT immortalized colonic epithelial cell line (HCEC 1CT) is initially karyotypically normal diploid and expresses a series of epithelial cell markers including stem cell markers. Under stressful tissue culture conditions, a spontaneous aneuploidy event occurred in the HCEC 1CT line, resulting in a single chromosomal change leading to a stable trisomy 7 cell line (1CT7). Trisomy 7 occurs in about 40% of all benign human adenomas (polyps) and thus this specific chromosomal change in diploid HCEC 1CT cells appears to be non random. In addition, we have partially transformed the HCEC 1CT line by introducing stable knockdown of wild type APC and TP53, and ectopically introducing a mutant Krasv12 and a mutant version of APC (A1309), all commonly found mutations in colorectal cancer (CRC). Methods Whole exome sequencing and bioinformatic analyses were performed to comprehensively examine the genetic background of these isogenic cell lines. Results Exome sequencing of these experimentally progressed cell lines recapitulates a list of genes previously reported to be involved in CRC tumorigenesis. In addition, sequencing revealed a collection of novel genes specifically detected in 1CT7 and A1309 cells but not normal diploid 1CT cells. Conclusion This study demonstrates the utility of using isogenic experimentally derived HCEC lines as a model to recapitulate CRC initiation and progression. Exome sequencing reveals a collection of novel genes that may play important roles in CRC tumorigenesis.


Background
Colorectal cancer (CRC) is the third most commonly diagnosed cancer and third leading cause of cancer related mortality in the United States. It is well established that sporadic colorectal cancer (CRCs) arises through the acquisition of a series of sequential genetic mutations in both tumor suppressor genes and oncogenes [1]. Mutational activation of oncogenes together with inactivation of tumor suppressor genes (TSG) contributes to colorectal tumor formation. It has been proposed that a minimum of four sequential genetic alterations are required for colorectal cancer evolution, including one oncogene (KRAS) and three TSGs (APC, SMAD4, TP53) as the main targets [2]. The dominant or recessive nature of these genes predict that at least seven mutations (KRAS and six additional ones) are required for complete inactivation of important TSG function [2]. The TSG mutations occur in most tumors, whereas KRAS mutations are found in approximately 50% of sporadic adenomas and carcinomas [3,4]. However, additional changes are required to convert a normal colonic epithelial cell into a malignant carcinoma. While most CRCs have~100 or more genomic changes, many of these are believed to be incidental or "passenger" alterations, and it is estimated that up to 15 "driver" oncogenic changes are required for transforming into full malignancy [5]. Many of these changes are not frequently observed in CRC and thus it remains to be determined which less frequently mutated genes are involved in CRC initiation and development.
Recent advances in next generation sequencing (NGS) technology have allowed for rapid and efficient analysis of causative mutations in rare Mendelian disorders [6]. Several studies have demonstrated the utility of exome sequencing in identifying novel driver mutations in various cancer types [7][8][9][10]. In particular, the whole exome and even the whole genome sequencing of colorectal tumors have delineated a comprehensive mutational landscape of genetic alterations in CRC [5,11,12]. However, the mutational events that contribute to CRC initiation are less well-studied, partly due to the lack of appropriate cellular reagents for validating important changes. We reasoned that examination of the landscape of genomic changes as early events in CRC initiation could be determined by introduction of specific alterations in the background of normal diploid HCECs. In the present study, we applied exome sequencing on a series of isogenically-derived immortalized human colonic epithelial cell (HCEC) lines generated from the same individual with defined genetic manipulations. Analysis of the mutation spectrum of these cell lines reveal expected changes and a list of novel candidate genes that may be involved in early stage of CRC tumorigenesis.

qRT-PCR
Total RNA was isolated from cells using RNeasyMinikit (Qiagen, Chatsworth, CA) according to the manufacturer's protocol. Then 1 µg RNA was converted to cDNA using a First Strand cDNA Synthesis Kit (Roche, Indianapolis, IN). Real-time quantitative PCR reactions were set up in triplicate with Ssofast Master Mix (Biorad, Hercules, CA) and run on a LightCycler ® 480 (Roche, Indianapolis, Indiana).

Whole-exome sequencing
Exome capture using 3 µg of genomic DNA from each cell line was performed using the TargetSeq (TM) Exome Enrichment system (A14061) from Life Technologies according to the manufacturer's protocol. Sequencing was performed on the SOLiD(TM) 5500XL platform. Mapping to the hg19 version of the human genome and single nucleotide variations as well as small indels identification was performed using default settings of the LifeScope software (Life Technologies, Carlsbad, CA). High quality variants (with coverage >=10x and MQV>=20) were annotated and filtered using the SNP and Variation Suite (SVS) version 7 from Golden Helix. Novel and rare variants (with MAF <1%) were filtered against the NHLBI exome project database. SNVs were predicted damaging using the SIFT, Poly-Phen or the Mutation Taster software within the SVS7 pipeline.

Characteristics of the sequenced HCEC lines
The HCEC 1CT line used in these experiments was derived from non-malignant colonic tissue from a patient with a previous history of CRC who was undergoing routine colonoscopy screening. The cells derived from explants were immortalized with ectopic expression of CDK4 and hTERT as previously described [13]. This cell line maintains a stable normal karyotype (46, XY) when continuously propagated in 2% oxygen and medium containing 2% serum [13]. 1CT7 cells were spontaneously generated from 1CT cells after prolonged passage under serum-free condition [14]. Trisomy in chromosome 7 is one of the earliest events occurring in up to~40% of colonic benign adenomas [15][16][17]. 1CT7 cells have enhanced cell migration (in a scratch-wound assay) compared to 1CT cells when cultured under low (2%) oxygen conditions (data not shown). Additionally, 1CT7 cells have significant up-regulation of EGFR and c-Met, which are two chromosome 7-located receptor tyrosine kinase compared to 1CT cells [14]. 1CTRPA A1309 (abbreviated as A1309) is a partially transformed cell line harboring TP53 and APC knockdowns (>90%), as well as ectopic expression of oncogenic KRAS V12 and truncated APC1309, all of which are common mutations detected in CRC tumors [1,2]. This cell line exhibits enhancement in cellular proliferation, anchorage independent growth as well as invasion through Matrigel ® compared with the 1CT line which does not have any detectable tumorigenic characteristics. Both 1CT7 and A1309 cell line are not fully transformed because they lack the ability to form tumors in immunocompromised mice (data not shown).

Exome capture and sequencing results
Exome capture was performed on the three isogenic 1CT cell lines using SOLiD(TM) 5500XL platform. A summary of the sequencing result is provided in Table 1. On average, 57.6 % of the bases were covered to 10X within the targeted bases. After mapping to the hg19 version of human genome (http://genome.ucsc.edu), we obtained the average depth of each read in the target region as 19X, 21X, 11X for each sample. The average number of observed variants for three samples is 11582. To filter out neutral variants, SIFT and Poly-Phen or the Mutation Taster analysis were performed to predict the functional consequences of all the mutations. We focused our analysis on the 240 and 280 genes with a minimum of three "deleterious" variant reads that are specifically mutated in 1CT7 and A1309 cells, respectively (Additional file 1). There are 32 genes altered found in common in both cell types.

Mutation spectrum of the isogenic 1CT cell lines
To examine if the mutations identified in 1CT7 and A1309 cells are relevant in CRC initiation or progression, we compared the high confidence mutations specifically present within 1CT7 or A1309 cells as listed in additional file 1 (at least six "deleterious" reads) with the TCGA CRC tumor dataset for 212 cases ( http://www.cbioportal.org/publicportal/ ). This analysis shows that the 1CT7 specific mutated genes are altered in 30.4% of all CRC cases whereas A1309 specific mutated genes are altered in 73.6% of all CRC cases, among which five genes are known cancer genes, i.e. PBRM1, MYB, PRDM16, BCR and NUP214 (Figure 1). In particular, PTPRT (protein tyrosine phosphatase receptor type T), one of the 1CT7 specific mutated gene that may be involved in cell adhesion is altered in 16.7% of the CRC cases [18]. Another example is the A1309 specific mutated gene CSMD1 (CUB and shushi multiple domain 1), that is a TSG altered in 15.6% of the CRC cases [19]. The other frequently mutated genes, such as SYNE1, MUC16, etc. have been found to be mutated in other cancer types and may be novel candidate driver mutations in early stage of CRC tumorigenesis [20,21].
Previously, exome sequencing of 24 randomly selected colorectal adenomas revealed mutations involved in multiple known CRC related pathways, such as Wnt signaling, cadherin signaling, integrin signaling, inflammation, and angiogenesis [22]. Comparison of cell autonomous specific mutations found in the present study shows the overlap of a subset of the genes implicated in these pathways (Additional file 2). Additionally, the distribution of all the high confidence hits in the present study exhibited similar pattern of biological processes to that of mutations detected in those adenomas (Additional file 3). Among these activities, metabolic, cell communication and transport are the most highly represented processes. Interestingly, a subset of mutations detected in 1CT7 or A1309 cells clustered on chromosome 11p15 (Additional file 4), consistent with a previous report of numerous aberrations detected on chromosome 11 in colorectal adenocarcinoma [23]. Taken together, these results can be interpreted to suggest that existence of trisomy 7 and the other introduced genetic alterations lead to the acquisition of additional mutations that may drive CRC initiation and progression.1CT7 and partially transformed A1309 cells may harbor the genetic background mimicking early stages CRC. In addition, since many of these mutations are detected sequentially in an experimental in vitro manipulated setting, it suggests these mutations occur in a cell autonomous manner and are not dependent on the extracellular microenvironment that occurs in vivo.

Identification of novel candidate genes involved in CRC tumorigenesis
To identify the genetic alterations that may be most relevant in CRC tumorigenesis, we prioritized our candidate genes using the ToppGene suite. This web-based tool has been shown to be a useful portal in identifying novel disease candidate genes [24,25]. We built the training gene set using the 24 colorectal adenoma sequencing data [22] and the test set using the high confidence 1CT7 or A1309 specific mutations. Protein-protein interaction (PPIN)based methods, including K-Step Markov, Hits with Priors, and PageRank with Priors, as well as functional annotation-based prioritization were used for the analyses (Additional file 5). The intersection of the top 20 genes using each method for 1CT7 or A1309 is represented as a Venn diagram in Figure 2. This analysis reveals 13 genes in 1CT7 cells and 14 genes in A1309 cells that can be designated as hits using more than three methods. To investigate whether this collection of top ranked novel candidate genes are potentially important in CRC biology, we compare these 27 genes with the TCGA dataset. We found that the 27 genes are altered in 35% of all the CRC cases and the cases with these alterations show poorer overall survival in Kaplan-Meier Plot analyses (Additional file 6) compared to the cases without these alterations. We then placed the 27 genes as central nodes and overlaid them with the TCGA dataset. This leads to the generation of an interaction network as shown in Additional file 7.
Within this network 15 out of 27 genes interact either directly or indirectly and most, if not all of the interactors are altered in CRC tumors. A subset of these interacting genes are known to be involved in CRC tumorigenesis, such as AXIN2, FBXW7and PIK3CA whereas the rest of the interacting genes do not have established roles in CRC. Further Gene Set Enrichment Analysis (http://www. broadinstitute.org/gsea/msigdb/annotate.jsp) of these 27 genes using C2 (except chemical and genetic perturbation category) curated gene sets reveals the enrichment in multiple pathways, such as EGFR, endocytosis, FGFR, splicesome and apoptosis (Additional file 8). Taken together, these results could be interpreted to suggest that these 27 genes and their interactors may be novel candidate genes that are involved in CRC tumorigenesis. Further mechanistic investigations of these genes and the pathways they are implicated in may give insights into their role in CRC initiation and progression and perhaps the identification of novel therapeutic targets.

Identification of INCENP polymorphism in 1CT isogenic series
The exome capture identified 3 single nucleotide polymorphisms (SNPs) in the INCENP gene in 1CT7 cells and one of the variants, p.M506T is predicted to be deleterious using SIFT analysis (Table 2). INCENP is a member of chromosomal passenger complex (CPC) which also consists of Aurora B, Survivin and Borealin [26]. Overexpression of INCENP is observed in several colorectal cancer cell lines [27]. Validation by Sanger sequencing confirmed that variant p.M506T is present in all 1CT series, i.e. 1CT, 1CT7 and A1309 as well as its pre-immortalized HCEC1 cells and this variant occurs at a highly conservative position ( Figure 3). Interestingly, this INCENP variant does not occur in 2CT cell line which is an independent CDK4 and hTERT immortalized colonic epithelial cell line derived from a patient with no CRC history. This cell line did not acquire trisomy 7 as does 1CT when cultured under the same serum deprived culture conditions. Since INCENP plays important roles in mitosis [28], it is possible that mutations in this gene may be one of the contributing factors that lead to aneuploidy and the occurrence of trisomy 7 cells in 1CT cell population. Further functional investigation is warranted to delineate its potential role in aneuploidy and as an early event in CRC initiation.

Discussion
We performed whole exome sequencing of a series of isogenically derived human colonic epithelial cell lines (HCECs), including the non-cancerous diploid parental 1CT cells, the 1CT7 cells with spontaneously occurring trisomy 7 which is frequently observed as an early event in CRC, and partially transformed A1309 cells harboring commonly found mutations in CRC. On average,~60 % of the bases were covered to 10X within the targeted bases with over 10,000 variants detected in these samples. The reason we chose 1CT as the control cell line but not 2CT or another known independent cell line is because 1CT, 1CT7 and A1309 are isogenically derived from the original pre-immortalized patient cells. Thus, the genes specifically mutated in 1CT7 and A1309 cells are more likely to be candidate "driver" genes instead of "passengers" involved in CRC tumorigenesis. Based on the TCGA datasets examined, the mutations unique to 1CT7 occurs in 30.4% of all CRC cases whereas Figure 2 Venn Diagram comparing the top 20 ranked candidate genes for CRC tumorigenesis derived from 1CT7 and A1309 sequencing data using functional annotation-and protein-protein interaction (PPIN)-based methods. Functional annotation-based prioritization was done using ToppGene server. For PPIN-based methods, K-Step Markov, Hits with Priors, and PageRank with Priors were used.
A1309 specific mutated genes are altered in 73.6% of all CRC cases. 1CT7 is a premalignant cell line containing only one early molecular change that occurs in about 40% of CRC cases. Additionally, many of the mutations found in this cell line are likely to be incidental events. Therefore, it is very likely amplification of chromosome 7 is an important early event in a reasonable fraction of sporadic CRC. The top ranked genes, PTPRT and CSMD1, which are unique to 1CT7 and A1309 cells, respectively, have previously been reported to be mutated in colorectal tumors [7,19]. Comparison of our sequencing data with a previous exome sequencing study for 24 colorectal adenomas reveals the overlap of a subset of genetic mutations involved in CRC related pathways. These results suggest that the existence of trisomy 7 and the introduction of other genetic manipulation can lead to acquisition of additional genetic mutations that may contribute to CRC progression. Interestingly, knockdown of TP53 and expression of K-Ras V12 in 1CT7 cells results in the emergence of trisomy 20, another nonrandom aneuploidy observed in~85% of CRC [14]. Therefore, 1CT7 and partially transformed A1309 cells may harbor the genetic background mimicking susceptibility to early stage colon cancer initiation and progression. These cell lines represent an ideal cell autonomous model to delineate the molecular events that contribute to CRC tumorigenesis.
Utilizing the ToppGene portal, we prioritized the candidate gene list based on protein-protein interactions and other functional annotations. A total of 27 genes are putative CRC hits using more than three methods. Many of these genes are frequently mutated in CRC tumors and patients with alteration in these genes exhibit overall poorer survival. Network analysis of these genes reveals

Figure 3
Validation of "deleterious" INCENP variants in ICT series. Predicted "deleterious" mutation p.M506T was confirmed by Sanger sequencing of PCR products from 1CT series. c. T1517>C mutation (highlighted in red box) was detected in all 1CT series as well as the preimmortalized HCEC1 cells and it occurs at a highly conserved position.
additional and perhaps novel interactors that are also altered in CRC tumors. Therefore, this set of genes and their interacting partners may play important role in CRC tumorigenesis. Epigenetic regulation of gene silencing is another pathway by which tumor suppressor genes are inactivated [29]. Aberrant DNA methylation has been reported to contribute to colon cancer progression through CpG Island Methylator Phenotype (CIMP) [29,30]. We can speculate that 1CT7 and A1309 cell lines may harbor a higher presence of aberrantly methylated genes compared to their normal isogenic counterpart, 1CT cells. Future investigation of the epigenetic signatures of these cell lines compared to preimmortalized normal epithelial cells as well as the authentic CRC samples are warranted.
In conclusion, the present study revealed the comprehensive mutation spectrums of a series of isogenicallyderived HCEC lines. This has led to the identification of known CRC genes as well as a collection of novel candidate CRC genes, demonstrating the potential of utilizing these isogenic HCEC lines to unravel the early cell autonomous events that contribute to CRC initiation and progression. These newly identified important CRC "driver" genes can be potentially utilized as biomarkers for the diagnostic and prognostic applications. A collection of these candidate genes may be further pursued as novel therapeutic targets for CRC prevention and intervention.

Availability of supporting data
Raw sequencing data can be retrieved from DOI: 10.6070/H44M92HV.