Gene network-based analysis identifies two potential subtypes of small intestinal neuroendocrine tumors

Background Tumor transcriptomes contain information of critical value to understanding the different capacities of a cell at both a physiological and pathological level. In terms of clinical relevance, they provide information regarding the cellular “toolbox” e.g., pathways associated with malignancy and metastasis or drug dependency. Exploration of this resource can therefore be leveraged as a translational tool to better manage and assess neoplastic behavior. The availability of public genome-wide expression datasets, provide an opportunity to reassess neuroendocrine tumors at a more fundamental level. We hypothesized that stringent analysis of expression profiles as well as regulatory networks of the neoplastic cell would provide novel information that facilitates further delineation of the genomic basis of small intestinal neuroendocrine tumors. Results We re-analyzed two publically available small intestinal tumor transcriptomes using stringent quality control parameters and network-based approaches and validated expression of core secretory regulatory elements e.g., CPE, PCSK1, secretogranins, including genes involved in depolarization e.g., SCN3A, as well as transcription factors associated with neurodevelopment (NKX2-2, NeuroD1, INSM1) and glucose homeostasis (APLP1). The candidate metastasis-associated transcription factor, ST18, was highly expressed (>14-fold, p < 0.004). Genes previously associated with neoplasia, CEBPA and SDHD, were decreased in expression (-1.5 – -2, p < 0.02). Genomic interrogation indicated that intestinal tumors may consist of two different subtypes, serotonin-producing neoplasms and serotonin/substance P/tachykinin lesions. QPCR validation in an independent dataset (n = 13 neuroendocrine tumors), confirmed up-regulated expression of 87% of genes (13/15). Conclusions An integrated cellular transcriptomic analysis of small intestinal neuroendocrine tumors identified that they are regulated at a developmental level, have key activation of hypoxic pathways (a known regulator of malignant stem cell phenotypes) as well as activation of genes involved in apoptosis and proliferation. Further refinement of these analyses by RNAseq studies of large-scale databases will enable definition of individual master regulators and facilitate the development of novel tissue and blood-based tools to better understand diagnose and treat tumors. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-595) contains supplementary material, which is available to authorized users.


Background
Neuroendocrine neoplasms (NENs) or NETs represent 1-2% of all neoplasia and are comparable in incidence to testicular cancer, gliomas and Hodgkin's lymphoma [1]. The most common variety, constituting approximately 29% of all NETs, develops within the small intestine or "midgut" and are the most common tumor of the small intestine [2,3]. Although previously considered to be benign, they are indolent cancers (~60% overall five year survival rate) exhibiting a better survivals than adenocarcinomas of the same location [2,4]. Although their biological behavior is generally non-aggressive, metastatic invasion is evident in 50% of tumors <1 cm [2]. The modest prognosis reflects the inherent clinical difficulty in diagnosis of small intestinal malignancy; disease may often have been present for some time before identification [2].
NETs are considered to be derived from neuroendocrine cells within the diffuse neuroendocrine system [5]. Like normal neuroendocrine cells, tumors exhibit a functional secretory apparatus e.g., chromogranins and proteins involved in amine uptake e.g., VMATs, as well as vesicular trafficking and fusions e.g., SNAP25 [6][7][8][9]. In addition, well-described signaling pathways involving G-protein coupled receptors such as somatostatin and dopamine have been defined e.g., cAMP/PKA [10,11]. These have provided the basis for establishment of a histological classification, the development of targeted agents e.g., peptide receptor radiotherapy, as well as imaging strategies that utilize identification of cellular amine uptake mechanisms [12,13]. The transcriptomic basis of tumor development and malignancy, however, remains largely unknown.
Chromosomal-based studies [14,15] e.g., CGH and high resolution SNP arrays [16] and molecular profiling through exome analyses have identified alterations e.g., loss of 18q22-mer [17,18] or SMAD4 LOH [19], that may be associated with neuroendocrine neoplasia. Similarly, gene expression profiling has identified a plethora of "marker genes" that include NAP1L1 [20], NKX2-3 [21], TGFβR2 [22] and CD302 [23]. However, no studies have been undertaken to generate an integrated molecular view of these neoplasmsthe "interactome". The relevance of such an analysis is that the delineation of the transcriptome, as a global measure, offers a complete overview of the cellular machinery at an RNA levelthe cellular "toolbox". This information provides the basis whereby network analysis can be utilized to identify specific interactive pathways associated with e.g., proliferation and metastasis rather than individual components. The establishment of the integrative pathways regulating the biological functions that constitute malignancy will likely have substantial translational applications.
Transcriptomic analysis can thus be utilized to provide a better understanding of tumor development as well as neoplasia. Such analyses have been demonstrated to be of considerable utility in other tumor types e.g., breast, particularly when translated to the clinical setting. Thus, considerable advance has occurred by upgrading histopathology, where gene-based analyses have allowed for the development of PCR-based arrays as well as custom-built chips to assess breast cancer classification [24][25][26], metastases [27] as well as predict therapeutic responsiveness [28]. Circulating tumor cells can readily be detected through PCR applicationssuch approaches appear to be more sensitive than current capture-based techniquesand may be more informative especially because multiple, biologically informative genes identified from RNA analyses can be assessed e.g., in non-small cell lung cancer [29], prostate cancer [30] or colon cancer [31]. Finally, a logical framework for the development of therapeutic targets can be generated through in silico-based reverse engineering of transcriptome datathis has previously been used to identify signaling pathways e.g., CREB targets [10] as well as master regulatorscardinal, potentially targetable genes that regulate nodes in pathways [32,33].
Given the absence of any large-scale transcriptome study and the lack of analytical homogeneity between different NET transcriptome studies, we reanalyzed two publically available small intestinal NET microarray datasets [20,21] (ArrayExpress: E-GEOD-6272/E-TABM-389). In order to identify genes that constitute the intestinal "NETwork", we used a strategy that included stringent quality control techniques consistent with differential expression and validated network-based approaches [10,[34][35][36]. Thereafter, we undertook qPCR to corroborate transcript alterations in candidate targets in an independent collection of NETs. Finally, we screened public databases (e.g., [37]) and published literature (e.g., [38]) to focus on validated signaling pathways and critical transcription factors. This approach allowed us to confirm or reconsider known disruptions in signaling pathways in small intestinal NETs and identify pathways involved in development as well as novel transcription targets with putative therapeutic and biomarker potential.

PCR validation in independent set
qPCR analysis confirmed up regulated expression of 13/15 (87%) genes in small intestinal NETs compared to normal   Table S1 and Additional file 6: Table S2). E. QPCR analysis of secretome-related transcripts in the independent set identified significant over-expression of all eight genes (ranging from APLP1 to SCN3A). *p <0.05 vs. normal mucosa. 3F. QPCR analysis of highly expressed transcripts in the independent set identified significant over-expression of ADCY2, AKT3 and ST18. Mean ± SEM, *p < 0.05 vs. normal mucosa. Tumors n = 13, normal mucosa n = 8.

Discussion
The precise basis of small intestinal tumor genomic profile has proven to be a complex subject and an integrated, cellular transcriptomic appreciation of neuroendocrine tumors has heretofore not been possible. This reflects a number of issues namely the paucity of studies available, the low number of tumor samples analyzed, the divergent analytical tools utilized and dissimilar focuses of the investigative groups e.g., focus on identifying metastatic genes [20]. We sought to define the issue using an integrated transcriptome analysis based on gene network-approaches that has successfully been proven to identify associations not previously apparent [10,[34][35][36]. Additionally, while it is likely that the current paradigm in tumor sequencing calls for tumor samples to be matched with control samples from the same individual [44], we hypothesized that comparing diverse population may shed light on tumorspecific behavior rather than on sample-specific behavior.
Overall, the information derived (from two independent datasets) demonstrates four areas of novelty and considerable interest. Firstly, expression of core regulatory secretory regulatory elements, including genes involved in depolarization, was identified. The data therefore provide a complete overview of genes involved in regulated secretion and demonstrate the conservation of secretory apparatus in these tumors. Secondly, a set of transcription factors associated with neurodevelopmental processes including INSM1, NKX2-2 and BEX1 were identified indicating that the regulation of neuroendocrine differentiation occurs in tumors and that aberrations of this process may be of biological relevance in the evolution of the neoplastic phenotype. Thirdly, we confirmed loss of SDHD expression, a phenomenon associated with "benign" conditions in other tumors e.g., paragangliomas [39]. Finally, our data may suggest that at a genomic level small intestinal NETs may be distinguished by at least two distinct, secretory subtypes, serotonin-producing neoplasms and serotonin/substance P (TAC1/tachykinin)-producing lesions. As such, this is supported by previous studies in small intestinal NETs with "carcinoid syndrome" i.e., produce excess serotonin which suggests at least two subtypes of tumors. These include: 1) the demonstration that elevated luminal concentrations of substance P (secreted from mucosal sources) are only measured in 12% of patients [45]; 2) fasting circulating substance P concentrations are elevated in <20% of carcinoids [46]; and 3) at least two distinct serotonin producing NET lesions have been identifiedserotonin producing NETs in the pancreas are TAC1/substance P negative [47].
Serotonin/substance P (TAC1)-secreting tumors (Set 2) A reanalysis of the microarray data [21] identified overexpression of common genes with Set 1 including APLP1, SCN3A, BEX, INSM1 and ST18. However, the most highly and uniquely expressed gene was TAC1, or substance P/tachykinins. Our secretory subnetwork analysis suggests that these tumors may not be classical serotonin-producing lesions.

Combinatorial-analysis
This interactome assessment of the highly expressed genes identified canonical elements of secretory regulation including secretogranins, vesicle trafficking and hormone processing. The chromogranins (CgA and CgB), secretogranins (secretogranin II and secretogranin III), and additional related proteins e.g., PCSK1 and 2 (which are found within dense core secretory granules in endocrine and neuroendocrine cells and process several hormones and neuropeptide precursors), PNMA2 (a secreted protein that may generate autoantibodies [48]), APLP1 (which colocalizes with APLP2 and synaptophysin [49]), as well as carboxypeptidase E (CPE) have essential roles in the regulated secretory pathway or as products of this pathway [50]. Elevated expression of these genes was confirmed by qPCR in an independent set and provides evidence corroborating the secretome fingerprint of the tumor cells. Of interest was the identification of high expression of SCN3A (Nav1.3). This tetrodotoxin-sensitive voltage-gated sodium channel gene mediates membrane depolarization in excitable cells [51]. This suggests that this gene may be involved in regulating aspects of neuroendocrine secretion which mechanistically require a depolarization event. It is clinically well recognized that small intestinal tumors are sensitized to paroxysmal increased release of serotonin or substance P/tachykinins by secretagogues [52]. In this respect, Nav1.3 is increased in expression following nerve injury with the concomitant phenomenon of hyperalgesia in dorsal root ganglia [53]. We speculate that this elevated expression of Nav1.3 in neuroendocrine tumors may be related. An assessment of the twenty-nine enteroendocrinerelated transcription factors [38] identified that ST18, INSM1 and NKX2-2 were commonly expressed in both tumor sets. ST18 (Myt3) is a candidate tumor suppressor in breast cancer; ectopic expression in MCF-7 breast cancer cells strongly inhibits colony formation in soft agar and the formation of tumors in a xenograft mouse model [54]; it is also known to function as an proapoptotic effector [55]. This gene, however, is involved in neuronal differentiation [56] as well as in normal pancreatic islet cell development [57]. Interactome analysis of small intestinal NET transcriptomes identified neuroendocrine developmental pathways to be a key feature of these lesions. INSM1, NKX2-2, and NEUROD1 were all identified to co-exist and elevated expression levels of these genes were confirmed by qPCR. Identification of other genes for example, TBX family members, in each transcriptome dataset supports a common activation of developmental pathways in these lesions and suggested the existence of a network of transactivating factors that function together to regulate the neuroendocrine phenotype. Further support for this is provided by overexpression of BEX1 which is considered a regenerationassociated gene [58] and may be involved in tumorigenesis [59]. Bex1 is epigenetically activated in neurosphere cells and is considered relevant as a marker of reactivation of stem cell and pluripotency-associated genes; Bex1 expression enlarges the differentiation potential of precursor cells [60]. These data suggest that transcription factors that regulate neuroendocrine cell development or lineage specification are upregulated in neuroendocrine tumors as has been noted in lung tumors [61]. This may indicate an active control of the neuroendocrine phenotype in tumors but also raises the question as to whether an abnormal phenotype (i.e. less well-differentiated tumor) could occur as a consequence of a disruption in the TFs (e.g., through methylationmediated repression) that co-ordinate the neurodevelopmental pathway. A similar phenomenon has been identified for tumor progenitor cells in small cell lung cancer [62].
At a developmental level, INSM1, apart from regulating neural and olfactory development [63], is essential for proper specification of both gastrointestinal and pancreatic endocrine cells [64] through interruption of cell cycle signaling, and cellular proliferation inhibition [65]. Endocrine transdifferentiation in BON cells is mediated by INSM1 through activation of NGN3 [66]. The plasticity of the neuroendocrine phenotype is controlled by NKX2-2 which regulates cell fate choices within the intestinal enteroendocrine population [67]. When this transcription factor is down-regulated, pancreatic alphaand beta-cell development is impaired; the ghrelinexpressing cell population, in contrast, is augmented [68]. Upregulation of NKX2-2 is considered one of the primary regulatory events required for the maintenance of beta-cell identity [69]. Although the precise role of these genes in NETs is unclear, given the known roles in neuroendocrine development, it seems plausible that activation of neuroedevelopmental pathway (s) can be implicated in NET proliferation. INSM1, at least, functions through disruption of the cell cycle by targeting the CDK4/CyclinD1 complex.
A second gene linked to this complex is CEBPA (CCAAT/enhancer binding protein alpha (C/EBPalpha). This is a basic/leucine zipper transcription factor that integrates transcription with proliferation to regulate the differentiation of tissues involved in energy balance. In the pituitary, C/EBPalpha functions to prolong the cell cycle in G1 and S in pituitary progenitor cells [70]. An assessment of the 487 genes in the COSMIC database verified to be associated in a dominant or recessive fashion with cancer identified that CEBPA was downregulated in both NET groups we studied. QPCR confirmed decreased expression of this gene (~50% of mucosal expression). Loss of function of this gene is associated with AML and MDS, largely through regulation of differentiation; this gene product inhibits CDK2/4 and the cyclin D1 pathway [71]. We postulate that a similar mechanism exists in small intestinal NETs; elevations in cdks and cyclin expression are well-recognized in NETs particularly as a consequence of IGF-1 stimulation [72]. It is noteworthy that inhibition of proliferation using interferons specifically inhibits these effectors in vitro [73].
A consistent loss or decrease in expression of SDHD, a recessive gene involved in paragangliomas, was noted in both tumor sets. Mutations in SDHD result in loss of complex II function and are associated with loss of stabilization of HIF1 under normoxia and generation of reactive oxygen species [74]. Mutations in this gene are considered to result in a "benign" phenotype in paraganglioma, the mechanisms of which are considered to be due to activation of cellular hypoxia responses [39]. Although no mutations have been detected in SDHD in intestinal NETs [75], LOH has been identified in~30% of lesions [76]. Interestingly, LOH alone could lead to a complete loss of function since SDHD is an imprinted gene [39]. QPCR, in an independent dataset, confirmed decreased expression (~50% of normal mucosal levels) of SDHD indicating a potential role for hypoxia in intestinal tumor biology.

Conclusions
We have identified two subtypes of intestinal neuroendocrine tumors, both associated with metastases, that express common signaling pathways involved in neuroendocrine secretion, nervous system and neuroendocrine development, as well as hypoxia and cyclin/CDK4 regulation. Transcriptome analyses have previously been leveraged to identify markers either of metastases [77] or blood-based antigens [48] or circulating transcripts [78]. The latter has evolved from a single transcript approach to a multiple gene screen -51 marker genesthat are closely correlated with neuroendocrine tumor biology [79] and overlap with genes e.g., APLP1 family, PNMA2 and CD59, in the current study. Detection of this enhanced gene signature has been shown to be significantly more effective than measurements of chromogranin A by ELISA as a peripheral blood tool for detecting NETs [79]. In addition, because it is based on assessment of multiple NET transcriptomes it is also effective at identifying all gastroenteropancreatic lesions irrespective of the organ of origin and tumors including in the absence of metastasis.
This manuscript provides an integrated transcriptomic view of small intestinal neuroendocrine tumors and identifies that these lesions are regulated at a developmental level, have key activation of hypoxic pathways (a known regulator of malignant stem cell phenotypes) as well as activation of genes involved in apoptosis and proliferation. Further analyses and leverage of these data should provide novel tissue and blood-based tools to better understand, diagnose and ultimately treat these neoplasms.

Methods
Please refer to the Additional file 1: Supplementary Methods for detailed description of computational protocols.

Gene expression arrays and independent validation set
All samples were collected following informed consent and analyzed according to Ethics Committee requirements of Yale University (IRB: 0805003870; expires 6/ 18/2015) in accordance with the World Medical Association Declaration of Helsinki regarding ethical conduct of research involving human subjects [79]). Clinical details regarding the three samples sets are included in Table 3. No statistically significant differences were noted in distribution of gender, age or treatment received between each of the sets.

Gene expression analyses
Individual analyses were performed using the web-based GeneProfiler tool (GeneProfiler, Bering Limited http://beringresearch.com/). Primary tumors were compared with non-matched normal mucosal samples. Sample set 1 consisted of 22,283 probes and 12 arrays, while sample set 2 consisted of 54,675 probes and 12 arrays. Probe sets that were unlikely to be reliable were eliminated using detection of Present/Absent calls. Probes present in more than 50% of samples were retained [80]. Raw probe intensities were normalized using the Robust Microarray Average (RMA) approach [81]. Array outlier detection was performed in the arrayQualityMetrics package [82] using the Kolmogorov-Smirnov statistic between each array's distribution and the distribution of the pooled data. To enhance microarray annotation, probe identifiers (IDs) were mapped to Entrez Gene IDs (accessed April 7, 2013) [83]. In cases were multiple probes mapped to the same Entrez ID, the average probe intensity was calculated. Probes without an Entrez record were removed from analysis. Genes that were consistently identified as differentially expressed using multiple ranking algorithms [84] (fold change ranking, ordinary t-statistic, shrinkage t-statistic, limma, significance analysis of microarrays) were called significant and retained for further analysis. This approach ensured that differential expression analysis was: 1) unbiased, and 2) consistent across different array platforms.

Functional gene expression analysis
Differentially expressed genes were enriched for Gene Ontology (GO) Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) terms using the topGO  Bioconductor package [85]. To ensure enrichment accuracy, terms with fewer than 10 assigned genes were not included in the analysis. Differentially expressed genes were also assessed at the Reactome pathway level (version 47) [86] using model-based gene set enrichment analysis [87]. For secondary analyses of selected genes, expression of genes relevant to carcinoma were assessed using the Sanger COSMIC database [37], while candidate enteroendocrine transcription factors were assessed against murine orthologs identified through transcriptome profiling of highly enriched populations [38]. The aim of these analyses was to assess the capacity to which differential expression analysis could identify previously known oncogenes and transcription factors.

Protein-protein interaction network analysis
Differentially expressed genes (seed nodes) were mapped to human interactions obtained from the BioGRID database (version 3.2.109, n = 15,068 proteins and n = 124,370 interactions) [88]. High-scoring differential subnetworks were extracted and visualized to identify putative signaling regulators (see Additional file 1: Supplementary Methods, Additional file 2: Figure S1, Additional file 3: Figure S2 and Additional file 4: Figure S3 for a full description of the methods). Briefly, for each differential expression analysis, network nodes were assigned a weight of -log 10 (p-value). Subsequently, all shortest paths were calculated between seed nodes. Each shortest path was assigned a weight, expressed as the sum of nodes on that shortest path. A subnetwork was extracted by selecting seed nodes and "linker" nodes that fell on the highest weighted shortest path between the seed nodes.
Pairwise interaction network similarity was assessed by network community detection and subsequent calculation of inter-community similarity. For each network, protein communities were identified by optimizing the network modularity [89] (Additional file 1: Supplementary Methods, Additional file 2: Figure S1, Additional file 3: Figure S2 and Additional file 4: Figure S3). Similarity between protein communities was expressed using the Jaccard coefficient, computed as a ratio of the number of common proteins in any two network communities to the total number of proteins in these communities. Disparate and identical communities would correspond to Jaccard coefficient of 0 and 1 respectively.

Real-time PCR validation (Independent Set)
To validate candidate genes, we measured transcript expression in an independent Set 3 (SI NETs: n = 13, normal mucosa: n = 8) using real-time PCR. RNA was extracted (TRIZOL®, Invitrogen, USA) [90,91] and real time RT-PCR analysis was performed using Assays-on-Demand™ products and the ABI 7900 Sequence Detection System according to the manufacturer's suggestions [90,91]. Primer probe sets are included in Table 4. Cycling was performed under standard conditions (TaqMan Universal PCR Master Mix Protocol) and data normalized (using ALG9 and the ΔΔC T method (Microsoft Excel). Non-parametric Mann-Whitney and Spearman correlations were used to compare samples and the Fisher's test was used for binary comparison (GraphPad Prism 5).