Investigation gene and microRNA expression in glioblastoma
© Dong et al. 2010
Published: 01 December 2010
Skip to main content
© Dong et al. 2010
Published: 01 December 2010
Glioblastoma is the most common primary brain tumor in adults. Though a lot of research has been focused on this disease, the causes and pathogenesis of glioblastoma have not been indentified clearly.
We indentified 1,236 significantly differentially expressed genes, and 30 pathways enriched in the set of differentially expressed genes among 243 tumor and 11 normal samples. We also indentified 97 differentially expressed microRNAs among 240 tumor and 10 normal samples. 22 of which have been reported to affect glioblastoma and 50 of which were implicated in other cancers and brain diseases. We regressed gene expression on microRNA expression in 237 tumor tissues and 10 normal tissues comprehensively. We found two experimentally validated microRNA targets and 1,094 miRNA-target gene pairs in our datasets which were predicted by miRanda algorithm, 8 of the target genes were tumor suppressor genes and 3 were oncogenes. Further function analysis of target genes suggested that microRNAs most frequently targeted genes associated with Cell Signalling and Nervous System.
We investigated gene and microRNA Expression in Glioblastoma and gave a comprehensive function study of differential expressed gene and microRNA in glioblastoma patients. These findings gave important clues to study of the carcinogenic process in glioblastomas.
Glioblastoma Multiforme (GBM) is the most common and most aggressive type of primary brain tumor, accounting for 52% of all primary brain tumor cases and 20% of all intracranial tumors . Primary GBM arise de novo, without any history of pre-existing lower-grade tumor, whereas secondary GBM have clinical, radiologic, or histopathologic evidence of malignant progression from pre-existing lower-grade tumor . In the past two decades, the molecular mechanisms, genetics and paths to treatment of Glioblastoma have extensively been studied . However, the causes and pathogenesis of glioblastoma have not been indentified clearly. With the continuing improvement of high-throughput genomic technologies, it is now feasible to survey human cancer genomes comprehensively. The Cancer Genome Atlas (TCGA) aims to catalogue and discover major cancer-causing genome alterations in large cohorts of human tumors through integrated multi-dimensional analyses . Glioblastoma is the first cancer studied by TCGA. To identify the genetic alterations in glioblastoma, we investigated the expression profiles of gene and microRNA.
MicroRNAs (miRNAs) are single-stranded short coding RNA molecules of about 22 nucleotides in length, which usually repress gene expression by binding at the 3’UTR region of target gene . The expressions of microRNAs are found to be highly different in organ development and tissue differentiation . Moreover, many microRNAs have been found to associate with apoptosis and cancer, suggesting they function as oncogene or tumor suppressor gene . In our study, we examined the expression levels of 470 human miRNAs in glioblastoma and indentify a group of microRNAs whose expression is significantly altered in this tumor. We also indentified the significantly altered gene expression and pathways related to glioblastoma.
All types of data were acquired from TCGA project  (http://cancergenome.nih.gov/dataportal/data/about/). Gene expression microarrays were performed on Affymetrix HT Human Genome U133 Array Plate Set by Massachusetts Institute of Technology (MIT). Level three data gave calls for genes per sample after Probeset-level and Gene-level Robust Multiarray Analysis (quantile normalization and background corrected) until the most recent update on Sep. 05, 2008. After calculation the average expression values for duplicated samples, finally 243 tumor tissue samples, 10 normal tissues and 1 cell line sample from glioblastomas patients were used for differential expression analysis. MicroRNA expression experiments were performed on Agilent 8 x 15KHuman microRNA-specific microarray by Universities of North Carolina (UNC). There are 534 microRNAs (470 human microRNAs) and 240 tumor tissue samples, 10 normal tissue samples available in level three data (after quantile normalization and batch adjusted) until the most recent update on Nov. 10, 2008. As it is very difficult to get the brain tissue samples from normal people, the control samples are all from the adjacent normal tissues of glioblastomas patients. Thus we focus on detecting the effect of somatic difference on disease, which is also a common approach in many other cancer studies. We used 254 samples for gene expression and pathway analysis, 250 samples for microRNA expression analysis, 247 samples common in microRNA and gene expression datasets for miRNA targets analysis.
The top ten GO terms most enriched in the differentially expressed gene list
transmission of nerve impulse
establishment of localization
ion transmembrane transporter activity
DAVID also could cluster similar functional GO terms together. The first two enriched GO term groups in the differentially expressed gene list were all the function terms relevant to brain and neuron. They were: 1) GOTERM Cellular Component including five terms: neuron projection, cell projection, dendrite, cell soma, and axon. 53 genes belong to this cluster including CDK5, SNCG, UCHL1, FREQ.
According to NCBI Entrez gene annotation , it was reported that the deregulation of gene CDK5 causes neuronal death and neurodegenerative diseases. Gene SNCG encodes a member of the synuclein family of proteins which are believed to be involved in the pathogenesis of neurodegenerative diseases. Mutations in this gene have also been associated with breast tumor development. Gene UCHL1 is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. FREQ gene encodes calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. 2) GOTERM Biological Process including twenty one terms: synaptic transmission; transmission of nerve impulse; neurotransmitter secretion; regulated secretory pathway; generation of a signal involved in cell-cell signaling; regulation of neurotransmitter levels; neurological system process; cell-cell signaling; exocytosis; SNARE binding; secretory pathway and so on. A total of 336 genes belong to this cluster. The detailed information for this two GO term groups were given in Additional file 1.
22 MicroRNAs related to glioblastoma/GBM/Neuroblastoma
To further examine the function of those significant miRNAs, we need to find the target gene of miRNAs associated with glioblastomas. So we carried out the regression analysis for miRNA and gene expression.
miRNA has been thought to promote degradation of target mRNA or suppress translation of corresponding protein by matching with mRNA in the 3’-UTR region[20–23]. There is no doubt that miRNAs perform various biological functions through regulation of gene expression. To reveal the mechanisms of how miRNA regulates gene expression in GBM, we identified target genes of miRNAs and constructed miRNA target networks. Since miRNAs repress the expression of its target gene, the first step was to test the inverse relationship between the expression profile of miRNA and that of its potential targets. To achieve this, we regressed the expression of target mRNA on the expression of miRNAs and select mRNA with significant negative regression coefficients as miRNA targets. P-value for declaring significant evidence of miRNA target was 1.00 × 10 – 4 . The second step was to conduct sequence analysis which used sequence complementarities of miRNA and its target site to predict potential miRNA target genes. To achieve this, we use experimentally verified and predicted miRNA targets data from three miRNAs databases: miR2Disease, TarBase  and miRBase. MiR2Disease (updated on Dec.19, 2008) and TarBase (updated on June, 2008) provided experimentally verified microRNA target genes. MiRBase predicted the target gene of miRNA by miRanda algorithm , where the predicted target genes and miRNAs could be downloaded directly (updated on: Oct.31, 2007).
We compiled 1,236 differentially expressed mRNAs and 97 differentially expressed miRNAs data in 237 tumor tissue sample and 10 normal tissue samples. We found two experimentally confirmed results. The literature reported that the in nasopharyngeal carcinomas underexpressed hsa-mir-29c (expression fold change (tumor/normal)=0.20) target overexpressed gene COL4A1(expression fold change(tumor/normal)=5.24) . In our result, down-regulated hsa-mir-29c (differentially expressed P-value < 5.11 × 10 – 12 ) targets over-expressed gene COL4A1 (differentially expressed P-value < 3.58 × 10 – 6 ) with regression β = – 389.02 and P= 1.35 × 10 – 8 . We conclude that hsa-mir-29c is also an important miRNA in glioblastomas. Another experiment validated targets gene was LDOC1 targeted by has-miR-155. The known oncogenic miRNA hsa-miR-155 can regulate a set of target genes including LDOC1, a regulator of apoptosis . Our results showed that hsa-miR-155 was over-expressed (differentially expressed P-value < 1.40 × 10 – 10 ) and targets under-expressed gene LDOC1 (differentially expressed P-value < 1.085 × 10 – 31 ) with regression β = – 196.77 and P= 4.00 × 10 – 15 . We inferred that hsa-mir-155 could induce cancer through regulation of apoptosis gene LDOC1 in glioblastomas.
The 661 target genes were a subset of the 1236 significant differentially expressed genes. We examined which pathways were these genes enriched in and compared them with the previous results. 11 pathways were significant by fisher exact test in DAVID, 8 of which were the same as the pathways identified from the previous sections: Epithelial cell signaling in Helicobacter pylori infection,Cholera - Infection, Long-term potentiation, Calcium signaling pathway, Neurodegenerative Diseases, Long-term depression, Gap junction, Neuroactive ligand-receptor interaction. Three new enriched pathways include Amyotrophic lateral sclerosis (ALS), Alzheimer’s disease, Wnt signaling pathway. These differentially expressed genes were also most involved in signal and neuroscience pathways.
To investigate the function of the 661 target genes, we searched the TSGDB  (a tumor suppressor gene database) and DNA-Tumor Suppressor and Oncogene Database  and we found eight tumor suppressor genes APC, TP53, BIN1, BTG1, CDK2AP1, LDOC1, RASSF1, WFDC1 and three oncogenes: MCF2, MPL, THRA.
According to NCBI Entrez gene annotation , APC encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway. It is also involved in other processes including cell migration and adhesion, transcriptional activation, and apoptosis. TP53 encodes tumor protein p53, which responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. BIN1 encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynanim, synaptojanin, endophilin, and clathrin. LDOC1 is thought to regulate the transcriptional response mediated by the nuclear factor kappa B (NF-kappaB). The gene has been proposed as a tumor suppressor gene whose protein product may have an important role in the development and/or progression of some cancers. RASSF1 encoded protein was found to interact with DNA repair protein XPA. The protein was also shown to inhibit the accumulation of cyclin D1, and thus induce cell cycle arrest. WFDC1 gene is mapped to chromosome 16q24, an area of frequent loss of heterozygosity in many cancers. Owing to its location and a possible growth inhibitory property of its gene product, this gene is suggested to be a tumor suppressor gene. MCF2 is a member of a large family of GDP-GTP exchange factors that modulate the activity of small GTPases of the Rho family. Five-prime recombinations result in the loss of N-terminal codons, producing MCF2 variants with oncogenic potential.
In this paper, we performed detailed analysis of differential expression of gene and miRNA between tumor tissues and normal brain tissues in glioblastomas. We also performed gene sets enrichment analysis to find the enriched GO terms and pathways. Most of the genes were enriched in Nervous system associated GO terms and Cell Signaling and Neuroscience associated pathways. 22 differentially expressed miRNAs were related to Glioblastoma multiforme or neuroblastoma. To study the regulation of gene expression by miRNA, we combined the sequence predicted miRNA targets in miRBase database, experiment validated miRNA targets in TarBase and miR2Disease database with our predictions from the gene and miRNA expression profiles and found 2 experiment validated targets and 1,094 predicted targets. Further function analysis of target genes suggests that miRNAs most frequently targeted genes in Cell Signalling and Nervous System. However, the number of normal tissues in the studies is small. More samples are needed for further investigation.
The D atabase for A nnotation, V isualization and I ntegrated D iscovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes [8, 9] (http://david.abcc.ncifcrf.gov/). After inputting large gene lists, it automatically calculates and identifies enriched biological themes, particularly GO terms and pathways; discovers enriched functional-related gene groups and clusters redundant annotation terms. For any one GO terms, right tail modified Fisher Exact was used to determine whether the number of genes with this GO terms is enriched in the differentially expressed gene list compared to the number of genes with this GO terms in all the 19,439 genes on HG-U133A array(Background). For any one pathway, right tail modified Fisher Exact was used to determine whether the number of genes within this pathway is enriched in the differentially expressed gene list compared to the number of genes within all KEGG or Biocarta pathways. The smaller the p-value was, much more enriched in the GO terms or pathway than by random chance.
We used algorithm proposed in TAPPA (Topological Analysis of Pathway Phenotype Association)  for pathway analysis. It calculated a Pathway Connectivity Index for each pathway and then evaluates its correlation to the phenotype variation. Gene connections of 162 KEGG pathways with gene number higher than 8 were collected in that paper and used for PCI calculation. For those pathways with no edge connections collected, PCI would degenerate into the average of gene expression values. Totally 501 pathways from KEGG  and Biocarta  were assembled in our analysis. The p-value for declaring significance after Bonferroni correction for multiple tests was 1 × 10–4.
The differential expression of the gene and microRNA were tested by T -test and Mann-Whitney Test. The thresholds for declaring significance after Bonferroni correction for multiple tests were 4.15 × 10–6 and 9.36 × 10–5, for gene and miRNA respectively. Linear regression was used to investigate the relationships between miRNA and gene expressions. The linear model took its common form: where y is an n-by-1 vector of observations, such as gene expression. X is an n-by-p matrix of regressors, such as miRNA expression, β is a p-by-1 vector of parameters; known as regression coefficient and ε is an n-by-1 vector of random disturbances. Right-tail fisher exact test were used to test for the enriched Gene Ontology Terms, pathways in the datasets. Matlab code for T-Test, Mann-Whitney Test and linear regression was attached in Additional files 6.
We thank the Cancer Genome Atlas Research Network for providing data and the members of TCGA’s External Scientific Committee and the Glioblastoma Disease Working Group (http://cancergenome.nih.gov/components/). Li Jin, Hua Dong and Hoicheong Siu are supported by grants from the National Outstanding Youth Science Foundation of China (30625016), National Science Foundation of China (30890034), Shanghai Commission of Science and Technology (04dz14003) and 863 Program (2007AA02Z312). Momiao Xiong and Li Luo are supported by Grant from National Institutes of Health NIAMS P01 AR052915-01A1, NIAMS P50 AR054144-01 CORT, HL74735, ES09912, and Grant from Hi-Tech Research and Development Program of China(863)(2007AA02Z300). Publication of this supplement was made possible with support from the International Society of Intelligent Biological Medicine (ISIBM).
This article has been published as part of BMC Genomics Volume 11 Supplement 3, 2010: The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/11?issue=S3.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.