Investigation gene and microRNA expression in glioblastoma

Background Glioblastoma is the most common primary brain tumor in adults. Though a lot of research has been focused on this disease, the causes and pathogenesis of glioblastoma have not been indentified clearly. Results We indentified 1,236 significantly differentially expressed genes, and 30 pathways enriched in the set of differentially expressed genes among 243 tumor and 11 normal samples. We also indentified 97 differentially expressed microRNAs among 240 tumor and 10 normal samples. 22 of which have been reported to affect glioblastoma and 50 of which were implicated in other cancers and brain diseases. We regressed gene expression on microRNA expression in 237 tumor tissues and 10 normal tissues comprehensively. We found two experimentally validated microRNA targets and 1,094 miRNA-target gene pairs in our datasets which were predicted by miRanda algorithm, 8 of the target genes were tumor suppressor genes and 3 were oncogenes. Further function analysis of target genes suggested that microRNAs most frequently targeted genes associated with Cell Signalling and Nervous System. Conclusion We investigated gene and microRNA Expression in Glioblastoma and gave a comprehensive function study of differential expressed gene and microRNA in glioblastoma patients. These findings gave important clues to study of the carcinogenic process in glioblastomas.


Background
Glioblastoma Multiforme (GBM) is the most common and most aggressive type of primary brain tumor, accounting for 52% of all primary brain tumor cases and 20% of all intracranial tumors [1]. Primary GBM arise de novo, without any history of pre-existing lower-grade tumor, whereas secondary GBM have clinical, radiologic, or histopathologic evidence of malignant progression from pre-existing lower-grade tumor [2]. In the past two decades, the molecular mechanisms, genetics and paths to treatment of Glioblastoma have extensively been studied [3]. However, the causes and pathogenesis of glioblastoma have not been indentified clearly. With the continuing improvement of high-throughput genomic technologies, it is now feasible to survey human cancer genomes comprehensively. The Cancer Genome Atlas (TCGA) aims to catalogue and discover major cancercausing genome alterations in large cohorts of human tumors through integrated multi-dimensional analyses [4]. Glioblastoma is the first cancer studied by TCGA. To identify the genetic alterations in glioblastoma, we investigated the expression profiles of gene and microRNA.
MicroRNAs (miRNAs) are single-stranded short coding RNA molecules of about 22 nucleotides in length, which usually repress gene expression by binding at the 3'UTR region of target gene [5]. The expressions of microRNAs are found to be highly different in organ development and tissue differentiation [6]. Moreover, many microRNAs have been found to associate with apoptosis and cancer, suggesting they function as oncogene or tumor suppressor gene [7]. In our study, we examined the expression levels of 470 human miRNAs in glioblastoma and indentify a group of microRNAs whose expression is significantly altered in this tumor. We also indentified the significantly altered gene expression and pathways related to glioblastoma.

Results
All types of data were acquired from TCGA project [4] (http://cancergenome.nih.gov/dataportal/data/about/). Gene expression microarrays were performed on Affymetrix HT Human Genome U133 Array Plate Set by Massachusetts Institute of Technology (MIT). Level three data gave calls for genes per sample after Probeset-level and Gene-level Robust Multiarray Analysis (quantile normalization and background corrected) until the most recent update on Sep. 05, 2008. After calculation the average expression values for duplicated samples, finally 243 tumor tissue samples, 10 normal tissues and 1 cell line sample from glioblastomas patients were used for differential expression analysis. MicroRNA expression experiments were performed on Agilent 8 x 15KHuman microRNA-specific microarray by Universities of North Carolina (UNC). There are 534 micro-RNAs (470 human microRNAs) and 240 tumor tissue samples, 10 normal tissue samples available in level three data (after quantile normalization and batch adjusted) until the most recent update on Nov. 10, 2008. As it is very difficult to get the brain tissue samples from normal people, the control samples are all from the adjacent normal tissues of glioblastomas patients. Thus we focus on detecting the effect of somatic difference on disease, which is also a common approach in many other cancer studies. We used 254 samples for gene expression and pathway analysis, 250 samples for microRNA expression analysis, 247 samples common in microRNA and gene expression datasets for miRNA targets analysis.

Gene expression analysis
A total of 1,236 genes were identified to be significantly differentially expressed between tumour and normal tissues. The results were given in Additional file 1. To further investigate the function of these differentially expressed genes, we used DAVID [8,9], bioinformatics resources and pathway analysis [10] for systematic and integrative analysis of large gene lists. 1,221 of 1,236 differentially expressed genes had annotations in DAVID Functional Annotation Tools. We carried out gene set enrichment analysis to indentify the most enriched gene function annotation terms (GO terms) [11] in the list of 1,221 annotated differentially expressed genes. (See methods for details.) The top ten enriched GO terms in the list of differentially expressed genes were shown in Table 1, suggesting these genes were enriched in brain and mainly associated with Nervous system development and function. The detailed information, for example, genes which shared the GO terms was given in Additional file 1.
DAVID also could cluster similar functional GO terms together. The first two enriched GO term groups in the differentially expressed gene list were all the function terms relevant to brain and neuron. They were: 1) GOTERM Cellular Component including five terms: neuron projection, cell projection, dendrite, cell soma, and axon. 53 genes belong to this cluster including CDK5 , SNCG , UCHL1 , FREQ.
According to NCBI Entrez gene annotation [12], it was reported that the deregulation of gene CDK5 causes neuronal death and neurodegenerative diseases. Gene SNCG encodes a member of the synuclein family of proteins which are believed to be involved in the pathogenesis of neurodegenerative diseases. Mutations in this gene have also been associated with breast tumor development. Gene UCHL1 is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. FREQ gene encodes calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. 2) GOTERM Biological Process including twenty one terms: synaptic transmission; transmission of nerve impulse; neurotransmitter secretion; regulated secretory pathway; generation of a signal involved in cell-cell signaling; regulation of neurotransmitter levels; neurological system process; cell-cell signaling; exocytosis; SNARE binding; secretory pathway and so on. A total of 336 genes belong to this cluster. The detailed information for this two GO term groups were given in Additional file 1.

Pathway analysis
We first used algorithm proposed in TAPPA (Topological Analysis of Pathway Phenotype Association) [10] for pathway analysis. The results revealed that 131 pathways were significantly associated with glioblastoma (Additional file 2). The 131 associated pathways belonged to 33 functional groups, among which Cell Signaling, Neuroscience, Immunology and Expression were the most enriched pathway groups. Glioma pathway was the only significant pathway in the cancer functional group with P-value= 5.75 × 10 -7 . Similar to the GO terms enrichment analysis, we used DAVID Functional Annotation Tools to indentify which pathways were most enriched in the list of differentially expressed genes. The 40 significant pathways were also given in Additional file 2. Cell Signaling, Signal Transduction, Apoptosis and Neuroscience were the most enriched pathway groups. A total of 30 significant pathways found by both methods were shown in Additional file 2. The detailed genes information involved in the over-represented pathways was also provided. Long-term potentiation(a Nervous System pathway) and Calcium signaling pathway(a Signal Transduction pathway), were the most significantly enriched pathways with p-value 2.62 × 10 -8 and 3.26 × 10 -8 , respectively. There were 11significant Cell Signaling pathways, 4 significant Apoptosis pathways, 4 significant Signal Transduction pathways, 3 significant Immunology pathways, 3 significant Neuroscience pathways and 2 significant Nervous System pathways, (Some pathways may belong to different functional groups).
The results suggested that the differentially expressed genes were most involved in signal, apoptosis and neuroscience pathways. Take long-term potentiation pathway as an example, Figure 1 show all the genes in this pathway, Hippocampal long-term potentiation (LTP) is a long-lasting increase in synaptic efficacy, is the molecular basis for learning and memory. 3 of the 71 genes in this pathway were significant over expressed genes and were highlighted in blue and 21 were under expressed and were highlighted in red. (One box in the figure may denote several genes)

Analysis of differential expression of microRNA
A total of 97 microRNAs were significantly differentially expressed between tumor and normal tissues (Additional files 3). To examine whether these miRNAs were associated with glioblastoma, we used miR2Disease [13] to validate our results (Updated Date: Dec. 19,2008).
MiR2Disease provides a comprehensive literature reported resource of miRNA deregulation in various human diseases. From the data in miR2Disease, 81 of the 97 significant miRNAs have been reported to associate with 84 diseases, among them, 72 miRNAs are associated with 59 cancers and brain diseases. 22 of those miRNAs have been reported to induce glioblastoma/ glioblastoma multiforme(GBM)/neuroblastoma (NB) and the expression pattern of miRNA(up-regulated or downregulated) in published literatures is exactly the same as that in our data. Table 2 gave the p-value, expression pattern, disease and references for the 22 miRNAs. We inferred that the other 50 miRNAs which were related to other cancers and brain diseases may also be important for carcinogenesis in brain. However, further experiment validations were required to confirm our results. Among the 97 significant miRNAs, 30 miRNAs were up-regulated and 67 were down-regulated.
To further examine the function of those significant miRNAs, we need to find the target gene of miRNAs associated with glioblastomas. So we carried out the regression analysis for miRNA and gene expression.
The regulation of gene expression by microRNA miRNA has been thought to promote degradation of target mRNA or suppress translation of corresponding protein by matching with mRNA in the 3'-UTR region [20][21][22][23]. There is no doubt that miRNAs perform various biological functions through regulation of gene expression. To reveal the mechanisms of how miRNA regulates gene expression in GBM, we identified target genes of miRNAs and constructed miRNA target networks. Since miRNAs repress the expression of its target gene, the first step was to test the inverse relationship between the expression profile of miRNA and that of its potential targets. To achieve this, we regressed the expression of target mRNA on the expression of miR-NAs and select mRNA with significant negative regression coefficients as miRNA targets. P-value for declaring significant evidence of miRNA target was 1.00 × 10 -4 . The second step was to conduct sequence analysis which used sequence complementarities of miRNA and hsa-mir-133a 9.15E-08 0.0038872 0.398482 down GBM [16] Exp means whether the expression pattern of microRNA is up-regulated or down-regulated, comparing the average expression value in tumor tissues(AvgTumor) and average expression value in normal tissues(AvgNorm). GBM: Glioblastoma multiforme, NB: neuroblastoma.
its target site to predict potential miRNA target genes.
For predicted targets in miRBase, we found 1,094 matched miRNA-gene pairs including 70 miRNAs and 661 genes (Additional file 4). 44 down-regulated miR-NAs target 202 overexpressed genes while 26 up-regulated microRNAs target 459 underexpressed genes. The up and down-regulated miRNA-gene pairs were shown in Figure 2 and Figure 3.
The 661 target genes were a subset of the 1236 significant differentially expressed genes. We examined which pathways were these genes enriched in and compared them with the previous results. 11 pathways were significant by fisher exact test in DAVID, 8 of which were the same as the pathways identified from the previous sections: Epithelial cell signaling in Helicobacter pylori infection,Cholera -Infection, Long-term potentiation, Calcium signaling pathway, Neurodegenerative Diseases, Long-term depression, Gap junction, Neuroactive ligand-receptor interaction. Three new enriched pathways include Amyotrophic lateral sclerosis (ALS), Alzheimer's disease, Wnt signaling pathway. These differentially expressed genes were also most involved in signal and neuroscience pathways.
According to NCBI Entrez gene annotation [12], APC encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway. It is also involved in other processes including cell migration and adhesion, transcriptional activation, and apoptosis. TP53 encodes tumor protein p53, which responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. BIN1 encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynanim, synaptojanin, endophilin, and clathrin. LDOC1 is thought to regulate the transcriptional response mediated by the nuclear factor kappa B (NF-kappaB). The gene has been proposed as a tumor suppressor gene whose protein product may have an important role in the development and/or progression of some cancers. RASSF1 encoded protein was found to interact with DNA repair protein XPA. The protein was also shown to inhibit the accumulation of cyclin D1, and thus induce cell cycle arrest. WFDC1 gene is mapped to chromosome 16q24, an area of frequent loss of heterozygosity in many cancers. Owing to its location and a possible growth inhibitory property of its gene product, this gene is suggested to be a tumor suppressor gene. MCF2 is a member of a large family of GDP-GTP exchange factors that modulate the activity of small GTPases of the Rho family. Five-prime recombinations result in the loss of N-terminal codons, producing MCF2 variants with oncogenic potential.
To further investigate the function of target genes, we identified the miRNA targeted pathways by right-tail fisher exact test, which tested enrichment of pathways in the miRNA target gene set. A total of 83 pathways targeted by 94 miRNAs were listed in Additional file 5 after Bonferroni correction for multiple tests (p-values<1.00 × 10 -4 ). Many of the pathways were targeted by more than one miRNA. We shown 29 pathways which were targeted by more than 10 miRNAs in Figure 4. Long-term potentiation (Nervous System Figure 4 Down-regulated miRNA and their targets A total of 29 pathways which were targeted by more than 10 miRNAs were shown in Figure 4. The blue bar indicated that the pathway was targeted by the number of miRNAs. The red bar gave the negative logarithm to the base 10 of average P-value indicating the significance of enrichment of the pathway in miRNA targets. pathway) was targeted by 79 (the most) miRNAs and Nitric Oxide Signalling (Signalling pathway) was targeted by 74 (the second most) miRNAs. We can see that the differentially expressed miRNAs most frequently targeted genes in Cell Signalling and Nervous System. The red bar gave the negative logarithm with 10 base of average p-value indicating the significance of enrichment of the pathway in the miRNA targets. The DNA replication pathway and the cell cycle pathway have the smallest average P-value 7.70 × 10 -9 and the second smallest P-value 1.17 × 10 -6 . P-values of Longterm potentiation pathway (7.29 × 10 -6 ) and Nitric Oxide Signalling pathway (9.99 × 10 -6 ) were also small.

Conclusions
In this paper, we performed detailed analysis of differential expression of gene and miRNA between tumor tissues and normal brain tissues in glioblastomas. We also performed gene sets enrichment analysis to find the enriched GO terms and pathways. Most of the genes were enriched in Nervous system associated GO terms and Cell Signaling and Neuroscience associated pathways. 22 differentially expressed miRNAs were related to Glioblastoma multiforme or neuroblastoma. To study the regulation of gene expression by miRNA, we combined the sequence predicted miRNA targets in miRBase database, experiment validated miRNA targets in Tar-Base and miR2Disease database with our predictions from the gene and miRNA expression profiles and found 2 experiment validated targets and 1,094 predicted targets. Further function analysis of target genes suggests that miRNAs most frequently targeted genes in Cell Signalling and Nervous System. However, the number of normal tissues in the studies is small. More samples are needed for further investigation.

DAVID bioinformatics resources
The Database for Annotation, Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes [8,9] (http://david.abcc.ncifcrf.gov/). After inputting large gene lists, it automatically calculates and identifies enriched biological themes, particularly GO terms and pathways; discovers enriched functional-related gene groups and clusters redundant annotation terms. For any one GO terms, right tail modified Fisher Exact was used to determine whether the number of genes with this GO terms is enriched in the differentially expressed gene list compared to the number of genes with this GO terms in all the 19,439 genes on HG-U133A array (Background). For any one pathway, right tail modified Fisher Exact was used to determine whether the number of genes within this pathway is enriched in the differentially expressed gene list compared to the number of genes within all KEGG or Biocarta pathways. The smaller the p-value was, much more enriched in the GO terms or pathway than by random chance.

Pathway-based differential expression analysis
We used algorithm proposed in TAPPA (Topological Analysis of Pathway Phenotype Association) [10] for pathway analysis. It calculated a Pathway Connectivity Index for each pathway and then evaluates its correlation to the phenotype variation. Gene connections of 162 KEGG pathways with gene number higher than 8 were collected in that paper and used for PCI calculation. For those pathways with no edge connections collected, PCI would degenerate into the average of gene expression values. Totally 501 pathways from KEGG [32] and Biocarta [33] were assembled in our analysis. The p-value for declaring significance after Bonferroni correction for multiple tests was 1 × 10 -4 .

Statistical analysis
The differential expression of the gene and microRNA were tested by T-test and Mann-Whitney Test. The thresholds for declaring significance after Bonferroni correction for multiple tests were 4.15 × 10 -6 and 9.36 × 10 -5 , for gene and miRNA respectively. Linear regression was used to investigate the relationships between miRNA and gene expressions. The linear model took its common form: where y is an n-by-1 vector of observations, such as gene expression. X is an n-by-p matrix of regressors, such as miRNA expression, β is a p-by-1 vector of parameters; known as regression coefficient and ε is an n-by-1 vector of random disturbances. Right-tail fisher exact test were used to test for the enriched Gene Ontology Terms, pathways in the datasets. Matlab code for T-Test, Mann-Whitney Test and linear regression was attached in Additional files 6.
Additional file 1: The list of 1236 significantly differentially expressed genesA total of 1236 genes were significantly differentially expressed by both T-test and Mann-Whitney Test after Bonferroni correction for multiple tests. The top 10 enriched GO terms and the first 2 significant GO terms clusters were given in sheet 2 and sheet 3, respectively.
Additional file 2: Lists of significant pathwaysThree sheets gave the list of 131 significant pathways indentified by TAPPA method, the list of 40 significant pathways indentified by Fisher Exact Test and the list of 30 pathways shared by two methods, respectively.
Additional file 3: The list of 97 significantly differentially expressed microRNAA total of 97 microRNAs were significantly differentially expressed by both T-test and Mann-Whitney Test after Bonferroni correction for multiple tests.
Additional file 4: The predicted microRNA targetsThe 1,094 matched miRNA-gene pairs including 70 microRNAs and 661 genes predicted by both our results and miRanda algorithm.