PDbase: a database of Parkinson's Disease-related genes and genetic variation using substantia nigra ESTs
© Yang et al; licensee BioMed Central Ltd. 2009
Published: 3 December 2009
Parkinson's disease (PD) is one of the most common neurodegenerative disorders, clinically characterized by impaired motor function. Since the etiology of PD is diverse and complex, many researchers have created PD-related research resources. However, resources for brain and PD studies are still lacking. Therefore, we have constructed a database of PD-related gene and genetic variations using the substantia nigra (SN) in PD and normal tissues. In addition, we integrated PD-related information from several resources.
We collected the 6,130 SN expressed sequenced tags (ESTs) from brain SN normal tissues and PD patients SN tissues using full-cDNA library and normalized cDNA library construction methods from our previous study. The SN ESTs were clustered in 2,951 unigene clusters and assigned in 2,678 genes. We then found up-regulated 57 genes and down-regulated 48 genes by comparing normal and PD SN ESTs frequencies with over 0.9 cut-off probability of differential expression based on the Audic and Claverie method. In addition, we integrated disease-related information from public resources. To examine the characteristics of these PD-related genes, we analyzed alternative splicing events, single nucleotide polymorphism (SNP) markers located in the gene regions, repeat elements, gene regulation elements, and pathways and protein-protein interaction networks.
We constructed the PDbase database to capture the PD-related gene, genetic variation, and functional elements. This database contains 2,698 PD-related genes through ESTs discovered from human normal and PD patients SN tissues, and through integrating several public resources. PDbase provides the mitochondrion proteins, microRNA gene regulation elements, single nucleotide polymorphisms (SNPs) markers within PD-related gene structures, repeat elements, and pathways and networks with protein-protein interaction information. The PDbase information can aid in understanding the causation of PD. It is available at http://bioportal.kobic.re.kr/PDbase/. Supplementary data is available at http://bioportal.kobic.re.kr/PDbase/suppl.jsp
The age-related neurodegenerative diseases prevalence is growing continuously due to a permanent increase in the human life span . It affects almost half of all patients with dementia. Parkinson's disease (PD) is the second most common age-related neurodegenerative disease, which results in abnormalities in motor function . Due to the high-frequency of PD, many researchers have tried to find the causation of PD. The disease is clinically characterized by impaired motor function, manifested by resting tremors, rigidity, bradykinesia, and postural instability . PD is caused by the degeneration of dopaminergic neurons in the substantia nigra (SN) pars compacta .
Although the causation of PD is diverse and complex combination of the mitochondrial proteins' dysfunction, genetic variation effects in cell cycles, and environmental risk factors, it is now clear that genetic factors contribute to the pathogenesis of the disease [5–9]. However, the etiology of sporadic PD, occurring in 95% of the cases , is still not fully understood. To solve this problem, several resources have been incorporated to help PD studies such as MDPD  and PDGene . MDPD provides a unique functionality to compare the differences in the type of mutations among ethnic groups manually examined by biomedical researchers . PDGene at the Gene Prospector application provides evidence about human genes in relation to Parkinson's disease and risk factors from association studies . Although useful integrated PD-related information has focused on genetic mutation and PD-association studies, there remains a limitation in public resource-dependent information. Therefore, we constructed experimental resources to investigate a wide spectrum of molecular events prior to integrating the PD-related public resources. Because public databases for SNPs and diseases are large, complicated, and difficult to use, we have developed the pipeline system to provide disease-related genes and genetic variations.
Methods and results
Substantia Nigra (SN) ESTs collection
We collected the 6,130 substantia nigra (SN) expressed sequenced tags (ESTs) from full-length cDNA libraries of brain SN normal tissues and PD patients SN tissues using oligo-capping methods in a previous study . These SN ESTs were deposited in No.s DT214917~DT221046 at the dbEST database, NCBI. The full-length cDNA library was constructed using an improved capping method with the pCNS-D2 vector . A normalized cDNA library was also constructed to obtain genes that are rarely expressed by the previous method . We checked the repeat elements using RepeatMasker program http://repeatmasker.org. To get high-quality SN ESTs, we went through several filtering steps: 1) removing the short length ESTs, 2) removing ESTs contaminated by genomic DNAs and E. coli, and 3) removing ESTs not aligned in any UniGene cluster. We analyzed the PD candidate genes with this ESTs pool containing 2,850 SN ESTs from PD patients and 2,883 SN ESTs from normal tissues. We carried out the annotation of SN ESTs based on UniGene clusters and then obtained 2,679 genes' information with 5,733 UniGene clusters.
SN ESTs clustering and expression
The annotation of the SN ESTs was carried out using the human RefSeq mRNA  and the UniGene database (build #217) for similarity comparisons based on the UniGene clusters (Shown in supplementary Data Table1). Our SN ESTs were clustered in 2,951 unigene clusters and assigned in 2,678 genes. Since we constructed the full-length cDNA libraries using the oligo-capping technique , these SN ESTs can be resourced to examine the multiple transcription start sites comparing mRNA transcription start sites. To investigate amino acid changes, we compared the SN ESTs sequences to the RefSeq protein sequences using BLASTX .
To study the global expression of genes possibly associated with Sporadic PD constituting most PD cases , we accounted for the number of SN ESTs from PD patients and normal tissues assigned in same gene. The frequency of each gene was analyzed by dividing the number of ESTs of a gene by the number of total clones merged into the UniGene database build #217 in each full-length cDNA library. Genes that were abundantly expressed were selected and listed among the ESTs. Significant differences in gene expression among the datasets were calculated using the Audic and Claverie method . We analyzed the probability of differential expression between the normal full-length SN library and the PD full-length SN library at a cut-off probability of 0.9 (shown in supplementary Data Table2). Finally, we found 57 up-regulated genes and 48 down-regulated genes through the comparison of normal and PD SN ESTs frequencies. MBP of them was reported to be up-regulated in PD SN . The up-regulated genes were associated with structural constituents of the myelin sheath, cytokine activity, transcription regulator activity, GTPase activity, calcium ion binding, or RNA binding on molecular function. The down-regulated genes were associated with oxidoreductase activity, serin-type endopeptidase inhibitor activity, phosphatidylethanolamine binding, mu-type opioid receptor binding, Rho GTPase activator activity, integrin binding, monooxygenase activity, or lipid binding.
Genomic mapping of expressed sequence clusters
To create a consensus sequence, we mapped the SN ESTs, mRNAs, and UniGene EST clusters having at least one mRNA to exclude the pseudo genes onto human genome using BLAT and SIM4. We used consensus sequences to eliminate non-consensus features of each UniGene cluster, after filtering out EST sequencing errors or contamination by a minority of similar but paralogous sequences. Then EST-mRNA alignment was generated using the SIM4 program, producing a consensus sequence that excludes minority features such as unaligned ends and inserts due to chimeric sequences or unspliced introns. The matching genomic region was aligned with the complete set of ESTs and mRNAs for the UniGene cluster using BLAT  and SIM4 . The SN EST sequences were aligned in human genomic sequences with a 75% minimum score and 90% minimum identity. When coordinates had non-canonical splice sites, we confirmed the exon-intron junction sites with the SIM4 program to perform alignments of expressed and genomic DNA sequence data efficiently and accurately, allowing for introns in the genomic sequence, and a relatively small number of sequencing errors .
Alternative splicing analysis of SN ESTs
Alternative splicing was detected by a computational procedure using genomic-EST-mRNA multiple sequence alignments. Alternative splicing types were derived from these isoforms retaining all possible alternative splicing information . SN ESTs with poor coverage were filtered out to remove non-consensus splice sites and regions with poor coverage. We categorized the alternative splicing types such as alternative start, alternative end, alternative 5' exon, alternative 3' exon, exon skipping, mutually excluded exons, or intron retention. Alternative starts and ends were identified if the first or last exon in a gene model was part of an alternative region. Alternative cassettes were labelled as such if the junction skipped one exon.
PDbase contains transcripts representing several alternative splicing events (Shown in Supplementary data, Table. 3). SN ESTs were associated with alternative splicing events in 321 genes. To examine candidate genes having the PD-specific alternative splicing patterns, we compared the alternative splicing patterns of normal SN ESTs and PD patients ESTs. We found that thirty-five PD-specific candidate genes having alternative splicing events were up-regulated in PD SN tissues: for example, AQP1, DCXR, DKK3, EEF1A1, GNAS, PGK1, SUCLG1, and THTPA. The major alternative splicing events in genes up-regulated in PD SN tissues are alternative transcription start or end sites. This may be a reason to construct SN full-length cDNA libraries using the oligo-capping method to replaces the cap structure specific to the 5' end of eukaryotic mRNA with oligonucleotides .
To provide the global PD-related gene features, we integrated PD-related gene information, as well as knowledge-based information. Because public databases for SNPs and diseases are large, complicated, and difficult to use, their integration is challenging. We collected 2,701 genes associated with PD and average 323 genetic variations in genomic region through our pipeline system for the disease-related gene and genetic integration . This integrated information is based on human gene nomenclature (HGNC)  and UniProt , genetic variation from dbSNP (version 129) , and disease information from Online Mendelian Inheritance in Man (OMIM) , Human Gene Mutation Database (HGMD) , and the Genetic Association database (GAD) . We examined the PD-related gene distribution to cover several domains of molecular and cellular biology based on the Gene Ontology database . In addition, we surveyed the protein-protein interaction (PPI) of the PD-related genes from the Human Protein Reference Database (HPRD) . It has been reported that the degeneration of dopaminergic neurons of SNpc is conducted by dysfunction of the mitochondrial complex through activation of mitochondria-dependent apoptotic molecular pathways . Hence, we investigated mitochondrion proteins associated with PD from MitoDat (Mendelian Inheritance and the Mitochondrion) . We found 31 mitochondrial proteins located in inner membrane (68%), outer membrane (19%), inter membrane space (6%), or matrix (6%). The numbers in parentheses indicate the percentages of mitochondrial proteins located in each organelle among the total number of mitochondrial proteins. There are, for example, solute carrier family 25 (ANT1, ANT2, ANT3), ATP synthase, H+ transporting, mitochondrial F1 complex (ATP5A1, ATP5B, ATP5C1, ATP5F1), and kinesin heavy chain member (HK2).
We also investigated molecular and cellular signalling pathways associated with PD-related genes from the BioCarta models http://www.biocarta.com/genes/index.asp and the KEGG databases . To examine the RNA elements involved in the regulation of PD-related genes, we searched microRNA elements related to PD genes from the mirBASE database as experimental micro-RNA resources  and conserved mammalian microRNA regulatory target sites for conserved microRNA families in the 3'UTR regions of RefSeq genes predicted by TargetScanS at the UCSC table track . In addition, we utilized multiple transcription start sites, CpG island, and repeat elements on the UCSC table tracks (download March 2009).
To show an example of a PD-related gene search, we present query results for a gene, SPP1, which is secreted phosphoprotein 1 (osteopontin, bone sialoprotein I, early T-lymphocyte activation 1). When a user queries this gene, SPP1, comprehensive information including frequency differences between the two full-length libraries from SN PD and normal tissues is seen. There are 19 SN PD ESTs and 34 SN normal ESTs in the PDbase database. The SPP1 was down-regulated at a statistically significant level in more than one sample having a probability of 0.977 < p < 0.98. This gene is known as a high anti-apoptotic gene from a previous cell death activity study . In addition, the user can get general gene information and genetic information containing the SNP marks located in this gene region, repeat elements, and alternative splicing events from the PD and normal SN ESTs. Three micro-RNAs can be associated with regulation of this gene, which has experimentally confirmed protein-protein interaction with eighteen other proteins and belongs to the regulators of the bone mineralization pathway. This SPP1 gene was represented as a PD target gene through our human SN ESTs analysis and verified using RT-PCR and neurotoxin, a 1-methyl-4-phenyl-1,2,3,6-tetrahydropiridine (MPTP)-treated mice model . The query results through the PDbase database are more helpful to researchers than results obtained from published previous databases.
We constructed a database of PD-related genes and genetic variation using SN ESTs, called PDbase. PDbase contains 2,698 genes and the biological characteristics of these genes in two ways: 1) through 303 cDNA libraries from human normal and PD's SN tissues and 2) by integrating information on disease-related genes and genetic variation. Mitochondrial DNA variants in PD play various roles. Mitochondrial dysfunction has been reported as the etiology of neurodegenerative diseases . Thus, PDbase also provides the PD-related mitochondrion proteins, microRNA, Single Nucleotide Polymorphisms (SNPs) markers within PD-related gene structures, repeat elements, and pathways and networks with protein-protein interaction information. PDbase integrates not only public resources, but also un-reported PD target genes discovered from normal and PD SN ESTs. It can serve as specific biomarkers for PD or neurodegenerative diseases and novel drug development. Also, PDbase can provide insight into the pathogenesis of PD and identify molecular targets of potential therapeutic significance for the neurodegeneration.
Availability and requirements
Other papers from the meeting have been published as part of BMC Bioinformatics Volume 10 Supplement 15, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics, available online at http://www.biomedcentral.com/1471-2105/10?issue=S15.
This research was supported by a grant from KRIBB Research Initiative Program, the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (No. M10869030002-08N6903-00210), and a grant of the MOST 21C Frontier R & D program in neuroscience from the Ministry of Science & Technology of Korea. We thank Maryana Bhak for editing the manuscript and Ha-Na Byun for web image design.
This article has been published as part of BMC Genomics Volume 10 Supplement 3, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/10?issue=S3.
- Fratiglioni L, Qiu C: Prevention of common neurodegenerative disorders in the elderly. Experimental gerontology. 2009, 44 (1-2): 46-50. 10.1016/j.exger.2008.06.006.View ArticlePubMedGoogle Scholar
- Robinson PA: Protein stability and aggregation in Parkinson's disease. The Biochemical journal. 2008, 413 (1): 1-13. 10.1042/BJ20080295.View ArticlePubMedGoogle Scholar
- Kim JM, Lee KH, Jeon YJ, Oh JH, Jeong SY, Song IS, Kim JM, Lee DS, Kim NS: Identification of genes related to Parkinson's disease using expressed sequence tags. DNA Res. 2006, 13 (6): 275-286. 10.1093/dnares/dsl016.View ArticlePubMedGoogle Scholar
- D'Amelio M, Ragonese P, Sconzo G, Aridon P, Savettieri G: Parkinson's disease and cancer: insights for pathogenesis from epidemiology. Annals of the New York Academy of Sciences. 2009, 1155: 324-334. 10.1111/j.1749-6632.2008.03681.x.View ArticlePubMedGoogle Scholar
- Kubo S, Hattori N, Mizuno Y: Recessive Parkinson's disease. Mov Disord. 2006, 21 (7): 885-893. 10.1002/mds.20841.View ArticlePubMedGoogle Scholar
- Dawson TM, Dawson VL: Molecular pathways of neurodegeneration in Parkinson's disease. Science. 2003, 302 (5646): 819-822. 10.1126/science.1087753.View ArticlePubMedGoogle Scholar
- Dauer W, Przedborski S: Parkinson's disease: mechanisms and models. Neuron. 2003, 39 (6): 889-909. 10.1016/S0896-6273(03)00568-3.View ArticlePubMedGoogle Scholar
- Whitworth AJ, Pallanck LJ: Genetic models of Parkinson's disease: mechanisms and therapies. SEB experimental biology series. 2008, 60: 93-113.PubMedGoogle Scholar
- Moore DJ, Dawson TM: Value of genetic models in understanding the cause and mechanisms of Parkinson's disease. Current neurology and neuroscience reports. 2008, 8 (4): 288-296. 10.1007/s11910-008-0045-7.PubMed CentralView ArticlePubMedGoogle Scholar
- Tang S, Zhang Z, Kavitha G, Tan EK, Ng SK: MDPD: an integrated genetic information resource for Parkinson's disease. Nucleic acids research. 2009, D858-862. 10.1093/nar/gkn770. 37 DatabaseGoogle Scholar
- Yu W, Wulf A, Liu T, Khoury MJ, Gwinn M: Gene Prospector: an evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases. BMC bioinformatics. 2008, 9: 528-10.1186/1471-2105-9-528.PubMed CentralView ArticlePubMedGoogle Scholar
- Oh JH, Kim YS, Kim NS: An improved method for constructing a full-length enriched cDNA library using small amounts of total RNA as a starting material. Exp Mol Med. 2003, 35 (6): 586-590.View ArticlePubMedGoogle Scholar
- Soares MB, Bonaldo MF, Jelene P, Su L, Lawton L, Efstratiadis A: Construction and characterization of a normalized cDNA library. Proceedings of the National Academy of Sciences of the United States of America. 1994, 91 (20): 9228-9232. 10.1073/pnas.91.20.9228.PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic acids research. 2001, 29 (1): 137-140. 10.1093/nar/29.1.137.PubMed CentralView ArticlePubMedGoogle Scholar
- Suzuki Y, Sugano S: Construction of a full-length enriched and a 5'-end enriched cDNA library using the oligo-capping method. Methods Mol Biol. 2003, 221: 73-91.PubMedGoogle Scholar
- Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL: NCBI BLAST: a better web interface. Nucleic acids research. 2008, W5-9. 10.1093/nar/gkn201. 36 Web ServerGoogle Scholar
- Mandel S, Grunblatt E, Riederer P, Amariglio N, Jacob-Hirsch J, Rechavi G, Youdim MB: Gene expression profiling of sporadic Parkinson's disease substantia nigra pars compacta reveals impairment of ubiquitin-proteasome subunits, SKP1A, aldehyde dehydrogenase, and chaperone HSC-70. Annals of the New York Academy of Sciences. 2005, 1053: 356-375. 10.1196/annals.1344.031.View ArticlePubMedGoogle Scholar
- Audic S, Claverie JM: The significance of digital gene expression profiles. Genome research. 1997, 7 (10): 986-995.PubMedGoogle Scholar
- Noureddine MA, Li YJ, Walt van der JM, Walters R, Jewett RM, Xu H, Wang T, Walter JW, Scott BL, Hulette C, et al: Genomic convergence to identify candidate genes for Parkinson disease: SAGE analysis of the substantia nigra. Mov Disord. 2005, 20 (10): 1299-1309. 10.1002/mds.20573.View ArticlePubMedGoogle Scholar
- Kent WJ: BLAT--the BLAST-like alignment tool. Genome research. 2002, 12 (4): 656-664.PubMed CentralView ArticlePubMedGoogle Scholar
- Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W: A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome research. 1998, 8 (9): 967-974.PubMed CentralPubMedGoogle Scholar
- Heber S, Alekseyev M, Sze SH, Tang H, Pevzner PA: Splicing graphs and EST assembly problem. Bioinformatics (Oxford, England). 2002, 18 (Suppl 1): S181-188.View ArticleGoogle Scholar
- Yang JO, Hwang S, Oh J, Bhak J, Sohn TK: An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases. BMC bioinformatics. 2008, 9 (Suppl 12): S19-10.1186/1471-2105-9-S12-S19.PubMed CentralView ArticlePubMedGoogle Scholar
- Eyre TA, Ducluzeau F, Sneddon TP, Povey S, Bruford EA, Lush MJ: The HUGO Gene Nomenclature Database, 2006 updates. Nucleic acids research. 2006, D319-321. 10.1093/nar/gkj147. 34 DatabaseGoogle Scholar
- The universal protein resource (UniProt). Nucleic acids research. 2008, D190-195. 36 DatabaseGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic acids research. 2001, 29 (1): 308-311. 10.1093/nar/29.1.308.PubMed CentralView ArticlePubMedGoogle Scholar
- Hamosh A, Scott AF, Amberger J, Bocchini C, Valle D, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic acids research. 2002, 30 (1): 52-55. 10.1093/nar/30.1.52.PubMed CentralView ArticlePubMedGoogle Scholar
- Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN: Human Gene Mutation Database (HGMD): 2003 update. Human mutation. 2003, 21 (6): 577-581. 10.1002/humu.10212.View ArticlePubMedGoogle Scholar
- Becker KG, Barnes KC, Bright TJ, Wang SA: The genetic association database. Nature genetics. 2004, 36 (5): 431-432. 10.1038/ng0504-431.View ArticlePubMedGoogle Scholar
- Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R: The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic acids research. 2009, D396-403. 10.1093/nar/gkn803. 37 DatabaseGoogle Scholar
- Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A: Human Protein Reference Database--2009 update. Nucleic acids research. 2009, D767-772. 10.1093/nar/gkn892. 37 DatabaseGoogle Scholar
- Lemkin PF, Chipperfield M, Merril C, Zullo S: A World Wide Web (WWW) server database engine for an organelle database, MitoDat. Electrophoresis. 1996, 17 (3): 566-572. 10.1002/elps.1150170327.View ArticlePubMedGoogle Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T: KEGG for linking genomes to life and the environment. Nucleic acids research. 2008, D480-484. 36 DatabaseGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic acids research. 2008, D154-158. 36 DatabaseGoogle Scholar
- Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP: MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Molecular cell. 2007, 27 (1): 91-105. 10.1016/j.molcel.2007.06.017.PubMed CentralView ArticlePubMedGoogle Scholar
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al: The Generic Genome Browser: A Building Block for a Model Organism System Database. Genome Research. 2002, 12: 1599-1610. 10.1101/gr.403602.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim JG, Park D, Kim BC, Cho SW, Kim YT, Park YJ, Cho HJ, Park H, Kim KB, Yoon KO, et al: Predicting the interactome of Xanthomonas oryzae pathovar oryzae for target selection and DB service. BMC bioinformatics. 2008, 9: 41-10.1186/1471-2105-9-41.PubMed CentralView ArticlePubMedGoogle Scholar
- Abu A, Frydman M, Marek D, Pras E, Stolovitch C, Aviram-Goldring A, Rienstein S, Reznik-Wolf H, Pras E: Mapping of a gene causing brittle cornea syndrome in Tunisian jews to 16q24. Invest Ophthalmol Vis Sci. 2006, 47 (12): 5283-5287. 10.1167/iovs.06-0206.View ArticlePubMedGoogle Scholar
- Khusnutdinova E, Gilyazova I, Ruiz-Pesini E, Derbeneva O, Khusainova R, Khidiyatova I, Magzhanov R, Wallace DC: A mitochondrial etiology of neurodegenerative diseases: evidence from Parkinson's disease. Annals of the New York Academy of Sciences. 2008, 1147: 1-20.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.