- Open Access
Categorizer: a tool to categorize genes into user-defined biological groups based on semantic similarity
© Na et al.; licensee BioMed Central. 2014
- Received: 2 June 2014
- Accepted: 4 December 2014
- Published: 11 December 2014
Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent–child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.
We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington’s disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures.
Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.
- Gene ontology
- Enrichment analysis
- Semantic similarity
- Neurodegenerative diseases
During the last decade, high-throughput technologies have allowed scientists to collect large sets of genomic and proteomic data. These data sets are then often screened for groups of genes that are over-represented or depleted when compared to a reference set or the entire genome/proteome of a specific organism. Therefore, great efforts have been made to develop computational methods to translate the flourishing raw data into meaningful biological knowledge.
Gene Ontology (GO) is a dictionary of controlled biological vocabularies to annotate genes at different levels of granularity . The GO dictionary can be envisioned as a graph that has, in a first approximation, the architecture of an upside down tree in which connected nodes, i.e., related GO terms, have a parent–child relationship and all nodes can be connected back to the three root nodes (biological process, molecular function and cellular component). This well-structured knowledge has been utilized to identify specific biological processes or functions enriched within sets of genes. There are many tools that can carry out this task: David, FuncAssociate, BiNGO, etc. [2–5]. These tools output lists of all individual GO terms that are significantly enriched in the analyzed data set. However, listed GO terms often refer to the same biological process. In addition, many GO terms are highly specific and difficult to interpret in the larger biological context that is investigated. As the research of most scientists is focused on a specific area, scientists are often less interested in the enrichment of specific GO terms in a set of genes but more in the enrichment of all GO terms that are associated with their area of interest. In order to reach this goal, researchers define categories of interest and manually assign genes into one of these categories according to their GO annotations [6, 7]. This is a laborious process and categorization results may differ from person to person.
Several semantic similarity measures have been developed recently in order to approach some of these problems [10, 11]. As the name indicates, they provide a measure for how close two annotation terms are, and their calculation is based on the information content (IC), respectively the specificity of each annotation term. In the determination of the specificity of annotations, it is assumed that more frequently used terms are less specific [12–14]. Different types of semantic similarity measures have been introduced [10, 11, 15–17] and used in very diverse applications including the clustering of microarray data , the comparison of sets of genes and proteins from different species (GOTax) , the assessment of functional similarity of genes or proteins (G-SESAME) [20, 21] and the identification of new disease genes based on known disease annotations (MedSim, ACGR) [22, 23]. However, if one uses semantic similarity measures for the categorization of genes into groups of interest, one has to consider that categorization is not about predicting how close two terms are but assessing how well two terms go together.
To meet these demands, we developed Categorizer that assigns genes to pre-defined biological functions or processes based on their GO annotations. As biological functions or processes of interest are different from field to field, this new tool allows users to define their own categories.
Categorizer was implemented using a platform-independent language, Python, and thus it can run on any operating systems. For the user’s convenience, we also provide a pre-compiled version of Categorizer that runs on the Windows operating system.
Information content (IC)
In the given example, I(G0) is zero, which means that annotations with G0 are biologically meaningless. As the calculation of the IC score I(x) for all GO terms is computationally expensive, Categorizer comes with a file that contains pre-computed values. In addition, we also provide a script with which users can pre-compute other I(x) values taking their annotations of interest, e.g. UniProtKB-GOA (no IEA), Human GOA, or customized annotation files. Synthetic IC scores for each term in our example are shown in Figure 2B.
When a specific GO term needs to be categorized, Categorizer searches for its parent terms that are assigned to a category and calculates semantic similarity scores with them. The semantic similarity scores are calculated as follows. IC scores are used to calculate (α) the semantic distance of a category-assigned parent GO term from the root node, (β) the semantic distance of the GO term to be categorized and its category-assigned parent term from their most informative child terms and (γ) the semantic distance between the category-assigned parent term and the GO term to be categorized (Figure 2A). All three scores are then combined in a final semantic similarity score . Given two GO terms, for instance G32 (term to be categorized) and G22 (one of G32’s parent terms) (Figure 2B), we calculate the semantic similarity of these two terms as follows :
Distance of a category-assigned parent GO term from the root node (α)
where r is the root term and p is the parent.
Thus, the distance of G22 from the root term is α =12.20.
Distance from the most informative child terms (β)
where c1 and c2 denotes the most informative child node of x1 and x2, respectively. If c1 and/or c2 do not exist, they are set to x1 and x2, respectively.
In our example, G32 has the child term G41 and G22 has child the terms G31, G32, G41, and G43. The most informative child term of G32 is G41 and that of G22 is G43. Therefore, β = (d(G22, G43) + (G32, G41))/2 = ((13.90 - 12.20) + (13.17 - 12.31))/2 = 1.28.
Distance of a category-assigned parent GO term and a GO term to be determined (γ)
where p is a parent term assigned to a category and x1 is a term to determine its category.
In this example, γ = d(G22, G32) = (12.31 - 12.20) = 0.11
Semantic similarity score
where by 0 ≤ S(x1, x2) ≤ 1.
Conventional semantic similarity measures were developed to assess how similar two GO terms are, but categorization is about assessing how well a specific term belongs to another term or a group of other terms. Thus, we use semantic similarity in the categorization process but require that a categorized term is a child of any term in the assigned category. For instance, two sibling terms, ‘DNA-templated transcription initiation (GO:0006352)’ and ‘DNA-templated transcription elongation (GO:0006354)’, are semantically very similar. They could be categorized to their parent term ‘RNA biosynthetic process (GO:0032774)’ because transcription initiation and elongation are both important steps in RNA biosynthesis. However, they cannot be categorized to each other because transcription initiation and elongation are two different molecular processes. Therefore, Categorizer first determines whether a term to be categorized is a child of only one or more category-assigned terms. If it is the child of only one term that has a category assignment, the similarity score of this parent–child pair is set to 1 and the term is assigned to the corresponding category. For a term that is a child of two or more category-assigned terms, Categorizer assesses semantic similarity between this term and all category-assigned terms and then assigns it to the category with the highest semantic similarity score. We demonstrate the procedure in the following examples:In the example shown in Figure 2B, the user assigned the term G22 to category A and the term G23 to category B. First, Categorizer automatically identifies child terms that belong to a single category only (e.g. G31 → A, G33 → B and G42 → B). For GO terms that have multiple parents, i.e. could belong to two or more categories (G32, G41, and G43), semantic similarity scores are calculated with the GO terms that are assigned to a category and their parents. Then the GO terms of interest are assigned to a category with the highest semantic similarity score.
Assignment example G32
Categorizer calculates pairwise semantic similarities of G32 with all the GO terms that belong to category A and are a parent of G32: S(G22,G32). In the same way, Categorizer also calculates semantic similarities of G32 with the terms in category B: S(G23, G32). Since S(G22,G32) = 0.815 and S(G23,G32) = 0.078, a gene with the annotation of G32 is more likely to belong to the category A.
Assignment example G41
Categorizer calculates the pairwise semantic similarities S(G22, G41) and S(G23, G41). Since S(G22, G41) = 0.475 and S(G23, G41) = 0.071, a gene with the annotation of G41 should belong to the category A.
Assignment example G43
Categorizer calculates the pairwise semantic similarities S(G31, G43), S(G22, G43), S(G23, G43), and S(G33, G43). Since S(G31, G43) = 0.350, S(G22, G43) = 0.346, S(G33, G43) = 0.291 and S(G23, G43) = 0.064, we can infer that the term G43 is closer to G31 than G33 in a biological sense and accordingly a gene with the annotation of G43 should belong to the category A.
One can allow a GO term to go into multiple categories if its semantic similarity score is above a user-defined threshold. For instance, a gene with the annotation of G32 can belong to category A and/or B depending on the semantic similarities and the user-defined threshold. The default threshold is set at 0.3 in Categorizer. This threshold value was determined by calculating an average semantic similarity score for two randomly selected GO terms that are linked directly or indirectly in a parent and child relationship. The average score was 0.10 ± 0.12 and accordingly Categorizer uses 0.3 as a default cutoff value for reliable categorization. After assignment of genes to one or several categories, enrichments of the categories are calculated.
The μ(c) and σ(c) denote an average number and standard deviation of category c obtained from the randomization. The p-values for each category are calculated from the z-scores.
Categories provided with categorizer
Cell cycle, Cytoskeleton, Metabolism, Transcription, Translation, Protein folding, Proteolysis, Signaling, RNA processing, Splicing, Transmembrane transport, Intracellular localization, Protein transport, Nuclear transport, Vesicles, Golgi/ER, Mitochondria, Endo- and exo-cytosis, Lysosome, Peroxisome, Ribosomes, Phagocytosis/phagosome, Autophagy, Apoptosis, DNA repair, DNA replication, Receptors
Cytoplasm, Mitochondria, Golgi, Nucleus, Cytoskeleton, Vesicle/Lysosome, ER, Extracellular
Hydrolase, Isomerase, Ligase, Lyase, Oxidoreductase, Transferase
Genetic modifiers of Huntington’s disease
In order to demonstrate the functionality of Categorizer, we first analyzed the enrichment of specific categories in a set of genes that have been identified as genetic modifiers in Drosophila models of Huntington’s disease (HD). The data was compiled from NeuroGeM, a database of genetic modifiers of neurodegenerative diseases including HD, Alzheimer’s, Parkinson’s, Amyotrophic lateral sclerosis, and several Spinocerebellar ataxia types [28, 29]. Modifiers are genes that are capable of modulating disease phenotypes; in this case the neuronal cell death caused by protein aggregation.
We categorized genetic modifiers into 9 groups that are of interest to researchers studying HD: cell cycle (cell cycle, GO:0007049), cytoskeleton (cytoskeleton organization, GO:0007010), metabolism (metabolic process, GO:0008152), protein synthesis (gene expression, GO:0010467), protein folding (protein folding, GO:0006457), proteolysis (proteolysis, GO:0006508), signaling (signal transduction, GO:0007165), splicing (RNA splicing, GO:0008380), and transport (transport, GO:0006810). We loaded the Drosophila gene-to-GO annotation file (downloaded from FlyBase in March 2014), and entered the list of high-confidence genetic modifiers of HD (210 genes) obtained from NeuroGeM. As a reference, we entered all Drosophila genes that had been tested experimentally as modifiers (7896 genes). We allowed a gene to be included into multiple categories with the default cutoff value of 0.3.
With this information, Categorizer assigned the genetic modifiers to the defined categories. As shown in Figure 3B, categorization results for each gene are reported in the middle of the graphical user interface (GUI), i.e., the categories that each gene is assigned to are listed together with the semantic similarity score in parenthesis. On the left side of the GUI, there is a pie chart that displays the category statistics. In this example, the metabolism category is the largest while the protein folding category is the smallest. On the right side of the GUI, category enrichment analysis results are shown (see Enrichment analysis in Implementation for details). Consistent with the knowledge on the importance of the protein folding machinery in the pathogenesis of neurodegenerative diseases [30, 31], the category of protein folding is highly enriched among genetic modifiers of HD, though they account for only a small portion of the genetic modifiers. Additionally, the categories of cell cycle, cytoskeleton, protein synthesis and splicing are also enriched among the genetic modifiers of HD. This finding is consistent with recent research data on neurodegeneration and HD in particular [32–39].
In the given example, we categorized genetic modifiers of HD into broad biological processes and calculated their enrichment. However, if a user is interested in signal transduction, one could define categories such as NK-kappaB cascade or TOR signaling. It is up to the user to decide how specific or broad the defined categories are.
Comparison of analysis results generated with Categorizer and classical approaches using GO Slim terms
Categorizer is different from GO Slim-based methods in that it identifies biologically relevant categories by using both the graphical structure of GO and the semantic similarities between GO terms. Therefore, we decided to compare the performance of Categorizer with that of the classical methods using GO Slim. First, we assessed the accuracies of category assignment by comparing assignment results of Categorizer and the GO Slim approach for a gold standard set of genes. Second, we evaluated the quality of enrichment analyses by comparing the results of the two approaches for the 210 high-confidence genetic modifiers of HD (used in Figure 3).
Next, we compared the quality of enrichment analyses of the two approaches. We did so by analyzing the enrichment results of the two methods for the 210 genetic modifiers of HD used in Figure 3. The statistics of categories and enrichment results generated by using Categorizer and by the GO Slim-based approach are shown in Figure 4B and C. The GO Slim approach identified the categories ‘cell cycle’ and ‘cytoskeleton’ as significantly enriched among the genetic modifiers of HD, which is consistent with the results found by Categorizer (Figure 4B). However, the three categories of ‘protein folding’ , ‘protein synthesis’ and ‘splicing’ were not identified as enriched categories by the GO Slim approach (p-value > 10-2). This result of the GO Slim approach is in stark contrast to the literature on modifiers of neurodegenerative diseases, including HD. Genes whose products are involved in protein folding, protein synthesis and splicing are found in most screens for modifiers of neurodegenerative diseases that have been carried out to date [31, 40–42]. As shown in Figure 4C, both Categorizer and GO Slim assigned the same number of genes to the categories of protein folding and splicing. However, the GO Slim approach assigned more genes to these categories in the randomized model of the reference gene set than did Categorizer. Therefore, the p-values obtained by the GO Slim method were larger than those obtained by Categorizer. Interestingly, Categorizer identifies protein synthesis as enriched in contrast to the GO Slim approach, although Categorizer assigned fewer genes to the protein synthesis category than GO Slim. The solution to this conundrum is that Categorizer assigned much fewer genes in a reference set to protein synthesis than GO Slim. Overall, these comparisons reveal that Categorizer provides more reliable categorization and enrichment results compared to the conventional GO analysis method.
Comparison with other semantic similarity measures
Here we developed a flexible and extendable tool that can be used to find over-represented categories within sets of genes. Categorizer classifies genes to categories according to biological meanings and assesses their enrichment. Thus, Categorizer offers a new way of enrichment analysis that allows focusing on processes that are of specific interest to the user.
This Research was supported by NSERC and the Chung-Ang University Research Grants in 2013. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1003444). We would like to thank Alex Cumberworth and Guillaume Lamour for their help in developing our web site.
- Gene Ontology. [http://geneontology.org],
- Huang DW, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009, 37: 1-13. 10.1093/nar/gkn923.PubMed CentralView ArticleGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2008, 4: 44-57. 10.1038/nprot.2008.211.View ArticleGoogle Scholar
- Maere S, Heymans K, Kuiper M: BiNGO: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005, 21: 3448-3449. 10.1093/bioinformatics/bti551.PubMedView ArticleGoogle Scholar
- Berriz GF, Beaver JE, Cenik C, Tasan M, Roth FP: Next generation software for functional trend analysis. Bioinformatics. 2009, 25: 3043-3044. 10.1093/bioinformatics/btp498.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang S, Binari R, Zhou R, Perrimon N: A genomewide RNA interference screen for modifiers of aggregates formation by mutant Huntingtin in Drosophila. Genetics. 2010, 184: 1165-1179. 10.1534/genetics.109.112516.PubMed CentralPubMedView ArticleGoogle Scholar
- Doumanis J, Wada K, Kino Y, Moore AW, Nukina N: RNAi screening in Drosophila cells identifies new modifiers of mutant huntingtin aggregation. PLoS ONE. 2009, 4: e7275-10.1371/journal.pone.0007275.PubMed CentralPubMedView ArticleGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, et al: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261. 10.1093/nar/gkh036.PubMedView ArticleGoogle Scholar
- GO Slims. [http://www.geneontology.org/GO.slims.shtml],
- Pesquita C, Faria D, Falcão AO, Lord P, Couto FM: Semantic similarity in biomedical ontologies. PLoS Comput Biol. 2009, 5: e1000443-10.1371/journal.pcbi.1000443.PubMed CentralPubMedView ArticleGoogle Scholar
- Mazandu GK, Mulder NJ: Information content-based gene ontology semantic similarity approaches: toward a unified framework theory. Biomed Res Int. 2013, 2013: Article ID 292063-View ArticleGoogle Scholar
- Resnik P: Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence. Volume 1. 1995, 448-453.Google Scholar
- Lin D: An information - theoretic definition of similarity. Proceedings of the 15th Conference on Machine Learning. 1998, 296-304.Google Scholar
- Jiang JJ, Conrath DW: Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the 15th International Conference on Research in Computational Linguistics. 1997, 19-33.Google Scholar
- Mazandu GK, Mulder NJ: DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures. BMC Bioinformatics. 2013, 14: 284-PubMed CentralPubMedGoogle Scholar
- Xu Y, Guo M, Shi W, Liu X, Wang C: A novel insight into Gene Ontology semantic similarity. Genomics. 2013, 101: 368-375. 10.1016/j.ygeno.2013.04.010.PubMedView ArticleGoogle Scholar
- Couto FM, Silva MJ: Disjunctive shared information between ontology concepts: application to Gene Ontology. J Biomed Sem. 2011, 2: 5-10.1186/2041-1480-2-5.View ArticleGoogle Scholar
- Speer N, Spieth C, Zell A: A memetic clustering algorithm for the functional partition of genes based on the gene ontology. Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004). 2004, 252-259.View ArticleGoogle Scholar
- Schlicker A, Rahnenführer J, Albrecht M, Lengauer T, Domingues FS: GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol. 2007, 8: R33-10.1186/gb-2007-8-3-r33.PubMed CentralPubMedView ArticleGoogle Scholar
- Du Z, Li L, Chen C-F, Yu PS, Wang JZ: G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res. 2009, 37: W345-W349. 10.1093/nar/gkp463.PubMed CentralPubMedView ArticleGoogle Scholar
- del Pozo A, Pazos F, Valencia A: Defining functional distances over gene ontology. BMC Bioinformatics. 2008, 9: 50-10.1186/1471-2105-9-50.PubMed CentralPubMedView ArticleGoogle Scholar
- Schlicker A, Lengauer T, Albrecht M: Improving disease gene prioritization using the semantic similarity of Gene Ontology terms. Bioinformatics. 2010, 26: i561-i567. 10.1093/bioinformatics/btq384.PubMed CentralPubMedView ArticleGoogle Scholar
- Yilmaz S, Jonveaux P, Bicep C, Pierron L, Smaïl-Tabbone M, Devignes MD: Gene-disease relationship discovery based on model-driven data integration and database view definition. Bioinformatics. 2009, 25: 230-236. 10.1093/bioinformatics/btn612.PubMed CentralPubMedView ArticleGoogle Scholar
- UniProtKB. [http://www.uniprot.org],
- Lord PW, Stevens RD, Brass A, Goble CA: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19: 1275-1283. 10.1093/bioinformatics/btg153.PubMedView ArticleGoogle Scholar
- Wu X, Pang E, Lin K, Pei Z-M: Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE. 2013, 8: e66745-10.1371/journal.pone.0066745.PubMed CentralPubMedView ArticleGoogle Scholar
- Glass K, Glass K, Girvan M, Girvan M: Annotation enrichment analysis: an alternative method for evaluating the functional properties of gene sets. Sci Rep. 2014, 4: 4191-PubMed CentralPubMedView ArticleGoogle Scholar
- Na D, Rouf M, O’ Kane CJ, Rubinsztein DC, Gsponer J: NeuroGeM, a knowledgebase of genetic modifiers in neurodegenerative diseases. BMC Med Genomics. 2013, 6: 52-10.1186/1755-8794-6-52.PubMed CentralPubMedView ArticleGoogle Scholar
- NeuroGeM. [http://chibi.ubc.ca/neurogem],
- Lu B, Vogel H: Drosophila models of neurodegenerative diseases. Annu Rev Pathol Mech Dis. 2009, 4: 315-342. 10.1146/annurev.pathol.3.121806.151529.View ArticleGoogle Scholar
- Shorter J: Hsp104: a weapon to combat diverse neurodegenerative disorders. Neurosignals. 2008, 16: 63-74. 10.1159/000109760.PubMedView ArticleGoogle Scholar
- Li S-H, Li X-J: Huntingtin–protein interactions and the pathogenesis of Huntington’s disease. Trends Genet. 2004, 20: 146-154. 10.1016/j.tig.2004.01.008.PubMedView ArticleGoogle Scholar
- Nucifora FC, Sasaki M, Peters MF, Huang H, Cooper JK, Yamada M, Takahashi H, Tsuji S, Troncoso J, Dawson VL, Dawson TM, Ross CA: Interference by huntingtin and atrophin-1 with cbp-mediated transcription leading to cellular toxicity. Science. 2001, 291: 2423-2428. 10.1126/science.1056784.PubMedView ArticleGoogle Scholar
- Huang CC, Faber PW, Persichetti F, Mittal V, Vonsattel J-P, MacDonald ME, Gusella JF: Amyloid formation by mutant Huntingtin: threshold, progressivity and recruitment of normal polyglutamine proteins. Somat Cell Mol Genet. 1988, 24: 217-233.View ArticleGoogle Scholar
- Li SH, Cheng AL, Zhou H, Lam S, Rao M, Li H, Li XJ: Interaction of Huntington disease protein with transcriptional activator Sp1. Mol Cell Biol. 2002, 22: 1277-1287. 10.1128/MCB.22.5.1277-1287.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Takano H, Gusella JF: The predominantly HEAT-like motif structure of huntingtin and its association and coincident nuclear entry with dorsal, an NF-kB/Rel/dorsal family transcription factor. BMC Neurosci. 2002, 3: 15-10.1186/1471-2202-3-15.PubMed CentralPubMedView ArticleGoogle Scholar
- Steffan JS, Kazantsev A, Spasic-Boskovic O, Greenwald M, Zhu YZ, Gohler H, Wanker EE, Bates GP, Housman DE, Thompson LM: The Huntington’s disease protein interacts with p53 and CREB-binding protein and represses transcription. Proc Natl Acad Sci. 2000, 97: 6763-6768. 10.1073/pnas.100110097.PubMed CentralPubMedView ArticleGoogle Scholar
- Culver BP, Savas JN, Park SK, Choi JH, Zheng S, Zeitlin SO, Yates JR, Tanese N: Proteomic analysis of wild-type and mutant Huntingtin-associated proteins in mouse brains identifies unique interactions and involvement in protein synthesis. J Biol Chem. 2012, 287: 21599-21614. 10.1074/jbc.M112.359307.PubMed CentralPubMedView ArticleGoogle Scholar
- Mills JD, Janitz M: Alternative splicing of mRNA in the molecular pathology of neurodegenerative diseases. Neurobiol Aging. 2012, 33: 1012.e11-1012.e24. 10.1016/j.neurobiolaging.2011.10.030.View ArticleGoogle Scholar
- Branco J, Al-Ramahi I, Ukani L, Pérez AM, Fernandez-Funez P, Rincón-Limas D, Botas J: Comparative analysis of genetic modifiers in Drosophila points to common and distinct mechanisms of pathogenesis among polyglutamine diseases. Hum Mol Genet. 2007, 17: 376-390. 10.1093/hmg/ddm315.PubMedView ArticleGoogle Scholar
- Pallos J, Bodai L, Lukacsovich T, Purcell JM, Steffan JS, Thompson LM, Marsh JL: Inhibition of specific HDACs and sirtuins suppresses pathogenesis in a Drosophila model of Huntington’s disease. Hum Mol Genet. 2008, 17: 3767-3775. 10.1093/hmg/ddn273.PubMed CentralPubMedView ArticleGoogle Scholar
- Fujikake N, Nagai Y, Popiel HA, Okamoto Y, Yamaguchi M, Toda T: Heat shock transcription factor 1-activating compounds suppress polyglutamine-induced neurodegeneration through induction of multiple molecular chaperones. J Biol Chem. 2008, 283: 26188-26197. 10.1074/jbc.M710521200.PubMed CentralPubMedView ArticleGoogle Scholar
- Mazandu GK, Mulder NJ: A topology-based metric for measuring term similarity in the gene ontology. Adv Bioinformatics. 2012, 2012: 975783-PubMed CentralPubMedView ArticleGoogle Scholar
- Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F: A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007, 23: 1274-1281. 10.1093/bioinformatics/btm087.PubMedView ArticleGoogle Scholar
- Zhang P, Zhang J, Sheng H, Russo JJ, Osborne B, Buetow K: Gene functional similarity search tool (GFSST). BMC Bioinformatics. 2006, 7: 135-10.1186/1471-2105-7-135.PubMed CentralPubMedView ArticleGoogle Scholar
- Guo X, Liu R, Shriver CD, Hu H, Liebman MN: Assessing semantic similarity measures for the characterization of human regulatory pathways. Bioinformatics. 2006, 22: 967-973. 10.1093/bioinformatics/btl042.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.