In this work, we applied a unique cross-species, evidence-based gene prioritization strategy for genes involved in alcoholism. We started with a set of genes with prior microarray expression evidence of involvement in ethanol response, representing approximately 10% of the human protein-coding genes. These genes were ranked using additional sources of evidence across multiple species, including humans, mice, C. elegans and Drosophila. We used the COGA GWAS dataset and applied permutation analysis to evaluate the best weighting score matrix for gene ranking. Based on these results, we selected the top 47 genes with the best evidence for follow up bioinformatics analysis. Our functional enrichment test of these 47 genes suggested that this ranking algorithm identifies gene sets with coherent biological functions relevant to brain responses to ethanol and neural adaptations occurring with alcoholism. Remarkably, higher ranking scores were predictive of genes containing an enrichment of significant SNP associations in the context of COGA alcohol dependence GWAS results. These results provide initial evidence that a cross-species analysis of gene networks correlated with molecular or behavioral responses to ethanol may provide a powerful strategy to identify candidate genes that contribute to alcoholism.
The identification of genes mediating biological responses to ethanol, including the modification of risk profiles for alcoholism, is an area of intense research interest due to the possibility of pinpointing targets for future alcoholism therapies. Recent advances in behavioral genetics and genomics have identified large numbers of genes that potentially contribute to phenotypic responses to ethanol in both human and animal models. However, little progress has been made in narrowing or organizing these large lists of genes into a tractable scheme for understanding the neurobiology and genetics of alcoholism. One approach that has been used for large collections of microarray data has been the performance of a meta-analysis across data on rodent models of divergent ethanol drinking collected from multiple centers and strains . However, this analysis identified 3,800 genes associated with variation in ethanol intake, making downstream hypothesis-driven studies difficult to formulate.
As discussed in the Background, in our research approach, we pursued a gene ranking algorithm constructed to integrate data on ethanol-related genes across species. We recognized that direct behavioral parallels with ethanol response across humans, mice, Drosophila and C. elegans were likely to be tenuous or non-existent. However, molecular commonalities underlying ethanol responses across species, if they could be identified, should provide a powerful validation mechanism for candidate genes involved in ethanol behavioral responses, even if those particular behavioral components differ across species.
Our ranking algorithm, while largely empirical at this stage, identified a ranked list of genes with obvious coherence in terms of functional gene networks. A remarkably large number of genes already validated as altering behavioral responses to ethanol were contained in the higher ranks. In addition, bioinformatics analysis showed several interesting biological functions that were over-represented among the ranked genes (Tables 6, 7, 8, 9), which is largely consistent with our previous analysis based on a network approach . Again, a number of individual gene members from the constructed networks have strong prior validation as candidate genes that influence alcoholism traits in humans or behavioral responses to ethanol in animal models. These validated genes serve to increase the probability for the entire gene network playing a role in ethanol responses.
Although gene targeting approaches in animal models might ultimately be the most robust method for validating the role of individual genes in ethanol response behaviors, such studies are complex and time-consuming. We chose, as an initial approach to validate our cross-species ranking algorithm, a study of the association of the gene ranking score with alcoholism traits in a GWAS analysis. We found a reduction in the minimum FDR q-value as the ranking score increased to 2. It is important to note that this effect is not due to the progressive limiting of markers examined. In this study, FDR is not dependent on the number of tests performed.
Although the results are encouraging, the limitations of the current analysis and possible improvements must be noted. We noted that when the gene rank score cutoff increased from 2.0 to 2.5, the size of the q-values reversed. This observation might be attributed to overly restricted gene selection given that number of SNPs in genes dropped from 2293 in 47 genes to 210 in only 6 genes. Another limitation is that the use of genes from the addiction/alcoholism array represents hypotheses about important genes, as selected by expert review, rather than selection from empirical association data. We could improve the current approach in the following ways. First, although we included seven datasets in the gene ranking, many additional datasets now exist or will be released in the near future that may be used in multi-species data integration. Additionally, this single GWAS dataset is likely to be underpowered given the recent evidence showing many loci of small effect influence most complex human traits. However, a network or pathway analysis approach to analyze a set of genes might improve power .
While there are undoubtedly numerous ways to score or weight genes, we have shown that this simple empirical approach is effective. Our results demonstrate the utility of gene ranking after cross-species data integration. Since this initial study demonstrated the utility of this approach, we are continuing to expand the number of data sets and improve the scoring scheme through a more sophisticated optimization of weighting parameters. As more data is included, additional alcohol GWAS results become available, and more sophisticated gene ranking algorithms are developed, we expect improvement in specificity and sensitivity. For example, there are many gene expression studies in rat brain from animals evaluated for alcohol-preference behavior [2, 28–31], and they will be integrated in future gene ranking. However, our initial gene targeting experiments in animal models, using the ranked gene lists derived in this study, have already identified several novel genes that alter ethanol response behaviors in mice, Drosophila or C. elegans (unpublished data). This provides direct support of our cross-species gene ranking.