Searching joint association signals in CATIE schizophrenia genome-wide association studies through a refined integrative network approach
© Jia and Zhao; licensee BioMed Central Ltd. 2012
Published: 26 October 2012
Genome-wide association studies (GWAS) have generated a wealth of valuable genotyping data for complex diseases/traits. A large proportion of these data are embedded with many weakly associated markers that have been missed in traditional single marker analyses, but they may provide valuable insights in dissecting the genetic components of diseases. Gene set analysis (GSA) augmented by protein-protein interaction network data provides a promising way to examine GWAS data by analyzing the combined effects of multiple genes/markers, each of which may have only individually weak to moderate association effects. A critical issue in GSA of GWAS data is the definition of gene-wise P values based on multiple SNPs mapped to a gene.
In this study, we proposed an alternative restricted search approach based on our previously developed dense module search algorithm, and we demonstrated it in the CATIE GWAS dataset for schizophrenia. Specifically, we explored three ways of computing gene-wise P values and examined their effects on the resultant module genes. These methods calculate gene-wise P values based on all the SNPs, the top ranked SNPs, or the most significant SNP among all the SNPs mapped to a gene. We applied the restricted search approach and identified a module gene set for each of the gene-wise P value data set. In our evaluation using an independent method, ALIGATOR, we showed that although each of these input datasets generated a unique set of module genes, all of them were significant in the GWAS dataset. Further functional enrichment analysis of these module genes showed that at the pathway level, they were all consistently related to neuro- and immune-related pathways. Finally, we compared our method with a previously reported method.
Our results showed that the approaches to computing gene-wise P values in GWAS data are critical in GSA. This work is useful for evaluating key factors in GSA of GWAS data.
Genome-wide association studies (GWAS) have emerged as a powerful tool to examine the genetic components of complex disease. During the past six years, GWA studies have successfully uncovered a few thousands of markers/genes that are associated with complex diseases/traits . So far, most standard GWA studies have focused on single marker based analysis and applied the genome-wide significance cutoff P value 5 × 10-8 for detecting significant markers; however, many weakly or moderately associated markers (e.g., whose P values are between 0.05 and 5 × 10-8) may also provide valuable insights. These markers have been generally missed in the standard analysis.
Gene set analysis (GSA) of GWAS data provides an alternative approach of assessing the joint effects of multiple genes , regardless of whether they are individually significant or not. Complex diseases are likely caused by multiple genes and markers, each of which may only contribute weak to moderate effect. Given that these markers are biologically or functionally correlated, GSA would increase the power to detect them in a typical GWAS dataset . GSA typically uses pathways or functional categories of cellular processes to define gene sets, such as those from the KEGG database  and the Gene Ontology (GO) annotations . Among the available GSA methods, representative ones include the Gene Set Enrichment Analysis (GSEA) of GWAS data , the Association LIst Go AnnoTatOR (ALIGATOR) , and the Gene set Ridge regression in ASsociation Studies (GRASS) . An advanced GSA approach is to use protein-protein interaction (PPI) network data as the platform to dynamically search for "gene sets", namely network modules, and perform an enrichment test for association signals. Our dense module search of GWAS association signals from PPI network is one of the first methods. Besides, Rossin et al.  used PPI network to assess whether the association loci uncovered in standard GWA studies are significantly connected through PPIs. They adapted a straightforward way of defining subnetworks , i.e., given a list of loci and the genes located in them, "direct networks" and "indirect networks" are constructed based on the network data. However, the GWAS data is not effectively incorporated in the process of network building such that the moderately associated genes (0.05 ~ 5 × 10-8) are still missed. Therefore, more comprehensive methods are in need to incorporate the GWAS data with the PPI data to help construct, prioritize, and evaluate subnetworks for complex diseases.
In most of these GSA methods, a critical issue is how to define the gene-wise P value, the P value representing the association signal at each gene region. In a typical GWAS data, statistics at the single nucleotide polymorphism (SNP) level is used, but biological data (pathways or PPI network) are typically annotated to genes/proteins. Thus, there is a gap between marker's significance and gene-wise significance. A popularly applied method in the field is to select the most significant SNP from the multiple SNPs mapped to each gene and represent the gene by the smallest P value [2, 5, 10]. Although this method is sensitive to retain the association signals and is easy to implement, this way of using the minimum P value bears intrinsic biases, including gene length, SNP density, and/or linkage disequilibrium (LD) structures. Recently, several reported methods aim to compute gene-wise P values, such as GATES , which adapts the Simes' test within each gene, VEGAS , which builds on the multivariate normal distribution and takes into account pairwise LD values, and the SNP-set analysis . Incorporation of these methods into a gene set analysis of GWAS data can reduce potential biases at the gene level and improve the robustness of follow up analyses.
In this work, we proposed a restricted search strategy to implement our previously developed dense module search (DMS) method . This new strategy could greatly reduce the computational intensity problem. We demonstrated this method in a GWAS dataset for schizophrenia and explored three different ways to define gene-wise P values. Our results showed that the way to define gene-wise P values could affect the network-based analysis substantially, and it also concluded that caution is needed when designing and interpreting the results.
Materials and methods
We used the GWAS dataset for schizophrenia from the Clinical Antipsychotic Trial of Intervention Effectiveness (CATIE) project . The CATIE project is a multiphase randomized controlled trial initially designed to investigate and improve the use of antipsychotic medications in treating schizophrenia patients. We included the samples involving 738 schizophrenia patients and 733 controls, which were genotyped by Perlegen Sciences using the Affymetrix 500K and Perlegen's custom 164K chip, resulting in ~446 k genotyped SNPs. A detailed description of the samples can be found in reference . We accessed this dataset (Distribution 7.0) from http://www.nimhgenetics.org/ through NIMH approval. Only the Caucasian samples were used. We followed the pipeline of quality control, including the selection of samples and markers, as described in references [8, 16, 17].
Gene-wise P value
To compute gene-wise P values, given multiple SNPs mapped to each gene, an ideal algorithm should account for potential confounding factors, such as gene length, SNP density, and LD structures. We incorporated the software tool VEGAS  to compute the gene-wise P values. VEGAS combines the information of multiple SNPs by making use of simulations from the multivariate normal distribution while explicitly accounting for the LD between markers. We followed the default settings in VEGAS; this default setting maps SNPs to genes with 50 kb extension of gene boundaries. The HapMap CEU samples (http://www.hapmap.org/, release R2) were selected to estimate the LD structure in our work, as we only included the Caucasian samples in the CATIE data.
We explored three options in VEGAS to compute gene-wise P values based on sets of SNPs: (1) using all the SNPs mapped to a gene (hereafter denoted as "VEGAS-all"); (2) using the top 10% SNPs based on SNP-level P values ("VEGAS-top"); and (3) using the most significant SNP, i.e., the SNP with the smallest P value ("minP"). Note that the minP option is the same as the smallest P value strategy widely used in many post-GWAS analysis methods. We included it here to compare with the other, more advanced approaches based on combining of multiple SNPs.
Our background PPI network data were collected from six public PPI databases: MINT, IntAct, DIP, BioGRID, HPRD, and MIPS/MPact. We downloaded these data from the Protein Interaction Network Analysis (PINA) platform (March 2010) . To ensure the reliability of the PPI data, we explicitly included only interactions that have experimental evidence and involve both interactors from human genes. Self-interaction and duplicates were removed. The final network included a total of 10,377 nodes and 50,109 interactions.
Module search algorithm
We first overlaid the GWAS data onto the whole human PPI network by assigning each node a z-score: , where is the inverse normal cumulative density function and P is the gene-wise P value from any one of the three methods. For a module with k genes, a module score was computed according to the Stouffer's Z-score method:. The detailed module construction process can be found in our previous work . Briefly, for a given "seed node", a "best module" with the maximum module score will be returned in the context of the working parameters, i.e., d=2 and r>0.1. Here, d is distance for the nodes to be included to the seed node and r is the network score increment cutoff value.
The overall network weighted by the GWAS data serves as the working background. The restricted search strategy we proposed in this study is implemented as follows. First, we ranked all the nodes in the network according to their weights z i and started with the node that had the highest score as the seed to perform a module search. Once the module was generated, all the nodes present in this module, including the seed node itself, were then removed from the network, and the rest of the network constituted the new background network. A new module search round was started again, with the highest scored node in the new background network each time, until none of the nodes in the background network could generate a module with ≥5 nodes (i.e., the minimum number of nodes we required to define a module). This restricted strategy takes into account every single node in the background network but does not require using each of them as a seed node, which has been implemented in our previous work. Thus, this tactic could greatly reduce the computational duty and also avoid the heavy redundancy of resultant modules using our original algorithm.
To estimate the significance of the resultant modules, we calculated P values based on the module scores (Z m ). We adopted the method proposed in  to empirically estimate the null distribution, which is assumed to be a normal distribution. Specifically, we used the median-centered module score to estimate the location parameters δ and σ for the empirical null distribution using the R package locfdr and computed standardized module scores by Z S = (Z m '- δ)/σ, where Z m ' is the median-centered module score. The final module P values were obtained using the standard calculation, P(Z m ) = 1-Φ(Z S ), where Φ is the normal cumulative density function. The module P values were then used for significance test of the resultant modules and help module selection.
Evaluation by ALIGATOR
We utilized the software tool ALIGATOR  to evaluate the module gene sets. The algorithm of ALIGATOR is initially designed to prioritize Gene Ontology (GO) categories using summary GWAS data at the SNP level. Building on a resampling strategy, ALIGATOR first pools all the SNPs and their P values from the GWAS data and builds the SNP collection. In each resample, the algorithm randomly selects SNPs from the collection and records the number of significant genes defined by the selected SNPs. Here, significant genes were defined as those with at least one SNP that has a P value less than a pre-defined cutoff value, e.g., 0.05. The random selection process keeps running until the significant genes targeted by the selected SNPs reaches the number of the significant genes in the real case. After the resampling SNP set is constructed, each of the GO categories are compared with the resample data to obtain the number of significant genes. This resampling process is repeated numerous times (e.g., 50,000), resulting in the null distribution of the number of significant genes in each GO category. Finally, an empirical P value can be computed for each GO category by comparing the significant genes in the category in the real case versus those in the resample sets.
In our application, instead of using GO categories, we constructed new gene sets based on the module genes. The module genes identified by each input dataset were pooled as one gene set, and 3 module gene sets were generated, corresponding to the input dataset of VEGAS-all, VEGAS-top, and minP. As a comparison, we also included the KEGG pathways (downloaded as of March, 2011) . We restricted the KEGG pathway size (number of genes in a pathway) to ≥5 and ≤300. In total, there were 204 KEGG pathways plus 3 module gene sets collected for the ALIGATOR analysis. We followed the ALIGATOR default definition of significant genes, i.e., genes with at least one SNP having P<0.05, and we performed the resampling 10,000 times. Multiple testing correction by the Benjamini & Hochberg (BH) method  was then conducted.
Comparison with DAPPLE
We compared with another available network based GSA method, "Disease Association Protein-Protein Link Evaluator (DAPPLE)" . We applied DAPPLE to the CATIE dataset. DAPPLE aims to evaluate whether genes in association loci are significantly connected by PPI, where the association loci are typically defined by the standard single marker analysis of a GWAS dataset for a complex disease/trait. Using the genes located in these association loci, DAPPLE searches for two types of subnetworks: a direct network, in which the input genes are directly connected, and an indirect network, in which the input genes are connected through a common interactor . Both networks are then evaluated by a permutation test to assess their significance. Because the construction process of the resultant subnetworks starts with the input association genes/loci, the method relies heavily on the input genes and can generate different subnetworks if the input gene/locus list is changed. However, the ways to define associated genes have not been standardized yet, although all adopt a pre-defined hard cutoff. For example, the widely applied method employs the cutoff of 5 × 10-8, which is challenging in psychiatric diseases, because the association signals are typically weak in psychiatric GWAS datasets. Alternatively, a user-defined cutoff value, which can be less strict yet arbitrary, could be considered.
We performed pathway enrichment test of the three module gene sets using the canonical KEGG pathways. The hypergeometric test was implemented with the genes in the network used as the gene universe and module genes used as the genes of interest. Multiple testing correction was performed using the Bonferroni method.
Exploration of gene-wise P values
Dense module search for schizophrenia
Summary of the module search results by the strict searching strategy
# significant modules
# module genes
To explore the results, we first compared the module genes in each data set using the weighted DMS method. As shown in Figure 1, the overlapped genes varied greatly among the three data sets, indicating that the gene-wise P value definition approach influenced the resultant subnetworks substantially. When using the VEGAS-all method, the smallest number of module genes were generated, as well as the number of significant modules, compared to the other methods, while the minP method generated the largest number of significant modules and module genes. However, the minP method is prone to potential biases, such as gene length, SNP density and LD structure. Thus, the results in this minP set might be inflated by these biases.
Validation by ALIGATOR
Analysis results of module gene sets and KEGG pathways by ALIGATOR (P BH <0.2)
P BH $
VEGAS-all module genes
1.0 × 10-4
VEGAS-top module genes
1.0 × 10-4
minP module genes
1.0 × 10-4
8.0 × 10-4
Tryptophan metabolism (hsa00380)
1.7 × 10-3
Gap junction (hsa04540)
2.3 × 10-3
Cytokine-cytokine receptor interaction (hsa04060)
2.9 × 10-3
Intestinal immune network for IgA production (hsa04672)
5.5 × 10-3
Exploration of module genes
Summary of module genes having positive association results in previous studies
Module gene set$
1.07 × 10-4
8.5 × 10-4
1.07 × 10-5
6.16 × 10-5
1.04 × 10-4
a, t, m
a, t, m
a, t, m
9.28 × 10-4
1.75 × 10-4
5.75 × 10-4
Comparison with the existing method (DAPPLE)
Comparative results by DAPPLE for the top 30 genes in VEGAS-all, VEGAS-top, and minP data sets
# of direct interactions
# genes to prioritize
Mean associated protein direct connectivity
Mean associated protein indirect connectivity
# module genes
Functional enrichment test of the module genes
Enriched KEGG pathways for module genes using the hypergeometric test
Pathway name (KEGG ID)
# module genes
Neurotrophin signaling pathway (hsa04722)
6.96 × 10-5
Adipocytokine signaling pathway (hsa04920)
6.34 × 10-4
Cell cycle (hsa04110)
1.56 × 10-3
Chronic myeloid leukemia (hsa05220)
2.28 × 10-3
Vasopressin-regulated water reabsorption (hsa04962)
4.10 × 10-3
Cell cycle (hsa04110)
8.61 × 10-8
1.15 × 10-5
Neurotrophin signaling pathway (hsa04722)
1.14 × 10-5
1.51 × 10-3
Endometrial cancer (hsa05213)
1.31 × 10-3
RIG-I-like receptor signaling pathway (hsa04622)
1.50 × 10-3
Chronic myeloid leukemia (hsa05220)
3.69 × 10-3
T cell receptor signaling pathway (hsa04660)
2.68 × 10-6
3.43 × 10-4
Neurotrophin signaling pathway (hsa04722)
2.91 × 10-6
3.69 × 10-4
Antigen processing and presentation (hsa04612)
8.58 × 10-6
1.08 × 10-3
Chronic myeloid leukemia (hsa05220)
1.03 × 10-5
1.28 × 10-3
Non-small cell lung cancer (hsa05223)
1.14 × 10-5
1.42 × 10-3
By taking advantage of our previously developed dense module search method, we proposed an alternative search strategy in this work and demonstrated it in the CATIE GWAS dataset, one of the major available GWAS datasets for schizophrenia. Additionally, we explored the different options to define gene-wise P values, including the VEGAS-all method, which built on all the SNPs mapped to a gene, the VEGAS-top method, which used the top 10% SNPs mapped to a gene, and the minP method, which used the most significant SNP. By applying our restricted search strategy in each of the three data sets, we showed that the VEGAS-all method generated the smallest number of module genes and was least affected by other potentially confounding effects such as gene length. The other two methods resulted in similar numbers of module genes. These results call for caution when selecting different methods to compute gene-wise P values, which may have significant influences on the resultant module genes prioritized for the disease.
The restricted search strategy is intended to reduce the overlap among modules. Assuming that a local environment of the background network includes 5 nodes, namely A, B, C, D, and E. Starting from node A, a module including A, B, C, and D would be generated at Z m+1 >Z m × (1+r). Starting from B, a module including B, C, D, and E would be generated. In our previous strategy to apply DMS, both modules would be reported, even though they had 75% overlapping genes. In the current strategy, to resolve the issue of overlap, we starts with the node that has the highest weight, e.g., A, to search for the module. And then we would remove the module genes from the background network after it is done, e.g., the nodes A, B, C, and D would be removed from the network and, thus, from further analysis. In this way, the module starting from B would not be reported, as most nodes in it have already been removed from consideration. This ensures that each node in the network could be analyzed once and will be involved in only one module. Both methods have their own advantages. The traditional one performs a comprehensive search and allows every node in the network to have the chance of being a seed. The computational intensity is high and redundancy among modules is strong. Furthermore, the correlation among modules posts challenges for the follow up statistical tests when selecting modules. In contrast, the restricted strategy is computationally efficient by gradually shrinking the background network, and it ensures against physical overlap among modules. However, it may miss moderately significant genes that cannot be included in any module. In practice, either of the two strategies can be selected depending on the specific aims and project design.
Computation of gene-wise P values is one of the key steps in most post-GWAS analyses. There have been several methods and tools published to compute gene-wise P values. The most widely applied method in the field is to select the SNP with the smallest P value among all SNPs mapped to a gene, although this method is subjected to several known biases, such as gene length, SNP density, and the local LD structure. We selected VEGAS because of its advantages, such as acceptable computation time (<12 hours for a typical GWAS dataset like in our case) and no need of genotyping data. The rationale of including two formulations in VEGAS is that using all SNPs mapped to a gene (e.g., VEGAS-all method) is comprehensive but considering all SNPs potentially dilute the signals, while using part of the SNPs (e.g., VEGAS-top) may miss some informative SNPs but captures the most significant 10% SNPs for the computation.
However, VEGAS computes SNP-SNP matrix based on pairwise LD values and could only deal with autosomal SNPs. SNPs located on the sex chromosomes (X and Y) are not applicable for VEGAS and were removed from our network based analysis. Although these genes accounted for only a small proportion (3.9%) in the PINA network we used, more comprehensive algorithms that are able to handle all genes in the genome is needed for future work.
The module genes we identified, in any scenario, recruited neuro-related and/or immune-related genes and pathways. All three sets of module genes include well-studied candidate genes for schizophrenia (e.g., DTNBP1), glutamate receptors (e.g., GRIN1), several genes located in the MHC region (e.g., HIST1H1A, HIST1H1C, HIST1H2AB, HIST1H2BB, HLA-E), and genes from the 14-3-3 protein family (e.g., YWHAQ, YWHAZ). Interestingly, all three module gene sets contain several genes in the MHC region, even though none of these genes passed the significance test for single markers at 5 × 10-8. The MHC region has been shown to harbor significant association signals in a combinatory analysis of three GWAS datasets for schizophrenia [11, 24]. The identification of these genes by our DMS method further confirmed this signal. It also proved that network based analysis could reveal markers that, although they individually failed the single marker test, their joint affects on the disease might be significant.
We proposed an efficient network-assisted framework to identify candidate genes from GWAS data based on dense module search algorithm. Augmented by functional annotation as well as a priori knowledge about schizophrenia, we explored the methods to compute gene-wise P values based on multiple SNPs mapped to a gene and assessed their effects on downstream analysis. In specific applications, caution is needed when selecting different search strategies and methods for gene-wise P values. Future work to compute gene-wise statistics for all genes in the genome will further improve such applications.
Based on “Network-assisted causal gene detection in genome-wide association studies: an improved module search algorithm”, by Peilin Jia and Zhongming Zhao which appeared in Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on. © 2011 IEEE .
We would like to thank Dr. Yang Liu for technical support. We would also like to thank Drs. Patrick F. Sullivan and Edwin van den Oord for valuable assistance in the CATIE GWAS data process. CATIE dataset use was granted through the National Institute of Mental Health (NIMH) Schizophrenia Genetics Initiative to Z.Z. This work was partially supported by National Institutes of Health grant R01LM011177 and 2010 NARSAD Young Investigator Award (to P. J.).
This article has been published as part of BMC Genomics Volume 13 Supplement 6, 2012: Selected articles from the IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/13/S6.
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009, 106: 9362-9367. 10.1073/pnas.0903103106.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z: Gene set analysis of genome-wide association studies: Methodological issues and perspectives. Genomics. 2011, 98: 1-8. 10.1016/j.ygeno.2011.04.006.View ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38: D355-D360. 10.1093/nar/gkp896.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang K, Li M, Bucan M: Pathway-Based Approaches for Analysis of Genomewide Association Studies. Am J Hum Genet. 2007, 81: 1278-1283. 10.1086/522374.PubMed CentralView ArticlePubMedGoogle Scholar
- Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, Sklar P, Owen MJ, O'Donovan MC, Craddock N: Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet. 2009, 85: 13-24. 10.1016/j.ajhg.2009.05.011.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen LS, Hutter CM, Potter JD, Liu Y, Prentice RL, Peters U, Hsu L: Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am J Hum Genet. 2010, 86: 860-871. 10.1016/j.ajhg.2010.04.014.PubMed CentralView ArticlePubMedGoogle Scholar
- Jia P, Zheng S, Long J, Zheng W, Zhao Z: dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks. Bioinformatics. 2011, 27: 95-102. 10.1093/bioinformatics/btq615.PubMed CentralView ArticlePubMedGoogle Scholar
- Rossin EJ, Lage K, Raychaudhuri S, Xavier RJ, Tatar D, Benita Y, Cotsapas C, Daly MJ: Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet. 2011, 7: e1001273-10.1371/journal.pgen.1001273.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang K, Li M, Hakonarson H: Analysing biological pathways in genome-wide association studies. Nat Rev Genet. 2010, 11: 843-854. 10.1038/nrg2884.View ArticlePubMedGoogle Scholar
- Jia P, Wang L, Fanous AH, Chen X, Kendler KS, Zhao Z: A bias-reducing pathway enrichment analysis of genome-wide association data confirmed association of the MHC region with schizophrenia. J Med Genet. 2012, 49: 96-103. 10.1136/jmedgenet-2011-100397.View ArticlePubMedGoogle Scholar
- Li MX, Gui HS, Kwan JS, Sham PC: GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet. 2011, 88: 283-293. 10.1016/j.ajhg.2011.01.019.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, Hayward NK, Montgomery GW, Visscher PM, Martin NG, Macgregor S: A versatile gene-based test for genome-wide association studies. Am J Hum Genet. 2010, 87: 139-145. 10.1016/j.ajhg.2010.06.009.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X: Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet. 2010, 86: 929-942. 10.1016/j.ajhg.2010.05.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Sullivan PF, Lin D, Tzeng JY, van den Oord E, Perkins D, Stroup TS, Wagner M, Lee S, Wright FA, Zou F, et al: Genomewide association for schizophrenia in the CATIE study: results of stage 1. Mol Psychiatry. 2008, 13: 570-584. 10.1038/mp.2008.25.PubMed CentralView ArticlePubMedGoogle Scholar
- Jia P, Wang L, Meltzer HY, Zhao Z: Common variants conferring risk of schizophrenia: a pathway analysis of GWAS data. Schizophr Res. 2010, 122: 38-42. 10.1016/j.schres.2010.07.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Sun J, Jia P, Fanous AH, Webb BT, van den Oord EJ, Chen X, Bukszar J, Kendler KS, Zhao Z: A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case. Bioinformatics. 2009, 25: 2595-2602. 10.1093/bioinformatics/btp428.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu J, Vallenius T, Ovaska K, Westermarck J, Makela TP, Hautaniemi S: Integrated network analysis platform for protein-protein interactions. Nat Methods. 2009, 6: 75-77. 10.1038/nmeth.1282.View ArticlePubMedGoogle Scholar
- Efron B: Correlated z-values and the accuracy of large-scale statistical estimates. J Am Stat Assoc. 2010, 105: 1042-1055. 10.1198/jasa.2010.tm09129.PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B. 1995, 57: 289-300.Google Scholar
- Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ, et al: Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009, 460: 753-757.PubMed CentralPubMedGoogle Scholar
- Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietilainen OP, Mors O, Mortensen PB, et al: Common variants conferring risk of schizophrenia. Nature. 2009, 460: 744-747.PubMed CentralPubMedGoogle Scholar
- Sun J, Jia P, Fanous AH, van den Oord E, Chen X, Riley BP, Amdur RL, Kendler KS, Zhao Z: Schizophrenia gene networks and pathways and their applications for novel candidate gene selection. PLoS One. 2010, 5: e11351-10.1371/journal.pone.0011351.PubMed CentralView ArticlePubMedGoogle Scholar
- Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P: Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009, 460: 748-752.PubMedGoogle Scholar
- Allen NC, Bagade S, McQueen MB, Ioannidis JP, Kavvoura FK, Khoury MJ, Tanzi RE, Bertram L: Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat Genet. 2008, 40: 827-834. 10.1038/ng.171.View ArticlePubMedGoogle Scholar
- Jia P, Zhao Z: Network-assisted causal gene detection in genome-wide association studies: an improved module search algorithm. Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on: 4-6 December 2011. 2011, 131-134. 10.1109/GENSiPS.2011.6169462.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.