Combined genotype and haplotype tests for regionbased association studies
 Sergii Zakharov^{1, 2}Email author,
 Tien Yin Wong^{3, 4},
 Tin Aung^{3, 4},
 Eranga Nishanthie Vithana^{3},
 Chiea Chuen Khor^{1, 2},
 Agus Salim^{2} and
 Anbupalam Thalamuthu^{1, 5}Email author
DOI: 10.1186/1471216414569
© Zakharov et al.; licensee BioMed Central Ltd. 2013
Received: 9 April 2013
Accepted: 13 August 2013
Published: 21 August 2013
Abstract
Background
Although singleSNP analysis has proven to be useful in identifying many diseaseassociated loci, regionbased analysis has several advantages. Empirically, it has been shown that regionbased genotype and haplotype approaches may possess much higher power than singleSNP statistical tests. Both high quality haplotypes and genotypes may be available for analysis given the development of next generation sequencing technologies and haplotype assembly algorithms.
Results
As generally it is unknown whether genotypes or haplotypes are more relevant for identifying an association, we propose to use both of them with the purpose of preserving high power under both genotype and haplotype disease scenarios. We suggest two approaches for a combined association test and investigate the performance of these two approaches based on a theoretical model, population genetics simulations and analysis of a real data set.
Conclusions
Based on a theoretical model, population genetics simulations and analysis of a central corneal thickness (CCT) Genome Wide Association Study (GWAS) data set we have shown that combined genotype and haplotype approach has a high potential utility for applications in association studies.
Keywords
Genotypebased tests Haplotypebased tests Association analysis Test statistic combinationBackground
The development of genotyping and sequencing technologies has enabled scientists to investigate the impact of genomic loci on complex disorders and traits. Indeed, genomewide association studies (GWAS) and sequencing studies have identified many common singlenucleotide polymorphisms (SNPs) (for GWAS publication list, see http://www.genome.gov/gwastudies/) and rare variations [1–4] associated with common diseases. Although singleSNP analysis has proven to be useful in discovering many diseaseassociated loci, this strategy may be limited due to very stringent significance threshold and poor reproducibility [5]. Regionbased association studies have the advantages of less stringent significance level and potentially higher power if multiple associated variants are found within a region. Indeed, several empirical studies have demonstrated the superiority of genotype genebased association analysis over singleSNP strategy [6, 7]. Also, there is some theoretical [8, 9] and empirical evidence that haplotypebased tests may possess higher power than SNPbased tests. When intending to use haplotypes in an association study, one faces a problem of phase inference. While several statistical algorithms have been developed to infer unknown haplotypes from genotype data [10–12], the improvements of sequencing technologies will enable researchers to assemble haplotypes from sequencing data with very high accuracy (for examples of existing assembly algorithms, see Bansal et al. [13], Bansal et al. [14], and Schatz et al. [15]). This opens up the opportunity to use highquality haplotypes and genotypes in sequencing association studies.
 1.
Combination of haplotype and genotypebased test statistics preserves power for both genotype and haplotype disease models;
 2.
In some of the considered scenarios, the performance of the MinPval approach is comparable to those of the SumPval method;
 3.
MinPval is much more robust than SumPval when one of the underlying tests has low power.
Methods
Genotype and haplotypebased tests
Let us assume that we are interested in testing the joint association of all the variants within a genomic region with either a dichotomous phenotype or quantitative trait. Next, assume we have chosen the two statistical tests for a regionbased association analysis: one genotype and one haplotypebased test. For haplotypebased tests haplotypes can be inferred from genotypes [10–12] or assembled from sequencing data [13, 15, 20]. Several conventional genotypebased methods [21–23] are applicable for common variants testing, whereas for sequencing data numerous recentlydeveloped rare variants approaches are available [24–28]. Haplotypebased methodologies have also been extensively published elsewhere [29–31], including rare haplotype tests [32–34].
The combined approaches
Let us denote pvalues from a genotype and a haplotypebased tests as p_{1} and p_{2} respectively. Our first approach is SumPval [35]. Let us consider the inverse standard normal transformation of both pvalues and which are distributed as standard normal random variables under the null hypothesis. Here, we assume that y_{1},y_{2} is bivariate normal. The SumPval test statistic is P_{sum} = y_{1} + y_{2}. Under the null hypothesis, it is distributed as a normal random variable with zero mean and variance Var(y_{1} − y_{2}) = Var(y_{1}) + 2Cor(y_{1}, y_{2}) − Var(y_{2}) = 2 + 2p, where p is a correlation coefficient between y_{1} and y_{2}, since two statistical tests for the same genomic region may not be independent. The correlation coefficient p may be estimated via permutation procedure. The rejection region is large values of the test statistic, which is equivalent to low values for p_{1} and/or p_{2}. The theoretical pvalue for SumPval test is calculated as , where Φ(a, b, c) is a value of normal cumulative distribution function with mean b and variance c taken at the point a.
where 0 < x < 1. Given the rejection region is small values of the test statistic, theoretical pvalue for MinPval test is straightforward to compute using (1).
Theoretical power model
Within our theoretical framework the following model is adopted: the two test statistics S_{ g } and S_{ h } of the underlying genotype and haplotypebased tests, respectively, are assumed to asymptotically follow central chisquared distribution ${X}_{1}^{2}$ with 1 degree of freedom under the null hypothesis, and noncentral chisquared ${X}_{1a}^{2}$ and ${X}_{1b}^{2}$ with NCPs a and b, respectively, under the alternative hypothesis. One of the examples of the test which results in such null and alternative distributions is Rao’s score test on, for example, genotype or haplotype scores described in Additional file 1. Since the two tests are applied to the same data, the chisquared test statistics are likely to be positively correlated. The correlation between the two test statistics may vary from very low to high. For example, if within a region there are few SNPs in very high LD then we would expect the correlation between the tests to be high. Alternatively, we would expect the correlation to be low when variants within a region are independent. The correlation is modeled via underlying multivariate normal distribution, namely, to simulate the test statistics ${S}_{g}~{X}_{1a}^{2}$ and ${S}_{h}~{X}_{1,b}^{2}$ a bivariate normal random vector $y=\left({y}_{1},y{}_{2}\right)$ with mean $\left(\sqrt{a},\sqrt{b}\right)$, unit variances and correlation coefficient p > 0 is generated, and the squares of the coordinates are taken as the proxy for the test statistics: ${S}_{g}={y}_{1}^{2}$, ${S}_{h}={y}_{2}^{2}$. To estimate the power of MinPval and SumPval tests we simulated 500,000 independent pairs (S_{ g }, S_{ h }) under the alternative huypothesis, calculated the test statistics for the combined approaches, and noted the share of statistically significant pairs. This procedure was done for every theoretical scenario (see “Results” section).
Population genetics simulations

for the “Rare” model: b _{ l } = d _{ r } = 4,l = 1,..,c and b _{c1} = d _{ c } = 0

for the “Both” model: b _{ l } = d _{ r } = 3,l = 1,..,c and b _{c1} = d _{ c } = 1.2

for the “Common” model: b _{ l } = d _{ r } = 1.5,l = 1,..,c and b _{c1} = d _{ c } = 2
The average number of variants within a region across 1000 data replicates in population genetics simulations
Phenotype model  Proportion of causal variants/haplotypes  

50%  20%  10%  
Haplotype common  32.4  31.6  31.6 
Haplotype both  35.5  33.2  32.6 
Haplotype rare  37.2  34.1  33.0 
Genotype common  33.1  32.4  32.2 
Genotype both  36.3  33.7  32.9 
Genotype rare  37.6  34.4  33.0 
Real data analysis
For the purpose of demonstrating the performance of the described methodologies we conducted a genebased analysis of the central corneal thickness (CCT) GWAS data sets described in Vithana et al. [41]. Briefly, the Singapore Indian Eye Study (SINDI), which is part of the Singapore Indian Chinese Cohort Eye Study (SICC) [42], consists of 2538 Indian subjects aged 40 and above, and the Singapore Malay Eye Study (SiMES) [43–45] is a genomewide association study of CCT phenotype which contains 2542 Malay subjects aged 40 and above. Both SiMES and SICC adhered to the Declaration of Helsinki. Ethics approval for the both studies was obtained from the Singapore Eye Research Institute Institutional Review Board [41]. The combined data set consists of 5080 individuals genotyped at 552318 SNPs after quality control. In total, 5049 individuals were analyzed after excluding those with missing phenotype. Also, we attempted to replicate all the genomewide significant regions using Chinese samples from the SICC. This data set contains 2837 samples with nonmissing phenotype and covariates (age and gender). SNPs were mapped to genes based on the method outlined by Zhao et al. [46]. Briefly, information on gene identifiers (IDs), names, start and end positions on a chromosome were downloaded from the NCBI Genome database (http://www.ncbi.nlm.nih.gov/Genomes). Gene regions included 10 kb upstream and downstream. Hierarchical mapping scheme (coding > intronic > 5′UTR > 3′UTR) was used if a variant was within 10 kb of multiple genes. The remaining intergene variants between two genes were grouped together. Haplotype inference was performed using the software Beagle [10] with reference panel from 1000Genomes Project (http://www.1000genomes.org/). In our analysis we adjusted for age, gender and the first ten principal components from Eigenstrat [47].
Statistical tests for population genetics simulations and real data application
where l{A} is an indicator of an event A. If there were no common haplotypes within a region, we formed three groups of haplotype: those with a frequency less that 0.05%, those in between 0.05% and 0.1%, and those with a frequency greater than 0.1%. For both tests we used the R (http://www.rproject.org/) package SKAT (http://www.hsph.harvard.edu/~xlin/software.html). For population genetics simulations pvalues for all the tests were estimated using 1000 permutations. In real data analysis for the underlying tests we used theoretical pvalues as we believe that they reasonably approximate empirical pvalues given large sample size and normallydistributed quantitative trait [41]. Then we tested an assumption of bivariate normality by applying the ShapiroWilk test (R package “mvnormtest” http://cran.rproject.org/web/packages/mvnormtest/). If the normality test was not significant on the genomewide level, we used theoretical pvalues for both SumPval and MinPval; otherwise we used permutations. The permutation procedure and estimation of correlation coefficient are described in the next section.
Permutation procedure and estimation of correlation coefficient
To calculate theoretical pvalues for the proposed methods we estimated the correlation coefficient p using 500 permutations. The difficulty in applying permutations lies in the fact that the permutation procedure should preserve the relationship between all the covariates, and also between phenotype and covariates, but disrupt the relationship between phenotype and genotype. Several techniques have been developed for conducting permutation tests of partial coefficients in a multiple regression model [49–51]. Among them the permutation of residuals under the reduced model [49] was shown to preserve correct type1 error for ttest [52] and was previously applied to microarray data analysis [53]. As the SKAT test can be obtained from a semiparametric regression model [40], let us consider the following genotype and haplotype regression models: Y = a_{1} − f_{1}(P) − Cc − ϵ and Y = a_{2} − f_{2}(R) − Cc − ϵ, where P is n × L collapsed genotype matrix, n is the sample size, L is the number of common SNPs within a region plus one for collapsed rare variants superlocus, Y is n × 1 vector of quantitative phenotype (CCT), C is n × 12 matrix of covariates which include age, gender and the first ten genotype principal components obtained from Eigenstrat [47], R is haplotype regression matrix, and f_{1} and f_{2} are unknown functions. To obtain the permutation values for the test statistics the reduced model Y = a − Cc − ϵ is fitted, and a, c, ϵ are the estimated constant coefficient, regression coefficients and residuals, respectively. Next, the residuals ϵ are permuted to obtain ϵ, and Y = a + Cc − ϵ. The permuted statistic values for both genotype and haplotype SKAT tests are calculated as respective SKAT statistics from semiparametric models Y = a_{1} − f_{1}(P) − Cc − ϵ and Y = a_{2} − f_{2}(R) − Cc − ϵ. Each pvalue obtained from permutations was transformed using the inverse standard normal transformation, and the value of p was estimated by a Pearson correlation coefficient.
Results
Theoretical power results
In Panel 3, where the difference in power between the combined approaches and the less powerful underlying test is shown, it can be seen that both MinPval and SumPval are consistently better than the less powerful test. This suggests that combination of statistical tests may prove beneficial when the underlying disease model is unknown. To investigate the impact of change of NCP b on the performance of the proposed approaches we fixed NCP a to be equal to 10.5 (corresponding to 90% power of a chisquared test with ${X}_{1a}^{2}$ distribution under the alternative hypothesis, the type1 error is 0.05). Panel 4 of Figures 1 and 2 depicts the power of MinPval and SumPval as a function of correlation and a “fraction of NCP” – the ratio of b to 10.5. As can be seen in Panel 4, MinPval test achieved higher power than SumPval in the majority of scenarios. It is notable that SumPval lost much power when the value of b is low. Hence, MinPval approach is more robust with respect to underperformance of one of the underlying tests.
Population genetics simulation results
Panles 1–3 of Figure 3 depict the results of population genetics simulations analysis for all the phenotype models with 50%, 20% and 10% or rare causal variants/haplotypes, respectively, at the fixed 5% type1 error. For all the tests 1000 permutations were performed to estimate pvalues. Haplotypes were assumed to be known without ambiguity. Under the genotypebased disease scenarios, genotype SKAT is expected to be more powerful than haplotype SKAT, and vice versa under the haplotypebased scenarios. However, genotype SKAT was less powerful for many genotypebased phenotype models. A possible explanation of this observation is that when rare variants are strongly associated with phenotype, for some statistical tests pooling of rare haplotypes may be a better strategy than pooling of rare variants. Also, it should be noted that with the decrease in the percentage of causal rare variants/haplotypes, the power for “Rare” and “Both” phenotype models decrease substantially since for these models rare variants/haplotypes are the major carriers of an association signal. For “Common” phenotype model one common variant/haplotype has a significant impact on phenotype; so, the decrease in power with the lower proportion of causal rare variants/haplotypes is not as high as for other phenotype models.
As can be seen from the Panels 1–3 of Figure 3, for all the phenotype disease models, when both underlying tests were almost equally powerful (e.g. Panel 1 haplotype disease scenario “Common” model, and genotype disease scenario “Both” and “Common” models), the power of both MinPval and SumPval were on the same level or even higher than those of the underlying tests. However, when genotypebased SKAT significantly underperformed haplotypebased SKAT (e.g. Panel 1 haplotype disease scenario “Rare” and “Both”models), MinPval approach showed slightly lower power than the more powerful underlying test and greater power compared with SumPval approach. The maximum power loss of SumPval and MInPval compared with the more powerful underlying test across all phenotype models was 6.3% and 3.8% respectively (haplotype disease scenario “Both” model). These results are consistent with those obtained from the theoretical power considerations, and illustrate the great potential of the proposed methods in their application to real association studies.
To examine the effect of phasing on our results we repeated the analysis using the most probable haplotypes inferred by Beagle [10]. The reference panel consisted of 1094 simulated individuals to mimic the size of the publicly available reference panel from the 1000 Genomes Project (http://www.1000genomes.org). The results of this analysis were very similar to those described above (data not shown). In addition, we applied the proposed methods with a different pair of underlying tests. The results are similar to those described above. For more details, see Additional files 1 and 2.
Application to central corneal thickness GWAS data set
The results of the combined SiMES and SINDI data analysis and the singleSNP pvalues from the original article
COL8A2  ZNF469LOC100128913  RXRACOL5A1  COL8A2 TRAPPC3  C7orf42  

Chromosome  1  16  9  1  7 
Number of SNPs  4  27  73  3  6 
Genotype SKAT  3.68E13  2.13E15  4.06E12  2.63E08  2.55E07 
Haplotype SKAT  5.58E10  0.149394  0.79  2.78E05  0.005 
MinPval  3.68E13  4.22E15  8.11E12  4E08  4.96E07 
SumPval  1.77E11  9.44E10  7.90E06  4E08  2.60E05 
SingleSNP analysis*  rs96067: 5.4E13  rs9938149: 1.63E16 rs12447690: 1.92E14  rs1536478: 3.5E9     
Replication results on Chinese samples from the Singapore Indian Chinese cohort eye study
COL8A2  ZNF469LOC100128913  RXRACOL5A1  COL8A2 TRAPPC3  C7orf42  

Genotype SKAT  0.019  0.117  0.001  1  0.014 
Haplotype SKAT  0.599  0.479  0.1  1  0.27 
MinPval  0.037  0.223  0.002  0.989  0.028 
SumPval  0.089  0.186  0.001  0.788  0.027 
SingleSNP analysis*  rs96067: 0.036  rs9938149: 0.4 rs12447690: 0.03  rs1536478: 0.016     
In addition to the genebased analysis, we tried to replicate the four genomewide significant SNPs found by Vithana et al. [41] in our Chinese samples using singleSNP analysis. Having tested an association of these SNPs with CCT trait using trend test within a linear additive model adjusting for age, gender and the first ten principal components, we found that none of the SNPs was significant on the corrected type1 error rate 0.0125 = 0.05/4. This result suggests that genebased replication may be a more powerful strategy than singleSNP replication.
In addition to the main genomewide analysis of SiMES + SINDI data set, we applied the proposed methods with a different pair of underlying tests to the three regions reported by Vithana et al. [41]. Both MinPval and SumPval identified the three regions on genomewide significance level. This result suggests that our combined approaches work as well with other underlying tests (for more details, see Additional file 1).
Discussion
When the underlying disease model is unknown, combining statistical tests tailored for different disease scenarios may be a much better strategy than application of a statistical test designed for one specific disease model. In this article we have described the two approaches of combining genotype and haplotypebased statistical tests. The results of theoretical power considerations, population genetics simulations and real data analysis showed strong performance of MinPval approach for different disease scenarios, whereas SumPval method was shown to perform poorly when one of the underlying tests had low power. Our analysis of SiMES + SINDI identified the three regions found by Vithana et al. [41], and additionally, the C7orf42 gene. The replication analysis confirmed an association of RXRACOL5A1 region, which is consistent with the results of Cornes et al. [54], and showed a moderate pvalue for C7orf42 gene. The analysis of real data highlighted the applicability of our combined approaches to real association studies.
In our simulations the Haplotype SKAT was the most powerful test in many cases, but in real data analysis it performed the worst. It is not known beforehand whether a genotype or a haplotypebased test would perform better; hence, our proposal to apply a combined approach is a robust choice. Indeed, MinPval did well in both simulations and real data. This emphasizes the major point of the combined strategy: MinPval may have slightly lower power when a disease model fits Haplotype SKAT and higher power when the disease model is closer to the second underlying tests. One of the possible reasons for the apparent inconsistency of Haplotype SKAT performance may be that for “Rare” and “Both” simulation models we assumed that rare variants bear the major association signal whereas in the real data only common SNPs were present. However, Haplotype SKAT performed well even for “Common” model when a common SNP was causal. We suppose that for this scenario genotype association translated into an association of haplotypes with a phenotype, which is possible if common SNPs within a region are in high LD with each other. On the other hand, if a causal common SNP within a region is in low LD with other common SNPs within a region then under a genotypebased disease scenario haplotypebased test may have much lower power than a genotypebased test which is observed in the results of the real data analysis.
The methods proposed in this study may be easily generalized to multiple statistical tests, namely, instead of two underlying tests it is possible to apply more tests and combine all of them via the described methodology. In this case the arguments for theoretical pvalue calculation for the proposed approaches can be extended in a straightforward manner.
Recently Derkach et al. [56] investigated the performance of the combined approaches, namely, the minimum of pvalues and the Fisher pvalue combination, for rare variants association scenarios. Although the approaches we propose are similar, our major idea is different. We combine two test statistics for the purpose of widening the set of alternatives for which our test is powerful; thus, we choose the underlying tests designed for very different phenotype models, whereas Derkach et al. [56] used linear and quadratic tests which are likely to be both powerful under many models. As a result, our conclusions are different from those of Derkach et al. [56]. For example, the authors stated that “hybrid test statistics provide much needed robustness in terms of power for association tests”, whereas we observed that only minimum pvalue approach really preserves power when one of the underlying tests underperforms. Secondly, the authors found that in many cases Fisher method outperforms both of the underlying tests, and the minimum pvalue approach. However, from our work it is clear that SumPval (which is similar to the Fisher pvalue combination) outperforms all the three tests only when both of the underlying tests have comparable power which is unlikely if the two underlying tests are deliberately chosen to fit very different phenotype models.
One of the limitations of the proposed approaches is the need to use permutations. For theoretical pvalue calculation both SumPval and MinPval require a correlation coefficient to be estimated via permutations. Moreover, permutations need to be applied when asymptotic distributions of the underlying test statistics are unknown or inadequate to describe the empirical distributions.
The described methodologies may be extended to preserve power under other disease models. For example, the combination of rarevariants and commonvariants statistical tests applied to a sequenced region may preserve high power when either only rare or only common variants are associated with a phenotype. However, it is not known how the combined approaches will perform if both common and rare variants are associated with phenotype.
Conclusions
In this study we have investigated the performance of combined haplotype and genotypebased tests for the purpose of preserving high power under both genotype and haplotype disease scenarios. Based on theoretical power calculations, population genetics simulations and analysis of the real data set we have illustrated high performance and potential utility of combined approaches for association studies.
Declarations
Authors’ Affiliations
References
 Green EK, Grozeva D, Sims R, Raybould R, Forty L, GordonSmith K, Russell E, St. Clair D, Young AH, Ferrier IN: DISC1 exon 11 rare variants found more commonly in schizoaffective spectrum cases than controls. Am J Med Genet B Neuropsychiatr Genet. 2011, 156 (4): 490492. 10.1002/ajmg.b.31187.View Article
 Norton N, Li D, Rieder Mark J, Siegfried Jill D, Rampersaud E, Züchner S, Mangos S, GonzalezQuintana J, Wang L, McGee S: Genomewide studies of copy number variation and exome sequencing identify rare variants in BAG3 as a cause of dilated cardiomyopathy. The American Journal of Human Genetics. 2011, 88 (3): 273282. 10.1016/j.ajhg.2011.01.016.View ArticlePubMed
 Ramagopalan SV, Dyment DA, Cader MZ, Morrison KM, Disanto G, Morahan JM, BerlangaTaylor AJ, Handel A, De Luca GC, Sadovnick AD: Rare variants in the CYP27B1 gene are associated with multiple sclerosis. Ann Neurol. 2011, 70 (6): 881886. 10.1002/ana.22678.View ArticlePubMed
 Xie P, Kranzler HR, Krauthammer M, Cosgrove KP, Oslin D, Anton RF, Farrer LA, Picciotto MR, Krystal JH, Zhao H: Rare nonsynonymous variants in alpha4 Nicotinic Acetylcholine receptor gene protect against nicotine dependence. Biol Psychiatry. 2011, 70 (6): 528536. 10.1016/j.biopsych.2011.04.017.PubMed CentralView ArticlePubMed
 Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X: Powerful SNPSet analysis for case–control genomewide association studies. Am J Hum Genet. 2010, 86 (6): 929942. 10.1016/j.ajhg.2010.05.002.PubMed CentralView ArticlePubMed
 Caporaso N, Gu F, Chatterjee N, ShengChih J, Yu K, Yeager M, Chen C, Jacobs K, Wheeler W, Landi MT: Genomewide and candidate gene association study of cigarette smoking behaviors. PLoS ONE. 2009, 4 (2): e465310.1371/journal.pone.0004653.PubMed CentralView ArticlePubMed
 Hong MG, Reynolds CA, Feldman AL, Kallin M, Lambert JC, Amouyel P, Ingelsson E, Pedersen NL, Prince JA: Genomewide and genebased association implicates FRMD6 in alzheimer disease. Hum Mutat. 2012, 33 (3): 521529. 10.1002/humu.22009.PubMed CentralView ArticlePubMed
 Akey J, Jin L, Xiong M: Haplotypes vs single marker linkage disequilibrium tests: what do we gain?. Eur J Hum Genet. 2001, 9: 291300. 10.1038/sj.ejhg.5200619.View ArticlePubMed
 Schaid DJ: Power and sample size for testing associations of haplotypes with complex traits. Ann Hum Genet. 2006, 70 (1): 116130. 10.1111/j.15298817.2005.00215.x.View ArticlePubMed
 Browning SR, Browning BL: Rapid and accurate Haplotype phasing and missingdata inference for wholegenome association studies by use of localized Haplotype clustering. The American Journal of Human Genetics. 2007, 81 (5): 10841097. 10.1086/521987.View ArticlePubMed
 Howie BN, Donnelly P, Marchini J: A flexible and accurate genotype imputation method for the next generation of genomewide association studies. PLoS Genet. 2009, 5 (6): e100052910.1371/journal.pgen.1000529.PubMed CentralView ArticlePubMed
 Stephens M, Smith NJ, Donnelly P: A New statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001, 68 (4): 978989. 10.1086/319501.PubMed CentralView ArticlePubMed
 Bansal V, Halpern AL, Axelrod N, Bafna V: An MCMC algorithm for haplotype assembly from wholegenome sequence data. Genome Res. 2008, 18 (8): 13361346. 10.1101/gr.077065.108.PubMed CentralView ArticlePubMed
 Bansal V, Libiger O, Torkamani A, Shork JN: Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet. 2011, 11: 773785.View Article
 Schatz MC, Delcher AL, Salzberg SL: Assembly of large genomes using secondgeneration sequencing. Genome Res. 2010, 20 (9): 11651173. 10.1101/gr.101360.109.PubMed CentralView ArticlePubMed
 Begnini A, Tessari G, Turco A, Malerba G, Naldi L, Gotti E, Boschiero L, Forni A, Rugiu C, Piaserico S: PTCH1 gene haplotype association with basal cell carcinoma after transplantation. Brit J Dermatol. 2010, 163 (2): 364370. 10.1111/j.13652133.2010.09776.x.View Article
 DIEUDE P, DAWIDOWICZ K, GUEDJ M, LEGRAIN Y, WIPFF J, HACHULLA E, DIOT E, SIBILIA J, MOUTHON L, CABANE J: PhenotypeHaplotype Correlation of IRF5 in Systemic Sclerosis: role of 2 Haplotypes in Disease Severity. J Rheumatol. 2010, 37 (5): 987992. 10.3899/jrheum.091163.View ArticlePubMed
 Lambert JC, GrenierBoley B, Harold D, Zelenika D, Chouraki V, Kamatani Y, Sleegers K, Ikram MA, Hiltunen M, Reitz C: Genomewide haplotype association study identifies the FRMD4A gene as a risk locus for Alzheimer's disease. Mol Psychiatry. 2013, 18 (4): 461470. 10.1038/mp.2012.14.PubMed CentralView ArticlePubMed
 Tregouet DA, Konig IR, Erdmann J, Munteanu A, Braund PS, Hall AS, Groszhennig A, LinselNitschke P, Perret C, DeSuremain M: Genomewide haplotype association study identifies the SLC22A3LPAL2LPA gene cluster as a risk locus for coronary artery disease. Nat Genet. 2009, 41 (3): 283285. 10.1038/ng.314.View ArticlePubMed
 Bansal V, Bafna V: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics. 2008, 24 (16): i153i159. 10.1093/bioinformatics/btn298.View ArticlePubMed
 Gauderman WJ, Murcray C, Gilliland F, Conti DV: Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol. 2007, 31 (5): 383395. 10.1002/gepi.20219.View ArticlePubMed
 Li M, Fu W, Lu Q: An aggregating UTest for a genetic association study of quantitative traits. BMC Proceedings. 2011, 5 (Suppl 9): S2310.1186/175365615S9S23.PubMed CentralView ArticlePubMed
 Li YM, Xiang Y, Sun ZQ: An entropybased measure for QTL mapping using extreme samples of population. Hum Hered. 2008, 65 (3): 121128. 10.1159/000109729.View ArticlePubMed
 Bansal V, Libiger O, Torkamani A, Schork N: Statistical analysis strategies for association studies involving rare variants. Nature Review Genetics. 2010, 11: 773785.View Article
 IonitaLaza I, Buxbaum JD, Laird NM, Lange C: A New testing strategy to identify rare variants with either risk or protective effect on disease. PLoS Genet. 2011, 7 (2): e100128910.1371/journal.pgen.1001289.PubMed CentralView ArticlePubMed
 Madsen BE, Browning SR: A groupwise association test for rare mutations using a weighted Sum statistic. PLoS Genet. 2009, 5 (2): e100038410.1371/journal.pgen.1000384.PubMed CentralView ArticlePubMed
 Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, OrhoMelander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ: Testing for an unusual distribution of rare variants. PLoS Genet. 2011, 7 (3): e100132210.1371/journal.pgen.1001322.PubMed CentralView ArticlePubMed
 Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei LJ, Sunyaev SR: Pooled association tests for rare variants in exonresequencing studies. Am J Hum Genet. 2010, 86 (6): 832838. 10.1016/j.ajhg.2010.04.005.PubMed CentralView ArticlePubMed
 Jin L, Zhu W, Guo J: Genomewide association studies using haplotype clustering with a new haplotype similarity. Genet Epidemiol. 2010, 34 (6): 633641. 10.1002/gepi.20521.View ArticlePubMed
 Sha Q, Dong J, Jiang R, Zhang S: Tests of association between quantitative traits and haplotypes in a reduceddimensional space. Ann Hum Genet. 2005, 69 (6): 715732. 10.1111/j.15298817.2005.00216.x.View ArticlePubMed
 Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG: Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered. 2002, 53 (2): 7991. 10.1159/000057986.View ArticlePubMed
 Guo W, Lin S: Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genet Epidemiol. 2009, 33 (4): 308316. 10.1002/gepi.20382.PubMed CentralView ArticlePubMed
 Li Y, Byrnes AE, Li M: To identify associations with rare variants, just WHaIT: weighted haplotype and imputationbased tests. Am J Hum Genet. 2010, 87 (5): 728735. 10.1016/j.ajhg.2010.10.014.PubMed CentralView ArticlePubMed
 Zhu X, Feng T, Li Y, Lu Q, Elston RC: Detecting rare variants for complex traits using family and unrelated data. Genet Epidemiol. 2010, 34 (2): 171187. 10.1002/gepi.20449.PubMed CentralView ArticlePubMed
 Stouffer SA, Suchman EA, Devinney LC, Star SA, Williams RMJ: Adjustment during army life. The American soldier. 1949, Princeton, NJ: Princeton Univ, 1
 Tippet LHC: The method of statistics. 1931, London: Williams and Northgate
 King CR, Rathouz PJ, Nicolae DL: An evolutionary framework for association testing in resequencing studies. PLoS Genet. 2010, 6 (11): e100120210.1371/journal.pgen.1001202.PubMed CentralView ArticlePubMed
 Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR: Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008, 4 (5): e100008310.1371/journal.pgen.1000083.PubMed CentralView ArticlePubMed
 Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009, 5 (10): e100069510.1371/journal.pgen.1000695.PubMed CentralView ArticlePubMed
 Wu Michael C, Lee S, Cai T, Li Y, Boehnke M, Lin X: Rarevariant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011, 89 (1): 8293. 10.1016/j.ajhg.2011.05.029.PubMed CentralView ArticlePubMed
 Vithana EN, Aung T, Khor CC, Cornes BK, Tay WT, Sim X, Lavanya R, Wu R, Zheng Y, Hibberd ML: Collagenrelated genes influence the glaucoma risk factor, central corneal thickness. Hum Mol Genet. 2011, 20 (4): 649658. 10.1093/hmg/ddq511.View ArticlePubMed
 Lavanya R, Jeganathan VSE, Zheng Y, Raju P, Cheung N, Tai ES, Wang JJ, Lamoureux E, Mitchell P, Young TL: Methodology of the Singapore Indian Chinese cohort (SICC) Eye study: quantifying ethnic variations in the epidemiology of eye diseases in Asians. Ophthalmic Epidemiol. 2009, 16 (6): 325336. 10.3109/09286580903144738.View ArticlePubMed
 Foong AWP, Saw SM, Loo JL, Shen S, Loon SC, Rosman M, Aung T, Tan DTH, Tai ES, Wong TY: Rationale and methodology for a populationbased study of Eye diseases in Malay people: the Singapore Malay Eye study (SiMES). Ophthalmic Epidemiol. 2007, 14 (1): 2535. 10.1080/09286580600878844.View ArticlePubMed
 Su DHW, Wong TY, Foster PJ, Tay WT, Saw SM, Aung T: Central corneal thickness and its associations with ocular and systemic factors: the Singapore Malay Eye study. Am J Ophthalmol. 2009, 147 (4): 709716.e701. 10.1016/j.ajo.2008.10.013.View ArticlePubMed
 Wong TCE: Prevalence and causes of low vision and blindness in an urban malay population: the singapore malay eye study. Arch Ophthalmol. 2008, 126 (8): 10911099. 10.1001/archopht.126.8.1091.View ArticlePubMed
 Zhao J, Gupta S, Seielstad M, Liu J, Thalamuthu A: Pathwaybased analysis using reduced gene subsets in genomewide association studies. BMC Bioinformatics. 2011, 12: 1710.1186/147121051217.PubMed CentralView ArticlePubMed
 Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genomewide association studies. Nat Genet. 2006, 38 (8): 904909. 10.1038/ng1847.View ArticlePubMed
 Thalamuthu A, Zhao J, Keong G, Kondragunta V, Mukhopadhyay I: Association tests for rare and common variants based on genotypic and phenotypic measures of similarity between individuals. BMC Proceedings. 2011, 5 (Suppl 9): S8910.1186/175365615S9S89.PubMed CentralView ArticlePubMed
 Freedman D, Lane D: A nonstochastic interpretation of reported significance levels. Journal of Business and Economic Statistics. 1983, 1: 292298.
 Kennedy PE: Randomization tests in econometrics. Journal of Business and Economic Statistics. 1995, 13: 8594.
 Ter Braak CJF: Permutation versus bootstrap significance tests in multiple regression and ANOVA. Bootstrapping and related techniques. Edited by: Jockel KH, Rothe G, Sendler W. 1992, Berlin: Springer
 Anderson MJ, Legendre P: An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. J Stat Comput Sim. 1999, 62 (3): 271303. 10.1080/00949659908811936.View Article
 Wagner BD, Zerbe GO, Mexal S, Leonard SS: Permutationbased adjustments for the significance of partial regression coefficients in microarray data analysis. Genet Epidemiol. 2008, 32 (1): 18. 10.1002/gepi.20255.PubMed CentralView ArticlePubMed
 Cornes BK, Khor CC, Nongpiur ME, Xu L, Tay WT, Zheng Y, Lavanya R, Li Y, Wu R, Sim X: Identification of four novel variants that influence central corneal thickness in multiethnic Asian populations. Hum Mol Genet. 2012, 21 (2): 437445. 10.1093/hmg/ddr463.View ArticlePubMed
 Zhang H, Xu L, Chen C, Jonas J: Central corneal thickness in adult Chinese. Association with ocular and general parameters. The Beijing Eye Study. Graefe’s Archive for Clinical and Experimental Ophthalmology. 2008, 246 (4): 587592. 10.1007/s0041700707609.View ArticlePubMed
 Derkach A, Lawless JF, Sun L: Robust and powerful tests for rare variants using Fisher’s method to combine evidence of association from two or more complementary tests. Genet Epidemiol. 2013, 37 (1): 110121. 10.1002/gepi.21689.View ArticlePubMed
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.