When the underlying disease model is unknown, combining statistical tests tailored for different disease scenarios may be a much better strategy than application of a statistical test designed for one specific disease model. In this article we have described the two approaches of combining genotype- and haplotype-based statistical tests. The results of theoretical power considerations, population genetics simulations and real data analysis showed strong performance of MinP-val approach for different disease scenarios, whereas SumP-val method was shown to perform poorly when one of the underlying tests had low power. Our analysis of SiMES + SINDI identified the three regions found by Vithana et al. [41], and additionally, the C7orf42 gene. The replication analysis confirmed an association of RXRA-COL5A1 region, which is consistent with the results of Cornes et al. [54], and showed a moderate p-value for C7orf42 gene. The analysis of real data highlighted the applicability of our combined approaches to real association studies.

In our simulations the Haplotype SKAT was the most powerful test in many cases, but in real data analysis it performed the worst. It is not known beforehand whether a genotype- or a haplotype-based test would perform better; hence, our proposal to apply a combined approach is a robust choice. Indeed, MinP-val did well in both simulations and real data. This emphasizes the major point of the combined strategy: MinP-val may have slightly lower power when a disease model fits Haplotype SKAT and higher power when the disease model is closer to the second underlying tests. One of the possible reasons for the apparent inconsistency of Haplotype SKAT performance may be that for “Rare” and “Both” simulation models we assumed that rare variants bear the major association signal whereas in the real data only common SNPs were present. However, Haplotype SKAT performed well even for “Common” model when a common SNP was causal. We suppose that for this scenario genotype association translated into an association of haplotypes with a phenotype, which is possible if common SNPs within a region are in high LD with each other. On the other hand, if a causal common SNP within a region is in low LD with other common SNPs within a region then under a genotype-based disease scenario haplotype-based test may have much lower power than a genotype-based test which is observed in the results of the real data analysis.

The methods proposed in this study may be easily generalized to multiple statistical tests, namely, instead of two underlying tests it is possible to apply more tests and combine all of them via the described methodology. In this case the arguments for theoretical p-value calculation for the proposed approaches can be extended in a straightforward manner.

Recently Derkach et al. [56] investigated the performance of the combined approaches, namely, the minimum of p-values and the Fisher p-value combination, for rare variants association scenarios. Although the approaches we propose are similar, our major idea is different. We combine two test statistics for the purpose of widening the set of alternatives for which our test is powerful; thus, we choose the underlying tests designed for very different phenotype models, whereas Derkach et al. [56] used linear and quadratic tests which are likely to be both powerful under many models. As a result, our conclusions are different from those of Derkach et al. [56]. For example, the authors stated that “hybrid test statistics provide much needed robustness in terms of power for association tests”, whereas we observed that only minimum p-value approach really preserves power when one of the underlying tests underperforms. Secondly, the authors found that in many cases Fisher method outperforms both of the underlying tests, and the minimum p-value approach. However, from our work it is clear that SumP-val (which is similar to the Fisher p-value combination) outperforms all the three tests only when both of the underlying tests have comparable power which is unlikely if the two underlying tests are deliberately chosen to fit very different phenotype models.

One of the limitations of the proposed approaches is the need to use permutations. For theoretical p-value calculation both SumP-val and MinP-val require a correlation coefficient to be estimated via permutations. Moreover, permutations need to be applied when asymptotic distributions of the underlying test statistics are unknown or inadequate to describe the empirical distributions.

The described methodologies may be extended to preserve power under other disease models. For example, the combination of rare-variants and common-variants statistical tests applied to a sequenced region may preserve high power when either only rare or only common variants are associated with a phenotype. However, it is not known how the combined approaches will perform if both common and rare variants are associated with phenotype.