# Comparison of similarity-based tests and pooling strategies for rare variants

- Sergii Zakharov
^{1, 2}Email author, - Agus Salim
^{2}and - Anbupalam Thalamuthu
^{1}Email author

**14**:50

https://doi.org/10.1186/1471-2164-14-50

© Zakharov et al.; licensee BioMed Central Ltd. 2013

**Received: **9 August 2012

**Accepted: **17 January 2013

**Published: **24 January 2013

## Abstract

### Background

As several rare genomic variants have been shown to affect common phenotypes, rare variants association analysis has received considerable attention. Several efficient association tests using genotype and phenotype similarity measures have been proposed in the literature. The major advantages of similarity-based tests are their ability to accommodate multiple types of DNA variations within one association test, and to account for the possible interaction within a region. However, not much work has been done to compare the performance of similarity-based tests on rare variants association scenarios, especially when applied with different rare variants pooling strategies.

### Results

Based on the population genetics simulations and analysis of a publicly-available sequencing data set, we compared the performance of four similarity-based tests and two rare variants pooling strategies. We showed that weighting approach outperforms collapsing under the presence of strong effect from rare variants and under the presence of moderate effect from common variants, whereas collapsing of rare variants is preferable when common variants possess a strong effect. We also demonstrated that the difference in statistical power between the two pooling strategies may be substantial. The results also highlighted consistently high power of two similarity-based approaches when applied with an appropriate pooling strategy.

### Conclusions

Population genetics simulations and sequencing data set analysis showed high power of two similarity-based tests and a substantial difference in power between the two pooling strategies.

## Keywords

## Background

Although genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) associated with common diseases (http://www.genome.gov/gwastudies/), these common variants explain only a small fraction of the phenotypic variance attributable to genetic factors[1, 2]. Recently, the scientific community has devoted a lot of attention to the analysis of rare variants, with the hope of finding the missing heritability. Indeed, there is growing evidence that rare variants are associated with some complex traits[3–6]. Therefore, research in the area of rare variants has a high potential to discover unknown associations of genomic regions with complex phenotypes. Numerous methodologies have been developed to test association of multiple rare variants within a region with a phenotype[7–11].

Measures of genotype similarity have been the basis of many proposed statistical tests. The idea of similarity-based tests is to consider the relationship between genotypic and phenotypic similarities (similarity here roughly refers to a measure of closeness of two genotypes or phenotypes). Similarity-based tests are motivated by the fact that haplotypes carrying the same causal mutation are more related compared with those without causal mutations; so, case haplotypes are expected to share longer stretches of DNA identical by descent[12]. One of the major advantages of similarity-based tests is the ability to accommodate multiple types of DNA variations (SNPs, insertions and deletions, CNV) observed within a region, given flexibility in the choice of similarity measures between two sequences[13]. Another issue that similarity-based tests address is the possible interaction of different variants within a region, which is potentially accounted for by considering multi-site similarity measures[14]. For unrelated individuals, similarity measures have been incorporated within a framework of single SNP analysis of variance[15], multiple regression[16], U-statistic[17] and distance-based regression[14]. Methods based on genotype similarity include the following: sequence kernel association test (SKAT)[11]; kernel-based association test (KBAT)[18], multivariate distance matrix regression test (MDMR)[19]; and aggregate U-test[20]. However, so far, no attempts have been made to evaluate the performance of similarity-based tests on rare variants association scenarios when common variants are included into or excluded from the analysis. Even though many non-causal common SNPs are removed by considering only rare variants, it is unclear if consideration of fewer variants would be sufficient to compensate for the loss of association signal from common SNPs. Also, when both rare and common variants are included into the analysis, it is of interest to evaluate the change in the performance of the tests when rare variants have lower effect sizes than common SNPs. Additionally, statistical tests may utilize different pooling strategies for rare variants, e.g. weighting or collapsing. Given the choice, it is unclear which pooling strategy is the best to be applied with similarity-based tests.

In this article, we compared the performance of the four similarity-based tests (SKAT, KBAT, MDMR and a modified U-test proposed by Schaid et al.[17]) applied with two popular rare variants pooling strategies (weighting and collapsing). The comparison was performed based on population-genetics simulations under four different disease models and the GAW17 sequencing data set. The results highlighted that, under the presence of strong rare variants association signal and moderate association of common variants, weighting may be a much better strategy than collapsing, whereas collapsing tends to outperform weighting when common variants possess a strong effect. Moreover, we discovered that the magnitude of the difference in power among similarity-based methods, when applied with weighting and collapsing strategies, may be very high, sometimes over 50%. Under the strong effect size of rare variants when common variants were excluded from the analysis, we observed better performance of collapsing strategy and lower power of weighting pooling strategy. Also, when the appropriate pooling strategy is applied, both SKAT and KBAT showed consistently high power among all the four similarity-based tests compared here.

## Results

### Population genetic simulations

For each test, 1000 permutations were performed to assess the significance of association. To make sure the empirical type-1 error is controlled, we ran the analysis of simulated data under the null model. As can be seen from Additional file1: Table S1, the type-1 error was well controlled by using the permutation procedure to estimate the significance level. It is noticeable that for “Risk Rare” scenario when weighting pooling strategy is applied and for “Risk Common” scenario the estimates of type-1 error are below 0.05. This suggests that in these cases the methods show slightly conservative behavior. The double-sided 99% confidence interval for the type-1 error estimate is approximately 0.033–0.67. This can be derived from the normal approximation, given that the estimate of type-1 error rate is distributed as an observed probability of success for a binomial random variable with a success probability of 0.05 under no inflation of type-1 error and the sample size of 1000, which is the number of data replicates. As can be seen, the empirical type-1 error estimates for population genetics simulations were within the 99% confidence interval.

**The maximum absolute difference in power (over the type-1 error rate) between weighting and collapsing pooling strategies for different tests and phenotype scenarios in population genetics simulations**

Scenario/Test | MDMR | SKAT | KBAT | U-Test |
---|---|---|---|---|

| 0.466 | 0.472 | 0.157 | 0.511 |

| 0.395 | 0.29 | 0.094 | 0.379 |

| 0.551 | 0.479 | 0.388 | 0.235 |

| 0.18 | 0.516 | 0.393 | 0.148 |

### GAW17 data set

The GAW17 data set is a large scale exome sequencing data set with genotypes from the 1000 Genomes Project (http://www.1000genomes.org). The dataset consists of 697 unrelated individuals from six populations (Centre d'Etude du Polymorphisme Humain (CEPH) samples, Tuscan, Chinese, Japanese, Yoruba and Luhya). The complex phenotype model incorporates environmental covariates (age, sex and smoking status) and both common and rare causal SNPs from genes in particular pathways. Totally 200 replicates of several quantitative traits and case–control status were simulated under the phenotype model. A more detailed description of the simulations can be found in Almasy et al.[21].

_{1}and Q

_{2}, and a dichotomous trait,

*D*. Adjustment for covariates was done in a similar way as in Jiang et al.[22]. Let

*G*be the genotype matrix,

*Q*

_{ j, }

*j*= 1, 2 are vectors of two quantitative traits,

*E*

_{ i }

*i*= 1, 2, 3, are vectors of covariates (age, sex and smoking status, respectively), and

*R*is the matrix of ten principal components of genotype matrix obtained using the software Eigenstat[23]. The corrected genotype, phenotypes and covariates are$\stackrel{\u2323}{\mathit{G}}=G-R{R}^{T}D$,$\tilde{Q}j={Q}_{j}-R{R}^{T}{Q}_{j},\phantom{\rule{0.5em}{0ex}}j=1,2$,$\tilde{\mathit{D}}=D-R{R}^{T}D$ and

*˜E*

_{ i }

*= E*

_{ i }

*-RR*

^{ T }

*E*

_{ i }

*, i*= 1,2,3. Next, covariates are regressed out of adjusted phenotypes using the regression models:

The residuals from the regression models (1) were dichotomized (upper 30% of the observed distribution were declared cases, while the others were controls) and tested for association with adjusted genotype *Ğ* of the causal genes. The type-1 error was set at 0.05, and 1000 permutations were performed for each of the 200 phenotype replicates to assess the power. To assess the empirical type-1 error rate for all the statistical tests, we ran the analysis with randomly permuted adjusted phenotypes obtained from the regressions (1). The resulting type-1 error rates are presented in Additional files3 and4. The double-sided 99% confidence interval for the type-1 error estimate is approximately 0.01–0.09. This can be derived from the normal approximation, given that the estimate of type-1 error rate is distributed as an observed probability of success for a binomial random variable with a success probability of 0.05 under no inflation of type-1 error and the sample size of 200, which is the number of phenotype replicates. As can be seen, the empirical type-1 error for GAW17 data was within the 99% confidence interval.

**The maximum absolute difference in power (over the respective causal genes) between weighting and collapsing pooling strategies for different tests and phenotypes in GAW17 data set**

Scenario/Test | MDMR | SKAT | KBAT | U-Test |
---|---|---|---|---|

| 0.84 (KDR) | 0.45 (ARNT) | 0.22 (ARNT) | 0.145 (HIF3A) |

| 0.605 (VNN1) | 0.5 (VNN1) | 0.42 (VNN1) | 0.535 (VNN1) |

| 0.77 (FLT1) | 0.42 (PRKCA) | 0.43 (PRKCA) | 0.535 (FLT1) |

## Discussion

In this article, we compared the performance of the four similarity-based tests together with two rare variants pooling strategies using population genetics simulations and the GAW17 real data set. The results suggest that weighting may be a much better strategy than collapsing under the assumption of strong effect from rare variants, and moderate or low effect from common variants. Collapsing, in turn, showed much better performance when common variants possessed a strong effect. The absolute power difference of a statistical test when applied with collapsing and weighting pooling strategies may be substantial. From our study, it follows that if researchers are inclined to believe in the association of rare variants within a region, weighted pooling should be applied with similarity-based tests, whereas collapsing is more appropriate if common variants are suspected to be associated with phenotype. Additionally, under strong rare variants effect size in one direction when common variants were excluded from the analysis, collapsing performed equally good or better than weighting. Finally, both SKAT and KBAT had consistently high power compared with other considered similarity-based tests when applied with the appropriate pooling strategy.

Recently, Basu and Pan[24] compared the performance of multiple statistical tests to identify an association with rare variants. The authors included SKAT with unweighted linear and quadratic kernels as one of the testing strategies. Based on the results, Basu and Pan[24] concluded that SKAT was powerful compared with other methods when only rare variants were considered. However, the authors found that the method lost its high power when neutral common variants were added. Our results suggest that using weighted kernels in SKAT may preserve high power to identify an association with rare variants even if multiple neutral common variants are included into the analysis. However, since we compared the performance of similarity-based tests, additional investigation is required to compare weighted similarity-based tests with other statistical strategies, including those considered in Basu and Pan[24].

From our results, the MDMR test does not seem to perform well when applied with weighting pooling strategy. To have a more detailed picture, we applied weighted MDMR test to the “Risk Rare” data sets with modified weights *w*_{
l
}^{
p
}, *l* = 1, … *L*, where the power value *p* varied from 0 to 1. So, *p = 1* corresponded to the beta weights applied in our study, whereas *p = 0* corresponded to the analysis without weights. Additional file7 shows the power surface as a function of significance level and a value of *p*. It is clear that for all significance levels, the power of MDMR monotonically decreased with higher values of *p*, which corresponded to higher relative weights for rare variants. In the Additional file1, we proved that when the number of cases equals the number of controls (like in our simulations), SKAT and MDMR test statistics are equivalent to the sum of, and the sum of squares of dissimilarities for all case–control pairs respectively. When weighting pooling strategy is applied, dissimilarity tends to be relatively large for pairs of individuals whose genotypes differ in multiple rare minor alleles. Squaring dissimilarity measure puts much more emphasis on pairs with a larger dissimilarity. Thus, the magnitude of the MDMR test statistic may be completely defined by the number of case–control pairs whose genotypes differ by at least two rare minor alleles. We suppose that pairs with a difference of one rare allele may not have sufficient dissimilarity to significantly influence the MDMR test statistic, which leads to a loss of power. To illustrate our reasoning, let us have two rare variants with only eight observed minor alleles each across 500 cases and 500 controls. To simplify the description, assume that individuals have either zero or one copy of a minor allele across the two variants. Also, we will use the equivalence of the MDMR test statistic to the sum of squared case–control dissimilarities. Consider the following cases under the null and alternative hypotheses, respectively: cases and controls have four minor alleles for each variant, and cases have all minor alleles. Under the alternative hypothesis, we have zero case–control pairs with a difference of two alleles across genotype, whereas under the null hypothesis, we have 32. However, under the alternative hypothesis, there are 16 × 500 case–control pairs with a difference of one minor allele, whereas under the null hypothesis, there are only 16 × (500–8). Now it becomes clear that if the dissimilarity of pairs of individuals with a difference of two alleles is large enough relative to the dissimilarity of pairs of individuals with a difference of only one allele, the MDMR test statistic may become lower compared to the null test statistic. The consideration above explains the low performance of MDMR with weighted similarity and the fact that for the “Risk Rare” scenario, the power of MDMR test was below type-1 error rate.

One limitation of the current study is that the minimum significance level in population genetics simulations was 0.001. For genome-wide significance, the number of permutations needed to reliably estimate the significance is very large. This makes the comparison of the similarity-based tests at the genome-wide level prohibitive. In real GWAS studies, only few highly-significant genes will require a very large number of permutations to estimate *p*-values, as many genes with low or no association signal can be dropped out after a few thousand permutations. For highly significant genes, permutation procedure can be split into several parts and performed in parallel on a cluster.

## Conclusions

The performance of similarity-based tests applied with two rare variants pooling strategies was investigated. Population genetics simulations and sequencing data set analysis showed consistently high power of two similarity-based tests and a substantial difference in performance of the two rare variants pooling strategies.

## Methods

### Similarity-based tests

*N*individuals (

*N*

^{ A }cases and

*N*

^{ U }controls), and within a genomic region

*L*SNPs (both common and rare) were called. Let us denote the genotype matrix

*G = {g*

_{ nl }

*, n = 1,…,N l = 1,…,L}*coded as minor allele counts, and the phenotype vector

*Y = {y*

_{ n }

*, n = 1,…,N}*with the elements valued 1 for cases and −1 for controls (except when otherwise specified). The

*N × N*similarity matrix is defined as

*K*= {

*s*(

*g*

_{ n },

*g*

_{ m })}

_{n,m = 1}

^{ N }, where

*g*

_{ n }is a multi-site vector of genotype {g

_{1n},…,g

_{Ln}} for

*n*th individual, and

*s (x,y)*is a similarity function. There is a variety of examples of similarity functions published in statistical genetics literature (for examples, see Wu et al.[11], Wessel and Shork[14], and Mukhopadhyay et al.[25]). However, it is desirable for the similarity matrix

*K*to be symmetric positive semi-definite as this is “the key to its use in many statistical analyses”[26]. Thus, we consider only those similarity measures that result in a positive definite similarity matrix. Examples of such similarity measures are the weighted linear kernel

*s*(

*g*

_{ n },

*g*

_{ m }) = ∑

_{l = 1}

^{ L }

*w*

_{ l }

*g*

_{ nl }

*g*

_{ ml }for some fixed weights

*w*

_{ l }

*,l = 1,…,L*the weighted quadratic kernel

*s*(

*g*

_{ n },

*g*

_{ m }) = (1 + ∑

_{ l }

^{ L }

*w*

_{ l }

*g*

_{ nl }

*g*

_{ ml })

^{2}, and the weighted IBS kernel

*s*(

*g*

_{ n },

*g*

_{ m }) = ∑

_{l = 1}

^{ L }

*w*

_{ l }(2 − |

*g*

_{ nl }−

*g*

_{ ml }|). For our analysis, a popular exponential similarity measure[27] was used:

The choice of similarity was motivated by the need to analyze quantitative genotype obtained as a result of population stratification adjustment (see Results section). As the exponential similarity is a function of the Euclidean distance between two multi-site genotypes, we consider this similarity to be more appropriate compared with, for example, another popular similarity measure, identity-by-state[17], which was designed to be applied to genotype codes.

### Weighting and collapsing

*w = {w*

_{ l }

*, l = 1,…,L}*. In general, they may be derived from observed minor allele frequency (MAF) or prior information. Here, we adopted the weights proposed by Wu et al.[11]:

*w*

_{ l }=

*Beta*(

*maf*

_{ l }; 1, 25)

^{2}, where

*maf*

_{ l }is MAF of

*l*th SNP,

*Beta (a; b, c)*is the beta density distribution function with parameters

*b*and

*c*evaluated at point

*a*. The weight function monotonically increases as MAF decreases, while, as noted by the authors, “putting decent nonzero weights for variants with MAF 1%–5%”. As noted by Wu et al.[11], setting

*0*≤

*b*≤

*1*and

*c*≥ 1 allows for an increase in the weight of rare variants and a decrease in the weight of common variants. Thus, any values of parameters and from the specified range are acceptable. For the three tests (SKAT, MDMR and U-test), the weights are incorporated via the calculation of similarity matrix. Specifically, the weights incorporating similarity function s

_{w}for the similarity matrix

*K*

_{ w }is as follows:

For the KBAT test statistic, the weights were incorporated differently (for details, see the description below) as the test does not use the multi-site genotype similarity.

*g*

_{ n(L+1) }

*,n = 1,…,N*as follows:

In general, this type of collapsing preserves more information than an indicator of at least one rare variant being present, as suggested by Li and Leal[28]. The collapsed genotype is treated as a new SNP (super-locus) *g*_{
n(L+1)
}*,n = 1,…,N*, and a similarity matrix is constructed using common variants and the super-locus.

### Multivariate distance matrix regression (MDMR)

*N x N*identity matrix

*1*

_{ N }and a vector of 1 of size

*N*as

*1*

_{ N }. Following Wessel and Schork[14], the test statistic is calculated according to the algorithm:

- 1.
Phenotype projection matrix

*H = Y(Y*^{ T }*Y)*^{ -1 }*Y*^{ T }, where upper*T*denotes transposition. - 2.
Dissimilarity matrix

*D*= {*d*_{ ij }}_{i,j = 1}^{ N }= 1_{ N }1_{ N }^{ T }−*K*, where*K*is a similarity matrix defined above. - 3.
Gover’s centered matrix

*G*= (1_{ N }− 1_{ N }1_{ N }^{ T }/*N*)*A*(*I*_{ N }− 1_{ N }1_{ N }^{ T }/*N*), where $A={\left\{-\frac{{\text{d}}_{\mathrm{ij}}^{2}}{2}\right\}}_{\text{i},\text{j}=1}^{\text{N}}$. - 4.
The test statistic

*MDMR*=*tr*(*HGH*)/*tr*((*I*_{ N }−*H*)*G*(*I*_{ N }−*H*)), where*tr*is matrix trace.

Large values of the test statistic indicate a deviation from the null hypothesis of no association of a genotype with a phenotype.

### Sequence kernel association test (SKAT)

For this test, the phenotype vector *Y* = {y_{n},n = 1,…,N} is coded as 1 for cases and 0 for controls. The mean phenotype vector is defined as$\overline{Y}={N}^{A}{1}_{N}/N$. Following Wu et al.[11], the test statistic is$\mathit{T}={\left(Y-\overline{Y}\right)}^{T}K\left(Y-\overline{Y}\right)/2$. The *SKAT* test statistic under the null hypothesis is asymptotically distributed as the weighted sum of chi-squared random variables with one degree of freedom. Thus, the significance level can be assessed theoretically. Permutations can also be used to estimate the *p*-value empirically.

### U-test

*U*

_{ 1 }and controls

*U*

_{ 0 }is defined as follows:

where *K*_{
nm
}*, n, m = 1,…,N* are the elements of the *K* similarity matrix (*K* = {*K*_{
nm
}}_{n,m = 1}^{
N
}). The U-test statistic is defined as *U* = (*U*_{1} − *U*_{0})^{2}. Note that Shaid et al.[17] considered the weighted sum of the single SNP U-test statistics, where weights were derived from the asymptotic variance-covariance matrix of the U statistics vector. However, for the purpose of comparison of weighting and collapsing rare variants pooling methods, the statistic was modified as described above. The test statistic *U* is similar to the single SNP U-test statistic proposed by Shaid et al.[17], but it incorporates the similarity information across multiple variants within a region. Permutations need to be applied to assess the *p*-value.

### Kernel-based association test (KBAT)

*K*

_{ l }= {(

*K*

_{ l })

_{ nm }}

_{n,m = 1}

^{ N }as a single SNPs similarity matrix for

*l*th variant. Similar to the notations of the U-test subsection,

*Ul*

_{ 1 }and

*Ul*

_{ 0 }are the average similarity scores for pairs of cases and controls, respectively, calculated from

*K*

_{ l }, and let

*U*

_{ l }= (

*U*

_{l 1}+

*U*

_{l 0})/2. Following Mukhopadhyay et al.[15], consider the within-group and between-group sum of squares:

where the two groups are case-case and control-control pairs. The test statistic is *KBAT* = ∑ _{l = 1}^{
L
}*BSS*_{
l
}/∑_{l = 1}^{
L
}*WSS*_{
l
}. Since the test does not utilize the multi-site similarity matrix, but only single SNP matrices *K*_{
l
}, the weighted test statistic *KBAT*_{
W
} = ∑ _{l = 1}^{
L
}*w*_{
l
}*BSS*_{
l
}/∑_{l = 1}^{
L
}*w*_{
l
}*WSS*_{
l
} is used here. A large value of the *KBAT* statistic indicates a deviation from the null hypothesis. Permutations are used to assess the significance.

### Population genetics simulations

Population genetics simulations were performed based on the code provided by King et al.[29] with demographic parameters from Boyko et al.[30]. A total of 1000 data replicates were generated for each of the four phenotype models: “Risk Rare”, “Risk Both”, “Risk Common” and “Mixed Rare”. For a detailed description of the simulations, see Additional file1.

## Declarations

### Acknowledgements

We would like to thank the workshop organizers of GAW17 for their permission to use their data in our research. The preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported by a GAW grant, R01 GM031575, and in part by NIH R01 MH059490, and used sequencing data from the 1000 Genomes Project (http://www.1000genomes.org).

Funding: This work was supported by the Agency for Science, Technology and Research (A*STAR; Singapore). The first author is a recipient of the Singapore International Graduate Award.

## Authors’ Affiliations

## References

- Hirschhorn JN, Gajdos ZKZ: Genome-wide association studies: results from the first Few years and potential implications for clinical medicine. Annu Rev Med. 2011, 62 (1): 11-24. 10.1146/annurev.med.091708.162036.View ArticlePubMedGoogle Scholar
- Maher B: Personal genomes: the case of the missing heritability. Nature. 2008, 456: 18-21.View ArticlePubMedGoogle Scholar
- Green EK, Grozeva D, Sims R, Raybould R, Forty L, Gordon-Smith K, Russell E, St Clair D, Young AH, Ferrier IN: DISC1 exon 11 rare variants found more commonly in schizoaffective spectrum cases than controls. Am J Med Genet B Neuropsychiatr Genet. 2011, 156 (4): 490-492. 10.1002/ajmg.b.31187.View ArticleGoogle Scholar
- Norton N, Li D, Rieder Mark J, Siegfried Jill D, Rampersaud E, Züchner S, Mangos S, Gonzalez-Quintana J, Wang L, McGee S: Genome-wide studies of copy number variation and exome sequencing identify rare variants in BAG3 as a cause of dilated cardiomyopathy. Am J Hum Genet. 2011, 88 (3): 273-282. 10.1016/j.ajhg.2011.01.016.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramagopalan SV, Dyment DA, Cader MZ, Morrison KM, Disanto G, Morahan JM, Berlanga-Taylor AJ, Handel A, De Luca GC, Sadovnick AD: Rare variants in the CYP27B1 gene are associated with multiple sclerosis. Ann Neurol. 2011, 70 (6): 881-886. 10.1002/ana.22678.View ArticlePubMedGoogle Scholar
- Xie P, Kranzler HR, Krauthammer M, Cosgrove KP, Oslin D, Anton RF, Farrer LA, Picciotto MR, Krystal JH, Zhao H: Rare nonsynonymous variants in alpha-4 nicotinic acetylcholine receptor gene protect against nicotine dependence. Biol Psychiatry. 2011, 70 (6): 528-536. 10.1016/j.biopsych.2011.04.017.PubMed CentralView ArticlePubMedGoogle Scholar
- Bansal V, Libiger O, Torkamani A, Shork JN: Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet. 2011, 11: 773-785.View ArticleGoogle Scholar
- Li Y, Byrnes AE, Li M: To identify associations with rare variants, just WHaIT: weighted haplotype and imputation-based tests. Am J Hum Genet. 2010, 87 (5): 728-735. 10.1016/j.ajhg.2010.10.014.PubMed CentralView ArticlePubMedGoogle Scholar
- Madsen BE, Browning SR: A groupwise association test for rare mutations using a weighted Sum statistic. PLoS Genet. 2009, 5 (2): e1000384-10.1371/journal.pgen.1000384.PubMed CentralView ArticlePubMedGoogle Scholar
- Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ: Testing for an unusual distribution of rare variants. PLoS Genet. 2011, 7 (3): e1001322-10.1371/journal.pgen.1001322.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu Michael C, Lee S, Cai T, Li Y, Boehnke M, Lin X: Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011, 89 (1): 82-93. 10.1016/j.ajhg.2011.05.029.PubMed CentralView ArticlePubMedGoogle Scholar
- Beckmann L, Thomas DC, Fischer C, Chang-Claude J: Haplotype sharing analysis using mantel statistics. Hum Hered. 2005, 59 (2): 67-78. 10.1159/000085221.View ArticlePubMedGoogle Scholar
- Schork NJ, Wessel J, Malo N: DNA Sequence‐Based Phenotypic Association Analysis. Advances in Genetics. Edited by: Rao DC, Gu CC. 2008, Volume 60: Academic Press, 195-217.Google Scholar
- Wessel J, Schork NJ: Generalized genomic distance based regression methodology for multilocus association analysis. Am J Hum Genet. 2006, 79 (5): 792-806. 10.1086/508346.PubMed CentralView ArticlePubMedGoogle Scholar
- Mukhopadhyay I, Feingold E, Weeks DE, Thalamuthu A: Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genet Epidemiol. 2010, 34 (3): 213-221.PubMed CentralPubMedGoogle Scholar
- Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP: A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet. 2008, 82 (2): 386-397. 10.1016/j.ajhg.2007.10.010.PubMed CentralView ArticlePubMedGoogle Scholar
- Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN: Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet. 2005, 76 (5): 780-793. 10.1086/429838.PubMed CentralView ArticlePubMedGoogle Scholar
- Thalamuthu A, Zhao J, Keong G, Kondragunta V, Mukhopadhyay I: Association tests for rare and common variants based on genotypic and phenotypic measures of similarity between individuals. BMC Proc. 2011, 5 (Suppl 9): S89-10.1186/1753-6561-5-S9-S89.PubMed CentralView ArticlePubMedGoogle Scholar
- Chung D, Zhang Q, Kraja A, Borecki I, Province M: Distance-based phenotypic association analysis of DNA sequence data. BMC Proc. 2011, 5 (Suppl 9): S54-10.1186/1753-6561-5-S9-S54.PubMed CentralView ArticlePubMedGoogle Scholar
- Li M, Fu W, Lu Q: An aggregating U-Test for a genetic association study of quantitative traits. BMC Proc. 2011, 5 (Suppl 9): S23-10.1186/1753-6561-5-S9-S23.PubMed CentralView ArticlePubMedGoogle Scholar
- Almasy L, Dyer T, Peralta J, Kent J, Charlesworth J, Curran J, Blangero J: Genetic analysis workshop 17 mini-exome simulation. BMC Proc. 2011, 5 (Suppl 9): S2-10.1186/1753-6561-5-S9-S2.PubMed CentralView ArticlePubMedGoogle Scholar
- Jiang R, Dong J: Detecting rare functional variants using a wavelet-based test on quantitative and qualitative traits. BMC Proc. 2011, 5 (Suppl 9): S70-10.1186/1753-6561-5-S9-S70.PubMed CentralView ArticlePubMedGoogle Scholar
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006, 38 (8): 904-909. 10.1038/ng1847.View ArticlePubMedGoogle Scholar
- Basu S, Pan W: Comparison of statistical tests for disease association with rare variants. Genet Epidemiol. 2011, 35 (7): 606-619. 10.1002/gepi.20609.PubMed CentralView ArticlePubMedGoogle Scholar
- Mukhopadhyay I, Feingold E, Weeks DE, Thalamuthu A: Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genet Epidemiol. 2009, 34 (3): 213-221.Google Scholar
- Schaid DJ: Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations. Hum Hered. 2010, 70 (2): 109-131. 10.1159/000312641.View ArticlePubMedGoogle Scholar
- Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X: Powerful SNP-Set analysis for case–control genome-wide association studies. Am J Hum Genet. 2010, 86 (6): 929-942. 10.1016/j.ajhg.2010.05.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Li B, Leal SM: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008, 83 (3): 311-321. 10.1016/j.ajhg.2008.06.024.PubMed CentralView ArticlePubMedGoogle Scholar
- King CR, Rathouz PJ, Nicolae DL: An evolutionary framework for association testing in resequencing studies. PLoS Genet. 2010, 6 (11): e1001202-10.1371/journal.pgen.1001202.PubMed CentralView ArticlePubMedGoogle Scholar
- Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR: Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008, 4 (5): e1000083-10.1371/journal.pgen.1000083.PubMed CentralView ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.