Since the first successful genome-wide association studies (GWAS) in 2005, over 600 GWAS have been reported [1]. Due in large part to rapid advances in genotyping technology and standardized guidelines for reporting statistical evidence, the multitude of comparisons made in a GWAS will result in both false positive (Type 1 errors) and, if the correction for multiple comparisons is overly conservative or power is inadequate, false negative (Type 2 errors) results.

The probability of a Type I error (incorrectly ascribing scientific significance to a statistical test) is generally controlled by setting the significance level, α, for a test, but the probability of making at least one Type I error in a study,

is a function of n, the number of independent comparisons made, as well as α. The direct application to a GWAS is that, with a significance level typical to small studies and candidate gene studies (e.g. α = 0.05, α = 0.01, α = 0.001), the probability of not committing a GWAS-wide Type I error is very small.

The standard for evidence of significance in GWAS to securely identify a genotypephenotype association in European Americans is generally considered to be p < 5 × 10^{-8} or p < 1 × 10^{-8}, for α = 0.05 and 0.01, respectively [2–5]. This standard is based on a Bonferroni correction for an assumed million independent variants in the human genome. As a consequence, the avoidance of Type 1 errors may inflate Type 2 errors. This is especially true for analyses with low power, such as rare diseases where patient numbers are limited, low frequency alleles, or genetic factors with small effect sizes. This conundrum can be resolved with extremely large study sizes, but in practice this is not always cost efficient or practical. These issues should be major considerations both for designing GWAS and interpreting GWAS results.

Several methods are commonly used to control the GWAS-wide Type I error rate: p-value adjustments for multiple comparisons have long been used when making multiple comparisons [6]; the use of q-values, a measure of the false discovery rate, has been proposed as a way to indirectly measure and control the Type I error rate [7]; a two-stage analysis of the data can be used not only to decrease the Type I error rate [8], but also to decrease the genotyping costs incurred [9]; genotype imputation can result in a net increase in statistical power [10, 11].

A Bonferroni adjustment fits our problem particularly well because many comparisons are made and a GWAS is considered agnostic, with no prior hypotheses [12]. Several studies have estimated the number of statistical comparisons made in a GWAS [2–5], but the universal application of a one-size-fits-all significance level to GWAS studies is inappropriate. Power to detect associations is determined, in large part, by allele frequencies and their effect sizes; since these variables are constants, only sample size can be adjusted. As the sample size increases, the power to detect low frequency and/or small effect size genetic variants also increases. Newer SNP arrays, designed to more fully capture the range of SNPs in diverse human populations and to include rare SNPs hypothesized to be more likely to have larger effect sizes, will increase the number of independent statistical comparisons [4]. Additionally, the dependent nature of genetic data, where SNPs in linkage disequilibrium (LD) are correlated to some degree, may lead to over-correction when using Bonferroni adjustments. One of the key assumptions of a Bonferroni adjustment is that all comparisons are independent. Neighboring SNPs on a chromosome tend to be inherited together in blocks and are not independent [3], making a strict Bonferroni adjustment overly conservative.

One relevant question is then not how many SNPs are being tested, but how many independent statistical comparisons are being made. In the context of a principal components analysis (PCA) of the genotype data, the number of independent comparisons can be defined as the number of principal components accounting for a large portion (99.5% has been suggested) of the variance in the data [

13]. The set of informative SNPs represented by these components could be used to infer the remainder of the data set with a high degree of fidelity, and can be used to make a Bonferroni adjustment with the desired GWAS-wide significance level:

What is not clear, however is which SNPs fall into the informative set, so all SNPs are tested. The assumption is then made that the test statistics are distributed similarly to the test statistics from an analysis including only the informative SNPs. Based on the simulations done by Gao et. al. this seems to be a reasonable assumption [13].

Another relevant question is how to adjust the p-values directly, rather than relying on a significance threshold [14]. These corrected p-values, measuring significance on the genome-wide scale, have the added benefit of easier interpretation. For example, comparing two uncorrected p-values, 6.8 × 10^{-8} and 4.1 × 10^{-10}, becomes much more tractable after a genome-wide correction, resulting in corrected p-values of 0.0291 and 0.0004, respectively.

There have been a number of studies attempting to provide an accurate picture of how SNPs, and/or statistical tests of SNPs, are correlated in genome-wide studies. These fall into three general categories: variations and alternatives to permutation testing [14, 15], principal components analysis [13, 16–18], and analysis of the underlying LD structure in the genome [19–21].

We have recently genotyped 1514 European Americans for 700,078 SNPs using the Affymetrix 6.0 platform in a GWAS to search for AIDS restriction genes. Here we compare traditional Bonferroni significance thresholds with methods from each of these statistical correction strategies to identify an appropriate measure of significance in our GWAS: 1) PRESTO, an optimized permutation algorithm [15] verified by PERMORY [22]; 2) the Sliding-window method for Locally Inter-correlated markers with asymptotic Distribution Errors corrected (SLIDE) program, an alternative to permutation testing, developed to correct p-values in a GWAS using a multivariate normal distribution-based correction [14, 23]; 3) the simpleℳ method, specifically developed to calculate the number of informative SNPs being tested in a GWAS using a principal components analysis [13]; 4) the number of LD blocks found by the Gabriel, Solid Spine of LD, and 4-Gamete algorithms, as implemented in Haploview [24].

Our aim is to identify the most appropriate method for obtaining accurate GWAS-wide significance thresholds and/or corrected p-values among 700,000 linked SNPs, the best method being one that results in an accurate estimate of the number of comparisons and has reasonable computational requirements.