Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies

  • Priya Duggal1Email author,

    Affiliated with

    • Elizabeth M Gillanders1Email author,

      Affiliated with

      • Taura N Holmes1 and

        Affiliated with

        • Joan E Bailey-Wilson1

          Affiliated with

          BMC Genomics20089:516

          DOI: 10.1186/1471-2164-9-516

          Received: 14 May 2008

          Accepted: 31 October 2008

          Published: 31 October 2008

          Abstract

          Background

          By assaying hundreds of thousands of single nucleotide polymorphisms, genome wide association studies (GWAS) allow for a powerful, unbiased review of the entire genome to localize common genetic variants that influence health and disease. Although it is widely recognized that some correction for multiple testing is necessary, in order to control the family-wide Type 1 Error in genetic association studies, it is not clear which method to utilize. One simple approach is to perform a Bonferroni correction using all n single nucleotide polymorphisms (SNPs) across the genome; however this approach is highly conservative and would "overcorrect" for SNPs that are not truly independent. Many SNPs fall within regions of strong linkage disequilibrium (LD) ("blocks") and should not be considered "independent".

          Results

          We proposed to approximate the number of "independent" SNPs by counting 1 SNP per LD block, plus all SNPs outside of blocks (interblock SNPs). We examined the effective number of independent SNPs for Genome Wide Association Study (GWAS) panels. In the CEPH Utah (CEU) population, by considering the interdependence of SNPs, we could reduce the total number of effective tests within the Affymetrix and Illumina SNP panels from 500,000 and 317,000 to 67,000 and 82,000 "independent" SNPs, respectively. For the Affymetrix 500 K and Illumina 317 K GWAS SNP panels we recommend using 10-5, 10-7 and 10-8 and for the Phase II HapMap CEPH Utah and Yoruba populations we recommend using 10-6, 10-7 and 10-9 as "suggestive", "significant" and "highly significant" p-value thresholds to properly control the family-wide Type 1 error.

          Conclusion

          By approximating the effective number of independent SNPs across the genome we are able to 'correct' for a more accurate number of tests and therefore develop 'LD adjusted' Bonferroni corrected p-value thresholds that account for the interdepdendence of SNPs on well-utilized commercially available SNP "chips". These thresholds will serve as guides to researchers trying to decide which regions of the genome should be studied further.

          Background

          Since first proposed in 1996 by Risch and Merikangas [1], it has increasingly been accepted that association studies are powerful to detect modest effects of common alleles involved in complex trait susceptibility. Until recently, genotype-phenotype tests of association have been limited to candidate genes. Recent advances in molecular technologies and the availability of the human genome sequence have revolutionized researchers' ability to catalogue human genetic variation. In addition, the International HapMap project has provided researchers with invaluable information regarding the linkage disequilibrium (LD) structure within the genome [2, 3]. These advances have made genome wide association studies (GWAS) to identify common variants a reality. However many issues regarding the design, analysis and interpretation of results remain to be investigated.

          In particular, interpretation of results is not trivial in light of the scale of multiple testing proposed. Testing such a large number of SNPs will require a balance between power and the chance of making false discoveries. There are many methods that have been proposed to address the multiple testing issue. These include false discovery rate (FDR), permutation testing, Bayesian factors (BF) and the Bonferonni correction. The FDR controls the expected proportion of false positives among all rejections, providing a less stringent control of the Type I error [4]. The application of the FDR method specifically in the context of genome wide studies has been proposed [46]. Permutation testing, in which the datasets are permuted thousands of times to achieve genomewide significance is another method that has been used in candidate gene studies and now genome wide association studies [7, 8]. Although empirical p-values have a theoretical advantage they may be computationally infeasible with large datasets. Another proposed method is the use of Bayesian Factors (BF) instead of frequentist p-values which need to be interpreted with the power of the study. However, BF also requires an assumption about the effect size, but the major advantage is that it can be compared across studies [9]. A simple method to control the family-wise error rate is the Bonferroni correction, which adjusts the Type 1 error (a) by the total number of tests (a/n). The Bonferroni correction can use the actual number of tests performed (i.e. SNPs genotyped) or a theoretical value based on the total number of tests possible (i.e. all SNPs). One critical, but often overlooked, assumption, of the Bonferroni correction method, is the assumption that all the tests are independent [10]. Biologically, we know that SNPs in close proximity are not independent, and therefore we are "overcorrecting" when we use the traditional Bonferroni method to adjust significance thresholds for multiple testing in GWAS studies [11]. We propose Bonferroni corrected p-value thresholds that account for the interdepdendence of SNPs on commonly used commercially available SNP "chips" (Illumina 317 K and Affymetrix 500 K) and in the HapMap panels. This method is an extension of the Bonferroni correction that accounts for the underlying linkage disequilibrium or dependence in dense SNP panels. These thresholds will be invaluable to researchers as they can be used as a guide to identifying regions of interest or significance in genome wide association studies, which should be studied further.

          Methods

          In order to estimate the effective number of "independent" SNPs in 3 autosomal marker panels (HapMap, Illumina 317 K and Affymetrix 500 K) we downloaded genotype data from release 22 of the International HapMap project. We used the non-redundant CEU and YRI data mapped against the "rs strand" of build 36 of the human genome. For the Illumina and Affymetrix marker sets we used a perl script to generate chromosome specific files containing only the subset of specific markers included in the Illumina 317 K or Affymetrix 500 K panels using CEU data. Then for each chromosome of data we used a perl script to generate smaller more manageable files each containing genotype data for approximately 2500 SNPs. We used Haploview version 4.0 to evaluate blocks of linkage disequilibrium (LD) using the 'solid Spine of LD' algorithm with a minimum D' value of 0.8. The Solid Spine of LD method internal to Haploview defines a block when the first and last markers are in strong LD with all intermediate markers. We also evaluated chromosome 1 for the CEU HapMap data using the "Solid Spine of LD' algorithm and varying the minimum D' value to 0.7 and 0.9 to determine if this value altered the thresholds. In addition, we evaluated chromosome 1 for the CEU HapMap data using the Gabriel and 4-gamete block defining methods. For all analyses we ignored pairwise comparisons of markers >500 kb apart and excluded individuals with >50% missing genotypes. We also excluded markers with a minor allele frequency less than 0.01, a Hardy-Weinberg equilibrium p-value less than 0.001 or a genotype call rate less than 75%. We then summarized across the genome: Total number of SNPs, Total number of Blocks, Total number of SNPs not in a block (inter-block SNPs) and Total number of blocks + interblock SNPs for each panel. Our programs are available upon request so that thresholds can be established per population.

          Results and discussion

          We established three thresholds that correspond to 1) suggestive association in which we expect 1 false positive association per GWAS 2) significant association in which we expect one false positive association to occur 0.05 times per GWAS and 3) highly significant association in which we expect one false positive association to occur 0.001 times per GWAS. In the CEPH Utah (CEU) population, by considering the interdependence of SNPs, we reduced the total number of effective tests within the Affymetrix and Illumina SNP panels from 500,000 and 317,000 to 67,000 and 82,000 "independent" SNPs, respectively (Tables 1 and 2). This results in p-value thresholds of ≈10-5, 10-7 and 10-8 for both the Affymetrix and Illumina SNP panels (Table 3) compared to ≈10-6, 10-7 and 10-9 if we do not correct for the lack of independence among SNPs. For researchers using these set genome-wide SNP panels this provides valuable thresholds to interpret association results, and to identify SNPs that may be important for replication.
          Table 1

          Affymetrix 500 K using CEU HapMap Samples

           

          Affymetrix 500,000 SNP Panel (CEU)

          Chromosome

          Total number of SNPs

          Total number of blocks

          Total number Interblock SNPS

          Total number of blocks + Interblock SNPs

          1

          31876

          4447

          833

          5280

          2

          33610

          4626

          787

          5413

          3

          27588

          3903

          723

          4626

          4

          25811

          3514

          689

          4203

          5

          26548

          3601

          646

          4247

          6

          26550

          3487

          604

          4091

          7

          21544

          3061

          618

          3679

          8

          22550

          3053

          563

          3616

          9

          19086

          2664

          541

          3205

          10

          23531

          3046

          510

          3556

          11

          21477

          2761

          528

          3289

          12

          20549

          2821

          499

          3320

          13

          15700

          2116

          392

          2508

          14

          12839

          1820

          371

          2191

          15

          11560

          1857

          396

          2253

          16

          12339

          1944

          454

          2398

          17

          8473

          1385

          344

          1729

          18

          11966

          1748

          374

          2122

          19

          5177

          954

          305

          1259

          20

          10292

          1519

          331

          1850

          21

          5873

          843

          204

          1047

          22

          5053

          828

          213

          1041

          Total

          399,992

          55,998

          10,925

          66,923

          Table 2

          Illumina 317 K SNPs using CEU HapMap Samples

           

          Illumina 317,000 SNP Panel (CEU)

          Chromosome

          Total number of SNPs

          Total number of blocks

          Total number Interblock SNPS

          Total number of blocks + Interblock SNPs

          1

          23055

          4959

          1336

          6295

          2

          25103

          5258

          1348

          6606

          3

          21332

          4505

          1268

          5773

          4

          18923

          3979

          1055

          5034

          5

          19062

          3966

          979

          4945

          6

          20524

          4044

          950

          4994

          7

          16493

          3472

          977

          4449

          8

          18053

          3658

          940

          4598

          9

          15691

          3305

          936

          4241

          10

          15423

          3263

          899

          4162

          11

          14498

          3037

          827

          3864

          12

          14844

          3097

          918

          4015

          13

          11411

          2373

          620

          2993

          14

          9767

          2086

          592

          2678

          15

          8817

          1942

          631

          2573

          16

          8924

          2078

          705

          2783

          17

          8279

          1859

          603

          2462

          18

          10390

          2183

          678

          2861

          19

          5833

          1408

          545

          1953

          20

          7758

          1736

          496

          2232

          21

          5430

          1130

          318

          1448

          22

          5398

          1156

          379

          1535

          Total

          305,008

          64,494

          18,000

          82,494

          Table 3

          Thresholds for Genome Wide Association Using CEU and YRI Population Samples

          Panel

          Suggestive p values (1)

          Significant p values (0.05)

          Highly Significant p values (0.001)

          Affymetrix CEU 500 K (n = 66,923)

          1.49 × 10-05

          7.47 × 10-07

          1.49 × 10-08

          Illumina 317 K (n = 82,494)

          1.21 × 10-05

          6.06 × 10-07

          1.21 × 10-08

          HapMap YRI (n = 289,175)

          3.45 × 10-06

          1.73 × 10-07

          3.45 × 10-09

          HapMap CEU (n = 164,296)

          6.09 × 10-06

          3.04 × 10-07

          6.09 × 10-09

             HapMap CEU (D' > 0.7)*

          8.37 × 10-06

          4.19 × 10-07

          8.37 × 10-09

             HapMap CEU (D' > 0.9)*

          4.38 × 10-06

          2.19 × 10-07

          4.38 × 10-09

          *extrapolated from Chromosome 1 data. P-values in parentheses in the header line indicate the family-wide error rate that corresponds to the Bonferroni-corrected significance thresholds given in the columns below.

          In addition to the established SNP panels, we evaluated the number of "independent" tests within the Phase II HapMap publicly available data for both the CEPH from Utah (CEU) and Yoruba (YRI) populations. Since our proposed thresholds are LD block dependent, they are population specific and the total number of "independent" SNPs may vary across populations and therefore should be considered separately. The publicly available data includes 2.4 million (CEU) and 2.7 million (YRI) SNPs across the genome. We reduced the total number of tests to 164,000 SNPs and 289,000 SNPs for the CEU and YRI, respectively (Tables 4 and 5). This results in p-value thresholds of ≈10-6, 10-7 and 10-9 for both the CEU and YRI populations (Table 3) compared to ≈10-7, 10-8 and 10-10 if we do not correct for the lack of independence among SNPs. The total number of "independent" SNPs for the YRI population is nearly double that for the CEU, however this does not have an impact on the exponent of the p-value. As expected, as the density of SNPs increases, the average number of SNPs within a block also increases. Therefore, it is likely that the additional Affymetrix and Illumina SNP panels (1 million and 650,000 SNPs) will not greatly increase the number of independent SNPs but will increase the number of SNPs within a block. However, using the highly dense HapMap population (Tables 4 and 5) provides us with thresholds that can be used for denser platforms (e.g. 1 million SNPs) or for studies that utilize statistical methods to impute the 2.5 million+ HapMap SNPs.
          Table 4

          HapMap SNPs using CEU HapMap Samples

           

          CEPH Utah HapMap Samples

          Chromosome

          Total number of SNPs

          Total number of blocks

          Total number Interblock SNPS

          Total number of blocks + Interblock SNPs

          1

          184403

          10740

          1815

          12555

          2

          211913

          11219

          1510

          12729

          3

          166801

          9431

          1426

          10857

          4

          155953

          10204

          1745

          10363

          5

          161666

          8725

          1238

          9963

          6

          174458

          8677

          1743

          10420

          7

          137148

          8050

          1140

          9190

          8

          141925

          7707

          1076

          8783

          9

          116824

          7092

          1105

          8197

          10

          132087

          7428

          1250

          8607

          11

          124354

          6821

          1037

          7858

          12

          118973

          6959

          991

          7950

          13

          99669

          5290

          793

          6083

          14

          80500

          4690

          893

          5583

          15

          69104

          4690

          814

          5504

          16

          68205

          5212

          817

          6029

          17

          56026

          4127

          715

          4842

          18

          73392

          4486

          742

          5228

          19

          35412

          3109

          570

          3679

          20

          60421

          3896

          606

          4502

          21

          32740

          2141

          380

          2521

          22

          33369

          2491

          421

          2853

          Total

          2,435,343

          143,185

          22,827

          164,296

          Table 5

          HapMap SNPs using YRI HapMap Samples

           

          Yoruba HapMap Samples

          Chromosome

          Total number of SNPs

          Total number of blocks

          Total number Interblock SNPS

          Total number of blocks + Interblock SNPs

          1

          209439

          17517

          4169

          21686

          2

          238828

          19081

          5688

          24769

          3

          184337

          15409

          3635

          19044

          4

          174670

          14673

          2754

          17427

          5

          176975

          14478

          3063

          17541

          6

          187787

          14073

          3127

          17200

          7

          149764

          12884

          2451

          15335

          8

          158800

          13069

          2465

          15534

          9

          128582

          11602

          3185

          14787

          10

          147710

          12065

          3778

          15843

          11

          136474

          11261

          2793

          14054

          12

          130298

          11142

          2383

          13525

          13

          112162

          8767

          1470

          10237

          14

          88022

          7549

          1240

          8789

          15

          77885

          7979

          1657

          9636

          16

          78364

          8334

          1810

          10144

          17

          62720

          6622

          1754

          8376

          18

          87027

          7466

          5294

          12760

          19

          39729

          4514

          1037

          5551

          20

          68828

          6397

          1344

          7741

          21

          37450

          3717

          744

          4461

          22

          36468

          3945

          790

          4735

          Total

          2,712,319

          232,544

          56,631

          289,175

          We also altered the D' value used to define the blocks from 0.7 to 0.9 for Chromosome 1 in the HapMap CEU population to determine if block definition had a large impact on our results. Using a D' value of 0.7 results in 2,039 fewer "independent" SNPs on chromosome 1 which extrapolates to 44,000 fewer "independent" SNPs across the genome. Using a more stringent value of D' = 0.9 results in 2,906 more "independent" SNPs on chromosome 1 which extrapolates to 63,932 more "independent" SNPs across the genome. Although this may increase the range of total SNPs across the genome from 120,000 to 228,000 it does not alter the exponent of the p-value or substantially affect the thresholds (Table 3).

          We also defined blocks using two additional block definitions: the Gabriel method and the 4-gamete rule. The Gabriel method creates blocks using stringent criteria of LD with a D' upper bound >0.98 and a lower bound >0.70[12]. This creates smaller blocks with fewer SNPs within a block. The 4-gamete rule of Wang, based on Hudson and Kaplan determines blocks based on presumed recombination[13, 14]. Using pairwise sets of SNPs it determines the frequency of observing all 4 possible 2-SNP haplotypes. If all 4 haplotypes are observed, this method assumes recombination has occurred. Table 6 shows the results of different block definitions for Chromosome 1 for the CEU HapMap samples. The Gabriel method results in a similar number of blocks, but the number of SNPs per block is greatly reduced resulting in more SNPs outside of the block that are still in LD but do not meet the stringent criteria of a "block". The 4-gamete rule results in fewer blocks and more SNPs outside of blocks that represent potential recombination events. To limit the dependence on LD we believe the solid spine of LD is the best method to capture the underlying LD and biological dependence of SNPs, and therefore we base our thresholds on this method.
          Table 6

          Altering Block Definitions for Chromosome 1

           

          Total Number of Blocks

          Total Number of Interblock SNPs

          Total Number of SNPs and Blocks

          Average Number of SNPs per block

          Average D' per block

          Solid Spine LD

          10740

          1815

          12555

          18.4

          0.804

          Gabriel

          10115

          38037

          48152

          15.7

          0.805

          4-Gamete Rule

          18967

          9084

          28051

          9.5

          0.841

          The method we detail is an extension to the original Bonferroni correction which is widely utilized; however, we have reduced the total number of SNPs to reflect the number of "independent SNPs" since independence is an assumption of the Bonferroni correction. Therefore, our thresholds are based on the original Bonferroni calculation of 1/Total # of SNPs, 0.05/Total # of SNPs and 0.001/Total # of SNPs where the number of SNPs that we use is now a better estimate of the number of independent tests being performed. Therefore, our proposed method allows a Bonferroni correction that has less violation of the assumption of independence.

          We have empirically defined thresholds for genome wide association studies to control the family-wise error rate while accounting for the interdependence of SNPs in linkage disequilibrium. The use of actual data provides us an opportunity to unequivocally characterize the underlying linkage disequilibrium structure in these two populations. We considered the use of simulations as has been done for single chromosomes by assigning haplotypes based on frequencies from inferred haplotypes of founders for a set number of replicates [11]. But the reality is that simulation programs have thus far been unable to recreate the complexity of the underlying LD structure of the human genome. While we could use real 500 K genotype data and simulate unassociated traits, we would need to obtain many real 500 K GWAS data sets and then simulate many replicates of unassociated traits in each of them to adequately examine Type I error. Currently, this is a daunting task since the process just for obtaining the data from public databases is quite lengthy and the analysis time required to perform hundreds of GWAS analyses would be prohibitive.

          By identifying the "independent" SNPs, we have significantly reduced the total number of SNPs to be used for Bonferroni correction in the set of SNP panels (Affymetrix and Illumina) and in HapMap. These "independent" SNPs provide us with a more accurate number of SNPs to include when adjusting for multiple testing using the Bonferroni correction. In addition, these p-values can assist in determining power for GWAS prior to genotyping so that only studies which can attain suggestive or significant association are pursued. We acknowledge that although we reduce the number of independent SNPS, the corresponding p-value cutoffs are still very low because we are analyzing more than 2 million SNPs without a specific biological hypothesis and stringency is still important. We need to balance identifying a true association while limiting Type 1 error.

          We did evaluate the effects of the new thresholds on power using the Genetic Power Calculator to [15] determine the sample sizes we would need using a significance level based on all HapMap SNPs versus only the independent SNPs and blocks, as we recommend here. Table 7 provides different sample sizes using the 'LD adjusted' Bonferroni correction that we suggest here and the unadjusted Bonferroni correction in both CEU and YRI HapMap samples. Using the unadjusted Bonferroni correction would result in a necessary increase in sample size of 358–890 cases depending on the genotype relative risk and population. This increased burden of sample recruitment, collection and genotyping to adjust for "all" SNPs needs to be considered carefully, especially since many of the SNPs will be in strong LD and not contributing increased information.
          Table 7

          Examples of sample sizes required to have 80% power to attain significant association (family-wide error of 0.05) when using 'LD-adjusted' and unadjusted Bonferroni-corrected significance thresholds in CEU and YRI under different genetic models

          P-value

          Population

          Genotype Relative Risk Aa/AA

          Sample Size

          3.04 × 10-07

          CEU HapMap LD adjusted

          1.4

          5270 (-890)

          2.08 × 10-08

          CEU HapMap

          1.4

          6160

          3.04 × 10-07

          CEU HapMap LD adjusted

          1.6

          2550 (-431)

          2.08 × 10-08

          CEU HapMap

          1.6

          2981

          1.73 × 10-07

          YRI HapMap LD adjusted

          1.4

          5457 (-742)

          1.85 × 10-08

          YRI HapMap

          1.4

          6199

          1.73 × 10-07

          YRI HapMap LD adjusted

          1.6

          2641 (-358)

          1.85 × 10-08

          YRI HapMap

          1.6

          2999

          Sample size is calculated with a high risk allele frequency of 10%, disease prevalence of 20%, and power of 0.80, with a difference in allele frequency between the causal marker and the genotyped marker of 10% (D' = 1.0). Sample size indicates the number of cases required (an equal number of controls is also required). The number in parentheses for sample size indicates the difference between the sample size required when using the LD adjusted Bonferroni correction versus using the unadjusted Bonferroni correction (which corrects for 2.4 million CEU HapMap SNPs and 2.7 million YRI HapMap SNPs.

          Conclusion

          The emerging trend towards genome wide association studies and large scale SNP genotyping warrants universal thresholds of significance, similar to those established by Lander and Kruglyak for LOD score genetic linkage analyses [16]. The dilemma facing many researchers is which regions to follow-up with dense SNPs or sequencing? To date, the most utilized threshold has been the arbitrary value set by the Wellcome Trust Case Control Consortium of 5 × 10-7 [17]. Interestingly, our Bonferroni LD-adjusted values are similar to these two thresholds (nominal p-value = 3.04 × 10-7 for CEU), but we also provide thresholds for suggestive and highly significant association. We believe the suggestive association threshold should be used to identify SNPs for consideration in follow-up studies, and both the significant and highly significant associations should be considered regions more likely of association. Of course, these thresholds are only guidelines that account for the interdependency of SNPs and investigators should carefully consider any regions with strong candidate genes or biologic plausibility even if they do not meet these thresholds. We also agree with the NHGRI/NCI working group on Replication in Association Studies that all statistically significant regions should be replicated using additional populations with adequate sample size to confirm any GWAS finding [18]. These thresholds should assist in replicating regions of true association.

          Declarations

          Acknowledgements

          This work was supported by the Intramural Program at the National Human Genome Research Institute, National Institutes of Health.

          We would like to acknowledge the programming support of NHGRI's Bioinformatics and Scientific Programming Core. Specifically we would like to recognize Suiyuan Zhang.

          Authors’ Affiliations

          (1)
          Statistical Genetics Section, Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health

          References

          1. Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science 1996, 273:1516–1517.View ArticlePubMed
          2. A haplotype map of the human genome Nature 2005, 437:1299–1320.
          3. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archeveque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449:851–861.View ArticlePubMed
          4. Benjamini Y, Yekutieli D: Quantitative trait Loci analysis using the false discovery rate. Genetics 2005, 171:783–790.View ArticlePubMed
          5. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple testing. 57 Edition 1995, 289–300.
          6. Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003, 100:9440–9445.View ArticlePubMed
          7. Dudbridge F: A note on permutation tests in multistage association scans. Am J Hum Genet 2006, 78:1094–1095.View ArticlePubMed
          8. Tenesa A, Farrington SM, Prendergast JG, Porteous ME, Walker M, Haq N, Barnetson RA, Theodoratou E, Cetnarskyj R, Cartwright N, Semple C, Clark AJ, Reid FJ, Smith LA, Kavoussanakis K, Koessler T, Pharoah PD, Buch S, Schafmayer C, Tepel J, Schreiber S, Volzke H, Schmidt CO, Hampe J, Chang-Claude J, Hoffmeister M, Brenner H, Wilkening S, Canzian F, Capella G, Moreno V, Deary IJ, Starr JM, Tomlinson IP, Kemp Z, Howarth K, Carvajal-Carmona L, Webb E, Broderick P, Vijayakrishnan J, Houlston RS, Rennert G, Ballinger D, Rozek L, Gruber SB, Matsuda K, Kidokoro T, Nakamura Y, Zanke BW, Greenwood CM, Rangrej J, Kustra R, Montpetit A, Hudson TJ, Gallinger S, Campbell H, Dunlop MG: Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet 2008, 40:631–637.View ArticlePubMed
          9. Marchini J, Howie B, Myers S, McVean G, Donnelly P: A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 2007, 39:906–913.View ArticlePubMed
          10. Sidak Z: Rectangular confidence regions for themeans of multivariate normal distributions. 1967, 626–633.
          11. Nicodemus KK, Liu W, Chase GA, Tsai YY, Fallin MD: Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms. BMC Genet 2005,6(Suppl 1):S78.View ArticlePubMed
          12. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, Defelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The structure of haplotype blocks in the human genome. Science 2002, 296:2225–2229.View ArticlePubMed
          13. Hudson RR, Kaplan NL: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 1985, 111:147–164.PubMed
          14. Wang N, Akey JM, Zhang K, Chakraborty R, Jin L: Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am J Hum Genet 2002, 71:1227–1234.View ArticlePubMed
          15. Purcell S, Cherny SS, Sham PC: Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 2003, 19:149–150.View ArticlePubMed
          16. Lander E, Kruglyak L: Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 1995, 11:241–247.View ArticlePubMed
          17. Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. 447 Edition 2007, 661–678.
          18. Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, Brooks LD, Cardon LR, Daly M, Donnelly P, Fraumeni JF Jr, Freimer NB, Gerhard DS, Gunter C, Guttmacher AE, Guyer MS, Harris EL, Hoh J, Hoover R, Kong CA, Merikangas KR, Morton CC, Palmer LJ, Phimister EG, Rice JP, Roberts J, Rotimi C, Tucker MA, Vogan KJ, Wacholder S, Wijsman EM, Winn DM, Collins FS: Replicating genotype-phenotype associations. Nature 2007, 447:655–660.View ArticlePubMed

          Copyright

          © Duggal et al. 2008

          This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

          Advertisement