Missing call bias in high-throughput genotyping

Background The advent of high-throughput and cost-effective genotyping platforms made genome-wide association (GWA) studies a reality. While the primary focus has been invested upon the improvement of reducing genotyping error, the problems associated with missing calls are largely overlooked. Results To probe into the effect of missing calls on GWAs, we demonstrated experimentally the prevalence and severity of the problem of missing call bias (MCB) in four genotyping technologies (Affymetrix 500 K SNP array, SNPstream, TaqMan, and Illumina Beadlab). Subsequently, we showed theoretically that MCB leads to biased conclusions in the subsequent analyses, including estimation of allele/genotype frequencies, the measurement of HWE and association tests under various modes of inheritance relationships. We showed that MCB usually leads to power loss in association tests, and such power change is greater than what could be achieved by equivalent reduction of sample size unbiasedly. We also compared the bias in allele frequency estimation and in association tests introduced by MCB with those by genotyping errors. Our results illustrated that in most cases, the bias can be greatly reduced by increasing the call-rate at the cost of genotyping error rate. Conclusion The commonly used 'no-call' procedure for the observations of borderline quality should be modified. If the objective is to minimize the bias, the cut-off for call-rate and that for genotyping error rate should be properly coupled in GWA. We suggested that the ongoing QC cut-off for call-rate should be increased, while the cut-off for genotyping error rate can be reduced properly.


Background
Driven by the common disease-common variant (CDCV) hypothesis [1], genome-wide association (GWA) studies have demonstrated its power in the identification of genetic variants underlying the diseases [2][3][4][5]. The completion of the human genome sequence [6,7] and the International HapMap Project [8][9][10] as well as the advent of highly efficient and affordable genotyping technologies made GWA within reach. The Phase II HapMap contains more than 4.3 million common SNPs and the coverage is estimated to capture 94% of common variation in CEU and CHB+JPT and 81% in YRI with r 2 ≥ 0.8 [9]. Several high-throughput and cost-effective technologies for genotyping that are currently being used, they are TaqMan assay [11] and GeneChip array [12] (based on hybridization with allele-specific probes), SNPstream system [13] and GoldenGate assay [14] (based on single nucleotide primer extension), Invader assay [15] (based on enzymatic cleavage), and SNiPer [16] (based on Oligonucleotide ligation). Though different reaction mechanisms are employed in different methods, fluorescence detection is widely employed in the process of the specific allele detection.
To deal with abundant genotype data produced by various genotyping platforms, quality control (QC) to ensure the accuracy of allele call becomes a critical issue. When genotyping errors occur, its effects on linkage analysis [17][18][19], LD measures [20,21], tagging SNP selection [22] and the subsequent association tests [22][23][24] have been widely and carefully investigated. Various strategies of detecting genotyping errors or removing its effects on analyses, especially on linkage analysis have been proposed [25][26][27][28]. In addition to genotyping errors, missing calls seem to be abundant in high-throughput genotyping. For example, in the data of the Phase I HapMap, less than 20% data that failed to pass QC was due to genotyping error (>1 duplicate inconsistent or >1 medelian error), while more than 65% of the markers show missing data in over 20% individuals [9]. The presence of missing calls was even more prominent in the Phase II data of HapMap [8]. However, the effect of missing call on the subsequent analyses has been largely ignored.
Strong emphasis on the accuracy of allele calls and technical success to achieve that has made the effect of missing call largely overlooked. It was suggested that 'no-call' procedure should be taken where observations of borderline quality be removed from allele calls in order to keep the genotyping error rate as low as possible [29]. This 'no-call' principle becomes a common practice in genotyping procedures. However, it should be noted that the validity of the 'no-call' principle relies on an implicit assumption that genotyping frequencies in no-call individuals are equal to those in the population. Under this hypothesis, missing data from the no-call procedure simply leads to a power loss due to a decreased sample size, and does not affect the estimation of allele frequencies at all. In this report, we started with a close examination of the validity of the 'no-call' principle by regenotyping those individuals whose genotypes that cannot be unequivocally determined and therefore would have been otherwise discarded. The objectives of this report are (1) to demonstrate experimentally how widely and seriously the problem of missing call bias (MCB) exists, (2) to investigate theoretically the effects of MCB on the subsequent analyses, especially on association studies, and (3) to provide suggestion on dealing with observations of borderline quality and re-evaluate the current QC standards, through comparing the effects of MCB and genotyping errors on allele frequency estimation and association studies.

Results
There are two major causes for missing calls. One is due to poor quality of DNA samples, which often fails to be amplified and to generate strong enough intensity of fluorescence signals over the background. The other arises when an observation, i.e., a read out of fluorescence signals, cannot be assigned unequivocally to any of the clusters of genotype, therefore, is subject to 'no-call' procedure. In this report, we mainly focus on the missing calls due to the failure of being assigned to any clusters of genotype.

Nature of no-calls: results of sequencing
To evaluate the nature of no-calls in reality, four different widely-used high-throughput genotyping platforms were included in this study, and they are GenomeLab™ SNPstream Genotyping System (Beckman Coulter, Los Angeles), BeadLab SNP Genotyping System (Illumina, San Diego), TaqMan ® SNP Genotyping Assays (ABI, Foster City) and GeneChip ® Human Mapping 500 K Array Set (Affymetrix, Santa Clara). Eight SNPs were selected and subjected to regenotyping of equivocal observations (nocalls) through sequencing. The criteria for the selection of SNPs and samples for sequencing were presented in Methods.
The genotype distribution of the observed data at each locus which were produced by the respective genotyping technology was compared with that of no-calls which were obtained by sequencing (Table 1). Statistically significant differences were observed in SNPstream, Illumina and GeneChip 500 K, indicating that the MCB indeed exists in widely-used genotyping technologies and it would lead to a biased estimation of allele/genotype frequencies. The genotype-specific call-rates c i (i = AA, Aa, aa) were calculated, most of which were above 0.95, but it could be as low as 0.75 for GeneChip 500 K.
In the subsequent sections, in order to explore the effects introduced by MCB, we proposed a model to investigate the nature of no-calls. Typically an equivocal observation occurs as follows (Fig. 1). For the data points that lie between the cluster of homozygotes of minor alleles (AA) and the cluster of heterozygotes (Aa), the real genotypes could be either homozygotes of minor alleles (Scenario I) or heterozygotes (Scenario II). For those that lie between the cluster of homozygotes of major alleles (aa) and the cluster of heterozygotes (Aa), the real genotypes could be either heterozygotes (Scenario III) or homozygotes of major alleles (Scenario IV). When the observations that cannot be called unequivocally are discarded, Scenario II is equivalent to Scenario III. To facilitate discussion, we assumed no-calls only happen in a specific genotype with the genotype-specific call-rate c (0 ≤ c ≤ 1).

Effect of MCB on type-I error rate for HWE
Hardy-Weinberg Equilibrium has been repeatedly recommended as a measure for QC in the context of genetic association studies [30]. In the following, we will show that MCB is one of causes for the departure from HWE.
The type-I error rate for departure from HWE disturbed by MCB is inflated with the increasing of c (Fig. 2). When MCB happens in homozygotes (Scenario I and Scenario IV), it leads to a departure from HWE because of excessive heterozygotes. The inflation is similar for AA and aa for a given c. When MCB happens in heterozygotes (Scenario II  & III), excessive homozygotes result in the departure from HWE. The inflation is more pronounced for MCB in het-erozygotes than that in homozygotes. For example, the type-I error rate is 0.055 ~0.228 in Scenario I and Scenario IV, while it can be 0.076 ~0.654 in Scenario II & III under different MAFs in the presence of MCB (c = 0.80) for a population (N = 500) under HWE in the significant level of 0.05. However, HWE still holds, as expected, when missing equivalently but unbiasedly across different genotypes (Unbiased Missing, UBM).

Effect of MCB on allele frequency estimation
The accuracy of allele/genotype frequency estimation is of special importance since many analyses such as association studies, inference of haplotype, and inference of population structure rely on it. UBM does not affect allele/ genotype frequency estimation although it reduces the sample size. However, MCB does. When missing bias in Scenario I and Scenario II & III, MAF is underestimated; and when in Scenario IV, MAF is overestimated and this change is larger than the former two (Fig. 2). It should be

Effect of MCB on association studies
The development of high-throughput genotyping technologies make association study widely conducted for identification of disease loci underlying complex traits. We now examine the effect of MCB on association using various disease models and statistical tests.
Power issue is of special importance in association studies [31,32]. To investigate the effect of MCB on the power of association studies, MCB was introduced into disease models (see Methods). Here, the sample size is 500 for both case and control groups, and MCB are identical for both groups.
It has been commonly assumed that missing calls would lead to power loss due to decreased sample size, which only holds in absence of MCB. In the presence of MCB, the power can be affected by both the sample size and the biased estimation of allele and genotype frequencies (see Fig. 2, Additional file 1 &2). The change of power by MCB is usually larger than that by UBM. For example, the power loss by UBM is all less than 5% for the locus with MAF = 0.25 under various disease models (power ≈ 80% in genotypic χ 2 test) when c = 0.80, while the change can be around and even more than 10% when disturbed by MCB (Table 2).
For the χ2 test based on genotype frequencies, MCB always leads to power loss in all scenarios under different disease models compared with the null (in the absence of missing) (see Fig. 2 and Additional file 2). But for the χ2 test based on allele frequencies, it even can gain the power in some scenarios, because of the biased estimation of allele frequency (see Fig. 2 and Additional file 1). Genotypic χ2 test seems to be more robust to the changes of power in association studies than allelic χ2 test in the presence of MCB (see Fig. 2, Additional file 1 &2, and Table 2).  2).
In addition, though the minor allele of A in the current settings of disease models is susceptible to the disease (in overdominant disease model, Aa is susceptible to disease), the conclusions drawn above can also be extended when A is a protective one (data not shown). Moreover, in the disease model (h AA = 0.01, h Aa = 0.01, and h aa = 0.01), the type-I error rate in MCB remains to be 0.05, indicating that MCB does not inflate the false positive rate for the association studies, under the assumption that the extent of missing is identical in both case and control.

Tradeoff between MCB and genotyping errors
In the previous section, we showed that MCB is common in the current genotyping technologies, and it could affect the subsequent analyses seriously and lead to false conclusions. The key issue is how to deal with those equivocal observations which apparently are responsible for MCB. Two alternative options are available. The first option is to discard the observations of borderline quality using the 'no-call' procedure which may lead to MCB. The second option is to assign these observations to one of the geno- The sketch of genotyping calling Figure 1 The sketch of genotyping calling. The points shown in 'x' represent no-calls due to the failure of being assigned to any clusters of genotype unequivocally. When the data points lie between the cluster of homozygotes of minor alleles (AA) and that of heterozygotes (Aa), the real calls could be AA or Aa, corresponding to Scenario I and Scenario II. When the data points lie between the cluster of heterozygotes (Aa) and that of homozygotes of major alleles (aa), the real calls could be Aa or aa, corresponding to Scenario III and Scenario IV.
Effects of MCB on HWE, MAF estimation and association studies under multiplicative disease model types at the cost of increasing genotyping errors. Here, we compare the overall outcome (allele frequency estimation and power of association studies) of these two options and try to offer guidelines for different scenarios to minimize the biases caused by the equivocal observations. In addition, we evaluate the overall call-rate and genotyping error rate in the two options respectively and intend to reexamine the current QC standards.
To facilitate the presentation, the genotype-specific callrate c was set to 0.80, a moderate MCB as shown previously. For the second option, we assumed all of the equivocal observations are called and the proportion of accurate calls among these equivocal ones is denoted by Both MCB and genotyping errors will possibly lead to inaccurate estimation of allele/genotype frequencies and in turn distorted association. When 'no-call' procedure is applied to those observations of borderline quality, we showed earlier that the biased estimation is dictated only by MAF and c. When the equivocal observations are called, the bias in allele frequency estimation depends on conf in addition to MAF and c. In particular, the bias in allele frequency estimation reflected by the changes of MAF estimation increases with the decreasing conf (see Additional file 3). Fixed MAF and c, the biased estimations caused by MCB and by genotyping errors are comparable. The bias introduced by MCB is certain, whereas the bias caused by genotyping errors changes with the conf monotonically. Interestingly, when the conf is large enough (indicated by the solid line in Fig. 3B), the biased estimation of MAF caused by genotyping errors are smaller than that caused by MCB. Therefore, it would be more beneficial to call the equivocal observations in this case (grey area above the line in Fig. 3B). However, when the conf is below the solid line indicated in Fig. 3B, the biased estimation of MAF caused by genotyping errors is more and 'no-call' procedure is recommended (area below the line in Fig. 3B). It should be noted that the bias in estimation of allele frequency in Scenario I by MCB is the greatest, more so than in genotyping errors, even with the highest error rate (conf = 0). Therefore, it is suggested that 'no-call' principle should not be taken in Scenario I if the objective is to minimize the biased estimation of MAF (Fig. 3).
In the following section, we explore the performance of association studies affected by MCB and by genotyping errors. Here, MCB (c = 0.80) and genotyping errors (c = 0.80, 0 ≤ conf ≤ 1), are assumed to be identical for case and control groups. MCB and genotyping errors were introduced to the disease models with various modes of inheritance relationship ( Table 2) respectively according to Methods. The power affected by MCB was discussed previously. The power affected by genotyping errors were shown in Additional file 3. For χ 2 test based on genotype frequencies, genotyping errors may have no effect on asso- Though the power affected by genotyping errors is complicated in different scenarios and disease models, the power either does not change or changes monotonically with the conf given the scenarios and disease models (see Additional file 3). Therefore, similar to the previous study on allele frequency estimation, a threshold of conf (indicated by the solid lines in Fig. 4B, and Additional file 4B &5B) is expected as well. If the conf is below the threshold, the power loss caused by genotyping errors is larger than that by MCB; therefore, 'no-call' procedure should be taken (area below the line in Fig. 4B, and Additional file 4B &5B). Otherwise, it is better to call the observations of borderline quality at the cost of genotyping errors (grey area above the line in Fig. 4B, and Additional file 4B &5B). For instance, for a locus with MAF = 0.35, when 'no-call' procedure is taken for the equivocal observations happened in Scenario I (c = 0.80), the overall call-rate can still achieve at 97.6%. The power is 87.7% in MCB compared with 92.1% of the null in the multiplicative disease model for allelic χ 2 test. However, if these equivocal observations are called even though they are completely misclassified (conf = 0), the power with genotyping errors can be 88.1% at least. It indicates that in order to reduce the power loss caused by the equivocal observations, it would be more beneficial to call the equivocal observations with a geno-typing error rate 2.5% than 'no-call' with an overall callrate 97.5% (Fig. 4).
As shown above, the commonly-used 'no-call' principal for the observations of borderline quality is not always the best choice. By weighing the influences on the performance of association study and allele frequency estimation, it is therefore preferable to force the calling of, even though they can be erroneous, these equivocal observations when they lie between the cluster of homozygotes of minor alleles and that of heterozygotes (i.e. in Scenario I & Scenario II). When the equivocal observations lie between the cluster of homozygotes of major alleles and the cluster of heterozygotes (i.e., in Scenario III & Scenario IV), the loss of power introduced by MCB is more pronounced than that by genotyping error when these equivocal observations can be accurately called; but when the calling accuracy cannot be granted (the conf is small), the power loss is affected more by the genotyping error and it may be better to invoke 'no-call' procedure. In addition, with different disease models, different decisions for dealing with these equivocal observations may be made. A program called QC-Tradeoff is available online to suggest whether 'no-call' procedure could be conducted http:// humpopgenfudan.cn/en/resource/download.html to minimize the biases caused by the equivocal observations.
In the above analyses, for the models of genotyping errors, we assumed all of equivocal observations were called to facilitate the discussion. Here, we extended a general  Table 2. Fig. 5 and Additional file 6 illustrated the power of association tests in the presence of MCB and genotyping errors, and the corresponding overall call-rate and genotyping error rate. An interesting finding is that the influences of the power caused by the equivocal observations always change monotonically with α from 0 (the model of MCB) to 1(the model of genotyping error). It indicates in order to minimize the biases caused by the equivocal observations on association studies, the validate procedure is either no-call resulted in MCB or call all of the equivocal observations with genotyping errors, which we had discussed above.

Effects of MCB and genotyping errors on MAF estimation
Given the knowledge of relationship of bias in allele/genotype frequency estimation and in association study with the magnitude of MCB and genotyping errors, it is therefore likely to develop a strategy to minimize the bias by Figure 4 Effects of MCB and genotyping errors on association studies under multiplicative disease model. A) illustrates the overall call-rate for the loci with different MAFs in the presence of MCB (c = 0.8). B) illustrates the threshold of conf by a solid line. If the equivocal observations can be called accurately in a confidence above the conf threshold, it prefers to call those equivocal ones at the cost of genotyping errors to minimize the power loss (grey area above the line). Otherwise, 'no-call' procedure is beneficial, which results in MCB (area below the line). C) illustrates the genotyping error rate, when the equivocal observations are called in the conf threshold mentioned above. The figures correspond to Scenario I, Scenario II, Scenario III and Scenario IV from the left to right.

Effects of MCB and genotyping errors on association studies under multiplicative disease model
choosing proper cut-offs for call-rate and genotyping error rate (see Discussion).

Disccusion
The advent of high-throughput genotyping technologies led to an exciting era of genome-wide associations. Genotype data with good quality are imperative in ensuring the creditability of a study. Missing calls in high-throughput genotyping has long been ignored in genetic studies. In this study, we demonstrated experimentally the prevalence and severity of the problem of missing calls, especially MCB, in the current genotyping technologies.
We also showed theoretically how MCB could lead to biased conclusions in the subsequent analyses including estimation of allele/genotype frequencies and association tests. MCB leads to power loss in most cases, and such loss may lead to false negative conclusions. Compared with allelic χ 2 test, genotypic χ 2 test is more robust to MCB. Various modes of inheritance relationship (dominant, recessive, overdominant, additive and multiplicative) were considered in our study. We also showed that when missing bias happens in the genotype whose contribution to the disease differs most, regardless whether it is susceptible or protective to the disease, it affects the power of association studies most.
In this study, we investigated the bias of association in the presence of both MCB and genotyping errors, and demonstrated that they contributed to the bias differently. This result is of special importance in determining the cut-offs used for QC in the current practice of GWA. The question is whether the current QC standards are optimal. If the objective is to minimize the bias in allele/genotype frequency estimation and in association tests, the cut-off for call-rate and that for genotyping error rate should be properly coupled in GWA. This leads to a re-examination of the existing QC standards for both call rate and genotyping error rate that are widely used in various association studies.
A commonly used QC standard for call rate is 80% and 95% or above for the first screening and fine mapping, respectively, in GWAs (e.g. Easton et al. [3]; Hunter et al. [4]). Although we demonstrated that the bias in allele frequency estimation and in association study discussed in Results (Fig. 3 &4, and Additional file 4 &5) is not negligible when 'no-call' procedure is applied, their call-rates are all above 80% and even can be above 95%. It suggests that the existing cut-offs are not sufficiently stringent to filter out the loci which may suffer from MCB.
A genotyping error rate < 1% is considered acceptable [3][4][5]33]. This is an extremely stringent cut-off in the presence of equivocal observations, given that such a stringent cutoff would force to invoke 'no-call' principal whereas would not lead to a reduction of bias. Our results indicated that in most cases, the bias can be greatly reduced by increasing the call-rate at the cost of genotyping error rate, i.e., < 5% (Fig. 3 &4, and Additional file 4 &5). Therefore, we suggested that the ongoing QC cut-off for call-rate should be increased, while the cut-off for genotyping error rate can be reduced properly.
A program called QC-Tradeoff is available online to provide a conf threshold. If the threshold is high, it is conservative to take 'no-call' procedure to reduce the power loss in association studies introduced by the equivocal observations; otherwise, the equivocal ones could be called even though genotyping errors may be occurred. Moreover, we showed that the missing calls can usually be reduced sufficiently but with certain accuracy using the current technologies, indicating that the value of conf in reality is usually high enough. Through adjusting relevant parameters which are implemented in the calling software provided by genotyping platforms (such as Illumina, Taq-Man and GeneChip 500 K), it allows either higher call rates or greater genotyping rate. For example, we adjusted quality value of TaqMan from 0.95 (default) to 0.80 to illustrate the change of calling in the two loci rs10109984 and rs11226. After a change of the quality value, the overall call-rate increased substantially. In particular, when the quality value is 0.95, 30 and 29 calls were not called for the loci rs10109984 and rs11226, respectively. When 0.8 was chosen as the quality value, only 5 were not called at rs10109984 and 7 at rs11226. Subsequently, the number of genotyping discordance between the genotyping results and sequencing results was 8 (corresponding to conf = 0.73) and 0 (corresponding to conf = 1.0) for the locus rs10109984 and rs11226, respectively. Furthermore, our results also suggested that MCB does not inflate type-I error rate for association studies. But it should be noted that the conclusion made here is under the assumption that the extent of missing is same to case and control. Sometimes differential bias between case and control is unavoidable, i.e., for the different sourcing of samples. In this case, effects of MCB and genotyping errors could be more complicated. Clayton et al. [34] showed case-control differential bias and calling inaccuracies can lead to differential misclassification, and consequently, to increase false-positive rates. Plagnol et al. [35] from the same lab found case-control bias associated with missing data can increase the false-positive rate as well and recommended to use 'fuzzy' calls to deal with uncertain genotypes that would otherwise be labeled as missing.

Conclusion
Missing calls in high-throughput genotyping has long been ignored in genetic studies. However, it had been illustrated that the problem of missing call bias does exist widely and sometimes seriously in prevalent highthroughput genotyping technologies. Missing call bias could lead to biased conclusions in subsequent analyses, including allele/genotype frequency estimation and association studies. The commonly used 'no-call' procedure does not always a best option for observations of borderline quality. Our results indicated that in most cases, the biased conclusion can be greatly reduced by increasing the call-rate at the cost of genotyping error rate. Therefore, the existing QC standards should be modified that the cut-off for call-rate and that for genotyping error rate should be properly coupled in GWA. A program called QC-Tradeoff is available online to suggest call or no-call the equivocal observations according to the case the user faced to minimize the power influences in association studies, and illustrate the acceptable QC standard in the corresponding case.

Regenotyping for no-calls
Two SNPs were selected for each platform (GeneChip 500 K, SNPstream, Illumina and TaqMan) from a large number of loci which were genotyped by the respective technology in our laboratory. The SNPs were selected using the following criteria: 1) the genotype calls were made using the software provided by the venders of the respective technology; 2) the call-rate at each locus is around or above the average call-rate from the same platform; 3) the minor allele frequency (MAF) of the observed data is above 0.15.
As for SNPstream, Illumina and TaqMan, their calling algorithms are based on various methods, including GetGenos/ QCReview program for SNPstream, GeneCall software for Illumina and SDS software for TaqMan. But generally, a baseline is set to distinguish the background signals and the informative ones, and then clustering and calling procedures conduct in the informative ones. The samples that show signals above the baseline but cannot be called unequivocally were selected for further analysis in the respective genotyping platforms. As for GeneChip 500 K, genotype data was called by the software GTYPE. It introduces a dynamic model-based algorithm [36] to suit its properties that many different SNPs are to be examined in a few individuals. This algorithm is different from others, so instead, we collected all the missing samples for sequencing.
Overall, rs6743724 (call-rate: 96.5%) and rs699512 (98.6%) for SNPstream, rs2277632 (98.9%) and rs1457043 (98.9%) for Illumina, rs10109984 (95.8%) and rs11226 (96.6%) for TaqMan, rs1192885 (94.2%) and rs6855202 (95.1%) for GeneChip 500 K were selected. The genotypes for the missing data were generated by sequencing the DNA segment containing the polymorphism locus. In the 2 × 3 contingency table of genotype in the 'observed' data produced by the genotyping platforms (Obs.) and the 'missing' data produced by sequencing (Seq.), Fisher Exact Test was used to examine whether there is difference in the genotypic distribution between them (Missing Call Bias, MCB). The genotypespecific call-rate was calculated as well, for AA, Aa and aa, respectively.

Models for MCB and genotyping errors
In the presence of no-calls, let denote the frequency of the genotypes G i (i = AA, Aa, aa) in the population and On the other hand, an equivocal data point can be assigned to a genotype, which could lead to a genotyping error. Assume all the equivocal data points are called and let conf denote the proportion of genotypes that could be accurately called among these equivocal ones. The genotyping error rate is (1 -conf)(1 -c)p G , where G denotes the genotype of equivocal data points.
The observed genotype frequencies in the models were listed in Table 3.

Effects on allele frequency estimation and type-I error rate for HWE
Let π denote the frequency for minor allele A in a population, where 0 ≤ π ≤ 0.5. Suppose HWE holds in the population, the genotype frequencies are p AA = π 2 , p Aa = 2π(1π) and p aa = (1 -π) 2 , respectively.
However, in the presence of MCB or genotyping errors, the estimation of π will be affected. Here, we present the results by the difference (π obs -π) to reflect the changes in the allele frequency estimation, where π obs is the estimated frequency of minor allele A in the existence of MCB or genotyping errors according to (1) power in association studies. The genotype frequencies in case and control can be calculated respectively according to Table 3 and formula (1)~(3). The power was calculated using a non-central χ 2 distribution following Gordon et al. [23]. The power calculation was conducted in the package of R. In order to the comparison of power affected by MCB and that by genotyping errors with different levels, we tend to find a conf threshold, where if the equivocal observations are called above the conf threshold, the power loss caused by genotyping errors are smaller than that by MCB, and vice versa.
Here, the same extent of MCB, UBM or genotyping errors was assumed for both case and control. Various modes of inheritance relationships were considered, including dominant, recessive, overdominant, additive and multiplicative relationship ( Table 2). The power in these disease models can be 80% in the significant of 0.05 using genotypic χ 2 test, when MAF is 0.25 and the sample size of case and control is 500 respectively.

p p aa
Additional File 3 Figure S3.