Quantify single nucleotide polymorphism (SNP) ratio in pooled DNA based on normalized fluorescence real-time PCR
© Yu et al; licensee BioMed Central Ltd. 2006
Received: 27 October 2005
Accepted: 09 June 2006
Published: 09 June 2006
Conventional real-time PCR to quantify the allele ratio in pooled DNA mainly depends on PCR amplification efficiency determination and Ct value, which is defined as the PCR cycle number at which the fluorescence emission exceeds the fixed threshold. Because of the nature of exponential calculation, slight errors are multiplied and the variations of the results seem too large. We have developed a new PCR data point analysis strategy for allele ratio quantification based on normalized fluorescence ratio.
In our method, initial reaction background fluorescence was determined based upon fitting of raw fluorescence data to four-parametric sigmoid function. After that, each fluorescence data point was first subtracted by respective background fluorescence and then each subtracted fluorescence data point was divided by the specific background fluorescence to get normalized fluorescence. By relating the normalized fluorescence ratio to the premixed known allele ratio of two alleles in standard samples, standard linear regression equation was generated, from which unknown specimens allele ratios were extrapolated using the measured normalized fluorescence ratio. In this article, we have compared the results of the proposed method with those of baseline subtracted fluorescence ratio method and conventional Ct method.
Results demonstrated that the proposed method could improve the reliability, precision, and repeatability for quantifying allele ratios. At the same time, it has the potential of fully automatic allelic ratio quantification.
Single nucleotide polymorphisms (SNPs), the most common source of polymorphism in the human genome, have a wide applications in human genetics, pharmacogenomics, analysis of clinical samples, and identification of human susceptibility genes involved in complex diseases [1–4]. Pooling equal amounts of DNA from all the individual samples and typing one SNP marker at a time can save valuable template DNA and has been successfully utilized in microsatellite markers  and SNPs [6, 7]. Thus, a quantifying SNPs method in pooled and mixed DNA samples with high precision and efficiency is required in human genetics . Another demand of quantifying SNPs in pooled DNA samples lies in the study of population dynamics of pathogens, from which quantifying SNP frequencies may not only help monitor the pathogen dynamics associated with treatment but could also improve therapeutic decision making . Several genotyping methods were found to be suitable for measuring SNP allele frequencies of wild-type/mutant allele ratios in DNA pools, including SSCP or dHPLC, TAQMAN™ (Applied Biosystems) , oligo-ligation assay, Invader assay™ (Third Wave Technologies Inc.), allele-specific amplification with real-time PCR  and Pyrosequencing™ (Pyrosequencing). The real-time PCR platform has a promising future for high throughput, sensitive and accurate estimation of SNP allele frequencies in DNA pools . It utilizes PCR primers together with LNA or MGB modified dual labeled specific probes spanning the SNP of interest . In conventional real-time PCR, the threshold cycle (Ct) or crossing point (CP) from pools were used to calculate allele ratios from the value of 2-ΔCt [11, 14] or similar to E-ΔCt taking PCR amplification efficiency into account. Those methods ideally assume the two-fluorophore channels have the comparable real-time PCR kinetics. This is, however, a simplified approach, since it has been demonstrated that binding efficiencies  differ between the probes, and also one fluorphore channel preferably amplifies of over the other  fluorophore. Apart from that, even small standard deviations of ΔCt will amplify exponentially too large variation and hence it cannot give satisfied precision. To circumvent Ct value, Oliver et al. (2000) have initiated allele ratio quantification based on the fluorescence signal ratio given by hydrolyzed allele specific probes. But their method took only the end point fluorescence into consideration where PCR reaction usually processed into plateau phase and reliable quantification result therefore is not arguably reached.
In this article, we compare the normalized fluorescence data point of the two fluorescence channels during the early PCR exponential phase, and furthermore describe a rational real time data point analysis strategy for quantifying allele ratios in DNA pools. It shows that by using the novel fluorescence normalization method and reliance on the more selected data points fluorescence ratio analysis, significant improvement in precision and repeatability was demonstrated in the novel analysis method compared to conventional method although the proposed strategy does not claim to completely replace the conventional method.
Principle of simulation
In the optimized Taqman real-time PCR, the measured incremental signal ΔRn is directly proportional to the amount of produced amplicon as described , at any cycle as followed:
[amplicon] synthesized = ΔRn/Δ∅ 1
Here Δ∅ represents parameter of the difference between the specific fluorescence of the free fluorophore and the specific fluorescence of the probe-bound fluorophore, ΔRn is the fluorescence increment.
The basic equation for PCR amplification  during exponential phase is:
Nn = N0·En 2
Where N0 is the initial number of DNA, n is number of cycles. As for SNP allele ratio quantification, the amount of product for the amplicon allele A and amplicon allele B will be:
[amplicon]An = A0·EAn 3
[amplicon]Bn = B0·EBn 4
So, the initial ratios allele A and B can be concluded as:
A0/B0 = [amplicon]An/[amplicon]Bn × (EBn/EAn) 5
If let RiA/B to represent A0/B0, the initial ratio of allele A and allele B, equation 5 can be transformed by inserting equation 1:
ΔRnA/ΔRnB = RiA/B × (Δ∅A/Δ∅B) (EAn/EBn) = RiA/B × (K1 × K2) 6
Whereby K1, equal to Δ∅A/Δ∅B, represents probe fluorescence and binding efficiency difference between two different probes. K2, equal to EAn/EBn, represents different allele amplification efficiency.
When analysis of the florescence value ΔRnA and ΔRnB in the selected segment of PCR reaction for each one-reaction tube, K1 representing difference of probe binding efficiencies is the constant parameters. Initial allele ratio RiA/B being constant for one reaction tube, K2 would relatively be constant in only a few chosen cycles. So, ΔRnA is predicted to have a linear relationship with ΔRnB within the selected fluorescence comparison segment in the reaction tube.
From another point of view, if we defined ΔRnA/ΔRnB to be Kf., Eq.6 could also converted as:
Kf/RiA/B = K1 × K2 7
Assuming that PCR reaction efficiency difference K2 remain constant for all samples in the only a few selected fluorescence comparison segment, and further considering that the K1 is the constant parameter, different allele ratio RiA/B is expected to have a linear relationship with the corresponding value of Kf for all samples. Here, this relationship is the clue to quantitative analysis with the suggested method.
Normalized fluorescence data from background
Where x is cycle number, f (Rx) is fluorescence of cycle x, y0 is the background fluorescence, a is the difference between the maximal fluorescence and background fluorescence, x0 is the first derivative maximum of the function or the inflexion point of the curve, b describes the slope of the curve.
where a' = a/y0 .so, normalized fluorescence can be expressed as:
Determination of PCR exponential phase for allele ratio quantification
Data points from as many as exponential cycles were selected to data analysis. The start of the exponential is determined by 80% of the second derivative maximum (SDM) of the sigmoidal equation 8 after fitting it to the raw fluorescent data using SigmaPlot (version 8.0, SPSS) [17, 18] or is visually chosen from logarithmic amplification graph (Log fluorescence versus Cycle numbers graph). Considering X0 is still in the heart of Log linear phase amplification, so, raw fluorescence data points from the determined start cycle of the exponential phase to X0 cycle were chosen for further analysis. Considering each well reaction gives two data points of two different fluorescence FAM vs. VIC, overlaps of cycles for all sample replicates and for both dual probes in real-time Taqman PCR assays were included.
Ratio of two fluorescence during selected exponential phase
An example of four-parameter sigmoidal equation modeling and result of fluorescence normalization
D2 sample FAM (Raw Rx)b
D2 Normalized Rx . (Raw Rx -/Raw Y0-1)c
Sigmoid function fitted for raw fluorescenced
Sigmoid function fitted for normalized fluorescencee
Fluorescence ratio Kf determination by linear regression
Summary of the fluorescence ratio based method and the Ct base method for quantification of predefined standards
Group No. In predefined standards
Predefined allele ratio YMDD:
Baseline subtracted fluorescence ratio
Mean (n = 4)a
S.D (n = 4)b
Mean (n = 4)d
S.D (n = 4)e
Mean (n = 4)f
S.D (n = 4)g
Standard curve generation
Linear regression analysis was also applied to baseline subtracted fluorescence ratio vs. predefined allele ratio. Excellent linear regression as shown in Fig. 3B was reached.
Traditionally, comparative ΔCt method for each allele frequency measure, which is based on the equation allele ratio = 2-ΔCt shown in Supplementary Material 7, 8, 9 [see Additional file 7, 8, 9], the averaged 2-ΔCt measurements of replicates were used to analyze linear regression to the mixed allele ratio. Ct value was determined by the second derivative maximum method . Run 3 but run 1 or run 2 has good lineation in ΔCt method (Fig 3C). These results suggest run 1 and run 2 cannot conclude reliable result from the standard curve in ΔCt method. In contrast, good linearship in all standard curves in the two fluorescence ratio methods suggests reliable results can be obtained from the standard curve by these two methods.
Intra-and inter-run variation in allele ratio quantification of the proposed method
intra-run and inter-run variations of three methods for three different runs
Baseline subtracted fluorescence ratio method
SDM method (2-ΔCt method)
Because of its conceptual simplicity and practical easy use, real time PCR is widely used to detect and quantify DNA and c DNA in diverse applications such as diagnosis and genetics. The quantification of single nucleotide polymorphism (SNP) allele frequencies in pooled DNA samples using real time PCR is a promising approach for large-scale diagnostics and genotyping. Currently real-time PCR to determine the allele ratio in pooled DNA mainly depends on Ct value. But inconsistency in PCR conditions, position effect, different florescence background etc all can lead to Ct value variability. And the small standard deviation of Ct values is amplified in the exponential way. For example, a Ct difference of one represents two-fold difference . The CV values for the variation of quantification seem to be too large between replicates and over-time run.
To avoid using Ct values in quantification allele frequency, a novel rational data processing method, which is based on the normalized fluorescence ratio Kf of real-time PCR exponential phase, was presented. Kf method extends the work of Dwight H. Oliver, and James R. Eshleman . The logic in proposed method derives from parallelism between fluorescence and produced amplicon .
Correction of background vertical shift of the amplification curve is the prerequisite for the correct quantification. By background correction, precision was reported improved . However this data analysis did not ideal fully resolve the non-PCR related fluorescence fluctuations occurring well to well (Fig 1B) and overtime run. For this reason, Passive reference dye (such as ROX) was applied to in the experiment setup to deal with problem. In the data procession step after PCR, We probe to investigate the possibility of decreasing difference within replicates due to non-PCR related fluorescence fluctuations by data point analysis. Matthew P (2004) and Liu.W.etc (2002) pointed that background fluorescence can be obtained by the developed four-parametric sigmoid function fit-point method . Utilizing the determined background fluorescence, we process the raw fluorescence data to obtain normalized fluorescence by first subtracting each data point with individual background fluorescence and then dividing the each subtracted fluorescence by the background fluorescence. Thus all fluorescence curves are displayed as fluorescence increment relative to individual initial background fluorescence, similar to PCR curve when passive reference dye was added in the reaction, if initial background were considered to be comparable to passive reference dye. This is not a too much aggressive postulation, if we get down to the fact that in the early PCR reactions, there is no PCR related fluorescence but non-PCR fluorescence difference due to instrument, reaction mixes or tubes. The applicability of the proposed method is attested by its ability to improve the homogeneity of intra-run replicates amplification curves. Thus this analysis step can remove well-to-well variations better than baseline-subtracted method to some degree.
Dwight H. Oliver early novel work in quantifying allele ratio by end point fluorescence ratio instead of the Ct value. However, amplification data points could be easily influenced by cycle-to-cycle signal noise and furthermore the end point fluorescence cannot serve as idea factor for real time PCR quantification. Thus the method in which as many as possible fluorescence data points from early exponential phase used in Kf method could potentially gives more reliable measurement for quantification of allele ratio than the method where only one data point, such as end fluorescence, used to be analyzed for interpretation.Equ.6 revealed that the fluorescence ratio between the FAM and VIC for each cycle is not only proportional to the initial allele ratio but also depend on the probe binding efficiency (k1) and allele amplification efficiency (K2). In Stahlberg, A. etc. early work aiming at quantifying allele ratio, the efficiency ratio Xer is determined from an intrinsic calibration curve by diluting the test sample. And relative sensitivity KRS reflecting the difference in the probes' fluorescence and binding efficiencies are further determined from the measurements on negative samples assuming a 60:40 expression ratio. However even slight PCR amplification efficiency inaccuracy, which is practically more than often confronted due to the fact that amplification efficiencies are estimates by all current method, may distort the results significantly because of exponential nature expressed Equ.6 . Rather than separately calculate the K1 (KRS) and K2 (Xer), the proposed method here determine combined effects of these two factors from the slope of the standard curve. The slopes of the standard curves in overtime runs (0.3160, 0.3201, and 0.2929) show that these combined effects are relatively constant among different runs in the data point analysis Kf method suggested here.
Major improvement in precision of allele ratio determination in Kf method was manifest from the fact that CV was always significantly reduced in the proposed kf method when compared intra and inter CV value with that of baseline subtracted fluorescence ratio method and SDM Ct method. Our results also suggest that precise allele ratio quantification range in the both baseline subtracted fluorescence ratio method and normalization fluorescence ratio method (kf method) was wider than that in SDM method. In addition, two out of the three standard curves (r2 = 0.7966, r2 = 0.8181, r2 = 0.9961) in the SDM Ct method did not have good linear relationship. So, the allele ratio quantification cannot be reliably obtained in these two standard curves while Kf has exceptionally good linear relationship (r2 > 0.99) with allele ratio in all three runs. In SDM method, the larger difference between two the allele ratios, the larger difference of calculated result of SDM method deviate from linear orbit. This can be explained that slight Ct value determination error was multiplied (E-ΔCt) to such an extension that deviation of extreme ratio sample (9:1 or 1:9) was larger in the SDM method assuming the accurate amplification efficiency values were obtained in this method.
Non-PCR related fluorescence might have an influence on the fluorescence reading of each data point. So non-PCR related fluorescence has an effect on the determination of fluorescence ratio of each reaction. This hypothesis was confirmed from result of the initial fluorescence background analysis. Run 3 average FAM fluorescence background (371) was higher than the other two runs FAM fluorescence (187 and 163). But the VIC fluorescence backgrounds (101, 92,134) did not vary greatly in all three runs. So, in the baseline subtracted fluorescence ratio method, the non-PCR related fluorescence ratio of run 3 (2.76) was larger than that of other two runs (1.85, 1.77), which partly explains that the slope of the standard curve for run 3 (0.9199) is bigger than those of other two runs (0.5548, 0.5312). By normalizing each data points of initial fluorescence background, as proposed in fluorescence normalization method (kf) method, initial fluorescence background effect was decreased largely. In a parallel experiment, the method was also used to analyze YMDD/YVDD allele mixed samples with above proposed workflow (data not shown). Results both from YMDD/YIDD and from YMDD/YVDD showed that the homogeneity of standard curves in Kf method was better than in baseline subtracted fluorescence ratio method.
Although our analysis method is tested to quantify single nucleotide polymorphism (SNP) ratio in pooled DNA, we also anticipate that this analysis method could be used in determining the relative expression of two genes in one sample. Anders Stahlberg (2003) etc. reported the real-time method application in the latter area, the method, however, is only to tell weather or not the expression of two genes is ~60:40. The resulted shown here suggested that the proposed method here hold the potential of wider use in the two gene relative expression determination.
In conclusion, provided raw fluorescent data for independent fluorophore channels can be obtained, the proposed Kf method to quantify the allele is more precise and can give repeatable results between different PCR runs. It also has an advantage of wider precise quantification range. Additional benefit is that the kf method Kf method, without using any manually set parameters in the analysis process, is adaptable to fully automatic allele ratio quantification
Using the primer and probe design software Primer Express™, we designed PCR primer and MGB-probes for amplification of hepatitis B virus (HBV) fragment to detect rtM204I mutation in reverse transcriptase (rt) region of the HBV polymerase gene. In brief, PCR amplification reactions contained 400 nmol forward primer: 5'- GTAGGGCTTTCCCCCACTG -3'and reverse primer: 5'- AGM GGT AAA AAG GGA YTC AMG ATG -3', 200 nmol wild type HBV specific probe: 5'FAM – ATCATCCATATAACTGAAA-MGB-3', 200 nmol mutant HBV specific probe: 5'VIC – CACATCATCAATATAAC -MGB -3', 200 μM each dATP, dCTP and dGTP, 400 μM dUTP, 2 U Amplitaq Gold DNA polymerase, 0.2 U AmpErase Uracil N-glycosidase (UNG), Self-mixed Taqman buffer in a total volume of 50 μL. After a decontamination step at 37°C for 5 min and inactivation of UNG enzyme at 95°C for 10 min, a two step protocol was followed for 40 cycles: 95°C for 20 s and 60°C for 40 s. The real-time PCR was performed in Biorad iCycler. Raw fluorescence readings of each cycle from reactions were exported to an Excel workbook for further analysis.
Different allele ratio DNA pools preparation
The concentration of the DNAs used to construct pools was measured using the Roche Diagnostics HBV Monitor assay. All samples were measured in duplicates. Range pools were constructed by mixing appropriate volumes of homozygote DNA. The concentrations ranged from 10%–90% to 90%-10% with 10% increments.
Fluorescence data from real-time PCR were exported to a MS Excel spreadsheet. Raw fluorescence data and normalized fluorescence data .vs cycles were fitted with the four parametric sigmoid function (equation 11) using nonlinear regression function of SigmaPlot. FAM fluorescence vs. VIC fluorescence was fit with linear regression function of SigmaPlot. The slopes of the regression curves were used as normalized fluorescence ratio for quantification. Standard curves were constructed from samples containing known amounts of the given targets.
The authors would like to thank Prof. Zhi Geng from Beiking University for his help discussion with data statistics analysis.
- Kruglyak L: The use of a genetic map of biallelic markers in linkage studies. Nat Genet. 1997, 17 (1): 21-24. 10.1038/ng0997-21.PubMedView Article
- Gu Z, Hillier L, Kwok PY: Single nucleotide polymorphism hunting in cyberspace. Hum Mutat. 1998, 12 (4): 221-225. 10.1002/(SICI)1098-1004(1998)12:4<221::AID-HUMU1>3.0.CO;2-I.PubMedView Article
- Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science. 1996, 273 (5281): 1516-1517.PubMedView Article
- Lok AS, Heathcote EJ, Hoofnagle JH: Management of hepatitis B: 2000--summary of a workshop. Gastroenterology. 2001, 120 (7): 1828-1853.PubMedView Article
- Barcellos LF, Klitz W, Field LL, Tobias R, Bowcock AM, Wilson R, Nelson MP, Nagatomi J, Thomson G: Association mapping of disease loci, by use of a pooled DNA genomic screen. Am J Hum Genet. 1997, 61 (3): 734-747.PubMedPubMed CentralView Article
- Arnheim N, Strange C, Erlich H: Use of pooled DNA samples to detect linkage disequilibrium of polymorphic restriction fragments and human disease: studies of the HLA class II loci. Proc Natl Acad Sci U S A. 1985, 82 (20): 6970-6974. 10.1073/pnas.82.20.6970.PubMedPubMed CentralView Article
- Shaw SH, Carrasquillo MM, Kashuk C, Puffenberger EG, Chakravarti A: Allele frequency distributions in pooled DNA samples: applications to mapping complex disease genes. Genome Res. 1998, 8 (2): 111-123.PubMed
- Sham P, Bader JS, Craig I, O'Donovan M, Owen M: DNA Pooling: a tool for large-scale association studies. Nat Rev Genet. 2002, 3 (11): 862-871. 10.1038/nrg930.PubMedView Article
- Allen MI, Deslauriers M, Andrews CW, Tipples GA, Walters KA, Tyrrell DL, Brown N, Condreay LD: Identification and characterization of mutations in hepatitis B virus resistant to lamivudine. Lamivudine Clinical Investigation Group. Hepatology. 1998, 27 (6): 1670-1677. 10.1002/hep.510270628.PubMedView Article
- Breen G, Harold D, Ralston S, Shaw D, St Clair D: Determining SNP allele frequencies in DNA pools. Biotechniques. 2000, 28 (3): 464-6, 468, 470.PubMed
- Germer S, Holland MJ, Higuchi R: High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. Genome Res. 2000, 10 (2): 258-266. 10.1101/gr.10.2.258.PubMedPubMed CentralView Article
- Shifman S, Pisante-Shalom A, Yakir B, Darvasi A: Quantitative technologies for allele frequency estimation of SNPs in DNA pools. Mol Cell Probes. 2002, 16 (6): 429-434. 10.1006/mcpr.2002.0440.PubMedView Article
- Johnson MP, Haupt LM, Griffiths LR: Locked nucleic acid (LNA) single nucleotide polymorphism (SNP) genotype analysis and validation using real-time PCR. Nucleic Acids Res. 2004, 32 (6): e55-10.1093/nar/gnh046.PubMedPubMed CentralView Article
- Chen J, Germer S, Higuchi R, Berkowitz G, Godbold J, Wetmur JG: Kinetic polymerase chain reaction on pooled DNA: a high-throughput, high-efficiency alternative in genetic epidemiological studies. Cancer Epidemiol Biomarkers Prev. 2002, 11 (1): 131-136.PubMed
- Stahlberg A, Aman P, Ridell B, Mostad P, Kubista M: Quantitative real-time PCR method for detection of B-lymphocyte monoclonality by comparison of kappa and lambda immunoglobulin light chain expression. Clin Chem. 2003, 49 (1): 51-59. 10.1373/49.1.51.PubMedView Article
- Swillens S, Goffard JC, Marechal Y, de Kerchove d'Exaerde A, El Housni H: Instant evaluation of the absolute initial number of cDNA copies from a single real-time PCR curve. Nucleic Acids Res. 2004, 32 (6): e56-10.1093/nar/gnh053.PubMedPubMed CentralView Article
- Rutledge RG, Cote C: Mathematics of quantitative kinetic PCR and the application of standard curves. Nucleic Acids Res. 2003, 31 (16): e93-10.1093/nar/gng093.PubMedPubMed CentralView Article
- Liu W, Saint DA: Validation of a quantitative method for real time PCR kinetics. Biochem Biophys Res Commun. 2002, 294 (2): 347-353. 10.1016/S0006-291X(02)00478-3.PubMedView Article
- Tichopad A, Dzidic A, Pfaffl MW: Improving quantitative real-time RT-PCR reproducibility by boosting primer-linked amplification efficiency. Biotechnology Letters. 2002, 24 (24): 2053 -22056. 10.1023/A:1021319421153.View Article
- Bubner B, Gase K, Baldwin IT: Two-fold differences are the detection limit for determining transgene copy numbers in plants by real-time PCR. BMC Biotechnol. 2004, 4: 14-10.1186/1472-6750-4-14.PubMedPubMed CentralView Article
- Oliver DH, Thompson RE, Griffin CA, Eshleman JR: Use of single nucleotide polymorphisms (SNP) and real-time polymerase chain reaction for bone marrow engraftment analysis. J Mol Diagn. 2000, 2 (4): 202-208.PubMedPubMed CentralView Article
- Le Hellard S, Ballereau SJ, Visscher PM, Torrance HS, Pinson J, Morris SW, Thomson ML, Semple CA, Muir WJ, Blackwood DH, Porteous DJ, Evans KL: SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. Nucleic Acids Res. 2002, 30 (15): e74-10.1093/nar/gnf070.PubMedPubMed CentralView Article
- Wilhelm J, Pingoud A, Hahn M: Validation of an algorithm for automatic quantification of nucleic acid copy numbers by real-time polymerase chain reaction. Anal Biochem. 2003, 317 (2): 218-225. 10.1016/S0003-2697(03)00167-2.PubMedView Article
- Peirson SN, Butler JN, Foster RG: Experimental validation of novel and conventional approaches to quantitative real-time PCR data analysis. Nucleic Acids Res. 2003, 31 (14): e73-10.1093/nar/gng073.PubMedPubMed CentralView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.