Evaluation validation of a qPCR curve analysis method and conventional approaches

Zhang, Yashu; Li, Hongping; Shang, Shucheng; Meng, Shuoyu; Lin, Ting; Zhang, Yanhui; Liu, Haixing

doi:10.1186/s12864-021-07986-4

Volume 22 Supplement 5

Selected articles from the 19th Asia Pacific Bioinformatics Conference (APBC 2021): genomics

Research
Open access
Published: 16 November 2021

Evaluation validation of a qPCR curve analysis method and conventional approaches

Yashu Zhang ORCID: orcid.org/0000-0002-7448-8809¹,
Hongping Li¹,
Shucheng Shang¹,
Shuoyu Meng¹,
Ting Lin²,
Yanhui Zhang² &
…
Haixing Liu³

BMC Genomics volume 22, Article number: 680 (2021) Cite this article

5891 Accesses
5 Citations
Metrics details

Abstract

Background

Reverse Transcription quantitative polymerase chain reaction (RT-qPCR) is a sensitive and reliable method for mRNA quantification and rapid analysis of gene expression from a large number of starting templates. It is based on the statistical significance of the beginning of exponential phase in real-time PCR kinetics, reflecting quantitative cycle of the initial target quantity and the efficiency of the PCR reaction (the fold increase of product per cycle).

Results

We used the large clinical biomarker dataset and 94-replicates-4-dilutions set which was published previously as research tools, then proposed a new qPCR curve analysis method——C_qMAN, to determine the position of quantitative cycle as well as the efficiency of the PCR reaction and applied in the calculations. To verify algorithm performance, 20 genes from biomarker and partial data with concentration gradients from 94-replicates-4-dilutions set of MYCN gene were used to compare our method with various publicly available methods and established a suitable evaluation index system.

Conclusions

The results show that C_qMAN method is comparable to other methods and can be a feasible method which applied to our self-developed qPCR data processing and analysis software, providing a simple tool for qPCR analysis.

Background

The working principle of the qPCR is to add fluorophore into the qPCR system, and use the fluorescence signal accumulation to detect the whole qPCR process [1]. The accumulated amount of DNA reaction products after fluorescent labeling is used as amplification data (expressed as amplification curves) can be used to determine the initial target quantity (called N₀ at the concentration level and called F₀ at the fluorescence level). An amplification reaction is generally displayed by an amplification curve, while the y-axis represents the fluorescence signal accumulation and the x-axis represents the number of cycles. During the process, the product fluorescence can not rise above the background at the beginning and almost tending to a straight line; as the reaction progresses, the fluorescence accumulates until the product is consumed and the fluorescence ceases to increase [2, 3]. The reason for this process is that, initially, the product quantity is very small, caused a weak fluorescence signal to be detected at baseline phase. The exponential increase of the product starts in cycle 1. It becomes visible when its associated fluorescence can be observed above baseline noise. During the transitional phase products continue to accumulate, but reagents become limiting and the reaction efficiency begins to fall. Until the product is no longer produced, so the reaction reaches to plateau phase [4]. Therefore, the baseline phase, exponential phase, transitional phase, and plateau phase of the amplification curve are generated based on the quantitative relationship between the fluorescence signal accumulation and cycles in Fig. 1A.

In Fig. 1A, the initial fluorescence of the reaction is at the background level with high noise, almost no fluorescence signal can be detected, then the product fluorescence rises above the background in the exponential phase within a few cycles and begins to saturate in the approach to the final plateau phase. Figure 1B shows the locations of relevant parameters determined by C_qMAN method.

For the relevant parameters of the amplification curve, the amplification process determines a quantitative threshold (called F_q in most methods) indicates a detectable fluorescent signal produced by the accumulation of sufficient amplification products which is generally set in the exponential phase. The x-axis of this quantitative threshold corresponds to a cycle called C_q in most methods, which is called C_qMAN in our method.

The amplification efficiency(E) is another important parameter for checking qPCR data analysis. Under ideal conditions, the number of DNA sequences will double in each cycle, the percentage of E-1 is 100% (at this time E is 2) [5]. However, due to factors such as reaction inhibitors, enzyme, primer and probes differences, PCR efficiency rarely reaches 100%. Therefore, E is any number between 1 and 2 [6]. Previously published studies have been suggested that PCR efficiencies mostly range between 65 and 90% [7].

After determining the quantitative cycle, the quantitative threshold, and estimating the amplification efficiency, the kinetics of qPCR exponential phase are described by eq. (1) to indicate the initial target quantity of the reaction.

$$ {N}_n={N}_0\times {E}^n $$

(1)

in which N₀ and N_n, are the initial target amount of DNA and the DNA target amount after n cycles, respectively. F_n, the fluorescent signal after n cycles and F₀, the fluorescence signal represents starting amount of the target DNA are the performance of N_n and N₀ at the fluorescence level [2]. Therefore, eq. (1) can be described as eq. (2)

$$ {F}_n={F}_0\times {E}^n $$

(2)

using the relevant parameters estimated by the curve analysis algorithm method can be expressed as eq. (3)

$$ {F}_q={F}_0\times {E}^{C_q} $$

(3)

then the observed initial target quantity(F₀) is calculated.

In the past two decades, the rapid development of qPCR technology has led to the production of multiple protocols, reagents, analytical methods and reporting formats. The original standard-C_q method [8, 9] fits a standard curve by preparing multiple sets of replicable experiments of the samples of known concentration, and estimates the concentration of unknown samples from the standard curve. This approach assumes that all standard samples have the same efficiency and is only effective if thresholds are measured from the exponential phase of the PCR reaction, some authors have reported that this assumption may be questionable [10]. Later, an approach proposed by Liu and Saint [11] assumes an efficiency can be obtained by fitting PCR amplification curve with a sigmoid function without preparing standard samples. Since then, the pioneering methods of estimating the target quantity of the initial reaction by calculating the reaction efficiency from the dynamics of a single PCR reaction has been widely used for improvement, and these methods differ in determining the fluorescence baseline, exponential phase, F_q, C_q, E to estimate initial target quantities. Baseline estimation is considered a constant baseline in some methods, including the observed minimum fluorescence, the mean value of the three lowest observations, the mean value of a set of fixed cycles [8, 12,13,14], and the baseline may also be determined by means of a set of dynamically determined baseline phase periods [15, 16] and taking-difference linear regression method [17]. However, the true value of background fluorescence is unknown, and errors in baseline estimation can lead to significant distortion of the results [8, 18, 19]. The difference defined by the exponential phase can easily lead to different results [20]. The residual algorithm estimation with the maximum value of the second derivative as the end point is commonly used [11, 21], or three periods are selected within the midpoint of the fluorescence signal [22]. Estimation of efficiency includes fitting the entire exponential cycle [21, 23], calculating the slope of the points within a certain defined range after linear regression [22, 24], and obtaining the ratio of the threshold fluorescence to the fluorescence value of the previous cycle [25]. F_q is generally defined in the exponential phase and then the value of C_q is determined, but in some methods, F_q and C_q are not involved [15, 26, 27]. And the definition process of all parameters may be combined with the fitting of the amplification curve to better obtain [23, 25, 26].

In order to provide reference for further developing and evaluating the qPCR curve analysis method and promoting the research of quantitative fluorescence PCR in gene expression, the new curve analysis method and other methods were evaluated on the biomarker dataset and 94-replicates-4-dilutions set in this paper from the aspects of expression level and statistical significance. The goal of this paper is to make our new method a comparison of other methods, at the same time provide users with an alternative curve analysis scheme. In order to evaluate the new method, some evaluation performance indicators were proposed.

Methods

qPCR dataset

Biomarker dataset

Data comes from a previously published study [28] that developed and validated the expression profile of a 59-mRNA gene to improve prognosis in children with neuroblastoma. This dataset measured 59 biomarkers and 5 reference genes in a sample maximization experimental design, using the LightCycler480 SYBR Green Master (Roche) in a 384-well plate with 8 μl reaction. These genes have been reported in at least two independent studies as prognostic genes for neuroblastoma. Three hundred sixty-six cDNA samples from the primary tumor biopsy and a 5-point 10-fold serial dilution series based on an external oligonucleotide standards (from 150,000 to 15 copies, n = 3), and no template control (NTC, n = 3) are included in each plate [28, 29]. This dataset will be referred to as ‘biomarker dataset’ in this study. Since there was no obvious specificity of 63 genes in this dataset, 20 of them (AHCY,AKR1C1,ARHGEF7,BIRC5,CAMTA1,CAMTA2,CD44,CDCA5,CDH5,CDKN3,CLSTN1,CPSG3,DDC,ECEL1,ELAVL4,EPB41L3,EPHA5,EPN2,FYN,HIVEP2) were randomly selected and then 300 (5 × 3 × 20) amplification curve data of 20 genes with concentration of 150,000, 15,000, 1500, 150, 15(3 replicated experiments for each group) were used for subsequent analysis.

94-replicates-4-dilutions set

This data set created a dilution series consisting of four 10-fold serial dilution points from 15,000 to 15 molecules, using 10 ng / μl yeast tRNA as a carrier (Roche) and created NTC samples of the same dilution. qPCR was done on a CFX 384 instrument (Bio-Rad). QPCR was performed on a CFX 384 instrument (Bio-Rad) using a 96-well pipetting robot (Tecan Freedom Evo 150). Amplification reactions were performed in 8 μl samples containing 0.4 μl forward and 0.4 μl reverse primer (5 μM each), 0.2 μl nuclease-free water, 4 μl iQ SYBR Green Supermix (Bio-Rad) and 3 μl of standard oligonucleotide. In 384-well plates (Hard-Shell 384-well microplate and Microseal B clear using an adhesive seal (Bio-Rad)), for each of the 4 dilution points, a total of 94 replicate reactions were distributed. In addition, the NTC reaction was repeated 8 times [28]. This dataset will be referred to as ‘94-replicates-4-dilutions set’. And 44 (4 × 11) amplification curves of the MYCN gene with a diluted concentration of 15, 150, 1500,15,000(11 replicated experiments for each group) were used for subsequent analysis.

qPCR curve analysis method

Previously published curve analysis method

We provide general descriptions of the 7 methods previously published. In this study, these methods will be referred to with their preferred abbreviations LinRegPCR, DART, FPLM, FPK-PCR, 5PSM, PCR-Miner and Cy0. The LinRegPCR program [16] starts with import of raw fluorescence data. A constant baseline fluorescence is determined per reaction with an iterative algorithm that aims at the longest set of data points on a straight line going down from the second derivative maximum cycle. After subtraction of the baseline fluorescence, LinRegPCR sets a window-of-linearity (W-o-L) that includes 4 points in the exponential phase of each sample and calculates the individual PCR efficiency from the slope of the regression line through these points. For each amplicon group, a quantification threshold F_q is set at 1 cycle below the top border of the W-o-L and the C_q is determined for each reaction. DART [22] constructs a model based on the maximum fluorescence value (R_max) and the baseline fluorescence noise (R_noise) to determine a central point M, and fits the cycle within a 10-fold range around M to estimate E, F_q, C_q obtain by 10-fold the standard deviation of 1–10 cycles. FPLM [21] uses four-parameter logistic model to fit the fluorescence curve and estimate the exponential phase, and the same as DART in determining F_q, C_q. The bilinear model and the six-parameter logistic model are used in the FPK-PCR [26] to estimate the E and initial target quantity without determining fluorescence threshold.5PSM [25] uses the ratio of the fluorescence value at the second derivative maximum (SDM) after fitting the curve with the five-parameter model to the fluorescence value of the previous cycle as the amplification efficiency and the cycle of SDM is used as the C_q. The principle of PCR-Miner [30] is based on the four-parameter logistic model to fit the raw fluorescence data as a function of PCR cycles to identify the exponential phase of the reaction. The method chooses the first positive second derivative maximum from the logistic model to calculate the dynamic fluorescence threshold and corresponding C_q. A three-parameter simple exponent model is fitted to this exponential phase using an iterative non-linear regression algorithm to compute the individual efficiency. Cy0 [31] obtains the intersection point (Cy0) between the abscissa axis of the curve inflection point and the tangent line based on the nonlinear regression of the Richards equation to the fluorescence value. The efficiency is estimated by the parameters in the post-fitting equation, and then the initial target quantity is obtained.

C_qMAN method

C_qMAN (C_q Management And Analysis System) is an adaptive analysis system that summarizes the methods and experiences of previous methods and provides a robust, objective, and noise-resistant method for quantification of qPCR results. Since researches have shown that smoothing can at best lead to erroneous accuracy of results, and usually also bias the results [32], the improved adaptive Savitzky-Golay filter in the C_qMAN system is only used for visual display of data. The detailed process is shown in Additional file 1. The C_qMAN method has been implemented in the system. We provide the URL of the system (http://122.193.29.190:9913/xMAN/en-us/index), and readers can reproduce our experimental results by combining with Additional files 1 and 2.

C_qMAN method relies on the modified gompertz model, is fitted to the raw fluorescence data by means of a non-linear fitting routine the Levenberg-Marquardt algorithm that minimizes the residual sum-of-squares to obtain parameters baseline fluorescence (y₀) and maximum fluorescence (y_max), exp. is the natural logarithm base, Ln is the natural logarithm, x is the actual cycle number, b and x₀ determine the shape of each model.

$$ y={y}_0+\left({y}_{max}-{y}_0\right){\lambda}^{{-\mathit{\exp}}^{-\left(x-{x}_0\right)/b}} $$

(4)

$$ {F}_{SDM}= bLn\left(\left(\left(\sqrt{5}-3\right){\lambda}^{\frac{X_0}{b}}\right)/2\right) $$

(5)

The maximum value of the second derivative are obtained by fitting the second derivative of the gompertz curve to estimate the end of the exponential phase (eq. (5)). x_SDM is the cycle at the maximum of the second derivative (SDM) which is applied as the end point of the exponential phase and the fluorescence value corresponding to this cycle is F_SDM in C_qMAN method. Take the intermediate value of y₀ and F_SDM as the “midpoint” F_q (eq. (6)), then substitute this value into eq. (4) to obtain the quantitative cycle (C_qMAN) (see Fig. 2B).

$$ {F}_q=\left({y}_0+{F}_{SDM}\right)/2 $$

(6)

For efficiency estimation, a three-parameter simple exponent model is fitted to this exponential phase (from C_qMAN to x_SDM) using an non-linear regression algorithm to estimate the single reaction’s individual efficiency in eq. (7). The individual efficiency of multiple reactions of the same gene is averaged, then the observed target quantity (F₀) can be calculated by eq. (8).

$$ {F}_n={F}_q+\alpha \times {E}^n $$

(7)

$$ {F}_0=1/{E_{\boldsymbol{mean}}}^{C\mathrm{q} MAN} $$

(8)

The logistic model used in Cy0, PCR Miner are generally susceptible to the influence of the number of amplified data in the plateau phase, resulting in inaccurate fitting [33]. At the same time, 5PSM adds a parameter to the logistic model to maintain the symmetry of the s-shaped curve structure, which will affect the calculation of parameters such as the maximum of the second derivative, resulting in larger errors. The gompertz model in C_qMAN is not easily affected by the data in the plateau phase, and it fits well in all the phase. At the same time, this method can ensure that the C_qMAN value is within the exponential phase without judging the starting point of the phase (in the first 2–3 cycles of the cycle where the SDM is located). It does not rely on baseline estimation of the noise larger phase of the fluorescence signal, and avoids the problem of deviation caused by the assumption in the DART and FPLM method that a constant baseline can be determined from the baseline phase. By using nonlinear regression fitting to estimate the average efficiency of all reactions of each gene, C_qMAN method further averaged the amplification reaction noise between each gene, more effectively resisting the noise while reducing the estimation error [31]. However, the shortcoming is that this method is prone to error under the influence of dynamic outliers (inhibition), in which aspect FPK-PCR performs better.

Results

Performance indicators

To eliminate the different measurement scales used by the analytical method based on concentration levels and fluorescence levels [34], we divided the data of all concentrations by the highest concentration data and all fluorescence data by the average value of the maximum observed target quantity (F₀), so that the average value of the maximum concentration and the maximum observed target quantity is 1. This process is called normalization. Then data sets were used to establish 6 performance indicators to measure the degree of compliance between the observed initial target quantity (F₀) calculated by the algorithm and the true value from different angles. Among them, the bias and relative error are used to compare the difference between the observed initial target quantity and the true value; coefficient of variation and precision are used to compare the difference between the observed initial target quantity (F₀) of the same group. The smaller the difference, the more reliable the method. Performance indicators as follows.

(1) Bias. The ratio between the average of the observed initial target quantity F₀ corresponding to the highest and lowest concentrations is calculated. In biomarker, the expected value of this ratio is 10,000 (because the ratio of the concentration of 150,000 and 15 is 10,000), and in 94-replicates-4-dilutions set, the expected value of this ratio is 0.001 (because the ratio of the diluted concentration of 15 and 15,000 is 0.001) and any value deviating from 10,000 or 0.001 is expressed as a bias. The log-transformed (base 10) between the true value and the initial target quantity F₀. After the data is normalized, the linear regression analysis makes the log (F₀) and log (NC) (NC, normalized concentration) slopes of the unbiased method 1 and any slope deviates from the value of 1 also expressed as a bias.

(2) Relative error (RE).

$$ RE=\frac{F_0- NC}{NC} $$

(9)

RE is the deviation after F₀ and NC are normalized to the same measurement scale.

(3) Coefficient of variation (CV).

$$ CV=\frac{SD_{\boldsymbol{group}}}{\mu_{\boldsymbol{group}}}\times 100\% $$

(10)

CV represents the ratio of the standard deviation (SD) to the average value(μ) of the same group (replicated experiments) of observed initial target quantity (F₀).

(4) Precision. Precision represents the within-triplicate variance of the observed initial target quantity (F₀) in the same group.

(5) Resolution. A linear regression analysis of log (true) on log(F₀) was performed and the 95% CI around the regression line was constructed. The width of this interval was converted into a fold deviation from the regression line and the geometric mean for the 5 groups was calculated as a measure of resolution.

Indicator evaluation

In the supplementary information, the original amplification experiment data of the two data sets used in this study were obtained from Reference [28] after being processed into the readable format of the C_qMAN system. We imported the data of these two data setsinto the C_qMAN system to obtain the F₀, C_q, and E calculated by the C_qMAN, integrated the results with the three parameter values of the other 7 methods provided in reference [28] (see biomarker_performance _indicators and 94_replicates_4_dilutions_set_results). In the bias_and_deviat_from_regres of biomarker_performance_indicators, the process of C_qMAN calculating 4 performance indicators is shown and it is the same as the calculation process of other 7 methods. Therefore, the calculation process of the other 7 methods is no longer provided. The performance indicators’ calculation results of the 8 methods are provided in biomarker_analysis_dilutoin_series in and 94_replicates_4_dilutions_set_results.

Except that the efficiency analysis results of the other 7 methods (see Fig. 3) directly used the data provided in reference [28] in the subsequent performance indicator analysis, the analysis results of other performance indicators are all reanalysis results.

Biomarker dataset analysis

The performance indicator values determined from the concentration series included in the measurement of the 20 genes are summarized in box-and-whisker plots. The boxes range from the 25th to the 75th percentile and are divided by the median; the whiskers are set at the 5th and 95th percentile (A) Bias in the slope level, which is based on the degree of deviation from 1.(B) The box-and-whisker plot of relative errors shows the difference between the observed initial target quantity and the true value.(C) Coefficient of variation is an objective indicator of the effects of measurement scales and dimensions that eliminate fluorescence levels and concentration levels.(D) Precision is determined as the within-triplicate variance and should have the same, low, value in all methods.(E) Resolution defined as the fold-chance that would result in the detection of a difference at a 5% significance level.

The mean value of the efficiencies of each gene per method.

(1) Bias. We expect the ratio between the observed initial target quantity and the true value to be 10,000 or 0.001 in two different datasets. After the data is normalized, the linear regression analysis makes the log (F₀) and log (SQ) slopes of the unbiased method 1, which will be unbiased. Cy0 has an advantage in the deviation index because the method calculate the efficiency value based on the slope of the relationship between Cy0 and log (input), and then use this efficiency value and the Cy0 value to calculate F₀. Therefore, Cy0 is unbiased and are the result of circular reasoning, but this also ensures that the observed initial target quantity F₀ is more accurate. Other methods are positively or negatively biased, and the observed values deviate significantly from the true values in Fig. 2A. Among them, C_qMAN performs better in the bias, with an average deviation of 2469.0003(for 10,000) and 0.0182(for 0.001).

(2) Relative error. The relative error was originally used to compare the difference between the measured value and the true value, and the degree of confidence in the response measurement. Here we can use the relative error response to calculate the difference between the observed value and the true value, reflecting the credibility of the algorithm. More intuitive response measurement accuracy than absolute error. We use relative error as one of the indicators to determine the difference between the observed initial target quantity F₀ and the true value. Cy0 performed best, average relative error was 0.1050. The average relative error of the rank after the second PCR-Miner was 0.2287, C_qMAN was 0.2416, and the highest 5PSM was as high as 0.6939 in Table 1 and Fig. 2B.

Table 1 Analysis of the average of 20 genes in 4 indicators per method

Full size table

(3) Coefficient of variation. The coefficient of variation reflects the degree of dispersion of the data, and at the same time overcomes the effects of large differences in measurement scales or different data sizes. We use the coefficient of variation coefficient to calculate the degree of dispersion of the observed initial target quantities of the three groups at each concentration, and average the five groups of coefficients of variation. The smaller the coefficient of variation, the lower the degree of dispersion. Result showed that C_qMAN showed the best performance of 7.20%, Cy0, LinRegPCR, PCR-Miner also stabilized at about 9.60%, and FPK-PCR’s coefficient of variation was as high as 25.12% in Table 1 and Fig. 2C.

(4) Precision. The five concentration sequences were measured three times and the fluorescence data were analyzed. Therefore, the variance of each set of 3 measurements should be small, reflecting only random changes in laboratory procedures and fluorescence measurements, and such changes should always be the same. The resulting three internal variances can be considered as a measure of the accuracy of the analytical method. C_qMAN, 5PSM, Cy0, LinRegPCR have lower variability in Fig. 2D.

(5) Resolution. Data points outside the 95% CI of the regression line fitted to the concentration sequence after linear regression will be judged to be significantly different from the true value and expressed in resolution. LinRegPCR has the lowest resolution; lower is better. Cy0, PCR-Miner and C_qMAN also perform well in Fig. 2E. With these 4 methods, the observed 2-fold difference is significant for approximately 85% of genes. For 5PSM, DART, FPLM, the resolution lies between the 2 and 3-fold-difference. In FPK-PCR, 40% of genes are over 5-fold-difference.

(6) Efficiency. The range of differences in efficiency values for each method indicates that this variability is the sum of the difference in efficiency between genes and the difference in estimation methods. Therefore, the difference between the methods cannot be explained. Except that DART and FPLM share a method of finding E, other methods get different median values of E. FPK-PCR and PCR-Miner have a large number of efficiency values above 2, which is obviously too high and the median value of C_qMAN, Cy0, LinRegPCR, 5PSM is between 1.7 and 1.9. We calculated the standard deviation of the amplification efficiency of the 20 genes, in which LinRegPCR, DART, FPLM calculated E value is relatively stable in Fig. 3.

94-replicates-4-dilutions set analysis

The highest dilute concentration is set to 1, the y-axis is set to log (dilution) (base 10).

(1) Target quantity. For data with dilute concentrations of 15,000, 1500, 150, and 15, respectively, the observed target quantity should be as close as possible to the expected value −3, −2, −1, 0 obtained after calculating the log (F₀) (base 10) in Fig. 4. The systematic negative or positive deviation of each analysis method is shown by the deviation of the average F₀ from the expected value (Fig. 4: horizontal line). C_qMAN, Cy0, PCR-Miner and LinRegPCR have the least bias. DART and FPLM show a higher bias, 5PSM displays a strong overestimation whereas FPK-PCR shows a strong underestimation of F₀ values.

(2) Bias, RE, CV, precision, resolution and E. C_qMAN and Cy0 keep lower variance in bias. C_qMAN perform best in RE, CV and precision. C_qMAN, Cy0, LinRegPCR and PCR-Miner does not vary much between the values in CV and precision. LinRegPCR has the lowest resolution, the average resolution of Cy0, PCR-Miner and C_qMAN is around 2-fold. Table 2 clearly illustrate the differences in 6 indicators of 8 methods and the average PCR efficiency of these methods is provided. The efficiency of Cy0 was not provided in the previously published data analysis.

Table 2 Analysis of MYCN gene in 4 indicators and the mean of PCR efficiency per method

Full size table

Discussions

For each of the evaluation indexes of the concentration sequence analysis of each gene, the rank synthesis method was used, and the Friedman test determined that these methods were not significantly different and comparable. Table 3 shows the results of each gene and method. The lower average rank indicates that the method which estimates the initial target quantity is closer to the true value in the performance evaluation of the four indicators we selected.

Table 3 Analysis of performance parameters per method in biomarker dataset (left) and 94-replicates-4-dilutions set (right). For each method, the mean rank is given for each of the performance indicators bias, RE, CV, precision and resolution. The methods are sorted based on the average of these ranks

Full size table

In the average rank sorting of 20 genes in the biomarker data set, the lowest rank average of C_qMAN and Cy0 are 2.08. The rank averages of the 5PSM, DART, FPLM, and FPK-PCR are all above 6, and the overall performance of F₀ estimation is lower in Table 3. For the 94-replicates-4-dilutions set, the performance of C_qMAN is 1.58, the average rank of Cy0 is 1.92, and the performance of LinRegPCR and PCR-Miner are also good; the rank average of 5PSM, DART, FPLM, and FPK-PCR is much higher.

Conclusions

Based on PCR kinetics and exponential model simulations, this study combines the real-time quantitative PCR curve analysis method proposed by the predecessors, and proposes a reliable gene expression level quantification method, C_qMAN. To prove the reliability of the method, two data sets from different instruments, different PCR mixtures, and a testable hypothesis were used to evaluate the performance of multiple qPCR curve analysis methods. The fluorescence data of the other 7 methods in the performance analysis process were taken from a previously published research by Ruijter et al. in 2013 [28]. Since the supplemental information from this research provided an excel template for calculating bias and precision, we can directly import the amplification curve data from two data sets analyzed by the C_qMAN system into the excel template to obtain the calculated values of the two indicators. The relative error and coefficient of variation are the two statistical indicators proposed by the author of this study for evaluation and analysis. Therefore, due to the difference in indicator settings and the difference in data sets selection, our analysis results are different from the results previously published by Ruijter et.al.

The limitation of this study is that two datasets have limited evaluation of the general applicability of the C_qMAN method, so future researches should include more instances and more verification indicators to better verify the robustness and representativeness of the method. However, it is undeniable that the analysis templates, datasets, and analysis results (see supporting information) in this research will definitely help further evaluation of research and make the results comparable with our results.

The aim of this study is not to promote a particular curve analysis method with the best overall performance, because the choice of methods by the experimenters may depend on the different research goals of experimental instruments, reagents, protocols, etc. It is our intention to help users choose the ideal method for their own studies and developers to modify and improve their methods [35].

Availability of data and materials

http://122.193.29.190:9913/xMAN/en-us/index

Abbreviations

RT-qPCR:: Reverse transcription quantitative polymerase chain reaction
C_qMAN:: C_q management and analysis system
RE:: Relative error
CV:: Coefficient of variation
E:: Efficiency

References

Tichopad A, Dilger M, Schwarz G, Pfaffl MW. Standardized determination of real-time PCR efficiency from a single reaction set-up. Nucleic Acids Res. 2003;31:e122.
Article Google Scholar
Higuchi R, Fockler C, Dollinger G, Watson R. Kinetic PCR analysis: realtime monitoring of DNA amplification reactions. Nat Biotechnol. 1993;11(9):1026–30. https://doi.org/10.1038/nbt0993-1026.
Article CAS Google Scholar
Tellinghuisen J, Spiess AN. Comparing real-time quantitative polymerase chain reaction analysis methods for precision, linearity, and accuracy of estimating amplification efficiency. Anal Biochem. 2014;449:76–82. https://doi.org/10.1016/j.ab.2013.12.020.
Article CAS PubMed Google Scholar
Heather D, VanGuilder K, Vrana E, Willard MF. Twenty-five years of quantitative PCR for gene expression analysis. Biotechniques. 2008;44(5):619–29. https://doi.org/10.2144/000112776.
Article CAS Google Scholar
Rao X, Lai D, Huang X. A new method for quantitative real-time polymerase chain reaction data analysis. J Comput Biol. 2013;20:703–11.
Article CAS Google Scholar
Liu W, Saint DA. A new quantitative method of real time reverse transcription polymerase chain reaction assay based on simulation of polymerase chain reaction kinetics. Anal Biochem. 2002;302(1):52–9. https://doi.org/10.1006/abio.2001.5530.
Article CAS PubMed Google Scholar
Kamphuis W, et al. Prostanoid receptor gene expression profile in human trabecular meshwork: a quantitative real-time PCR approach. Invest Ophthalmol Vis Sci. 2001;42:3209–15.
CAS PubMed Google Scholar
Larionov A, Krause A, Miller A. Standard curve based method for relative real time PCR data processing. BMC Bioinformatics. 2005;6:e62.
Article Google Scholar
Javad MA. Evaluation of real-time PCR efficiency by the use of two strategies: standard curve and linear regression. Jundishapur Sci Med J. 2012;11(1):85–95.
Google Scholar
Raeymaekers L. A commentary on the practical applications of competitive PCR. Genome Res. 1995;5(1):91–4. https://doi.org/10.1101/gr.5.1.91.
Article CAS PubMed Google Scholar
Liu W, Saint DA. Validation of a quantitative method for real time PCR kinetics. Biochem Biophys Res Commun. 2002;294(2):347–53. https://doi.org/10.1016/S0006-291X(02)00478-3.
Article CAS PubMed Google Scholar
Wilhelm J, Pingoud A. Real-time polymerase chain reaction. Chem Biochem. 2003;4(11):1120–8. https://doi.org/10.1002/cbic.200300662.
Article CAS Google Scholar
Frank DN. BARCRAWL and BARTAB: software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing. BMC Bioinformatics. 2009;10:e362.
Article Google Scholar
Dello RC, et al. Novel sensitive, specific and rapid pharmacogenomic test for the prediction of abacavir hypersensitivity reaction: HLA-B*57:01 detection by real-time PCR. Pharmacogenomics. 2010;12:567–76.
Article Google Scholar
Boggy GJ, Woolf PJ. A mechanistic model of PCR for accurate quantification of quantitative PCR data. PLoS One. 2010;5:e12355.
Article Google Scholar
Ruijter JM, et al. Amplifification effificiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res. 2009;37:e45.
Article CAS Google Scholar
Chen, P. Comparison of different methods for quantitative real-time polymerase chain reaction data analysis. Dissertations & Theses (2014).
Rutledge RG, Stewart D. A kinetic-based sigmoidal model for the polymerase chain reaction and its application to high-capacity absolute quantitative real-time PCR. BMC Biotechnol. 2008;8:47.
Article Google Scholar
Rebrikov DV, Trofimov D. Real-time PCR: a review of approaches to data analysis. Appl Biochem Microbiol. 2006;42(5):455–63. https://doi.org/10.1134/S0003683806050024.
Article CAS Google Scholar
Cikos S, Bukovska A, Koppel J. Relative quantification of mRNA: comparison of methods currently used for real-time PCR data analysis. BMC Mol Biol. 2007;8:e113.
Article Google Scholar
Heid CA, Stevens J, Livak KJ, Williams PM. Real time quantitative PCR. Genome Res. 1996;6(10):986–94. https://doi.org/10.1101/gr.6.10.986.
Article CAS PubMed Google Scholar
Peirson SN, Butler JN, Foster RG. Experimental validation of novel and conventional approaches to quantitative real-time PCR data analysis. Nucleic Acids Res. 2003;31:e45.
Article Google Scholar
Spiess AN, Feig C, Ritz C. Highly accurate sigmoidal fitting of real-time PCR data by introducing a parameter for asymmetry. BMC Bioinformatics. 2008;9:e211.
Article Google Scholar
Ramakers C, Ruijter JM, Deprez RH, et al. Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci Lett. 2003;1:e62.
Article Google Scholar
Ritz C, Spiess AN. qPCR: an R package for sigmoidal model selection in quantitative real-time polymerase chain reaction analysis. Bioinformatics. 2008;24(13):1549–51. https://doi.org/10.1093/bioinformatics/btn227.
Article CAS PubMed Google Scholar
Guescini M, Sisti D, Rocchi MB, et al. A new real-time PCR method to overcome significant quantitative inaccuracy due to slight amplification inhibition. BMC Bioinformatics. 2008;9:e326.
Article Google Scholar
Rutledge RG, Stewart D. Assessing the performance capabilities of LRE-based assays for absolute quantitative real-time PCR. PLoS One. 2010;5:e9731.
Article Google Scholar
Ruijter JM, Pfaffl MW, Zhao S, Spiess AN, Boggy G, Blom J, et al. Evaluation of qPCR curve analysis methods for reliable biomarker discovery: Bias, resolution, precision, and implications. Methods. 2013;59(1):32–46. https://doi.org/10.1016/j.ymeth.2012.08.011.
Article CAS PubMed Google Scholar
Lievens A, Van AS, Van den Bulcke M, Goetghebeur E. Enhanced analysis of real-time PCR data by using a variable efficiency model: FPK-PCR. Nucleic Acids Res. 2012;40:e10.
Article CAS Google Scholar
Zhao S, Fernald RD. Comprehensive algorithm for quantitative real-time polymerase chain reaction. J Comput Biol. 2005;12:1047–64.
Article CAS Google Scholar
Vermeulen J, de Preter K, Naranjo A, Vercruysse L, van Roy N, Hellemans J, et al. Predicting outcomes for children with neuroblastoma using a multigene-expression signature: a retrospective SIOPEN/COG/GPOH study. Lancet Oncol. 2009;10(7):663–71. https://doi.org/10.1016/S1470-2045(09)70154-8.
Article CAS PubMed PubMed Central Google Scholar
Spiess AN, Deutschmann C, Burdukiewicz M, Himmelreich R, Klat K, Schierack P, et al. Impact of smoothing on parameter estimation in quantitative DNA amplification experiments. Clin Chem. 2015;61(2):379–88. https://doi.org/10.1373/clinchem.2014.230656.
Article CAS PubMed Google Scholar
Rutledge RG. Sigmoidal curve-fitting redifines quantitative real-time PCR with the prospective of developing automated high-throughput applications. Nucleic Acids Res. 2004;32:e178.
Article CAS Google Scholar
Vynck M, Thas O. Reducing bias in digital PCR quantification experiments: the importance of appropriate modelling of volume variability. Anal Chem. 2018;90(11):6540–7. https://doi.org/10.1021/acs.analchem.8b00115.
Article CAS PubMed Google Scholar
Y Zhang et al. Evaluation validation of a qPCR curve analysis method and conventional approaches. https://doi.org/10.1101/2020.06.18.158873.

Download references

Acknowledgments

The authors wish to thank Professor Ting Lin’s group for their kindly assistance in providing the technical support.

About this supplement

This article has been published as part of BMC Genomics Volume 22 Supplement 5 2021: Selected articles from the 19th Asia Pacific Bioinformatics Conference (APBC 2021): genomics The full contents of the supplement are available at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-22-supplement-5.

Funding

National Natural Science Foundation of China(U1806205). The funders did not play any role in the design of the study, the collection, analysis, and interpretation of data, or in writing of the manuscript.

Author information

Authors and Affiliations

Department of Information Science and Engineering, Ocean University of China, Qingdao, China
Yashu Zhang, Hongping Li, Shucheng Shang & Shuoyu Meng
Apexbio Biotechnology (Suzhou) Co., Ltd, Suzhou, China
Ting Lin & Yanhui Zhang
First Institute of Oceanography, Ministry of Natural Resources, Qingdao, China
Haixing Liu

Authors

Yashu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongping Li
View author publications
You can also search for this author in PubMed Google Scholar
Shucheng Shang
View author publications
You can also search for this author in PubMed Google Scholar
Shuoyu Meng
View author publications
You can also search for this author in PubMed Google Scholar
Ting Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yanhui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haixing Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YZ was responsible for making charts, data analysis and writing the full text. HL provided the research idea and direction of the paper, and provided guidance for the revision of the paper. SS and SM were responsible for the development of C_qMAN software to facilitate the processing and analysis of the data in the paper. TL, YZ and LX provided guidance on datasets and software development. All authors have read and approved the final manuscript.

Authors’ information

Yashu Zhang received her B. S. degree from Qingdao University of Technology, Qingdao, China in 2018. She is majoring in her M.S. degree in geographic information system at Ocean University of China. Her research interests include bioinformatics and parallel computing.

Hongping Li received his B. S. and M. S. degree from TianJin University, Tianjin, China in 1984 and 1988 respectively. In 2003, he received his Ph. D degree in computer science from University of Oklahoma, Norman, OK, USA. He severed as a lecturer in Tsinghua University, Beijing, China from 1991 to 1997. In 2004, he joined the faculty of Ocean University of China, Qingdao, China, served as a professor in Department of Marine Technology. His research interests include bioinformatics and parallel computing.

Shucheng Shang received his B. S. degree from Taiyuan University of Technology, Taiyuan, China in 2017. He is majoring in his M.S. degree in geographic information system at Ocean University of China. His research interests include bioinformatics and parallel computing.

Shuoyu Meng received her B. S. degree from Changan University, Xian, China in 2017. She is majoring in her M.S. degree in geographic information system at Ocean University of China. Her research interests include bioinformatics and parallel computing.

Ting Lin is a researcher at Apexbio Biotechnology (Suzhou) Co.,Ltd., Suzhou, China. His research interests include bioinformatics and parallel computing.

Yanhui Zhang is a researcher at Apexbio Biotechnology (Suzhou) Co.,Ltd., Suzhou, China. Her research interests include bioinformatics and parallel computing.

Haixing Liu graduated from the Mathematics Department of Peking University in 1983. In 1999, he was transferred to the First Institute of Oceanography of the State Oceanic Administration from Qingdao University of Science and Technology. He is currently the director of the Ocean Information and Computing Center and the Director of the High Performance Computing Center. He mainly engaged in research and development of marine environmental information system.

Corresponding author

Correspondence to Hongping Li.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that Apexbio Biotechnology Co., Ltd. is currently applying for patents relating to the content of the manuscript.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Additional file 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Li, H., Shang, S. et al. Evaluation validation of a qPCR curve analysis method and conventional approaches. BMC Genomics 22 (Suppl 5), 680 (2021). https://doi.org/10.1186/s12864-021-07986-4

Download citation

Received: 18 August 2021
Accepted: 07 September 2021
Published: 16 November 2021
DOI: https://doi.org/10.1186/s12864-021-07986-4

Selected articles from the 19th Asia Pacific Bioinformatics Conference (APBC 2021): genomics

Evaluation validation of a qPCR curve analysis method and conventional approaches

Abstract

Background

Results

Conclusions

Background

Methods

qPCR dataset

Biomarker dataset

94-replicates-4-dilutions set

qPCR curve analysis method

Previously published curve analysis method

CqMAN method

Results

Performance indicators

Indicator evaluation

Biomarker dataset analysis

94-replicates-4-dilutions set analysis

Discussions

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgments

About this supplement

Funding

Author information

Authors and Affiliations

Contributions

Authors’ information

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1.

Additional file 2.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us

C_qMAN method