Skip to main content

Table 1 Summary of studies comparing normalization methods for the DEG analysis

From: Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

References

Normalization methods

Software Packages/ pipelines

Replicates per condition (n)

Conclusions

Bullard et al. 2010 [17]

POLR2A, Q, TC, UQ

Genominator

2, 4

POLR2A and UQ with LRT/Exact test significantly reduced the bias of DE relative to qRT-PCR

Kvam et al. 2012 [33]

DESeq, TMM, UQ

DESeq, edgeR, baySeq, TSPM

2, 4, 5

baySeq with UQ normalization performed best with highest sensitivity and low rates of false positives. But all the methods had an inflated true FDR (> 0.1).

Rapaport F. et al. 2013 [34]

DESeq, TMM, UQ, RPKM, FPKM, Q, voom,

Cuffdiff, DESeq, edgeR, baySeq, PoissonSeq, voom-limma

2, 3

No single method emerged as favorable in all comparisons, but baySeq with UQ method was least correlated with qRT-PCR and Cuffdiff had an inflated number of false positive predictions.

Li et al. 2015 [35]

DESeq, Med, Q, RPKM, RC, TMM, UQ, ERPKM

DESeq, edgeR, Cufflinks-CuffDiff, RSEM, Sailfis

2, 4

RC or RPKM seems to be adequate and the results from Sailfish and RSEM with RC or RPKM are inconsistent, resulting a conclusion of that normalization methods are not necessary for all sequence data.

Dilliest et al. 2013 [23]

DESeq, Med, Q, RPKM, TC, TMM, UQ

DESeq, edgeR, Cufflinks-CuffDiff

2, 3

Exact test from DESeq combined with DESeq/TMM normalization performed best in terms of control of FDR below 0.05 for high-count genes; RPKM, TC and Q should be abandoned in DE gene analysis.

Soneson et al. 2013 [36]

DESeq, TMM, UQ, RPKM, FPKM, voom, vst

DESeq, edgeR, EBSeq, baySeq, NBPSeq,NOIseq, SAMseq, ShrinkSeq,TSPM, limma

2, 5, 10, 11

DESeq had poor FDR control with 2 samples and good FDR control for larger sample sizes and low TPR.edgeR had poor FDR control with high TPR. Voom/vst-limma had good FDR control, but low power for small sample size.

Seyednasroliah et al. 2013 [37]

DESeq, TMM, UQ, RPKM, FPKM, voom

DESeq, edgeR, baySeq, NOIseq, SAMseq, limma, CuffDiff2, EBSeq

2:6, 8,10,12, 16, 20, 24, 28

DESeq and limma were the safe choice and relatively conservative while edgeR and EBSeq were too liberal. DESeq and edgeR were the best tools

Zhang et al. 2014 [38]

DESeq, TMM, FPKM,

DESeq, edgeR, Cufflinks-CuffDiff

1:6, 8, 14, 20

TMM performed best in terms of sensitivity and DESeq was the best for control false positives. Both were not sensitive to the read depth.

Lin et al. 2016 [39]

DESeq, Med, Q, RPKM, TC, TMM, UQ

DESeq, edgeR and SAS

2, 3, 5

DESeq and TMM normalization methods were recommended compared to the other methods.

Tang et al. 2015 [40]

RLE,TMM, UQ, RPKM, FPKM, Q, voom,

DESeq, DESeq2, edgeR, EBSeq, baySeq, SAMseq, PoissonSeq, voom-limma, TCC

1, 3, 6, 9

In multi-group comparison, the proposed pipeline internally using edgeR was recommended for count data with replicates while this pipeline with DESeq2 was recommended for data without replicates

Germain et al. 2016 [41]

RLE, TMM, voom, TPM

Cufflinks-CuffDiff, DESeq2, edgeR, voom-limma

3, 5

With benchmarked differential expression analysis, in general voom and edgeR showed the most stable performance and be superior to other methods in most assay with replicates of 3 and 5. But voom significantly underperformed in transcript-level simulation and edgeR shown suboptimal results in the SEQC dataset

Maza E 2016 [42]

TMM, RLE, MRN

DESeq2, edgeR

1

The three methods gave the same results for a simple two-condition comparison withourt replicates.

Costa-Silva et al. 2017 [43]

TMM, RLE, UQ, voom

Limma-Voom, NOIseq, DESeq2, SAMSeq, EBSeq, sleuth, baySeq, edgeR

1:8

Limma-voom, NOIseq and DESeq2 had more consistent results for DEGs identification

Spies et al. 2019 [44]

Vst, Med, RLE, TMM

DyNB, EBSeq-HMM, FunPat, ImpulseDE2, Imms, next maSigPro, nsgp, splineTC, timeSeq, edgeR, DESeq2

2, 3, 5

DESeq2 and edgeR with a pairwise comparison outperformed TC tools for short time course (< 8 time points) due to high false positive rate except ImpulseDE2, but they were less efficient on longer time series than splineTC and maSigPro tools.