Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

Li, Xiaohong; Cooper, Nigel G. F.; O’Toole, Timothy E.; Rouchka, Eric C.

doi:10.1186/s12864-020-6502-7

Table 1 Summary of studies comparing normalization methods for the DEG analysis

From: Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

References	Normalization methods	Software Packages/ pipelines	Replicates per condition (n)	Conclusions
Bullard et al. 2010 [17]	POLR2A, Q, TC, UQ	Genominator	2, 4	POLR2A and UQ with LRT/Exact test significantly reduced the bias of DE relative to qRT-PCR
Kvam et al. 2012 [33]	DESeq, TMM, UQ	DESeq, edgeR, baySeq, TSPM	2, 4, 5	baySeq with UQ normalization performed best with highest sensitivity and low rates of false positives. But all the methods had an inflated true FDR (> 0.1).
Rapaport F. et al. 2013 [34]	DESeq, TMM, UQ, RPKM, FPKM, Q, voom,	Cuffdiff, DESeq, edgeR, baySeq, PoissonSeq, voom-limma	2, 3	No single method emerged as favorable in all comparisons, but baySeq with UQ method was least correlated with qRT-PCR and Cuffdiff had an inflated number of false positive predictions.
Li et al. 2015 [35]	DESeq, Med, Q, RPKM, RC, TMM, UQ, ERPKM	DESeq, edgeR, Cufflinks-CuffDiff, RSEM, Sailfis	2, 4	RC or RPKM seems to be adequate and the results from Sailfish and RSEM with RC or RPKM are inconsistent, resulting a conclusion of that normalization methods are not necessary for all sequence data.
Dilliest et al. 2013 [23]	DESeq, Med, Q, RPKM, TC, TMM, UQ	DESeq, edgeR, Cufflinks-CuffDiff	2, 3	Exact test from DESeq combined with DESeq/TMM normalization performed best in terms of control of FDR below 0.05 for high-count genes; RPKM, TC and Q should be abandoned in DE gene analysis.
Soneson et al. 2013 [36]	DESeq, TMM, UQ, RPKM, FPKM, voom, vst	DESeq, edgeR, EBSeq, baySeq, NBPSeq,NOIseq, SAMseq, ShrinkSeq,TSPM, limma	2, 5, 10, 11	DESeq had poor FDR control with 2 samples and good FDR control for larger sample sizes and low TPR.edgeR had poor FDR control with high TPR. Voom/vst-limma had good FDR control, but low power for small sample size.
Seyednasroliah et al. 2013 [37]	DESeq, TMM, UQ, RPKM, FPKM, voom	DESeq, edgeR, baySeq, NOIseq, SAMseq, limma, CuffDiff2, EBSeq	2:6, 8,10,12, 16, 20, 24, 28	DESeq and limma were the safe choice and relatively conservative while edgeR and EBSeq were too liberal. DESeq and edgeR were the best tools
Zhang et al. 2014 [38]	DESeq, TMM, FPKM,	DESeq, edgeR, Cufflinks-CuffDiff	1:6, 8, 14, 20	TMM performed best in terms of sensitivity and DESeq was the best for control false positives. Both were not sensitive to the read depth.
Lin et al. 2016 [39]	DESeq, Med, Q, RPKM, TC, TMM, UQ	DESeq, edgeR and SAS	2, 3, 5	DESeq and TMM normalization methods were recommended compared to the other methods.
Tang et al. 2015 [40]	RLE,TMM, UQ, RPKM, FPKM, Q, voom,	DESeq, DESeq2, edgeR, EBSeq, baySeq, SAMseq, PoissonSeq, voom-limma, TCC	1, 3, 6, 9	In multi-group comparison, the proposed pipeline internally using edgeR was recommended for count data with replicates while this pipeline with DESeq2 was recommended for data without replicates
Germain et al. 2016 [41]	RLE, TMM, voom, TPM	Cufflinks-CuffDiff, DESeq2, edgeR, voom-limma	3, 5	With benchmarked differential expression analysis, in general voom and edgeR showed the most stable performance and be superior to other methods in most assay with replicates of 3 and 5. But voom significantly underperformed in transcript-level simulation and edgeR shown suboptimal results in the SEQC dataset
Maza E 2016 [42]	TMM, RLE, MRN	DESeq2, edgeR	1	The three methods gave the same results for a simple two-condition comparison withourt replicates.
Costa-Silva et al. 2017 [43]	TMM, RLE, UQ, voom	Limma-Voom, NOIseq, DESeq2, SAMSeq, EBSeq, sleuth, baySeq, edgeR	1:8	Limma-voom, NOIseq and DESeq2 had more consistent results for DEGs identification
Spies et al. 2019 [44]	Vst, Med, RLE, TMM	DyNB, EBSeq-HMM, FunPat, ImpulseDE2, Imms, next maSigPro, nsgp, splineTC, timeSeq, edgeR, DESeq2	2, 3, 5	DESeq2 and edgeR with a pairwise comparison outperformed TC tools for short time course (< 8 time points) due to high false positive rate except ImpulseDE2, but they were less efficient on longer time series than splineTC and maSigPro tools.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com

BMC Genomics

Contact us