Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq
© Rajkumar et al. 2015
Received: 1 September 2014
Accepted: 10 July 2015
Published: 25 July 2015
Massively parallel cDNA sequencing (RNA-seq) experiments are gradually superseding microarrays in quantitative gene expression profiling. However, many biologists are uncertain about the choice of differentially expressed gene (DEG) analysis methods and the validity of cost-saving sample pooling strategies for their RNA-seq experiments. Hence, we performed experimental validation of DEGs identified by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM) in a RNA-seq experiment involving mice amygdalae micro-punches, using high-throughput qPCR on independent biological replicate samples. Moreover, we sequenced RNA-pools and compared their results with sequencing corresponding individual RNA samples.
False-positivity rate of Cuffdiff2 and false-negativity rates of DESeq2 and TSPM were high. Among the four investigated DEG analysis methods, sensitivity and specificity of edgeR was relatively high. We documented the pooling bias and that the DEGs identified in pooled samples suffered low positive predictive values.
Our results highlighted the need for combined use of more sensitive DEG analysis methods and high-throughput validation of identified DEGs in future RNA-seq experiments. They indicated limited utility of sample pooling strategies for RNA-seq in similar setups and supported increasing the number of biological replicate samples.
Massively parallel cDNA sequencing (RNA-seq) is gradually superseding microarrays in quantitative gene expression profiling . Apart from its ability to detect novel transcripts, splicing events, and sequence variations, RNA-seq offers unparalleled precise detection of gene expression over a wide dynamic range . Due to declining costs of sequencing, further increase in the use of RNA-seq is expected. However, several methodological  and statistical  issues in the design and analyses of RNA-seq experiments remain unresolved. Biologists, who plan RNA-seq experiments, often pose questions on sample size requirements, cost-effective strategies for sample pooling, and on the choice of data analysis software. Current literature on this topic [4–10] do not provide unequivocal answers to these important questions .
Diverse methods are available to align RNA-seq reads , summarize read counts, assemble transcripts , and to detect differential expression between biological conditions . Differentially expressed gene (DEG) analysis methods differ on their normalisation procedures, detection of differential isoform expression, statistical modelling, variance estimation, and corrections for multiple testing . This research focussed on four commonly used DEG analysis methods, Cuffdiff2 , edgeR , DESeq2 , and Two-stage Poisson Model (TSPM) . It minimised the variations, secondary to read alignment and counting, by employing the same spliced aligner (Tophat2)  and counting algorithm (HTSeq 0.5.4)  for all methods, except for Cuffdiff2 that requires a unique quantification method, Cufflinks . We analysed data, derived from a RNA-seq experiment including RNA samples, extracted from amygdalae micro-punches of a genetically modified mouse strain (Brd1 +/−) and of their wild-type (WT) littermates (8 biological replicates/group), with Cuffdiff2, edgeR, DESeq2 and TSPM. Validation using independent biological replicates is preferred over in silico analyses, using online databases or simulated datasets, as well as technical validation, using the same RNA samples, to confirm true-positive DEGs between two or more biological conditions [19, 20]. Hence, we validated the differential expression of 115 genes, randomly selected from the list of DEGs that were identified by the four methods, using independent biological replicates and high-throughput quantitative reverse-transcription PCR (qPCR).
Pooling biological replicate RNA samples, such as those derived from a number of experimentally similar animals, may retain the biological information, while reducing the cost of sequencing. Validity and utility of sample pooling for gene expression analyses using microarrays have been evaluated extensively [21–23]. Biological averaging hypothesis suggests reduced biological variability and increased power to detect DEGs , but a pooling bias, that is the difference between the value measured in the pool and the mean of the values measured in the corresponding individual replicates, can occur . Although several RNA-seq experiments, based on pooled samples of RNA, have identified DEGs , the validity of pooling for RNA-seq experiments to detect DEGs has not been systematically evaluated so far. Hence, we evaluated the validity of two pooling strategies (3 or 8 biological replicates/pool; two pools/group) against the reference standard of sequencing the corresponding individual samples (3 or 8 samples/group) to detect DEGs.
Validity of DEG analysis methods
Validation of four differential gene expression analysis methods for RNA-Seq
Total number of identified DEGsb
Number for DEGs selected for qPCR validation
Sensitivity (True positivity rate) (%)
Specificity (True negativity rate) (%)
False positivity rate (%)
False negativity rate (%)
Positive predictive value (%)
Negative predictive value (%)
Positive likelihood ratio
Negative Likelihood ratio
Overall agreement (%)
Validity of RNA pooling for DEG analyses
Validation of two pooling strategies for RNA-Seq
Pooling 3 samples
Pooling 8 samples
Total number of identified DEGsb
Sensitivity (True positivity rate) (%)
Specificity (True negativity rate) (%)
False positivity rate (%)
False negativity rate (%)
Positive predictive value (%)
Negative predictive value (%)
Agreement between identified DEGsc
Correlation between reported LFCd
Root-mean-square deviation of LFCe
Our findings revealed that false positivity rate of Cuffdiff2 and false negativity rates of DESeq2 and TSPM were high. Contrary to previous studies that supported the validity of RNA sample pooling for microarray based analyses of gene expression [21, 26], we documented the pooling bias in estimating differential gene expression, and high false positivity rate to detect DEGs for RNA-seq experiments employing pooling of low amount RNA samples from brain micro-punches. Our results corroborated previous studies, which indicated low sensitivity of DESeq [5, 27], high false positivity of Cuffdiff , and high sensitivity of edgeR . False positivity and false negativity rates of TSPM have been reported to be dependent on the number of replicates [5, 6, 15]. This study did not evaluate the issues concerning read alignment , read counting , transcript assembly , and many novel DEG analysis methods [5, 28]. It included RNA samples, expected to be highly variable on their gene expression profiles, since amygdalae comprise multiple functionally distinct nuclei , and micro-punching of such regions in mouse brain is inherently imprecise.
Differences between the DEG analysis methods begin with their normalisation procedures . edgeR uses a model, which incorporates normalisation factors as offsets that are estimated by trimmed mean of M values for each contig . DESeq2 employs a relative log expression method . Normalisation procedures for Cuffdiff2 consider total number of reads, gene length, variability within and between the conditions, and differential isoform expression [12, 18]. TSPM can accommodate various normalisation procedures, but works without normalisation by default . It assumes Poisson distribution for the genes that are not over-dispersed. edgeR and DESeq2 model negative binomial distribution, while Cuffdiff2 follow beta negative binomial model to accommodate ambiguously mapped reads . Principal source of variability between these methods is their dispersion estimation procedures . DESeq2 is stringent to detect outliers and excludes genes with extreme read counts by default . It considers the maximum a posteriori dispersion estimates, while edgeR moderates its dispersion estimates by their dispersion-mean relationship . Cuffdiff2 includes covariances between different isoforms . TSPM differs by its per-gene dispersion estimation without considering the information across genes . Several unique correction procedures, such as multi-read correction, bias correction and effective length correction, are incorporated only in Cufflinks2 and Cuffdiff2 . edgeR, DESeq2 and Cuffdiff2 calculate their p values by the generalized linear model (GLM) likelihood ratio test , GLM Wald test  and t-test , respectively. TSPM employs quasi or standard likelihood ratio tests, based on whether a gene is over-dispersed or not. Cuffdiff is more likely to estimate false positive statistically significant p values, when the gene expression is detected in only one group .
Our results favour the use of edgeR, among the four investigated methods, and discourage using RNA pooling in future RNA-seq experiments. Pooled samples do not represent the population variations in gene expression levels, and they cannot estimate within population variation . Within-group variances of the pooled samples are less than true within-group variances of the individual samples. This leads to erroneously long DEG lists with low positive predictive values that limit practical use. If researchers plan RNA pooling because of saving costs or of limited starting material, stringent false discovery corrections and high-throughput validation of as many identified DEGs as possible should be considered. If the validation targets are chosen by random sampling from the list of identified DEGs, false discovery rates can be estimated cost-effectively . An increase in the number of biological replicates, added into each pool, may help to minimise the pooling bias in estimating differential gene expression. Increasing the number of replicates is more effective to improve the power to detect DEGs than increasing sequencing depth above 10 million reads per sample [4, 35]. Limiting sequencing depth to 10 million reads per sample can reduce the costs and can help the biologists to sequence more replicates. Heterogeneity of biological variance among RNA samples may be larger than the dispersion, estimated by edgeR , and most contemporary RNA-seq experiments have been estimated to be under-powered by their design . Hence, reducing the number of replicates by pooling will decrease the power and the ability to estimate within population variation further, and will increase pooling bias as well as false discovery rates (FDR).
Although edgeR was the most sensitive among these four methods, it did not detect differential expression of Brd1 that was genetically modified in these mice. Employing two or more DEG analysis methods in parallel can enhance the overall sensitivity to detect true-positive DEGs , but consequent rise in the FDR will further increase the need for high-throughput validation of identified DEGs. In a hypothetical RNA-seq experiment with 10000 expressed genes, 100 DEGs, 5 % FDR 80 % power, minimum average read count of 1, and maximum dispersion of 0.5, 99 replicates need to be sequenced to detect a DEG with two-fold differential expression . Sample size requirement will be more, if a DEG has low expression, less differential expression, and high dispersion. Until such large RNA-seq experiments become a reality, we cannot overemphasise the need for combined use of more sensitive DEG analysis methods and of high-throughput validation of identified DEGs.
Among the four investigated methods for RNA-seq differential expression analyses using brain micropunches, edgeR detected more true-positive DEGs with high specificity. Moreover, we demonstrated limited utility of sample pooling strategies for RNA-seq in our setup. Pooled samples identified DEGs with high false positivity rates and low positive predictive values. On the basis of our results, we conclude that combined use of more sensitive DEG analysis methods and high-throughput validation of identified DEGs is desired for future RNA-seq experiments.
RNA samples were obtained from female mice, heterozygous for a targeted deletion of exon 3–5 of the Brd1 gene (Brd1 +/−) on a congenic C57BL/6NTac background, and from their WT littermates. 8–10 weeks old Brd1 +/− (n = 8) and WT mice (n = 8) were sacrificed. Their brains were snap frozen in 2-methylbutane and sectioned (1 mm) coronally using a slicer matrix (Zivic Instruments, Pittsburgh, USA) at −20 °C. Amygdalae were identified  and punched by a needle (1 mm diameter). Total RNA was extracted using Maxwell-16 system and LEV simplyRNA Tissue Kit (Promega, Madison, USA). Quantity of RNA was measured by NanoDrop 1000 version 3.7.1 (Thermo Fisher Scientific, Waltham, MA, USA) and their quality was assessed using Agilent 2100 Bioanalyzer (Agilent technologies, SantaClara, USA). All animal procedures were approved by the Danish National Committee for Ethics in Animal Experimentation.
16 individual RNA samples (264 ng RNA/sample; 8/group, mean RNA Integrity Number (RIN) 7.53 (SD 0.31) ), 4 pools (2/group) that combined three individual samples (88 ng RNA/sample; 264 ng/pool) and 4 pools (2/group) that combined eight individual samples (33 ng RNA/sample; 264 ng/pool) were included. Each pool of the two pools/group was prepared by pooling equal amounts of the same three or eight RNA samples [Additional file 10]. cDNA was synthesised from all 24 RNA samples using random hexamer primers and libraries were prepared by TruSeq RNA sample preparation kit (Illumina, San Diego, USA). RNA-seq (50 bp; single-end; minimum 10 million clean reads/sample) was performed using Illumina HiSeq2000 (Illumina, San Diego, USA).
Reads that passed quality control (more than 90 % bases had less than 1 % sequencing error; no ambiguous bases) were aligned to the mouse genome (Mus_musculus.GRCm.38.72) with corresponding gene model annotation (Mus_musculus.GRCm38.72.gtf) by TopHat 2.0.6 . Overall read alignment rates were above 90 % for all libraries. Aligned reads were counted by HTSeq 0.5.4 with “intersection-nonempty” overlap resolution mode . DEG analyses with edgeR 3.2.4 [13, 27], Cuffdiff 2.1.1 [12, 18], DESeq2 1.0.19 [27, 31] and TSPM  followed previously published protocols using default parameters (unless stated differently). edgeR employed generalized linear models with tag-wise dispersion. As Cuffdiff2 do not work with count matrix, Tophat2 aligned reads were assembled into transcripts using Cufflinks 2.1.1  with quartile normalisation, bias correction, multi-read correction, and with reference gene model annotation (Mus_musculus.GRCm.38.72.gtf). After combining all transcripts by Cuffmerge 2.1.1  with reference gene model annotation (Mus_musculus.GRCm.38.72.gtf), Cuffdiff2 identified DEGs with geometric library normalisation and per-condition dispersion estimation. Adjusted p values were calculated by Benjamini-Hochberg false discovery correction (5 %) for all methods. Genes with adjusted p values less than 0.05 were considered as DEGs. Codes for the analysis methods were provided in the Additional file 11.
Validation of DEGs by qPCR
16 RNA samples were obtained by the procedures, described above, from another batch of female Brd1 +/− and WT mice (8/group). 180 ng total RNA was reverse transcribed by iScript Select cDNA Synthesis Kit (Bio-Rad, Hercules, USA). All eight DEGs, detected by DESeq2 and TSPM, were selected for validation. 107 more genes were randomly selected for validation from the list of remaining 191 DEGs, detected by Cuffdiff2 and edgeR. After 10–20 cycles of specific target amplification with PreAmp master mix (Fluidigm, San Francisco, USA), high-throughput qPCR was performed on the BioMark HD (Fluidigm, San Francisco, USA), using 48.48 dynamic arrays (Fluidigm, San Francisco, USA) and SsoFast EvaGreen Low ROX kit (Bio-Rad, Hercules, USA) [Additional files 12 & 13]. A DEG, detected by the RNA-seq DEG analysis methods, was considered as a true-positive DEG, if it satisfied the following criteria, (i) Both RNA-seq and qPCR showed same direction (upregulation or down-regulation) of differential expression, (ii) Differential expression fold change, estimated by qPCR, was either above 1.25 or below 0.80 (LFC cut-off was ±0.3219) . Spearman correlation coefficients, root-mean-square deviations and kappa statistics were calculated using STATA 13.1 (StataCorp LP, Texas, USA).
Availability of supporting data
Digital expression count matrix of our RNA-seq data is available as the Additional date file 1 with the online version of this paper.
APR, PQ, FL, MN, OM, ADB, and JHC were funded by the Aarhus University, Aarhus, Denmark, and by the Initiative for Integrative Psychiatric Research (iPSYCH), Denmark. RL was funded by Baker IDI heart and diabetes Institute, Victoria, Australia. JJ and QL were employed by Beijing Genomics Institute, Shenzhen, China. Funding bodies did not play any role in design, in the collection, analysis, and interpretation of data; in the writing of the manuscript; and in the decision to submit the manuscript for publication. The authors would like to acknowledge molecular biologist, Jakob Hansen, Department of Forensic Medicine, Aarhus University for his advice and help in optimizing RNA extraction from micro-punches.
- Korf I. Genomics: the state of the art in RNA-seq analysis. Nat Methods. 2013;10:1165–6.PubMedView ArticleGoogle Scholar
- Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63.PubMed CentralPubMedView ArticleGoogle Scholar
- Fang Z, Cui X. Design and validation issues in RNA-seq experiments. Brief Bioinform. 2011;12:280–7.PubMedView ArticleGoogle Scholar
- Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 2013;14:R95.PubMed CentralPubMedView ArticleGoogle Scholar
- Soneson C, Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics. 2013;14:91.PubMed CentralPubMedView ArticleGoogle Scholar
- Kvam VM, Liu P, Si Y. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. Am J Bot. 2012;99:248–56.PubMedView ArticleGoogle Scholar
- Engstrom PG, Steijger T, Sipos B, Grant GR, Kahles A, Alioto T, et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods. 2013;10:1185–91.PubMed CentralPubMedView ArticleGoogle Scholar
- Steijger T, Abril JF, Engstrom PG, Kokocinski F, Akerman M, Alioto T, et al. Assessment of transcript reconstruction methods for RNA-seq. Nat Methods. 2013;10:1177–84.PubMedView ArticleGoogle Scholar
- Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013;14:671–83.PubMedView ArticleGoogle Scholar
- Li CI, Su PF, Shyr Y. Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data. BMC Bioinformatics. 2013;14:357.PubMed CentralPubMedView ArticleGoogle Scholar
- Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011;8:469–77.PubMedView ArticleGoogle Scholar
- Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;31:46–53.PubMedView ArticleGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.PubMed CentralPubMedView ArticleGoogle Scholar
- Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106.PubMed CentralPubMedView ArticleGoogle Scholar
- Auer PL, Doerge RW. A Two-stage Poisson model for testing RNA-Seq data. Stat Appl Genet Mol Biol. 2011;10:1–26.Google Scholar
- Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36.PubMed CentralPubMedView ArticleGoogle Scholar
- Counting reads in features with htseq-count [http://www-huber.embl.de/users/anders/HTSeq/doc/count.html#count].
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–78.PubMed CentralPubMedView ArticleGoogle Scholar
- Chuaqui RF, Bonner RF, Best CJ, Gillespie JW, Flaig MJ, Hewitt SM, et al. Post-analysis follow-up and validation of microarray experiments. Nat Genet. 2002;32(Suppl):509–14.PubMedView ArticleGoogle Scholar
- Abruzzo LV, Wang J, Kapoor M, Medeiros LJ, Keating MJ, Edward Highsmith W, et al. Biological validation of differentially expressed genes in chronic lymphocytic leukemia identified by applying multiple statistical methods to oligonucleotide microarrays. J Mol Diagn. 2005;7:337–45.PubMed CentralPubMedView ArticleGoogle Scholar
- Kendziorski C, Irizarry RA, Chen KS, Haag JD, Gould MN. On the utility of pooling biological samples in microarray experiments. Proc Natl Acad Sci U S A. 2005;102:4252–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Peng X, Wood CL, Blalock EM, Chen KC, Landfield PW, Stromberg AJ. Statistical implications of pooling RNA samples for microarray experiments. BMC Bioinformatics. 2003;4:26.PubMed CentralPubMedView ArticleGoogle Scholar
- Kendziorski CM, Zhang Y, Lan H, Attie AD. The efficiency of pooling mRNA in microarray experiments. Biostatistics. 2003;4:465–77.PubMedView ArticleGoogle Scholar
- Mary-Huard T, Daudin JJ, Baccini M, Biggeri A, Bar-Hen A. Biases induced by pooling samples in microarray experiments. Bioinformatics. 2007;23:i313–318.PubMedView ArticleGoogle Scholar
- Xu J, Sun J, Chen J, Wang L, Li A, Helm M, et al. RNA-Seq analysis implicates dysregulation of the immune system in schizophrenia. BMC Genomics. 2012;13 Suppl 8:S2.PubMed CentralPubMedView ArticleGoogle Scholar
- Ilmjarv S, Hundahl CA, Reimets R, Niitsoo M, Kolde R, Vilo J, et al. Estimating differential expression from multiple indicators. Nucleic Acids Res. 2014;42(8):e72.PubMed CentralPubMedView ArticleGoogle Scholar
- Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W, et al. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc. 2013;8:1765–86.PubMedView ArticleGoogle Scholar
- Fang Z, Martin J, Wang Z. Statistical methods for identifying differentially expressed genes in RNA-Seq experiments. Cell Biosci. 2012;2:26.PubMed CentralPubMedView ArticleGoogle Scholar
- Kasukawa T, Masumoto KH, Nikaido I, Nagano M, Uno KD, Tsujino K, et al. Quantitative expression profile of distinct functional regions in the adult mouse brain. PLoS One. 2011;6, e23228.PubMed CentralPubMedView ArticleGoogle Scholar
- Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25.PubMed CentralPubMedView ArticleGoogle Scholar
- Differential analysis of count data - the DESeq2 package [http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf]
- EdgeR: differential expression analysis of digital gene expression data. User's Guide. [http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf]
- Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 2011;12:R22.PubMed CentralPubMedView ArticleGoogle Scholar
- Leek JT, Taub MA, Rasgon JL. A statistical approach to selecting and confirming validation targets in -omics experiments. BMC Bioinformatics. 2012;13:150.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu Y, Zhou J, White KP. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 2013: doi:10.1093/bioinformatics/btt1688.
- Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. 2013;14:232–43.PubMed CentralPubMedView ArticleGoogle Scholar
- Bast Jr RC. Molecular approaches to personalizing management of ovarian cancer. Ann Oncol. 2011;22:viii5–viii15.PubMed CentralPubMedView ArticleGoogle Scholar
- Paxinos G, Franklin KBJ. The mouse brain in stereotaxic coordinates. 2nd ed. San Diego, CA: Academic; 2001.Google Scholar
- Fleige S, Pfaffl MW. RNA integrity and the effect on the real-time qRT-PCR performance. Mol Aspects Med. 2006;27:126–39.PubMedView ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.