Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling

Background Elucidation of immune populations with single-cell RNA-seq has greatly benefited the field of immunology by deepening the characterization of immune heterogeneity and leading to the discovery of new subtypes. However, single-cell methods inherently suffer from limitations in the recovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropout events. This issue is often compounded by limited sample availability and limited prior knowledge of heterogeneity, which can confound data interpretation. Results Here, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. We prepared 21 libraries under identical conditions of a defined mixture of two human and two murine lymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluated methods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expression signatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5′ v1 and 3′ v3 methods. We demonstrate that these methods have fewer dropout events, which facilitates the identification of differentially-expressed genes and improves the concordance of single-cell profiles to immune bulk RNA-seq signatures. Conclusion Overall, our characterization of immune cell mixtures provides useful metrics, which can guide selection of a high-throughput single-cell RNA-seq method for profiling more complex immune-cell heterogeneity usually found in vivo. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07358-4.


Background
Understanding the cellular diversity underlying immune responses is an important component of immunological research. Although techniques such as FACS and mass cytometry [1] are useful for studying cellular diversity according to well-characterized cell-surface-protein markers, the advent of single-cell RNA sequencing (RNA-seq) has expanded the power to characterize individual immune cells from a defined set of cell-surface markers to the entire transcriptome for last few years. These single-cell technologies have enabled immunologists to characterize inflammation [2] and immune responses to cancer [3][4][5][6][7], uncovering previously uncharacterized cellular diversity and cell-type specific transcriptional responses. As recent advances have increased cell throughput and lowered per-cell costs, the number of high-throughput single-cell RNA-seq techniques that can process more than a thousand cells per experiment has increased.
Several key factors, such as variable capture and amplification efficiencies during library preparation, impact the ability of single-cell RNA-seq techniques to accurately and comprehensively characterize immune-cell diversity. Mixtures of different cell sizes are particularly complex as small cells contain low total number of transcripts and therefore, are difficult to distinguish from ambient noise. The relatively small size and low mRNA content of immune cells may impact the performance of single-cell RNA-seq methods differently than was previously described using larger cells [8][9][10][11][12][13]. Immune cells constitute a broad range of cell types across various lineages, activation states, and cell sizes. Efficient recovery across these diverse cell types impacts the fidelity of cell-composition analyses. Methods that recover a larger fraction of cells in a cost-efficient manner benefit studies that sample tissues containing few immune cells. Also, increased sensitivity in detecting individual mRNA transcripts results in more comprehensive cellular profiles, which greatly advances the characterization of immune sub-types. A more complete picture of cellular transcriptional activity facilitates the identification of differentially-expressed (DE) marker genes and positively impacts the mapping of cells against reference immune cell signatures.
Previous benchmarking studies using somatic cell lines or peripheral blood mononuclear cells (PBMCs) reported that high-throughput single-cell RNA-seq methods generally enabled broader sampling of diverse populations at a lower per-cell cost. However, larger sample sizes come at the expense of lower mRNA detection sensitivity [8][9][10][11][12][13]. In this work, we extend previous findings with a focus on the application of high-throughput methods to immune-cell profiling. By using a defined mixture of four lymphocyte cell lines, we assess the performance of seven high-throughput methods using four commercially-available systems to address common concerns in immune-cell profiling. First, we examine library efficiency in terms of cell recovery and cellassignable reads. Next, we assess mRNA detection sensitivity and the correlation of cellular profiles to immune cell signatures from bulk RNA-seq. Finally, we compare results across the lymphocyte cell lines and explore in-vivo variation of mRNA detection across peripheral blood mononuclear cells (PBMCs) in consideration of varying cell sizes and cellular mRNA contents. This study serves as useful guidelines for the selection of a suitable single-cell RNAseq method to study immune cells.

Design of single-cell RNA-seq benchmarking experiments
We benchmarked four commercially-available highthroughput single-cell systems: the Chromium [14] (10x Genomics), the ddSEQ (Illumina and Bio-Rad), the scRNA-Seq System running Drop-seq (Dolomite Bio) [15], and the ICELL8 cx (Takara Bio) [16] (Fig. 1). We tested three methods available for the Chromium (3′ v3, 3′ v2 and 5′ v1) as well as two methods for the ICELL8 (the official 3′ DE protocol and an alternate 3′ DE-UMI protocol). All methods tested perform mRNA end counting by tagging mRNA sequences with a barcode containing a cell identifier (CID) and a unique molecular identifier (UMI) with lengths that vary by method (Supplement Table 1).
All techniques, apart from ddSEQ, amplify full-length cDNA (Supplement Table 1) using a modified Smart-seq protocol [17,18], which incorporates a 5′ PCR handle by employing a reverse transcriptase's ability to switch templates at the end of a transcript. Full-length cDNA can be amplified with primers in the 5′ template-switch and 3′ poly-T oligonucleotides. Barcoded cDNA ends are further amplified after direct ligation or tagmentation to incorporate Illumina sequencing adapters. ddSEQ contains a single amplification step during adapter incorporation after second strand synthesis without amplification of full-length cDNA. Amplification bias introduced in the multiple rounds of PCR in these protocols, is mitigated by the incorporation of UMIs [19]. However, UMI counts are unreliable in the ICELL8 3′ DE protocol because cDNA is amplified in the presence of barcoding primers, potentially inflating UMI counts. The alternative ICELL8 3′ DE-UMI protocol is more robust for UMI counting since reverse transcription and cDNA amplification are uncoupled by an exonuclease digestion of barcoding primers.
We used a 1:1:1:1 mixture of four lymphocyte cell lines from two species ( Fig. 1; Supplement Table 2): EL4 (mouse CD4+ T cells), IVA12 (mouse B cells), Jurkat (human CD4+ T cells), and TALL-104 (human CD8+ T cells). These cells also vary in morphology: TALL-104 cells (~5 μm diameter) are considerably smaller than the other cell types (~10 μm diameter). These cell lines are expected to have distinct expression profiles enabling the classification of each cell type. Usage of cells from two species allowed us to clearly identify cross-species doublet contamination to calculate capture rates of cell multiplets. To mirror typical single-cell sequencing runs and to ensure a comparison independent of sequencing limitations, we normalized the read depth of our libraries to~50,000 reads per cell ( Fig. 1; Supplement Figs. 1 and 2). Cells were identified and classified by correlating single-cell expression profiles to bulk RNA-seq.

Evaluation of cell capture and library efficiency
One important consideration for single-cell RNA-seq is the capture rate, or the fraction of cells recovered in the data relative to input. This is especially critical when working with precious samples with few cells. To identify recovered cells, we used the curve of the logtotal count against the log-rank of each CID, which is equivalent to the transposed log-log empirical cumulative density plot of the total counts of each CID. The knee and inflection points in this curve typically define the transition between the cell-containing component and the ambient RNA component of the total count distribution. Here, we defined a recovered cell as a CID located above the inflection point (Supplement Fig. 2a). In our tests, we found that capture rates were slightly lower than, but tracked with theoretical rates (Fig. 2a; Table 1). As expected, we observed the highest rates with 10x Genomics methods, ranging from~30 to~80%, while ddSEQ and Drop-seq methods recovered < 2% of cells.
In addition to the capture rate, we also quantified events capturing multiple cells in a single partition. This technical artifact impairs downstream data analysis, as artificial mixtures of transcriptomes may be interpreted wrongly as single cells. The extent of this issue is influenced by the quality of the single-cell suspension, cell health, and cell loading concentration. By counting CIDs with a significant fraction of both human and mouse transcripts, for all methods, we observed multiplet rates around the 5% we had targeted with our cell-loading concentrations (Table 1; Supplement Fig. 3a). Another significant factor in efficiency is the fraction of reads that can be assigned to individual cells. Increased background noise in sequencing libraries results in wasted reads and unnecessarily increased sequencing costs. We observed the highest fraction of cellassociated reads for our ICELL8 experiments (> 90%), intermediate rates for 10x experiments (~50-75%) and the lowest rates for ddSEQ and Drop-seq (< 25%) (  Tables 3 and 4). We also examined the genomic locations of aligned reads. About 75% of aligned bases of each library were mapped to exons and UTRs. Notably, the intergenic fraction was lowest in 10x samples, suggesting lower genomic contamination in these methods. (Supplement Fig. 3b). The ddSEQ method exhibited the greatest UTR bias. This is likely due to the longest read-length (150 bases) for ddSEQ of each tested technology.
10x 5′ v1 and 3′ v3 methods demonstrate the highest mRNA detection sensitivity Because immune cells tend to have low levels of mRNA, the mRNA detection sensitivity, or the fraction of a cell's transcriptome detectable, critically impacts downstream analyses. Single-cell RNA-seq methods are inherently prone to dropouts due to inefficiencies during library Fig. 1 Overview of high-throughput single-cell benchmarking experiments. Experiments were performed using four immune cell lines to benchmark cell recovery, transcript detection sensitivity, concordance to bulk RNA-seq and differentially-expressed gene identification preparation resulting in false-negative gene-expression signals [15]. Although we performed library normalization to obtain a consistent read depth across all cells, we found that read distributions of individual cell types varied. Since EL4 cells demonstrated the highest consistency between read distributions across experiments (Supplement Fig. 1c), we focused our initial analysis on EL4 cells to minimize batch effects due to differential sequencing depths. We observed the highest detection of both transcripts and genes with at least one read count using 10x Genomics methods, with the highest levels seen in the 3′ v3 experiments (median 28,006 UMIs/4776 genes across all samples) followed by the 5′ v1 and 3′ v2 kits (25,988 UMIs/4470 genes and 21,570 UMIs/3882 genes, respectively) ( Fig. 3a, b; Supplement Table 4). ddSEQ and Drop-seq experiments demonstrated similar detection rates (10,466 UMIs/3644 genes and 8,791 UMIs/3255 genes, respectively). UMI counts generated by the ICELL8 3′ DE method are unreliable due to residual barcoding primers during cDNA amplification, so we focused on gene detection sensitivity instead. We observed a significant drop in gene detection between the 3′ DE and 3′ DE-UMI methods (2849 and 1288 genes, respectively) and a low number of UMIs counted in the 3′ DE-UMI method ((2792 UMIs). This suggests that many transcripts are lost in the additional primer digestion and cleanup steps. Crosscontamination due to ambient RNA minimally impacted these UMI detection rates with average estimates of contamination calculated with DecontX [20] falling under 1% for UMI-based methods (Supplement Table 4). For the other three cell types, rankings of methods by absolute UMI-and gene-count distributions slightly differed from EL4 cells, likely due to greater variation in read depth across samples for these cell types (Supplement Figs. 1c and 4a).  To account for varying read distributions across the four cell types, we compared the number of detected UMIs and genes relative to the total number of reads per cell. For EL4, IVA12 and Jurkat cells, we observed a similar trend across methods with regards to efficiency of transcript and gene detection (Fig. 3c, d). Again, 10x 3′ v3 (mean ± sd reads/UMI = 2.07 ± 0.52, reads/gene = 9.04 ± 2.65) and 5′ v1 chemistries (mean ± sd reads/ UMI = 1.98 ± 0.19, reads/gene = 9.51 ± 2.68) were the most efficient, requiring fewer reads to detect a single UMI or gene. These methods are followed by 10x 3′ v2 (reads/UMI = 2.35 ± 0.33, reads/gene = 11.17 ± 3.03), ddSEQ (reads/UMI = 5.25 ± 1.14, reads/gene = 13.42 ± 3.89), Drop-seq (reads/UMI = 6.40 ± 1.42, reads/gene = 15.97 ± 5.62) and ICELL8 methods (3′ DE: reads/gene = 29.68 ± 41.48, 3' DE-UMI: reads/UMI = 21.77 ± 5.50, reads/gene = 47.5 ± 17.91). This trend is largely mirrored in TALL-104 cells, albeit less distinct due to the low read depth obtained for those cells (Fig. 3c, d; Supplement Fig. 1c).
We further examined the number of genes with at least one sequenced read in pseudo-bulk populations. For this purpose, cells form each cell type were pooled and gene-expression measurements were merged. We observed similar trends with higher numbers of detected genes with the 10x 3′ v3, and 5′ v1 method for EL4, Fig. 3 Transcript detection sensitivity: a Distributions of unique molecular identifiers (UMIs) and b genes detected in EL4 cells by sample are plotted. c Numbers of UMIs or d genes detected versus numbers of reads per cell for each cell type are plotted. e Accumulated average numbers of genes detected from aggregated data of subsamples up to 50 cells are plotted. f Dropout modeling (dropout rate versus FPKM of bulk sequencing) for EL4 cells by method are shown. A left-shifted curve indicates higher sensitivity, that is, fewer dropouts at lower expression levels. Sensitivity of methods for EL4 cells ranked in the following order: 10x 3′ v3 > 10x 5′ v1 > 10x 3′ v2 > ddSEQ > Drop-seq > ICELL8 3′ DE > ICELL8 3′ DE-UMI. Cells with high mitochondrial expression rates were excluded from this calculation IVA12 and Jurkat cells (Fig. 3e). Although the ICELL8 3′ DE method had a low per-cell gene detection rate, when pooling more than 30 cells this method exhibited comparable levels of gene detection to 10x 3′ v2, ddSEQ and Drop-seq methods. This is likely due to the high false-negative rate of genes with overall low expression levels in the ICELL8 3′ DE method. The cumulative number of genes for TALL-104 cells was lower than the other cell types and the relative detection rates across methods did match trends seen in other cell types, possibly due to the low read depth and cell recovery for this cell type.
We also examined the ability of each method to detect genes at various expression levels by calculating the dropout rate, the conditional probability that a gene is not detected in a given cell. The dropout rate was modeled as a function of the expression level in bulk RNAseq (FPKM) for each cell type. We used a nonlinear least square fit of the data that accounted for the activity of reverse transcriptase described by Michaelis-Menten kinetics [21][22][23]. Here, higher gene detection sensitivity as a function of fewer dropouts at lower expression levels, was indicated by left-shifted curves and lower Gene Detection 50 (GD 50 ) value, the point at which this curve reached a detection probability of 0.5. The GD 50 metric represented the expression level of a gene we would expect to be detected in half of the cells, and could help guide expectations of detection rates for genes of interest based on their expression in bulk RNAseq. For EL4 cells, 10x Genomics methods were the most sensitive with 10x 3′ v3 having the lowest GD 50 at 13.6 FPKM, followed by the 5′ v1 and 3′ v2 chemistries  Table 1). We observed similar trends across methods with the other three cell types, which had greater variance in read depth and transcript detection (Supplement Figs. 4b-d).

mRNA detection affects the fidelity of single-cell and pseudo-bulk transcriptomes
We next investigated how well single-cell expression recapitulates immune signatures from bulk RNA-seq. For this purpose, we correlated expression of a set of marker genes (defined using bulk RNA-seq data; see Methods) between bulk RNA-seq and single cells. In general, cells with more genes detected had a better concordance to bulk RNA-seq immune signatures (Supplement Fig. 5). We observed higher Pearson correlation coefficients for 10x 3′ v3, 5′ v1 and ddSEQ methods against EL4, IVA12 and Jurkat bulk RNA-seq expression signatures (Fig. 4a). ICELL8 3′ methods, with generally fewer genes detected, demonstrated the lowest correlation values. Overall, poorer correlation to TALL-104 bulk RNA-seq was in line with fewer transcripts and genes detected for this cell type in the single-cell data.
We further examined the correlation between pooled single-cell RNA-seq pseudo-bulk transcriptomes and bulk RNA-seq data using all genes. Averaging geneexpression profiles across single cells is commonly performed to compare data across experiments and is thought to resemble bulk data. For EL4, IVA12 and Jurkat, most methods began to plateau around a correlation value of r = 0.9 with a pool of 10-20 cells (Fig. 4b). The maximum correlation values were lower for ICELL8 3′ DE (r = 0.90 and 3′ DE-UMI methods (r = 0.81-0.90) compared to other methods (r=0.92-0.95), and correlation was generally lower for TALL-104 cells in all methods, suggesting that lower mRNA detection sensitivity not only affects data fidelity at a per-cell level but also impacts aggregated single-cell data. Although samples were prepared under identical conditions, we cannot rule out any effects of biological differences between samples. However, it is likely that higher variance in the detection of lowly expressed transcripts drives much of the difference in expression observed in single-cell and bulk RNA-seq, and aggregation across individual cells may not increase the correlation of expression for these lowly-expressed genes. Notably, our data indicates that detection sensitivity is not necessarily improved by pooling across single cells and results from such analyses should be interpreted cautiously.

Higher mRNA detection sensitivity improves identification of differentially-expressed genes
To assess the performance of differential expression analysis for each method, we focused on the two mouse cell types (EL4 and IVA12) because these cells had more similar sequencing depths compared to the two human cell types. We used the hurdle model proposed by Finak et al. [24] to identify differentially-expressed (DE) genes with an FDR < 10 − 4 (Fig. 5a). For each DE analysis we sampled 199 cells, the lowest number of recovered cells by any method. Gene expression data was normalized by each cell's library size (see Methods), which correlated highly to scaling factors derived by deconvolution from cell pools (mean +/− sd r =0.99 +/− 0.016) (Supplement Table 4) [25]. Over 3000 DE genes were identified in 10x Genomics methods, the highest among the methods tested, followed by Drop-seq (avg~2700 genes) and ddSEQ (avg~2800 genes), while the two ICELL8 methods had the fewest numbers of DE genes (avg1 800 and~1000 genes) ( Fig. 5b; Table 1). We observed similar trends with two alternative commonly-used tests for differential expression, a Mann-Whitney-Wilcoxon test [26] and a likelihood ratio test with an negative binomial generalized linear model [26,27] (Supplement Fig. 6a). Performing DE analysis using all the cells obtained in each method increased the number of genes passing the significance threshold due to the increased statistical power (Supplement Fig. 6b). When we considered the 5,868 genes that had more than a 1.5-fold difference in bulk RNA-seq data as a proxy for groundtruth expression differences, the trend remained the same ( Fig. 5b; Supplement Figs. 6a, 6b; Table 1). To further evaluate the effectiveness of calling DE genes in terms of quantity and quality, we assessed recall and precision of each technology. Recall was calculated as  for mouse genes in EL4 vs IVA12 cells. Shown is a representative subsample of mouse cells (n=199) using the 10x 3′ v2 method demonstrating the criteria for declaring DE genes (FDR < 10 − 4 ); DE genes are highlighted in red. b Number of significant DE genes calculated using MAST between EL4 and IVA12 cells by method. Error bars represent the 95% confidence interval. The total number of significant DE genes are plotted in red, the number of DE genes with > 1.5-fold difference in expression in bulk RNA-seq (5868 genes) are plotted in cyan. c Median bulk RNA-seq expression (FPKM) of all significant DE genes (red) or DE genes with > 1.5-fold difference (cyan). Error bars represent 95% confidence interval the fraction of DE genes in bulk RNA-seq data that was identified as differentially expressed in the single-cell data. Precision was defined by the fraction of DE genes from single-cell data that were also differentially expressed in bulk RNA-seq data. We observed the highest recall rates for 10x Genomics methods (0.595 -10x 5′ v1; 0.577 -10x 3′ v3) ( Table 1) indicating lower type II error rates for these methods with higher mRNA detection sensitivity. Precision was reasonably high across all methods (0.728-0.887) ( Table 1), implying a moderately low number of falsely called DE genes overall.
In general, we observed that fold changes in single-cell data correlated well with gene expression differences in bulk RNA-seq data, especially for genes with higher expression levels (Supplement Fig. 6c). In contrast, genes with low expression correlated poorly with smaller fold changes observed in the single-cell data, consistent with higher dropout probabilities for lowly-expressed transcripts. Also, the distribution of FPKM values was generally higher for DE genes from single-cell data compared to genes with at least 1.5-fold changes in bulk RNA-seq (Supplement Figs. 6d, e), indicating that all methods exhibit a considerable type II error rate with respect to lowly-expressed genes. Furthermore, we found the lowest median FPKM in bulk RNA-seq for DE genes from the methods with the highest detection sensitivity, 10x 3′ v3 (median = 3.43 FPKM) and 10x 5′ v1 (median = 3.45 FPKM), and the highest median FPKM for the ICELL8 3′ DE-UMI method (median = 4.91 FPKM), which had the lowest transcript detection sensitivity ( Fig. 5c; Supplement Figs. 6d, e).

Recovery of low-mRNA-content cells
Many immune single-cell experiments profile an undefined mixture of cell types that potentially vary in mRNA content. Efficient cell recovery across diverse cell types is important to accurately characterize the diversity of these populations. We next compared the differences in cell recovery between the four cell types included in our sample mixture. In particular, TALL-104 cells are smaller (5 μm diameter) than the other three cell-types (EL4/IVA12 -11 μm, Jurkat -10 μm diameter) and, in our hands, more difficult to culture, with viability rates under 80% and slow growth. Across all experiments, TALL-104 cells had the lowest distribution of reads, UMIs, and genes recovered (Supplement Figs. 1c and 4a), such that they were more susceptible to exclusion based on read or UMI thresholding of CIDs to distinguish cells from ambient noise.
We classified cells by correlating their expression profile to gene signatures from bulk RNA-seq. This enabled us to examine the recovery of each cell type with common thresholding points on the log-log curve of total reads or UMIs vs rank ordered CIDs. In droplet-based methods, thresholding removes a large fraction of CIDs that are derived from droplets containing a barcoded bead, but no cell. Two points are commonly used as thresholds: the knee point where the signed curvature is minimized and the inflection point where first derivative is minimized [28] within a given range of total reads or UMIs ( Fig. 6a; Supplement Fig. 2a). While the fraction of classifiable cells on each side of these two thresholds varied across experiments, both thresholds were able to capture EL4, IVA12 and Jurkat cells (Fig. 6a). Notably, most TALL-104 cells would be removed by thresholding using the stringent knee point, with only one experiment having any TALL-104 cells above this threshold. While the more permissive inflection point performed better at capturing TALL-104 cells, all TALL-104 cells would be considered ambient noise using this threshold in samples (Fig. 6b).
As low mRNA recovery might reflect poor cell health, e.g., due to mechanical stress in cell preparation, we also examined the fraction of cells below each threshold that had a high rate of mitochondrially encoded UMIs or reads, an indication of broken or poor-quality cells (Fig.  6c). Accordingly, a large fraction of cells removed by the knee point cutoff had a high mitochondrial rate. However, significant numbers of cells had reasonable mitochondrial rates, including many TALL-104 cells. This suggests that lower transcript recovery in TALL-104 cells is related, at least in part, to lower overall mRNA content and not cellular damage, although we cannot completely rule out other cell-quality issues that do not affect the mitochondrial rate. Overall, some cell populations could be lost when thresholding based on total UMI or read count is too stringent. It would be beneficial to include more cells at the initial CID selection step and filter cells more stringently in downstream analyses with other cell-quality criteria to avoid loss of cell populations with low mRNA content. Of note, this issue does not affect ICELL8 methods as all cell-related barcodes are known a priori when cell-containing wells are selected for processing.

mRNA detection sensitivity varies across heterogenous cell types
In heterogeneous populations, mRNA capture rates and read depths may vary across subpopulations. We explored differences in mRNA detection sensitivity across the four cell types in our samples. As it is common in single-cell profiling of mixed populations, we observed differences in read and UMI recovery across cell types in each method (Fig. 7a). When the entire data was used to model dropout rates, we found that cell types with the lowest read distributions, such as TALL-104 cells, had increased dropout probabilities and GD 50 levels across all seven methods tested ( Fig. 7b; Supplement Figs. 4b,  d). We hypothesized that differences in dropout rates were predominantly driven by differences in mRNA detection rates and compared cells from each cell type with similar numbers of UMIs. Cells were separated into six quantile bins based on the number of UMIs (Fig. 7c) and dropout rates for each cell type were modeled. Because there were few TALL-104 cells recovered in many samples (Supplement Table 4), we focused on data from the 10x 3′ v3 method which had sufficient numbers of cells available for analysis. We found that with increasing number of total UMIs, GD 50 values and dropout rates decreased. Notably, GD 50 levels were similar across cell-types within a bin (Fig. 7d). Slight differences in GD 50 were related to variation in mean number of UMIs Fig. 6 Cell recovery by cell identifier (CID) thresholding: a Example of using the transposed log-log empirical cumulative density plot of the total counts of each CID to identify cell-containing droplets . Common thresholding points, the 'knee' and the 'inflection' are indicated with arrows. The knee is the point at which the signed curvature is minimized, the inflection is the point at which the first derivative is minimized. b The fraction of cells above the knee or inflection are plotted. c Fraction of cells below mitochondrial rate threshold (listed in Supplement Table 4) relative to knee point. Samples are colored by cell sample mixture listed in Supplement Table 2 for a particular cell type. TALL-104 cells, which fell into the two lowest bins due to the low numbers of transcripts detected, had similar dropout rates as other cell types in the same bin (Fig. 7d, e).
To explore the range of mRNA capture rates across lymphocytes in vivo, we profiled a single PBMC sample using the 10x 3′ v3 and 5′ v1 methods. Cells were classified by mapping gene expression to a reference PBMC CITE-Seq dataset [27,29] (Fig. 8a). We recovered 9,450 and 7,780 classified cells for the 10x 3′ v3 and 10x 5′ v1 methods, respectively. Of the classified cells, we identified a variety of cell types comprising PBMC cells including cells from the lymphoid (e.g. B and T cells) and myeloid lineages (e.g. dendritic cells and monocytes). We observed similar fractions of cell types in data from both methods (Supplement Figs. 7a, b). Cell classes were found to express expected cell-type-marker genes (Fig. 8b) and had distinct expression of DE genes (Supplement Fig. 7c). For instance, we observed enrichment of CD19, which encodes part of the BCR co-receptor, in B cells and CD3E, which encodes part of the TCR co-receptor in T cells (Fig. 8b). The median numbers of genes and UMIs detected across various cell types ranged from under 500 genes and 1,000 UMIs detected in platelets (3′ v3-333 genes/773 UMIs; 5′ v1-450 genes/941 UMIs) to over 3,500 genes and 18,000 UMIs in proliferating CD4 T cells (3' v3-4,156 genes/ 18,469 UMIs; 5′ v1-3,737 genes/18,877 UMIs) (Fig. 8c, d). Similar to our mixed cell line benchmarking experiments, we observed a variation in read depth across cell types related to gene and UMI detection statistics (Supplement Fig. 7d).

Discussion
In this paper we explored several important quality metrics of single-cell RNA-seq methods: efficiency of cell recovery, library efficiency and mRNA detection sensitivity. High recovery of cells put into a system and minimal loss of reads due to noise are important, especially for limited samples with few cells. The differences in performance we observed across these methods are directly related to the design of cell and mRNA capture. To partition cells, the methods we tested either use microfluidics to generate nanoliter sized droplets or to partition cells on microwell chips. Ideally, those microreactors contain exactly one bead and one cell. In practice, however, the number of cells per microreactor approximately follows a Poisson distribution. While the loading probability of cells is similar across these methods, the distribution of barcoding oligonucleotides varies. The loading statistics of Drop-seq and ddSEQ follow a Poisson distribution, while 10x Chromium chips load beads in a sub-Poissonian fashion. The latter enables an increased theoretical capture rate of~60%. Sparser loading of barcoding beads in ddSEQ and Drop-seq minimizes the occurrence of bead doublets, but at the expense of lower maximum recovery rates of~3% and~5% respectively. Oligonucleotide loading is tightly controlled on ICELL8 chips with the pre-printing of oligonucleotides, providing a priori knowledge of cell-related CIDs when coupled with cell imaging. The ability of the ICELL8 to selectively process a subset of wells, those containing cells identified by fluorescence imaging, greatly improves this method's library efficiency compared to techniques that process all partitions. Accordingly, we observed the highest fraction of cell-related reads in ICELL8 libraries, especially compared to ddSEQ and Drop-seq methods with a large fraction of bead containing droplets lacking a cell and increased potential for ambient RNA. Quality of single-cell suspensions are also important factors to these metrics. Variable cell viability and inefficient cell quantification of our samples may negatively impact cell-capture and multiplet rates and explain the discrepancy between expected and observed rates.
In our experiments, we observed the highest mRNAdetection sensitivity in the 10x 3′ v3 and 5′ v1 methods, with the highest numbers of transcripts and genes detected and lower probabilities of gene dropouts at lower expression levels. Our results corroborated previous reports about the performance of some of the methods assessed in our work [10][11][12][13]. Here, we extended these findings by demonstrating an increased sensitivity of the more recent 5′ v1 and 3′ v3 methods, which also validated claims made by 10x Genomics. Further, we found ICELL8 methods had the lowest mRNA detection sensitivity of the methods tested for the assayed immune cell types. Of note, this is partly in disagreement with two papers reporting better performance of the ICELL8 3′ DE method relative to 10x 3′ v2 and Drop-seq [10,30]. Differences to the performance we have observed may be related to cell types used in each study. For example, ICELL8 3′ DE detected significantly fewer genes per cell compared to 10x 3′ v2, ddSEQ and Drop-seq in B-cells in Mereu et al. [10], which is on par with our findings.
Gene detection rates may be increased by greater sequencing depths, particularly for lowly-expressed genes (Supplement Fig. 2b; Supplement Table 3). However, high-throughput methods aim to sequence many cells concurrently for a broad exploration of populations, at the expense of the completeness of individual transcriptional profiles. Here, libraries are not routinely sequenced to full saturation due to high sequencing costs. To be able properly assess mRNA detection sensitivity, we normalized samples to a common sequencing depth of~50,000 reads per cell by downsampling raw reads. Additional iterations of this stochastic process showed little variation in the resulting analysis (Supplement Table 5), suggesting our normalization step did not introduce any technical bias. Notably, the resulting sequencing depth is typical for common high-throughput single-cell RNA-seq experiments. Therefore, our data can provide expectations for mRNA and gene detection rates in experiments with a similar sequencing depth using other immune cells.
Multiple aspects of single-cell RNA-seq protocols such as efficiencies in mRNA capture, reverse transcription, and cDNA amplification can affect the overall mRNA detection sensitivity. Efficient mRNA capture may be impacted by the template switch mechanism, as only first strand cDNAs which have successfully switched templates can be amplified. ddSEQ, the sole method we have tested that does not utilize template switching, is not as sensitive as the 10x Genomics methods, possibly due to other technical differences. Another source of inefficiency may arise in the reverse transcription cleanup step prior to cDNA amplification. We found that the addition of a primer digestion step to the ICELL8 3′ DE protocol in the 3′ DE-UMI method decreased the mRNA detection sensitivity. Additional improvements to mRNA capture such as improving oligonucleotide chemistry for mRNA capture and cDNA amplification may enhance mRNA detection sensitivity and improve singlecell RNA-seq techniques in the future.
Increasing the sensitivity of mRNA detection greatly benefits downstream analyses of immune profiling datasets. Sampling transcriptomes with high fidelity results in a greater likelihood of detecting rare transcripts for identifying DE genes at lower expression levels. In general, our results showed that expression profiles of cells with high mRNA content generated by methods with a high mRNA detection rate correlated well to bulk-RNAseq data. Also, the number of DE genes as well as the overall correlation in fold-change differences to bulk RNA-seq improved with higher mRNA detection sensitivity. Here, all 10x Genomics methods, which had the highest mRNA detection sensitivity, exhibited a high correlation to bulk RNA-seq data as well as more DE genes with a lower range of expression levels in bulk data. Notably, our results reveal that the higher variance in the detection of lowly expressed transcripts commonly observed in techniques with lower sensitivity is not necessarily overcome by pooling across single cells when performing pseudo-bulk analyses. Strengthening the underlying mRNA detection sensitivity can improve downstream analyses to identify marker genes as well as classify subtle immune subtypes and cell states with small, but significant differences in gene expression, and can facilitate the identification of novel immune subpopulations [31].
Importantly, our data also provides insight into the performance of single-cell techniques across heterogenous populations of immune cells. Although the immune cell lines used in this study may differ from lymphocytes found in vivo, the standardized cell culture conditions for these cells helps reduce expression variability compared to primary cells and facilitated data analysis. Nonetheless, our results provide better guidance for immune profiling in contrast to the higher mRNA content cell lines such as carcinoma or stem cells commonly used in previous comparison papers [8,11,12]. The inclusion of small TALL-104 cells can allow us to assess the sensitivity of these methods in subpopulations with comparatively low mRNA levels. Relaxing CID filtering criteria based on total UMI or total read counts can improve recovery of TALL-104 cells for downstream analyses. Notably, smaller immune cells such as TALL-104 cells have a higher gene dropout rate that is related to the number of transcripts captured from a cell. Thus, additional quality metrics also need to be calibrated carefully for the identification of small immune cell types. Cell imaging on the ICELL8 cx to identify otherwise challenging cells of interest such as TALL-104, can also be used to recover populations of small cells. In general, TALL-104 cells exhibit lower mRNA detection rates and higher dropout rates. Thus, we can expect that other immune cell types with low mRNA content exhibit similar dropout rates as other immune cells with a comparable rate of transcript detection.
Cell-compositional bias may also arise during sample preparation. For example, density gradient purification of PBMCs may selectively reduce the number of granulocytes, or cell populations may be lost due to greater sensitivity to dissociation and purification induced stress. Furthermore, large cells are filtered from single-cell suspensions to avoid clogging narrow microfluidic channels. 10x Chromium microfluidics are~50-60 μm in diameter, while Drop-seq channels are typically~125 μm [15] and ICELL8 nozzles have a 125 μm bore size, and these systems can theoretically process cells up to that diameter in size. However, most immune cells are smaller than the 40 μm filter size commonly used during single-cell sample preparation and can be captured by a variety of systems.
We can translate the observation in our data in vitro that dropout rates and gene detection sensitivity is related to the total number of transcripts captured from a cell (Fig. 7) to the variety of immune cells found in vivo. In our 10x Genomics PBMC datasets, we observed a range of mRNA detection rates across various cell types with proliferating cells and hematopoietic stem cells (HSPCs) displaying the highest rates of UMIs and genes detected. Additionally, dendritic cells (DCs) and monocytes (MCs) displayed elevated UMI and gene detection rates relative to most T and B cells (Fig. 8c, d), similar to observations in other PBMC datasets [10]. Taken together, this suggests that these cells would have a lower dropout rate and more robust DE gene detection. These results can be further extended to immune cell populations, such as granulocytes, which have been excluded currently during sample preparation. It would be interesting to further explore the capabilities of various single-cell RNA-seq and sample preparation methods to assay other immune populations that are particularly difficult to survey.

Conclusions
Our comparison of data from seven high-throughput single-cell methods can help guide method selection for immune profiling experiments. Our data can provide reasonable predictions of transcript and gene detection rates for lymphocytes, as well as insight into performance across heterogenous immune cell populations with varying mRNA content. Our results suggest looser thresholding of CIDs in droplet-based methods can be beneficial to retain cell populations with low-mRNA content. Additionally, smaller cells such as TALL-104 cells have a higher gene-dropout rate that is related to the number of transcripts captured from a cell.
Each method we tested showed advantages that could benefit immune cell profiling. In this study, 10x Genomics methods had the highest cell recovery and mRNA detection sensitivity, making these techniques particularly suited to experiments with limited samples and experiments that require detection of genes with lower expression levels. Here, the performance was comparable between the 10x 5′ v1 chemistry and 3′ v3 methods, making the 5′ v1 chemistry an appropriate substitute when pairing gene expression analysis with TCR/BCR clonotyping. 10x Genomics and Illumina/Bio-Rad (ddSEQ) sell reagents in kits, facilitating adoption of these methods, but limiting customization of protocols. Takara Bio also sells reagent kits for the ICELL8, however, protocols on the instrument are customizable allowing for greater flexibility. Drop-seq is also an open system that is fully customizable and custom reagents such as target-capture oligonucleotide beads [32,33] can be easily integrated into the protocol. The fluorescent imaging capabilities of the ICELL8 cx enable the pairing of sequencing and imaging data in downstream analysis. Our ICELL8 experiments demonstrated high library efficiencies with a large fraction of reads assignable to cells and potential utility to recover low-mRNA-content cells, such as TALL-104 cells, that are more susceptible to stringent read and UMI thresholding. Overall, our data shows that all methods exhibit specific strengths which can be aligned with experimental goals, sample limitations, and budgetary constraints.

Chromium
Cells were resuspended in PBS with 0.04% BSA at a stock concentration within the recommended range (typically~1e6 cells/mL) and loaded at a volume to target between 2000 and 6000 cells depending on sample. Libraries were prepared according to manufacturer's instructions for each chemistry. Libraries were sequenced on a NextSeq500 or NovaSeq (Illumina) according to manufacturer's guidelines: 3′ v3 -28x8x0x91, 3′ v2 -26x8x0x98, and 5′ v1 -26x8x0x110 [read1xindex1xindex2xread2].
ddSEQ Cells were resuspended in PBS + 0.1% BSA and loaded at a concentration of either 2000 or 2500 cells/ul. Libraries were prepared according to manufacturer's instructions. Libraries were sequenced on an Illumina NextSeq500 (68x8x0x150) at a 3pM concentration with provided custom read 1 primers.

Drop-seq
Libraries were prepared following the McCarroll Lab Dropseq protocol (http://mccarrolllab.org/Drop-seq/) [15], with cells and beads encapsulated using the Dolomite scRNA-Seq system. Oligo beads (ChemGenes) contained the original Drop-seq polyT primer with a VN anchor at the 3′ end (TTTTTTTAAGCAGTGGTATCAACGCAGAGTAC JJJJJJJJJJJJNNNNNNNNVTTTTTTTTTTTTTTTTTTT TTTTTTTTTTTVN). Cells were resuspended in PBS + 0.01% BSA and loaded at a concentration of either 150 or 300 cells/ul. For encapsulation, cell and bead solutions were loaded at 30 ul/min and encapsulation oil was loaded at 200 ul/min. Libraries were sequenced on a NextSeq 500 (Illumina) (21x8x0x138) at a 3pM concentration with custom Read1 Drop-seq primers (GCCTGTCCGCGGAAGCAGTGGTATCA ACGCAGAGTAC).

ICELL8 cx
Cells were resuspended in 1X PBS and loaded at a final concentration of 2800 cells/ml. Only wells containing single cells as determined by the Cell-Select software using default settings were processed. Libraries were prepared using the Takara Bio SMARTER ICELL8 cx 3′ DE user manual or an alternate protocol that separates the initial reverse transcription reaction from cDNA amplification. In short, after RT, cDNA was removed from the chip and cleaned and concentrated with a Zymo Clean & Concentrator-5 kit. cDNA was then treated with 20 U of Exonuclease for 30 min at 37°C and the enzyme was deactivated with 20 min at 80°C. cDNA was then amplified and tagmented using the Illumina Nextera XT kit for the final sequencing libraries. Libraries were sequenced at 25x8x0x131.

Peripheral blood mononuclear cells (PBMC) chromium libraries
Blood samples were collected into sodium-heparin tubes (Beckton Dickinson) from a healthy adult donor through an internal blood donation program. Consent was kindly provided by the donor, and the sample was de-identified upon receipt of the sample. Blood was diluted 1:1 with PBS without Ca 2+ and Mg 2+ and centrifuged at 800 g × 30 min x room temperature without braking in Accuspin tubes (Sigma-Aldrich) containing 15 ml of Ficoll-Paque PLUS density gradient media (GE Healthcare). Leukocytes were carefully removed after centrifugation and placed in a new 50 ml tube pre-filled with chilled PBS without Ca 2+ and Mg 2+ and centrifuged again at 800 g × 7 min × 4 degrees C. After centrifugation, the cell pellet was counted and resuspended in PBS with 0.04% BSA at 1000 cells/uL. 10x 5′ v1 and 10x 3′ v3 methods (10x Genomics) were performed in succession on the same day using the same PBMC preparation. Libraries were prepared according to the manufacturer's protocol for each method to capture 10,000 cells per sample. For the 10x 5′ v1 preparation, 13 cycles were used for cDNA amplification, and 16 cycles were used for sample-index PCR amplification. For the 10x 3′ v3 prep, 11 cycles were used for cDNA amplification, and 14 cycles were used for sample-index PCR amplification. Libraries were sequenced on an Illumina NovaSeq6000 instrument using an S4 flow cell at 28x8x0x91.

Bulk RNA sequencing
RNA was isolated from cells using the RNeasy kit (Qiagen). Libraries were generated using 1μg of total RNA using a modified Illumina TruSeq Stranded mRNA protocol. Reverse transcription was performed with the addition of RNaseOut (Invitrogen) and actinomycin-D (MP Biomedicals). The resulting product was cleaned using AMPure RNAClean beads. Additionally, secondstrand synthesis was performed using dUTP instead of dTTP and an additional USER (New England Biolabs) digestion step was incorporated after size-selection. Libraries were sequenced at 101x6x0x101 on a HiSeq (Illumina) to a minimum depth of 30 million reads per sample.

Read alignment and transcript counting
Cell-line data was aligned to a combined human/mouse reference genome obtained from 10x Genomics: reference "cellRanger_1.2.0" composed of hg19 with Ensembl 82 and mm10 with Ensembl 84. PBMC data was aligned to the human reference only. Bulk data was aligned using STAR v2.5.1b and quantified using featureCounts v1.6.3 [34].
For normalizing single-cell libraries, we considered the fact that cell types with low mRNA content are more prone to dropouts and thus, may compromise proper normalization based on the mean read count per CID. Thus, robust library scaling factors were derived by using only cells with sufficiently high mRNA content. For this purpose, we calculated Gaussian kernel density estimates with a smoothing bandwidth determined via the Normal Reference Distribution method as provided by R v4.0.2. Local maxima of the density function were sorted in decreasing order. The first significant mode, d i , was considered the sequencing depth of the cell population with the highest mRNA content for sample i (Supplement Fig. 1a). Scaling factors were then derived by s i = min i (d i )/d i . FASTQ files were downsampled by factor s i using seqtk v1.3-r106. This resulted in libraries with around 50,000 reads per cell. Downsampling was repeated three times for four representative samples for an assement of the margin of error. Alignment statistics for normalized data were generated using PicardTools CollectRNA-seqMetrics using the aligned (and filtered) BAM files from each pipeline. Cell mixture (EL4, IVA12, Jurkat, and Tall) and PBMC experiments were scaled independently to the lowest local maxima value in each group.
To assess step-wise quality metrics of each method's original data we applied a uniform pipeline (Supplement Table 3). First, we demultiplexed and aligned sequenced reads using STAR as recommended by each scRNA-seq method. Here, ddSEQ data was processed using SureCell RNA Single-Cell v1.1.0 (with STAR v2.5.2b). Then, highquality reads were filtered by MAPQ scores (MAPQ = 255 for 10x, Drop-Seq, and ICELL8; MAPQ = 50 for ddSEQ) using samtools 1.9 [37] and mapped against mouse (Ensembl 84) and human (Ensembl 82) gene annotations using featureCounts from the subreadpackage 2.0.1 with parameters "-t exon -g gene_id -C -p --primary". We wrote custom Java code (v JDK 11.0.1) to generate read and UMI count matrices. Here, we considered potential sequencing errors in UMIs and corrected these as follows: reads got grouped by <barcode, gene, UMI> tuples. If two groups had the same <barcode, gene> pair, but their UMIs differed by a single base, the UMI of the smaller group was corrected to the UMI of the larger group.
Cell mixture cell type classification.
Cells were assigned to one of four input cell classes by their similarity to cell type signatures from bulk RNAseq data. First, we selected highly expressed genes with FPKM > 50 in any bulk RNA-seq sample. Next, gene expression was contrasted between bulk RNA-seq samples from the same species (EL4 vs IVA12 and Jurkat vs TALL-104) and we filtered 184 highly variable genes (93 murine, 91 human) with a ln fold difference > 3 between the two cell types. Pearson correlation, r, was calculated between each gene expression vector of each cell x i ∈ ℝ 184 and each gene expression vector of each bulk RNAseq sample y j ∈ ℝ 184 : f i ( y j ) = r(ln(x i + 1), ln(y j + 1)). A cell type was assigned using the following four classification rules derived from the correlation coefficient distributions: We relaxed these rules for ICELL8 data to account for overall lower CID numbers and method-specific distribution differences: Cells with two or more assigned cell types, were removed. For each sample, we classified the top n CIDs ranked by total read count with n = 2 × number of expected cells.
To analyze the thresholding method using the transposed log-log empirical cumulative density plot of the total read counts of each CID, we calculated the knee and inflection points as described in Lun et al. [28]. Briefly, the knee and inflection points are interpreted as determinants of the range in which the curve transitions between cell-containing droplets/wells with high mRNA content and empty droplets/wells with ambient RNA. The total count per CID is modeled as a function of decreasing CID rank by fitting cubic smooth splines with 20 degrees of freedom. The knee point is defined as the point on the curve where the signed curvature is minimized, the inflection point is defined as the point where the first derivative of the spline basis functions is minimized. We defined a lower bound for fitting the smooth splines as the minimum number of total reads of all classified CIDs. Calculations were performed using the R package DropletUtils v1.6.1. For the analysis of the original (i.e., not down-sampled) data described in Supplement Fig. 2, CIDs above the inflection point were considered genuine cells.
PBMC cell type classification PBMC data was projected onto an annotated PBMC CITE-Seq reference dataset [29] using Seurat (v3.9.9.9003) [27]. Each cell received an assignment and prediction score to a cell class in the reference. For analysis, we included CIDs with reads greater than the inflection point calculated as described above using the top 12,000 CIDs by read count and more than 200 genes detected. Data was normalized using SCTransform [38]. Cells with a prediction score > 0.5 and a mitochondrial rate < 20% were retained for further analysis. Marker genes were identified using the FindMarkers() function using the Wilcoxonrank-sum test comparing cells in each cell population with more than 10 cells to all other cells in the dataset.

Doublet rate estimation
CIDs were classified as multi-species multiplets if the number of total counts from each species exceeded the 10th percentile of the distribution for that species. The total count distributions were calculated using cells assigned in the step described above. Multiplet rates were calculated by taking CIDs above the inflection point and dividing the number of multi-species multiplets by the total number of cells. To obtain the true multiplet rate that considers non-detectable intraspecies multiplets and accounts for differing proportions of human and mouse cells, this fraction was divided by an adjustment factor λ: 9. λ ¼ 2 n human n mouse n human þ n mouse − n hm ð Þ 2 where n human is the number of human cells, n mouse is the number of mouse cells, and n hm is the number of interspecies multiplets.

Cell recovery rates
Cell recovery rates were calculated as the number of CIDs above the inflection point divided by the total number of cells loaded onto a system. Theoretical capture rates for 10x Genomics and ddSEQ methods are based on expected recovery numbers given in the user manual. Theoretical capture rates for Drop-seq were based on a 5% droplet occupancy of oligo beads. Theoretical capture rates for iCELL8 protocols were calculated based on a Poisson distribution of wells with single cells based on an average cell occupancy of 1 cell per well.

Estimation of ambient-RNA contamination
For each sample, UMI count matrices of filtered cells were analyzed using the decontX() function in the celda package (v1.2.4) [20] using cell classifications as cluster labels. Estimates of contamination were generated by calculating the difference in UMI counts after decontamination relative to the total number of UMIs detected for species-specific genes of a cell.

Pseudo-bulk analyses
Pseudo-bulk analyses analyzing correlation to bulk RNA-seq data and gene detection rates were performed by summing UMI counts across multiple cells. A subsample with at least 100 classified cells or the maximum number of classified cells recovered that meet the mitochondrial rate threshold were selected from each method. Mitochondrial rate thresholds were determined on a per-sample basis based on distribution of rates (Supplement Table 4). Various numbers of cells (1-50) were randomly sampled from this pool and expression values were averaged. The aggregated expression matrix was used for analyzing its correlation to bulk RNA-seq and for quantifying the number of detected genes. Mean values across 50 iterations for these metrics were used for visualization.

Dropout modeling
Dropout rates denote the fraction of missing values in a gene's expression vector. We estimated the dropout rate for each gene from the species of the cell type considered. Cells included in the analysis were filtered by fraction of mitochondrial counts to remove poor quality cells. Dropout rate of bulk RNA-seq data was modeled by fitting the function f(x) = a * exp(−b * x) where x is the expression level using nonlinear least squares. GD 50 FPKM numbers were calculated as 0.5 = a * exp(−b * x) using the fitted coefficients a and b. Dropout rates were similarly calculated for single-cell RNA-seq data by binning cells by mRNA detection rates. 10x 3′ v3 cells were placed into six bins based on distribution percentiles resulting equivalent numbers of cells in each bin. Dropout models were calculated for a random subset of 50 cells for each cell type with at least 50 cells in each bin and results were averaged across 50 iterations.

Differentially expressed gene identification
Statistical differences between gene expression of EL4 and IVA12 cells were identified using the hurdle model provided in the MAST R package v1.12.0 [24], a Wilcoxon rank-sum test, or the negative binomial generalized linear model available in the MASS R package v7.3-51.5. Genes that had an FDR-adjusted p-value < 1e-4 were declared differentially expressed. Cells from multiple replicates from each method were pooled in order to maximize sample sizes. Downsampling of cells was performed to the smallest number of observed cells from a single cell type (n = 199); this step was repeated 10 times to assess the error margin. UMI count data was used for 10x, ddSEQ, Drop-seq, and ICELL8 3′ DE-UMI samples, read count data was used for ICELL8 3′ DE samples. Expression count matrices were normalized by library size factors (i.e., total counts per cell), multiplied by 10 4 , and log-transformed by log 2 (x + 1). Lognormalized count matrices were subjected to MAST, normalized count matrices were used for the Wilcoxonrank sum test, and raw count data was input to the negative binomial generalized linear model. Library size estimates were also determined using the computeSumFactors() function in the scran (v1.14.6) package [25] using cell classifications as cluster identities. These estimates were then correlated to total counts per cell.
Recall and precision of each method were calculated based on true-positive (TP), false-positive (FP) and falsenegative DE genes (FN) as follows: where TP are significant DE genes as calculated with MAST with a 1.5 fold change in bulk RNA-seq data, FN are not differentially expressed in single-cell data, but had a 1.5 fold change in bulk data, and FP are significant DE genes with less than a 1.5 fold change in bulk data.