Detection of high variability in gene expression from single-cell RNA-seq profiling
- Hung-I Harry Chen^{1, 2},
- Yufang Jin^{2},
- Yufei Huang^{2}Email author and
- Yidong Chen^{1, 3}Email author
Published: 22 August 2016
Abstract
Background
The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Unlike the objectives of conventional RNA-seq where differential expression analysis is the integral component, the most important goal of scRNA-seq is to identify highly variable genes across a population of cells, to account for the discrete nature of single-cell gene expression and uniqueness of sequencing library preparation protocol for single-cell sequencing. However, there is lack of generic expression variation model for different scRNA-seq data sets. Hence, the objective of this study is to develop a gene expression variation model (GEVM), utilizing the relationship between coefficient of variation (CV) and average expression level to address the over-dispersion of single-cell data, and its corresponding statistical significance to quantify the variably expressed genes (VEGs).
Results
We have built a simulation framework that generated scRNA-seq data with different number of cells, model parameters, and variation levels. We implemented our GEVM and demonstrated the robustness by using a set of simulated scRNA-seq data under different conditions. We evaluated the regression robustness using root-mean-square error (RMSE) and assessed the parameter estimation process by varying initial model parameters that deviated from homogeneous cell population. We also applied the GEVM on real scRNA-seq data to test the performance under distinct cases.
Conclusions
In this paper, we proposed a gene expression variation model that can be used to determine significant variably expressed genes. Applying the model to the simulated single-cell data, we observed robust parameter estimation under different conditions with minimal root mean square errors. We also examined the model on two distinct scRNA-seq data sets using different single-cell protocols and determined the VEGs. Obtaining VEGs allowed us to observe possible subpopulations, providing further evidences of cell heterogeneity. With the GEVM, we can easily find out significant variably expressed genes in different scRNA-seq data sets.
Keywords
Background
Single-cell analysis has emerged a decade ago to understand the heterogeneity of a cell population, especially in biology contexts such as early embryonic development and tumor etiology [1]. Single-cell quantitative PCR (qPCR) [2–4] or single-molecule RNA fluorescence in situ hybridization (FISH) [5] have been widely used as low-throughput approaches to measure the expression of specific genes at a single-cell level. Although experiments using these methods can provide crucial information of cellular heterogeneity and the presence of distinct cell subpopulations, only a small number of genes can be monitored simultaneously. RNA sequencing (RNA-seq), a developed approach using next-generation sequencing (NGS) technology, can unbiasedly detect the genome-wide gene expression of a sample. Bulk RNA-seq experiments start with a large population of cells (> 10^{5}), and the gene expression levels are considered as the average expression across the population of a cell pool [6]. Bulk RNA-seq might be sufficient in many contexts such as revealing the aberration of mRNA expression between different treatments, conditions, or phenotypes. However, biological questions like diversity in early stage development embryonic cells, which each cell has distinct functions, can't be explained by bulk RNA-seq experiments. With recent introduction of Smart-seq protocol, the required volume of starting materials has been vastly reduced, making the single-cell RNA sequencing (scRNA-seq) achievable [7, 8].
There are already several protocols for sequencing of single cells, which allow researchers to assay high-throughput gene expression profiling at the single-cell level of a large number of cells. However, unlike the conventional RNA-seq where analysis tools are abundantly available, the lack of bioinformatics tools for single-cell RNA-seq limits its huge potential. Comparing with bulk RNA-seq measurements, single-cell RNA-seq data tend to have much lower read counts (~200,000 to 5 million reads per cell) [9], higher variability, and large number of outliers, and all these are poorly accommodated by conventional RNA-seq analysis methods [10]. Unlike the objectives of conventional RNA-seq where differential expression analysis and the detection of differentially expressed genes (DEGs) are integral components, the most important goal of scRNA-seq is to identify variably expressed genes (VEGs) across a population of cells to account for the discrete nature of single-cell gene expression and uniqueness of sequencing library preparation protocol for single-cell sequencing. As we observed, the transcriptional heterogeneity of the cell population can be assessed by the expression variation difference, whether they are lowly or highly expressed, which conventional RNA-seq analysis failed to identify due to the assumption of homogeneity within each cell subtype.
In recent studies, gene expression variation models were proposed specifically for the corresponded scRNA-seq experiments in order to detect VEGs deviated from the Poisson model [11, 12]. However, different scRNA-seq data sets rendered different distributions and a common mathematical description is necessary. Hence, the purpose of this study is to provide a mathematical description of a gene expression variation model (GEVM) for scRNA-seq data. The model addresses the over-dispersion of single-cell data and the additional variability caused by different sources of variation. By exploiting existing statistical tools such as local regression and nonlinear least squares curve fitting, the parameters of gene variation model are estimated and statistical significant VEGs can be identified. To study the robustness of the model, we have built a simulation framework to generate single-cell RNA-seq data using different distributions in each step to imitate the dispersion of real data in different conditions. We demonstrated robustness of our method by applying it to the simulated data and test how precise we can estimate the parameters to the initial settings.
Methods
Modeling of single-cell RNA-seq data
Therefore, by measuring the CV and mean abundance of gene expression μ from all genes, we can estimate the two parameters α and β and dissect the baseline of the cell population. Note that from Eq. 4 when the mean expression level μ becomes larger, \( CV\to \sqrt{\alpha } \), or a constant coefficient variation [15], and when μ ≪ 1 the σ ^{2} → βμ, or equivalently to a Poisson parameter λ ' = βμ.
Estimation of model parameters and selection of significant VEGs
Determination of p-value of VEGs
Instead of picking VEGs by the rank of CV difference D _{ i }, we hypothesize that under the assumption of a homogeneous cell population, the CV difference to the model curve (Eq. 4) possesses a normal distribution (around baseline). We further assume that majority of genes, in a heterogeneous cell population, do not deviate much. Therefore, we use the CV differences of the data points around the model curve (Eq. 4) to fit a normal distribution. Even though robust local regression is used to estimate the expression variation model, the model is still influenced by those outlier genes. Hence, we use kernel density to find the center of the normal distribution. Afterwards, we fit the normal distribution using the CV differences below the center point. We can calculate p-value of each data point from the normal distribution and determine the significance of VEGs comparing to initial homogenous cell population. The procedure of Benjamini and Hochberg [16] is also applied to obtain the false discovery rate (FDR). Fig. 1 shows the overall workflow for detecting VEGs in scRNA-seq data set.
Simulation of scRNA-seq data from a homogeneous cell population
In order to evaluate the robustness of our GEVM, we generated a set of simulated data where we could control the baseline parameters and the differential expression status for a set of genes in a random set of cells. First, we utilized exponential distribution (with 3 different mean values: 0.25, 1, and 10, respectively) to create a “master cell” and its genome-wide expression levels of a cell population. The two lower mean values were designed to reflect the nature of low expression events in scRNA-seq data. The master cell expression level M _{ i } would be the base expression value of gene i for the other single cells in the population (children cells derived from a single master cell).
Equations 6 and 7 were obtained utilizing our model Eq. 3. The mean value of gene i in cell j, μ _{ ij }, was derived from the master cell expression level with a Gaussian distribution of μ _{ ij } = N(M _{ i }, max(0.2, 0.2 * M _{ i })). Here we required standard deviation greater than 0.2 to avoid small or near 0 standard deviation.
Simulation of scRNA-seq data from a heterogeneous cell population
Single-cell RNA-seq data set for testing
Results and discussion
Implementation of noise model on simulation data
To understand the robustness and limitation of the noise model, simulated data sets with different parameters compositions were generated by using R and then proceeded to identify the significantly VEGs following the flow chart in Fig. 1. Simulation modules implemented were: 1) Master cell gene expression generation; 2) homogeneous cell population gene expression generation (with model parameter α and β); 3) heterogeneous cell population generation (with model parameter prct for number of genes deviated from homogeneous cell population, and s for gene expression variation, Eq. 8).
Estimation of model parameters \( \hat{\alpha}\ and\ \hat{\beta} \) under different α and β with fixed number of cells, prct, s, and gene number = 15,000. Comparing with the noise model in Eq. 9, we have obtained fairly low RMSE in each condition
Simulation parameters | Regression results | |||||||
---|---|---|---|---|---|---|---|---|
# of cells | Prct (%) | s | α | β | \( \hat{\alpha} \) | \( \hat{\beta} \) | RMSE | RMSE (Eq. 9) |
1,000 | 10 | 2 | 0 | 1 | 0.0003 ± 0.0002 | 1.0293 ± 0.0014 | 0.0074 ± 0.0006 | 0.044 ± 0.010 |
1.2 | 0.0004 ± 0.0003 | 1.2187 ± 0.0024 | 0.0047 ± 0.0006 | 0.066 ± 0.020 | ||||
1.5 | 0.0007 ± 0.0004 | 1.5032 ± 0.0039 | 0.0028 ± 0.0007 | 0.091 ± 0.019 | ||||
0.15 | 1 | 0.1557 ± 0.0005 | 1.0116 ± 0.0009 | 0.0047 ± 0.0003 | 0.049 ± 0.008 | |||
1.2 | 0.1562 ± 0.0007 | 1.1965 ± 0.0020 | 0.0038 ± 0.0004 | 0.030 ± 0.004 | ||||
1.5 | 0.1569 ± 0.0007 | 1.4756 ± 0.0047 | 0.0043 ± 0.0006 | 0.017 ± 0.001 | ||||
0.5 | 1 | 0.5146 ± 0.0013 | 1.0020 ± 0.0010 | 0.0038 ± 0.0004 | 0.060 ± 0.006 | |||
1.2 | 0.5161 ± 0.0011 | 1.1837 ± 0.0023 | 0.0039 ± 0.0003 | 0.047 ± 0.006 | ||||
1.5 | 0.5187 ± 0.0016 | 1.4561 ± 0.0045 | 0.0054 ± 0.0005 | 0.030 ± 0.004 |
Estimation of model parameters \( \hat{\alpha}\ and\ \hat{\beta} \) under different prct and s with fixed number of cells, α, β, and gene number = 15,000
Simulation parameters | Regression results | |||||||
---|---|---|---|---|---|---|---|---|
# of cells | α | β | s | Prct (%) | \( \hat{\alpha} \) | \( \hat{\beta} \) | RMSE | RMSE (Eq. 9) |
1,000 | 0.15 | 1.2 | 1 | 10 | 0.1563 ± 0.0005 | 1.1965 ± 0.0018 | 0.0037 ± 0.0003 | 0.028 ± 0.002 |
30 | 0.1579 ± 0.0006 | 1.2017 ± 0.0019 | 0.0048 ± 0.0003 | 0.026 ± 0.001 | ||||
50 | 0.1612 ± 0.0009 | 1.2076 ± 0.0023 | 0.0071 ± 0.0005 | 0.027 ± 0.001 | ||||
2 | 10 | 0.1563 ± 0.0005 | 1.1961 ± 0.0019 | 0.0040 ± 0.0005 | 0.033 ± 0.007 | |||
30 | 0.1612 ± 0.0017 | 1.2015 ± 0.0024 | 0.0077 ± 0.0014 | 0.036 ± 0.001 | ||||
50 | 0.1713 ± 0.0014 | 1.2080 ± 0.0024 | 0.0147 ± 0.0012 | 0.050 ± 0.002 | ||||
3 | 10 | 0.1572 ± 0.0012 | 1.1963 ± 0.0026 | 0.0056 ± 0.0009 | 0.048 ± 0.008 | |||
30 | 0.1649 ± 0.0010 | 1.1997 ± 0.0027 | 0.0122 ± 0.0011 | 0.054 ± 0.002 | ||||
50 | 0.1775 ± 0.0012 | 1.2078 ± 0.0030 | 0.0225 ± 0.0011 | 0.096 ± 0.003 |
Estimation of model parameters \( \hat{\alpha}\ and\ \hat{\beta} \) under number of cells with fixed α, β, prct, s, and gene number = 15,000
Simulation parameters | Regression results | |||||||
---|---|---|---|---|---|---|---|---|
Prct (%) | s | α | β | # of cells | \( \hat{\alpha} \) | \( \hat{\beta} \) | RMSE | RMSE (Eq. 9) |
10 | 2 | 0.15 | 1.2 | 50 | 0.1595 ± 0.0024 | 1.1078 ± 0.0040 | 0.0127 ± 0.0007 | 0.037 ± 0.007 |
100 | 0.1587 ± 0.0023 | 1.1416 ± 0.0044 | 0.0085 ± 0.0009 | 0.034 ± 0.006 | ||||
500 | 0.1575 ± 0.0009 | 1.1836 ± 0.0036 | 0.0047 ± 0.0008 | 0.032 ± 0.008 |
Estimation of model parameters (α and β)
We firstly fixed the data set size 15,000 genes and 1,000 cells with prct = 10 % and s = 2, only the model parameters α and β were changed, and the fit results of simulation data are shown in Fig. 2. When α = 0 and β = 1, we simply simulated the data in a Poisson distribution, following a diagonal line in the figure. When α became larger, the curve angled more prominent, which indicated data deviated from Poisson distribution at the larger expression level. The increase of β resulted in the entire data shifting away from the diagonal line, which might be associated with different sources of technical noises. We observed the robust parameter estimation as shown in Table 1 in all initial model parameters (with RMSE less than 0.01 for all these simulated cases). We noted that sometimes the current model failed to fit a straight line when α = 0, which we will investigate further for regression procedures at higher expression level specifically.
When the input parameter β became larger, the two estimated model parameters were deviated from the input parameters. However, even in the extreme case where α = 0.5 and β = 1.5, the RMSE still very consistent in our model (0.0054 ± 0.0005, see Table 1). On the other hand, the orange line - the simple noise model fitting using Eq. 9, can hardly fit the baseline of the simulated data, which results in high RMSE (~0.05, 10x larger than our proposed method) in most conditions.
We further examined the number of significant VEGs under each condition. The pale green points in the log(μ)-log(CV) plots in Fig. 2 were the selected as significant VEGs with FDR < 0.05. In the ideal condition where α = 0 and β = 1, there are in average 940 genes changed by at least two fold change and we have detected around 700 VEGs. Along with the increase α and β, the number of significant VEGs decreased. In the condition where α = 0.5 and β = 1.5, there are only around 250 VEGs detected, where around 950 genes are altered by at least two fold change. It is reasonable since the data are more disperse when α and β become larger. The dispersion affects the fitted normal distribution of CV difference while determining the p-value for VEGs, which results in worse FDR when the model parameters are large.
Test estimation robustness with varying degree of heterogeneity of cell population
Next we tested the performance of model under different percentage of genes affected by random log2 fold change values, which were generated by a normal distribution with zero mean and standard deviation s (Eq. 8). The data set size was still set as 15,000 genes and 1,000 cells, and we fixed the model parameters where α = 0.15 and β = 1.2. From the results in Table 2, we could observe that model parameter β is mostly identical and remained close to 1.2 under different levels and numbers of variable genes. However, the model parameter α became larger (from 0.156 to 0.178) with the increments of s and prct. This is unavoidable because α represents the dispersion of the data set. With more genes deviated from the homogeneous population, the dispersion increased and estimated α biased from the input model parameter value. Due to the deviation of α, RMSE also increased when s and prct became larger. We concluded that the scale and number of variable genes influence the estimation of model parameter α, which results in the increase of RMSE. Nevertheless, this issue is solved during the determination of the distribution of CV difference, where we use kernel density to adjust the center of the normal distribution.
Test estimation robustness with varying number of cells
At last, we would like to know if the model could be properly fit with limited number of cells. We reduced the population size to 50, 100, or 500 cells. To test under a moderate variation condition, we set prct = 10 %, and s = 2, with model parameters remained as α = 0.15 and β = 1.2. The results in Table 3 show that reducing the number of cells slightly affected the estimation of α: α is larger when the number of cells is smaller, in which CV of genes are more disperse. The estimation of β also deviated a bit with the decrease of the population size. Under 50 and 100 cells conditions, the scattering of the data points around the diagonal line resulted in the estimation error of β and a higher RMSE in lower number of cells. Moreover, the two factors that influenced the estimation of α and β also played a role in calling significant VEGs. Under the same number of genes, we determined only about 355 VEGs in 500 cells condition, whereas about 596 VEGs were called in 50 cells condition. With only a small number of cells, the normal distribution of homogeneous genes is difficult to estimate accurately, which might result in the increase or decrease of detected VEGs. Hence, a sufficient number of cells is necessary to accurately determine VEGs among a cell population.
In conclusion, the major factors that influence the robustness of the noise model are how data distributes and the number of cells. Fitting errors arise from two situations, 1) the estimated parameters are unusually large (especially β) in the simulated cases, which is unlikely in real scRNA-seq data, 2) the data distribute close to the diagonal line in the CV-mean plot, but with many variable genes at higher expression level, which results in the failure of fitting a straight line. The cell population size is also a concern; however, in reality a single-cell experiment should be designed with a large number of cells. Hence, the population size may not be a major factor for most single cell applications.
From the simulation results we could find out that a simple fitting method is not enough. By fitting the model in Eq. 9 straightforwardly, we got much larger RMSE in every condition. In contrast, our expression variation model design with multiple layers of estimates can be fitted properly for most of the experiment condition. However, in some cases the fitted model curve (red) lay under the local fit curve (cyan) at the middle mean abundance interval, which it might be a potential problem occasionally.
Application on real data sets
Determination of single-cell subpopulations
Even though the two scRNA-seq experiments obtained from GEO database used two different techniques to capture single cells with vastly different distributions in the CV-mean plots as shown in Figs. 3 and 4 , we could fit the expression variation models properly for both data. In the previous two studies [11, 12], it has been demonstrated that, using VEGs, cell heterogeneity has been detected along with associated biological functions of subpopulations. Clearly, finding the VEGs of a single-cell experiment is just the first step. The subsequent analyses that utilizing VEGs and their expression changes across the cell population are the key of single-cell RNA-seq analysis.
Conclusion
In this paper, we proposed a single cell gene expression variation model, and demonstrated the method to regress the model parameters for a single-cell RNA-seq experiment by exploiting the relationship between the coefficient of variation and mean transcript abundance of all genes in the genome. A single-cell data simulation was also designed and used to determine the robustness of the model parameter estimation. In most condition the model parameters were estimated precisely, and resistant to the influence of factors such as population size, and dispersion of genes. The results of testing on two real scRNA-seq data sets further confirmed our simulation, while additional modeling requirement due to lower total UMI count per cell warrants further investigation.
Abbreviations
CV, Coefficient of Variation; DE, Differential Expression; DEG, Differentially Expressed Gene; GEVM, Gene Expression Variation Model; NB, Negative Binomial (distribution, model); NGS, Next Generation Sequencing; PC, Principal Component; PCA, Principal Component Analysis; RMSE, Root-mean Square Error; scRNA-seq, single-cell RNA-seq; UMI, Unique Molecular Identifier; VEGs, Variably Expressed Genes
Declarations
Acknowledgement
This research was supported in part by the Genome Sequencing Facility of the Greehey Children's Cancer Research Institute, UTHSCSA. Fundings for this research were provided partially by the NCI Cancer Center Shared Resources (NIH-NCI P30CA54174), NIH (CTSA 1UL1RR025767-01 to YC and CPRIT (RP120685-C2) grants to YC and HHC, and NIGMS (R01GM113245) to YH and YC.
Declarations
The publication costs for this article were funded by the CPRIT grants to YC mentioned below.
This article has been published as part of BMC Genomics Volume 17 Supplement 7, 2016: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM) 2015: genomics. The full contents of the supplement are available online at http://bmcgenomics.biomedcentral.com/articles/supplements/volume-17-supplement-7.
Availability of data and materials
The R scripts of the algorithm and the UMI data for GSE65525 will be available from GitHub, https://github.com/hillas/scVEGs.
Authors' contributions
All authors contribute to the manuscript. HHC, YJ, YH and YC conceived and designed the study. HHC carried out the simulation procedure and implemented the algorithm in R. All authors read and approved the final manuscript.
Competing interests
Authors declare no competing interest in preparing the paper and developing the software associated to this paper.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet. 2015;16(3):133–45.View ArticlePubMedGoogle Scholar
- Taniguchi K, Kajiyama T, Kambara H. Quantitative analysis of gene expression in a single cell by qPCR. Nat Methods. 2009;6(7):503–6.View ArticlePubMedGoogle Scholar
- Livak KJ, Wills QF, Tipping AJ, Datta K, Mittal R, Goldson AJ, Sexton DW, Holmes CC. Methods for qPCR gene expression profiling applied to 1440 lymphoblastoid single cells. Methods. 2013;59(1):71–9.Google Scholar
- Sanchez-Freire V, Ebert AD, Kalisky T, Quake SR, Wu JC. Microfluidic single-cell real-time PCR for comparative analysis of gene expression patterns. Nat Protoc. 2012;7(5):829–38.View ArticlePubMedGoogle Scholar
- Raj A, van den Bogaard P, Rifkin SA, van Oudenaarden A, Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat Methods. 2008;5(10):877–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim JK, Marioni JC. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 2013;14(1):R7.View ArticlePubMedPubMed CentralGoogle Scholar
- Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, et al. MRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6(5):377–82.Google Scholar
- Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol. 2012;30(8):777–82.Google Scholar
- Streets AM, Huang Y. How deep is enough in single-cell RNA-seq? Nat Biotechnol. 2014;32(10):1005–6.View ArticlePubMedGoogle Scholar
- Saliba AE, Westermann AJ, Gorski SA, Vogel J. Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 2014;42(14):8845–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015;161(5):1187–201.Google Scholar
- Zeisel A, Munoz-Manchado AB, Codeluppi S, Lonnerberg P, La Manno G, Jureus A, Marques S, Munguba H, He L, Betsholtz C, et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015;347(6226):1138–42.Google Scholar
- Robinson MD, McCarthy DJ, Smyth GK. EdgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.View ArticlePubMedGoogle Scholar
- Marinov GK, Williams BA, McCue K, Schroth GP, Gertz J, Myers RM, Wold BJ. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 2014;24(3):496–510.Google Scholar
- Chen Y, Kamat V, Dougherty ER, Bittner ML, Meltzer PS, Trent JM. Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics. 2002;18(9):1207–15.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57:289–300.Google Scholar
- Hashimshony T, Wagner F, Sher N, Yanai I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2012;2(3):666–73.View ArticlePubMedGoogle Scholar
- Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, Lonnerberg P, Linnarsson S. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods. 2014;11(2):163–6.Google Scholar
- Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014;11(7):740–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.View ArticlePubMedPubMed CentralGoogle Scholar