Analysis and correction of compositional bias in sparse sequencing count data

BMC Genomics

Table 3 Correlations of compositional scales with orthogonal measurements on absolute abundances/technical biases

Dataset	Type	CLR	TMM	CSS	Scran	W ₀	W ₁	W ₂	W ₃
Tara oceans [10]	16s (from whole metagenome)	0 (−2.65×10⁻⁶)	0.26	0.15	0.52	.58	.54	.53	.53
Rat bodyMap [44]	Bulk RNAseq	-0.36	0.22	0.16	0.18	.20	.19	.20	.26
Embryonic stem cells [62]	UMI/scRNAseq	-0.70	.70	.67	.67	.71	.70	.70	.68

Correlations of logged reconstructed abundance factors (1/compositional correction factor) with logged total flow cytometry cell counts is shown for the Tara project. Correlations of logged normalization factors with logged total ERCC counts are shown in the case of the rat body map and embryonic stem cells datasets. Given the high sparsity in these datsets, CLR factors computed by adding pseudocounts, essentially had no information on technical biases. W₁,…W₃ are estimators proposed in the Methods section that adjust the base estimator W₀ for feature-wise zero-generation properties. All are presented here for comparison purposes. The default Wrench estimator (W₂) compares well at low and high coverage settings. For more details on these and the distinction in terminology between compositional correction factors and normalization factors, refer Materials and Methods. Bland-Altman plots for the data underlying these numbers are presented in Additional file 1: Figure S18–S20, and related discussions in Additional file 1: Section 9

ISSN: 1471-2164