Skip to main content
Fig. 3 | BMC Genomics

Fig. 3

From: Analysis and correction of compositional bias in sparse sequencing count data

Fig. 3

Compositional bias introduced by sequencing technology. As a sample j from group g of interest is prepared for sequencing, its true internal feature concentrations (organized as a vector) \(X_{gj}^{0}\) is transformed by various technical biases to Xgj. A sequencing machine introduces compositional bias by generating counts Ygj proportional to the input absolute abundances in Xgj according to proportions \(q_{gj}=\left [\ldots x_{gji}/\left (\sum _{k} x_{gjk} \right)\ldots \right ]\), i and k indexing features. Directly performing a differential abundance test on Y (DE Test 1), by using normalization factors proportional to that of total sequencing output (ex: R/FPKM/subsampling in metagenomics) amounts to testing for changes in relative abundances of features in X, in general (not X0). For inferring differences in absolute abundance, we need to reconstruct X0 from Y to perform our inference (DE Test 3). For compositional bias correction in particular, we care about reconstructing Xj from Y (DE Test 2). We show more formally later that compositional correction can reconstruct X0 if technical biases perturb all feature abundances by the same factor, and that the presence of sequence-able contaminants induces more stricter assumptions behind their application

Back to article page