### Preparation of in vitro sample mixtures

Human genomic DNA samples of NA12878 and NA24385 were purchased from Coriell Institute for Medical Research (Camden, NJ, USA). Samples were mixed based on OD measurements, resulting in 10 and 2% of NA12878 DNA in the background of NA24385 DNA.

### Custom panel design

The first panel (N0015) was designed to maximize the number of unique NA12878 variants enriched by a limited number of primers. The regions were randomly selected from the genome so that two or more variants would appear within 150bp from the targeting primer. In total, 3587 primers were selected using custom scripts to balance coverage, primer Tm, dimer potential and predicted specificity within the human genome. The second panel (N0030) was designed to cover the full coding region of 194 cancer and inherited diseases related genes. 10,857 primers were selected so that 150bp or longer amplicons from those primers can overlap the entire coding region. Additional universal sequence was added to the 5’ end of all primer sequences and all oligos were synthesized by IDT (Coralville, IA).

### Single primer enrichment protocol

DNA libraries were prepared using components from QIAseq Targeted DNA Panel Kit (QIAGEN, Germany). Briefly, 80 ng DNA was enzymatically fragmented and end repaired in a 25ul reaction containing a 2.5ul 10X fragmentation buffer and a 5ul fragmentation enzyme mix. The reaction was carried out at 4 °C for 1 min, 32 °C for 24 min and 65 °C for 30 min. Immediately after the reaction, 10ul 5X ligation buffer, 5ul DNA ligase, 2.8ul 25uM barcoded adapters and water were added to 50ul. The reaction continued at 20 °C for 15 min. To ensure complete removal of free barcoded adapters, each reaction was purified for two rounds using 1.8X and 0.9X Ampure beads (Beckman, US). The purified DNA was then mixed in 20ul with 10nM each target primer, 400nM IL-Forward primer, 1X TEPCR buffer and 0.8ul HotStarTaq DNA polymerase. The PCR enrichment condition was: 95 °C for 13 min, 98 °C for 2 min; six cycles of 98 °C for 15 s and 65 °C for 15 min; 72 °C for 5 min. Each reaction was cleaned once using 0.9X Ampure beads (Beckman, US) to remove unused primers. Enriched DNA was combined with 400nM IL-Universal primer, 400nM IL-Index primer, 1X UPCR buffer and 1ul HotStarTaq DNA polymerase in a volume of 20 ul. The universal PCR condition was: 95 °C for 13 min, 98 °C for 2 min; 20 cycles of 98 °C for 15 s and 60 °C for 2 min; 72 °C for 5 min. The DNA library was purified once using 0.9X Ampure beads and quantified using QIAseq Library Quant System (QIAGEN, Germany). Both libraries were sequenced on Illumina NextSeq (pair-end, 2x150 bp) following manufacturer’s user manual (Illumina, CA).

### Read processing

Original reads in the FASTQ file are processed in the order of adapter trimming, read aligning, barcode clustering, and primer region trimming (detailed pipeline described in Additional file 2: Supplementary Methods). As a first step, non-genomic sequences such as sequencing adapters and molecular barcode region (Additional file 2: Figure S5) are trimmed off by Cutadapt [24] and custom scripts. The trimmed reads are then aligned to the reference genome using BWA-MEM [25], creating the alignment file in BAM format. A custom barcode clustering algorithm is applied to account for possible sequencing or PCR errors in the barcode sequence. In brief, the algorithm identifies barcodes that are within short edit distance (typically ≤ 1), and then combines them if one barcode has significantly more supporting reads than others. The procedure is described in more detail in [8].

Standard GATK workflow steps such as indel realignment and base quality score recalibration [26] are not implemented in this work because they are very time consuming for ultra-deep read sets. However these procedures are recommended, and can be implemented within a reasonable time for shallower reads.

### smCounter workflow

smCounter processes individual target locus sequentially and independently. At a target locus, reads covering the locus are collected by samtools mpileup and screened by several pre-processing filters. High quality reads that pass the pre-processing filters are put into the statistical model to determine if a potential (candidate) variant exists. Finally, a variant call will be made if the candidate variant passes a number of post-processing filters. We described the statistical model below and the filters in Additional file 2: Table S2, S3. In particular, we gave a detailed discussion on the strand bias filter in Additional file 2: Supplementary Methods.

smCounter follows two steps to evaluate the likelihood of mutation at a target locus. The first step aims to estimate the posterior probability of each allele using all reads originating from an individual barcode. For example, the posterior probability of an ‘*A*’ nucleotide for the *k*
^{th} barcode can be calculated by Bayes rule as

$$ P(A\mid BC_{k}) = \frac{P(BC_{k} \mid A)P(A) }{\sum_{X \in \Theta} P(BC_{k} \mid X) P(X)}, $$

(1)

where *P*(*X*) is the prior probability of allele *X*, *P*(*B*
*C*
_{
k
}∣*X*) is the likelihood of all alleles in barcode *k* given the true allele *X* in the corresponding DNA molecule, and *Θ* includes the set of *A*,*T*,*G*,*C* and any indels observed on the reads covering the position. For simplicity, each allele is given an equal prior probability, i.e., *P*(*X*)=1/|*Θ*|, where |*Θ*| being the cardinality of *Θ*.

*P*(*B*
*C*
_{
k
}|*X*) is a key component of the model. However, its analytical form cannot be easily obtained, so instead we give a heuristic approximation. The main assumption is that the non-*X* alleles are caused by a mixture of base-calling errors during sequencing and PCR enzymatic errors during DNA enrichment. Without knowing the relative fraction of the two errors, we approximate *P*(*B*
*C*
_{
k
}|*X*) by the weighted sum of *P*(base-calling error) and *P*(PCR error). The idea is illustrated as follows.

Suppose we want to calculate *P*(*B*
*C*
_{
k
}|*A*). Assume the barcode has *n* supporting reads and *n*
_{
A
}, *n*
_{
T
}, *n*
_{
G
}, *n*
_{
C
} of the reads has *A*,*T*,*G*,*C* at the position. For simplicity, we assume there is no indel or ambiguous base, so *n*
_{
A
}+*n*
_{
T
}+*n*
_{
G
}+*n*
_{
C
}=*n*. Furthermore, we assume the Phred quality scores of ‘*A*’ bases are \(q_{(A)1}, \ldots, q_{(A)n_{A}}\phantom {\dot {i}\!}\). The base-calling error probabilities are \(e_{(A)1}, \ldots, e_{(A)n_{A}}\phantom {\dot {i}\!}\), where \(e_{(A)j}=10^{-q_{(A)j}/10}\phantom {\dot {i}\!}\). The same notations apply to *G*,*T*,*C* reads. Given true allele *A*, the probability that *G*,*T*,*C* alleles are base-calling errors is \(C_{p}\prod _{Y\in \Theta } \prod _{i=1}^{n_{Y}} e_{(Y)i}^{1-I(Y=A)} (1-e_{(Y)i})^{I(Y=A)}\). In this expression, *C*
_{
p
}=3×10^{−5} is approximately the probability of no PCR error, estimated assuming 30 PCR cycles and a 10^{−6} PCR error rate in each cycle. *Θ* is the set of {*A*,*T*,*G*,*C*}, and *I*(*Y*=*A*) equals 1 if *Y*=*A* and 0 otherwise. The probability that *G*,*T*,*C* alleles are PCR errors is approximated by the product of \(\prod _{Y\in \Theta } \prod _{i=1}^{n_{Y}} (1-e_{(Y)i})\) (probability of no base-calling error) and \(\min _{Y\in \{G,T,C\}} 10^{-6(0.5+n_{Y})/\sum _{Z\in \Theta }(0.5+n_{Z})}\) (heuristic approximation of PCR error). In summary, *P*(*B*
*C*
_{
k
}|*A*) is approximated by

$$\begin{array}{*{20}l} P(BC_{k}\mid A) &\approx \alpha C_{p}\prod_{Y\in \Theta} \prod_{i=1}^{n_{Y}} e_{(Y)i}^{1-I(Y=A)} (1-e_{(Y)i})^{I(Y=A)} \\ &+ (1-\alpha) \prod\limits_{Y\in \Theta} \prod\limits_{i=1}^{n_{Y}} (1-e_{(Y)i}) \min_{Y\in\{G,T,C\}}\\&\quad 10^{-6(0.5+n_{Y})/\sum_{Z\in\Theta}(0.5+n_{Z})}, \end{array} $$

(2)

where *α*∈ [ 0,1] represents the relative weight between base-calling errors and PCR errors. The choice of *α* has no significant impact on smCounter’s performance (Additional file 2: Figure S6). Throughout this study, we have set *α*=0.5, which is mathematically equivalent to a direct sum of base-calling errors and PCR errors. *P*(*B*
*C*
_{
k
}∣*X*) is calculated using the same formula for any other allele *X*. Combining all components and applying equation (1), we have the estimation of barcode level posterior probability of each allele.

The second step is to evaluate the total evidence of each allele at the locus. Assuming the locus is covered by *N* barcodes that are independent from each other, the probability that *A* is a true allele, i.e., at least one of the DNA copies has *A* at the locus, is given by \(P(A) = 1-\prod _{k=1}^{N} \big (1-P(BC_{k}\mid A)\big)\). For computational purposes, we take the logarithm of 1−*P*(*A*) and define the prediction index of *A* as

$$ I(A) = -\sum_{k=1}^{N}\log_{10}\big(1-P(A|BC_{k})\big). $$

(3)

*I*(*A*) can be interpreted as a measurement of the evidence for *A* at the locus. If *A* is a non-reference base and *I*(*A*) exceeds the preselected threshold, *A* is considered as a candidate point mutation.

### Benchmarking variant calling performance

The initial ground truth set consists of the high confidence heterozygous NA12878 variants (v2.19, ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv2.19/) in the target region with NA24385’s variants (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_CallsIn2Technologies_05182015/) being masked out. In addition, we identified one NA12878 SNP and ten NA24385 SNPs in the N0015 target region, as well as one NA24385 SNP in the N0030 target region, that were not present in the released variant sets. All twelve putative variants were visually verified in IGV and some were again verified in an independent read set with 80% NA12878 admixture as well as in other publicly available data for NA12878 and NA24385. The ground truth set was modified assuming these putative variants are real.

Variant calling performance was benchmarked against the modified ground truth set. Comparing variants identified from different variant callers is not a trivial task, because complex variants can often be represented in different forms. For this task we used the vcfeval function in RTG Tools [21], which performs complicated VCF comparison and determines whether different representations indicate the same complex variant.

### Estimation of locus specific detection limit based on molecular barcode counts

The detection limit at a specific locus can be estimated based on the barcode depth at the locus, the average barcode depth of the sequencing run, and the desired false positive rate. Let the locus specific barcode depth be *d* and the average barcode depth be *D*. Assume the desired false positive rate is 20 per megabase. To call a variant at the locus, the prediction index given by equation (3) needs to exceed the cutoff value approximated by 14+0.012*d* (Table 3). Let *I* be the average prediction index of a barcode with good read evidence. In N0030, *I* is typically around 3.5, which was the value used in Fig. 4. The minimum number of barcodes with the alternative allele (alternative barcode) is estimated by the ceiling of (14+0.012*d*)/*I*.

We seek the minimum allele fraction *f* such that the minimum number of alternative barcode is reached with at least 95% probability, assuming the actual number of alternative barcodes follows a binomial distribution Bin(*D*,*f*). The probability of observing sufficient alternative barcodes can then be calculated by *P*
_{
D,f
}(*X*≥(14+0.012*d*)/*I*), where *P*
_{
D,f
} is the probability distribution function of the binomial distribution Bin(*D*,*f*). Finally, we find the detection limit *f*
^{∗} that ensures this probability is at least 95%:

$$f^{*} = \min_{f\in[0,1]} P_{D,f} \left(X \ge \frac{14+0.012d}{I}\right) \ge 0.95. $$

In this estimation, a different equation for the cutoff value or a different barcode level prediction index *I* may be used, depending on the desired false positive rate and the average number of rpb.