Figure 4From: Identification of errors introduced during high throughput sequencing of the T cell receptor repertoireComplementation in error occurrence. An expected frequency of multiple errors was calculated based on the assumption that each error is independent using the formula p = C(SER)M, where SER = observed single error rate, M = number of mutated nt in sequence, and C = total number of possible erroneous sequence combinations. C = N!/(M!x(N-M)!), where N = number of nucleotides in the sequence. The expected frequency of multiple mutations is plotted against the observed frequency in experimental samples either for data sets not filtered based on phred score or filtered at a q = 30, and for the presence of between 2 and 10 mutated nt for q = 0 and 2 and 4 for q = 30 (no events were observed with 4-10 mutations for q = 30 filtered data).Back to article page