Taxa counting from short read data. Taxa count analysis for SYV (EC: 220.127.116.11) based on the 1780 Illumina short reads sharing the common leading SP. A) Counts are displayed as function of the minimal Hamming distance between fused strings. Different curves represent different sample sizes varying (from bottom to top) from S = 200 to 1600. Mean values and errors on the mean are calculated from 20 random realizations at each sample size. B) Mean counts as function of sample size S, for d ≥ 1-10 (top to bottom). C) Mean counts as function of sample size for d ≥ 1-5 for the data in B (solid) and for artificial data (dashed) constructed from the real data into which artificial errors were introduced with probability of 1% per amino acid.