Core- and pan-genome size estimates. Observations and estimates of core- and pan-genome sizes. The horisontal axis is on log2 scale. Solid blue markers represent the observed data; squares are the core genes, circles are the median number of genes for an individual genome, and the triangles are the total number of gene families found in the data set. The red "+" represents the estimated core size, whilst the red "x" is the estimated size of the pan-genome using the binomial mixture model. The red "c" is the Chao lower-bound estimate of pan-size. The bars represents a 90% naive bootstrap confidence interval for the pan-genome, giving a rough indication of uncertainty.