Prediction of core, new and pan genes in L. monocytogenes. (A) Exponential regression analysis that predicts the number of core genes in N sequenced genomes. For each N, permutations are randomly sampled and the number of core genes conserved in all N genomes is computed. The estimated number of core genes in 26 L. monocytogenes genomes ranges from 2,330 to 2,456. The sampled distribution is represented by a smoothed color density plot obtained through kernel density estimation. Yellow indicates the lowest density and purple indicates the highest density. For each N, black circles indicate the mean value and whiskers indicate the 5th and the 95th percentiles of the distribution. An exponential decay fit to the means is given by a solid red curve. A modified exponential decay is given by a solid black curve, which better fits the observed data by accounting for false-negative gene calls. (B) Power law regression analysis predicts the number of new genes that will be discovered by sequencing additional L. monocytogenes genomes. The LIII genomes are the outliers that pull the means higher, indicating that LIII diversity has not yet been fully sequenced. (C) Power law regression analysis predicts the number of L. monocytogenes pan genes accumulated from genome sequencing is currently 4,052 and growing.