Pan- and core genomes of L. crispatus . Development of the pan- (A) and core (B) genomes as a function of the number of sequenced L. crispatus strains. The total number of genes found according to the pan- and core genome analysis is shown for increasing numbers of sequenced genomes. The dashed lines represent least squares fits to the medians and the R2 describes the suitability of the fit. The box plots present median (horizontal line), 25th and 75th percentiles (solid box), with the data extremes shown by whiskers outside the box. C) The distribution of core and accessory L. crispatus CDSs within COG functional categories. For each category, the top and bottom bars show the percentage of the assigned core and accessory CDSs relative to the entire core and the accessory L. crispatus CDSs, respectively. The proportion of the strain-specific CDSs is highlighted (light blue) in the accessory bars. COGs significantly enriched (p-value ≤ 0.01, hypergeometric distribution) in core (1), shared accessory (2), or strain-specific (3) CDSs are marked next to the COG identifiers. Only COG functional categories with more than 20 members are shown. The COG categories are given in the inset at the bottom of the figure. D) Distribution of ortholog groups at different levels of conservation in each strain. The OrthoMCL-defined ortholog groups were classified into different levels of conservation according to the number of strains they were detected in. Ortholog groups found in all the ten genomes represent the current core (red). Conservation levels are represented by different colors.