High quality draft genomes can be obtained through de novo assembly of short read sequences
The emergence and maturation of next generation sequencing (NGS) technology, driven in large part by efforts to develop approaches that allow for completion of a human genome for $1,000 , has made possible rapid, inexpensive, and high-throughput microbial whole-genome analysis, which promises to improve our understanding of bacterial pathogenesis, and our ability to detect and control infectious diseases. Our data show that NGS data facilitate de novo assembly and analyses of bacterial genomes. Although these draft genomes can have up to thousands of sequence gaps, the quality of the assembly is sufficient for automated annotation, and subsequent comparative genomics analyses, particularly when studying a conserved group of organisms such as the Listeria species examined here. Thus, NGS represents a feasible approach for rapid and comprehensive pathogen identification, subtyping, source tracking, and surveillance , and has the potential to be developed, in the long term, into routine diagnostic applications. The utility of draft genomes for identification of candidate vaccine targets has also been recently demonstrated .
The genus Listeria sensu stricto has a pan-genome characterized by limited introduction of new genetic material with 2,032 core and 2,918 accessory genes identified to date
Our data show that the members of the genus Listeria have a highly conserved genome with limited acquisition, from other gene pools, of homologous and non-homologous genes, even though horizontal transfer of homologous genes within and between Listeria species has clearly been shown to occur [38, 64]. Although the pangenome of the genus Listeria is not closed, there seems to be very limited on-going introduction of new genetic material from external gene pools (i.e., other genera). Data supporting this limited introduction of new genetic material into the pangenome include (i) the observation that the core and accessory genes identified among the 13 genomes analyzed represent a large proportion (i.e., 76.2%) of the predicted pan-genome, (ii) the similarity in size of observed core genome (2,032 genes) and predicted core genome (1,994 genes), suggesting limited gene loss and deletion, (iii) highly conserved estimated genome size (from 2.8 to 3.2 Mb), (iv) a relatively small fraction (4% on average) of genes that have atypical codon usage, and (v) a small number of prophages and transposons. The fact that most of these prophages have a codon usage pattern that is similar to their host indicates that they have co-evolved with their Listeria hosts . A Listeria pan-genome characterized by limited introduction of genetic material is also supported by the observation that pan-genome coverage for the genus Listeria (except L. grayii) is higher than the pan-genome coverage estimates for most bacterial species, which range from 30% (for Escherichia coli, based on genomes of 22 strains) to 73% (for Francisella tularensis, based on genomes of 7 strains) . A pan-genome coverage estimate performed for the Bacillus cereus group, a group of closely related pathogenic and non-pathogenic Gram-positive species, revealed a coverage of 42%, indicating pan-genome coverage of Listeria, is also high compared to Gram-positive organisms. The Bacillus cereus group, however, can be considered a single species from a taxonomic perspective . In the case of Listeria this measure of shared gene content should not be interpreted to mean that Listeria species are very closely related and may in fact comprise one species. On the contrary, the Listeria species have diverged substantially in the primary sequence of their core genes with an average pair-wise nucleotide identity of 84.8%, compared to average pair-wise nucleotide identities within species of 99.2% in F. tularensis  and 96.7% in E. coli . Phillipy et al.  predicted a closed pan-genome for the species L. monocytogenes, which is congruent with our observations for the complete genus.
The mechanism behind the limited occurrence of gene acquisition from outside gene pools in Listeria remains to be determined. Although several strains harbor an insertion of prophage A118 in the comK open reading frame, which encodes a transcriptional regulator of competence, comK is intact in L. marthii, L. innocua FSL J1-023 and FSL S4-378, and L. ivanovii subsp. londoniensis, as well as the previously sequenced L. monocytogenes F2365 and HCC23 genomes. While most of the competence related genes are present in all Listeria genomes  and while evidence for homologous recombination has been detected by multiple studies [38, 64, 70], natural competence has not yet been report for any Listeria strains . Limited natural competence may thus at least partially explain the low level of gene acquisition from outside gene pools, particularly since our data suggest that most listeriophages do are part of the closed Listeria pangenome. In addition, limited gene acquisition in the genus Listeria may also be facilitated by the presence, in all genomes of the genus examined so far, of well-developed defense system against foreign DNA/mobile elements, including R-M systems  and/or CRISPR systems . Both systems have been shown to limit or block horizontal gene transfer in Staphylococcus aureus [72, 73]. This would explain why functional transposable elements are virtually absent from Listeria, and if present (as is the case for the conjugative elements reported here) contain a putative anti-restriction gene, which protects them from the restriction modification system.
Despite the overall high conservation of genome content across different Listeria species, gene loss and deletion events, as well as introduction of genetic material through horizontal gene transfer from other gene pools occurs in this genus, often with phenotypic consequences. For example, the chromosomal region that contains inlAB in L. monocytogenes and L. ivanovii appears to be hypervariable with evidence for deletion events (e.g., in L. seeligeri) and horizontal introduction of genetic material from other genera (e.g., the presence, in the L. ivanovii inlAB region, of two ORFs with relatively high similarity to Enterococcus genes and the presence, in the inlAB region of L. marthii, of approximately 15 ORFs that were putatively introduced by horizontal gene transfer), consistent with another report  that also suggested putative horizontal gene transfer events in this region.
While Listeria includes a number of species-like clades, many of these putative species include subclades or strains with atypical virulence-associated characteristics and gene profiles
Generally, within the genus Listeria, only members of the species L. monocytogenes are considered to be human pathogens, while members of the species L. ivanovii are considered to be animal pathogens . Key genes that clearly contribute to virulence, as supported by experimental evidence, include (i) genes located in the prfA cluster, which are critical for intracellular survival and cell to cell spread , (ii) inlA and inlB, which are critical for invasion of intestinal epithelial and hepatic cells, respectively , and (iii) inlC, which encodes a protein that is specifically required for cell-to-cell spread . Strains representing L. monocytogenes lineages I, II, and III as well as the L. ivanovii subsp. londoniensis strains contained the full complement of these virulence genes (i.e., prfA cluster, inlAB, inlC), consistent with the experimentally verified virulence of these organisms [75, 76]. Our full genome analyses suggest that the evolution from a Listeria ancestor that contained all three virulence loci yielded species and strains that have lost one or more of these key virulence genes. L. welshimeri, the majority of L. innocua strains, and non-hemolytic L. seeligeri strains lack all three of these virulence loci, consistent with their observed avirulence [22, 77].
Other strains are lacking only a subset of the key virulence genes found in most L. monocytogenes and L. ivanovii. Hemolytic L. seeligeri carries the prfA cluster, but lacks inlAB and inlC, consistent with its avirulence in mammalian tissue culture and animal models . Interestingly, some strains (represented by the hemolytic L. innocua strain characterized here) contain the prfA cluster as well as inlA and have the ability to invade human intestinal epithelial cells, while lacking inlC and showing avirulence in a mouse model . Similarly, at least one L. monocytogenes strain (HCC23, representing lineage IIIA) contains the prfA cluster as well as inlAB, while lacking inlC and showing avirulence in mouse infection experiments . These strains represent a particular challenge for virulence classification, as they would typically be classified as virulent with standard assays (as they are hemolytic and positive for some key virulence genes). Overall, our data, along with previously reported virulence characterizations of isolates representing different Listeria species as well as atypical strains (e.g., hemolytic L. innocua) [22–24], clearly indicate the need for a well-designed molecular approach to define pathogenic strains within the genus Listeria. While we hypothesize that use of multiple marker genes, e.g., genes in the prfA cluster, inlA (including identification of virulence attenuating premature stop codons ), inlB, and inlC is needed to identify virulent strains, further tissue culture and animal studies are needed to confirm appropriate marker genes. In addition, further comparative genomics studies of phenotypically variable Listeria will be needed to identify and validate diagnostic targets and markers.
The genus Listeria represents two main clades that diverged from a common ancestor that contained the prfA cluster and a number of internalin genes, most likely 47 million years ago
The use of 100 core genes that have been previously shown to show no evidence for positive selection nor homologous recombination resulted in a robust phylogeny dividing Listeria (except L. grayi) into two main clades; (i) a clade consisting of L. monocytogenes, L. marthii, L. innocua and L. welshimeri, and (ii) a clade consisting of L. ivanovii and L. seeligeri. The existence of two main clades has been shown in several previous studies [3, 8, 23], however the placement of L. welshimeri has always been ambiguous. While some studies placed L. welshimeri basal in the L. seeligeri/L. ivanovii clade , others [8, 23], like the majority of the phylogenetic reconstruction methods used here, place L. welshimeri basal in the L. monocytogenes/L. marthii/L. innocua clade. A likely explanation for this ambiguous phylogenetic placement is the "long branch attraction effect"  as L. welshimeri is placed on a long branch and seems to have branched off of the MRCA of the L. monocytogenes/L. marthii/L. innocua clade relatively early during the evolutionary of Listeria sensu stricto. As likelihood-based methods are less prone to long branch attraction , placement, by these methods, of L. welshimeri in the L. monocytogenes/L. marthii/L. innocua clade suggests that this placement is correct.
Our data also seem to support a hypothesis that the most recent common ancestor (MRCA) of Listeria possessed not only the prfA virulence cluster as indicated before , but also many internalins including A, B and C, which are essential for host invasion  and cell-to-cell spread . While a few studies  have previously explored the evolution of internalin multigene family, including one study  that proposed presence of inlB in the MRCA of Listeria (except L. grayii), our analysis allowed for identification of 16 internalin genes that, like inlB, were likely present in the MRCA of Listeria sensu stricto.
Based on a Bayesian molecular clock analysis that used 100 genes of the Listeria core genome places, we propose that the MRCA of the genus Listeria (except L. grayii) can be dated to about 40 to 60 mya, similar to the date has been inferred for the most recent ancestor of S. enterica and S. bongori . A plausible hypothesis for emergence of these pathogens during this time period is that a major mammalian radiation during this same epoch  provided strains of Listeria and Salmonella that were able to colonize mammalian hosts with a selective advantage over less adapted or environmental strains.
Loss of virulence associated genes is a recurrent evolutionary pattern in Listeria
While a number of studies have reported that gene loss and genome reduction are general patterns in the evolutionary transition from facultative pathogenic lifestyles to obligate pathogenic lifestyles in bacteria , our data suggest that gene loss events in multiple genomic regions and lineages coincided with multiple evolutionary transitions of Listeria from a facultative pathogenic lifestyle to an obligate saprotrophic lifestyle. The switch from a facultative pathogen to obligate saprotrophic clades seems to have occurred at least four times during the evolutionary history of Listeria sensu stricto, including (i) during the speciation event leading to L. seeligeri, which coincided with the loss of the inlAB operon and inlC, but not the prfA cluster, (ii) the speciation event leading to L. welshimeri, which coincided with loss of the prfA cluster, the inlAB operon and inlC, (iii) the speciation event leading to L. innocua, which coincided with the loss of inlB and inlC and (iv) the speciation event leading to L. marthii, which coincided with the loss of the prfA cluster, the inlAB operon and inlC. Secondary losses of additional virulence-associated genes occurred in the non-hemolytic L. seeligeri, which lost the prfA cluster, and non-hemolytic L. innocua strains, which lost the prfA cluster as well as inlA.
Despite the observation that loss of virulence genes appears to be a key event in the evolution of Listeria species, several apparently avirulent Listeria strains (hemolytic L. seeligeri strains and hemolytic L. innocua strains) have strongly conserved, and in most cases functional, homologues of key L. monocytogenes virulence genes in their genomes. For example, previous studies  demonstrated some functionality of different L. seeligeri virulence factors and our data suggest that the homologue of internalin A in the hemolytic L. innocua strain supports the ability to invade human intestinal epithelial cells (even though future experiments with an isogenic inlA mutant will be required to confirm this). One hypothesis is that the virulence genes in Listeria play a role in the survival of and defense against predation by protists, however this hypothesis is not supported by a recent study that demonstrates that L. monocytogenes does not survive ingestion by the amoeba Acanthamoeba polyphaga .