To date, sequenced Francisella genomes have mainly been of the human pathogenic species F. tularensis. Consequently, conclusions about evolutionary processes within the genus Francisella have largely been based on F. tularensis genomes
Moreover, a number of identification assays developed on the basis of F. tularensis genome sequences have been shown to give false-positive results. These have been attributed to the presence of the same sequence signatures as in other environmental Francisella lineages
In the present study, we provide details of the genetic content of nearly all currently known members of the genus Francisella, with the exception of the recently published F. cantonensis. Despite a generally high sequence similarity within the genus, we found that different genetic clades demonstrated different population structures (recombination frequencies). This suggests differences in lifestyle and host association.
F. tularensis has been found to be a highly clonal species, differentiating it from environmental lineages characterised by moderate recombination rates. We also found evidence that the fish pathogen F. noatunensis subsp. noatunensis also possesses a highly clonal lineage. The low recombination frequency in F. noatunensis subsp. noatunensis may indicate that a close host association reduced contact with other Francisella species, and thereby, lowered opportunities for gene acquisition by recombination, as previously hypothesised for F. tularensis. The data of population structure and variability observed among F. novicida opposed to F. tularensis subspecies tularensis, holarctica and mediasiatica strains should be taken in consideration for the on-going discussion whether it should be a separate species or included as a subspecies in F. tularensis.
Whole genome phylogeny
Our analyses of 45 Francisella genomes showed that recognised species could be divided into two main genetic clades (Figure
1) and suggested that the pathogenic species in each clade emerged independently by similar evolutionary paths of host adaptation. F. tularensis, which can infect mammals, belonged to clade 1, whereas F. noatunensis subsp. noatunensis, which infects fish, belonged to clade 2. Our analyses based on the phylogeny of Francisella genomes and genetic relatives suggest that the ancestral Francisella species originated in a marine habitat.
Using whole-genome analysis, we found evidence that F. noatunensis subsp. noatunensis and F. noatunensis subsp. orientalis represent a paraphyletic group. F. philomiragia and F. noatunensis subsp. noatunensis together form a sister clade to F. noatunensis subsp. orientalis (Figure
1). This result was not supported by 16 S RNA analysis, which suggested that the two subspecies F. noatunensis subsp. orientalis and F. noatunensis subsp. noatunensis form a clade. There are limited nucleotide differences in the 16 S region and more comprehensive core genome SNP analysis gives a better view of the real relationship. Identification of genetic factors coding for fish pathogenicity may therefore be possible via comparative analyses. Further phenotypic studies and experiments are needed to confirm this hypothesis. Since F. noatunensis subsp. noatunensis is isolated from cod and salmon, while F. noatunensis subsp. orientalis isolates are recovered from tilapia and three-line grunt, an alternative hypothesis is that these subspecies evolved their abilities for fish infection independently.
In agreement with other studies
[26, 40], our analyses verified the phylogenetic position of the tick endosymbiont W. persica within the Francisella genus. Several lines of genomic evidence support the view that this organism adapted as a primary symbiont; its genome size was found to be significantly less (about 80–85%) than other Francisella genomes. Moreover, in W. persica, regulatory genes were found to be depleted, while biosynthesis pathways supplying amino acids to their insect host were retained, and there was a low abundance of IS elements. Our results confirm that W. persica and other Francisella-like endosymbionts (FLE)
[19, 24, 41, 42] belong to the genus Francisella and that W. persica has undergone a significant reduction in its genome.
An important outcome of the present study was reclassification of the isolate 3523 from Australia. It was initially classified as an unusual F. novicida, a classification which remained despite whole genome sequencing
. It was clear from the present analyses that isolate 3523 belongs to the F. hispaniensis clade
. This result demonstrates the importance of broad-scale intra-genus sequencing for correct classification of new isolates.
Francisella pan- and core- genome
The overall core genome size of 799 genes for the genus Francisella was similar to that reported for the genus Bacillus (Figure
. Bacterial pan-genomes at the genus level have only recently been reported in the literature, and the number of gene families in these genera is expected to grow as more genomes are included. This approach may be valuable in revealing genes specific for niche commonalities between subspecies.
We detected different pan-genome sizes for clade 1 and clade 2 (Figure
2B), where clade 2 contain more genes compared to clade 1 and especially F. tularensis clade alone. As a result, F. tularensis may be fully sampled by sequencing smaller set of representative isolates, whereas more isolates of clade 2 would need to be fully sequenced to give a true pan-genome size.
The contribution of each genome to the complete pan-genome of the genus Francisella and related organisms is illustrated in Figure
2C. Most genomes contribute to the increase of the pan-genome, although the increase is less pronounced when similar genomes are added.
Clade specific genes
Genomes in the Francisella genus have, with the exception of W. persica, similar gene composition. W. persica has not retained genes included in the COG group ‘amino acid transport and metabolism’, which it does not require because of its endosymbiotic lifestyle.
An extensive list of 80 potential candidate virulence genes has previously been published, which are specific for the human pathogen F. tularensis. Surprisingly, analysis of the more complete collection of Francisella genomes in the present study reduced this list to only seven F. tularensis specific genes. The general lack of specific genes agrees with previous observations that pathogens exhibit specialisation and host-adaption by gene loss rather than gene gain
. Thus, a pathogen such as F. tularensis should be defined not only by genes that are present but also by those that are lacking.
The seven specific genes in F. tularensis were located in three separate gene clusters. The genes in the first cluster (FTT0794, FTT0795, FTT0796) are organised in an operon and were predicted to be involved in exopolysaccharide synthesis. It has been suggested that an exopolysaccharide capsule plays a role in F. tularensis virulence
The second gene cluster (FTT1453c, FTT1454c and FTT1458) is present within the wbt locus of F. tularensis. Genes in this locus encode proteins involved in lipopolysaccharide O-antigen synthesis. The O-antigens of F. tularensis contain an insertion element that differentiates it from F. novicida. A similar insertion element structure has been reported to be essential for virulence in Shigella sonnei. Finally, the last gene cluster contained a single gene, FTT1188, which encodes for a hypothetical membrane protein. This protein does not exhibit significant homology to other proteins in the NCBI non-redundant database and has not yet been characterised in depth. Identification of virulence genes and their role in the virulence of Francisella is not straight-forward. Our data suggest that F. tularensis virulence cannot be easily explained by a simplistic model which just considers the presence or absence of specific “virulence genes”
Fa. hongkongensis and P. salmonis both contain many genes that are absent from the Francisella genus and are therefore likely to occupy distinctly different ecological niches.
The two principal clades of the Francisella genus exhibit several similarities and differences in IS element content and composition. F. noatunensis subsp. noatunensis contains an expanded set of IS elements similar to those reported for F. tularensis. The expanded IS elements in F. tularensis are ISFtu1 and ISFtu2, whereas the expanded IS element in F. noatunensis is ISFpil. Interestingly, the two subspecies of F. noatunensis exhibited contrasting contents of IS elements. While F. noatunensis subsp. noatunensis contained the highest number of IS elements of all species in the genus, F. noatunensis subsp. orientalis contains very few IS elements. This may have arisen because of different evolutionary modes in these subspecies.
Recombination is an important process for generating new genetic diversity within bacterial populations and is a known driver of evolution in many bacterial species
. The recombination analysis of 45 genomes in the present study confirmed that the level of recombination within the Francisella genus is highly variable, ranging from undetectable in F. tularensis to moderate in F. novicida.
We found that the subspecies of F. noatunensis exhibited apparently low rates of recombination at levels similar to the human pathogen group of F. tularensis. Otherwise, both clade 1 and II displayed moderate recombination rates, largely due to frequent recombination within F. novicida (clade 1) and F. philomiragia (clade 2). According to our findings, 33% of the core gene set provided evidence of past recombination events, similar in magnitude to estimates for Campylobacter and Neisseria, but higher than previously reported for Francisella. It was reported previously that 19.2% of the genes in the core genome of F. novicida and F. philomiragia have been recombined, but our data suggests that this is could be an underestimate due to a lack of genomic sequences available at that time.
Our findings are similar to reports for other bacterial genera that include highly specialized pathogens. Variable recombination rates have previously been reported for several pathogenic populations, (Yersina pestis, Bacillus anthracis and Mycobacterium tuberculosis). It should be noted that if closely related clones of pathogens with low recombination rates are over-represented, this can result in an under-estimation of recombination rates for the entire genus
[50, 56]. By ensuring proper representation of all subspecies of the Francisella genus in the present work, such effects were minimised.