Whole genome sequencing of the fish pathogen Francisella noatunensis subsp. orientalis Toba04 gives novel insights into Francisella evolution and pathogenecity
DOI: 10.1186/1471-2164-13-598
© Sridhar et al.; licensee BioMed Central Ltd. 2012
Received: 7 June 2012
Accepted: 31 October 2012
Published: 6 November 2012
Abstract
Background
Francisella is a genus of gram-negative bacterium highly virulent in fishes and human where F. tularensis is causing the serious disease tularaemia in human. Recently Francisella species have been reported to cause mortality in aquaculture species like Atlantic cod and tilapia. We have completed the sequencing and draft assembly of the Francisella noatunensis subsp. orientalisToba04 strain isolated from farmed Tilapia. Compared to other available Francisella genomes, it is most similar to the genome of Francisella philomiragia subsp. philomiragia, a free-living bacterium not virulent to human.
Results
The genome is rearranged compared to the available Francisella genomes even though we found no IS-elements in the genome. Nearly 16% percent of the predicted ORFs are pseudogenes. Computational pathway analysis indicates that a number of the metabolic pathways are disrupted due to pseudogenes. Comparing the novel genome with other available Francisella genomes, we found around 2.5% of unique genes present in Francisella noatunensis subsp. orientalis Toba04 and a list of genes uniquely present in the human-pathogenic Francisella subspecies. Most of these genes might have transferred from bacterial species through horizontal gene transfer. Comparative analysis between human and fish pathogen also provide insights into genes responsible for pathogenecity. Our analysis of pseudogenes indicates that the evolution of Francisella subspecies’s pseudogenes from Tilapia is old with large number of pseudogenes having more than one inactivating mutation.
Conclusions
The fish pathogen has lost non-essential genes some time ago. Evolutionary analysis of the Francisella genomes, strongly suggests that human and fish pathogenic Francisella species have evolved independently from free-living metabolically competent Francisella species. These findings will contribute to understanding the evolution of Francisella species and pathogenesis.
Keywords
Francisella Macrophage Pseudogenes Genome Insertion elements Comparative analysisBackground
Species in the Francisella genus are facultative intracellular, gram-negative bacteria, and well known for causing Tularaemia in mammals. Francisella was first found by the American bacteriologist Edward Francis in 1922[1]. The Francisella tularensis subspecies strains can be serious pathogens for human and can cause tularaemia that lead to mortality, making these bacteria a potential bio-weapon[2]. Until recently the Francisella genus only consisted of two species, F. tularensis subsp. tularensis and F. philomiragia subsp. philomiragia where Francisella philomiragia subsp. Philomiragia is a non-virulent species. Recently, more Francisella species and strains have been isolated from several new sources. From farmed Atlantic cod, a new and highly virulent species of Francisella was recently described and has later been given the name F.noatunensis subsp. noatunensis[3, 4]. Francisella has also been reported from other fish species and another fish pathogenic strain, F. noatunensis subsp. orientalis Toba04 has been obtained from tilapia[4–6]. In addition, Francisella has been identified in environmental samples and from invertebrates like ticks[2, 7]. Although the available Francisella genomes are fairly close to each other in their features, their genomes are highly rearranged[8]. The molecular phylogeny of Francisella species and strains has been reported previously and the family Francisellaceae currently contains one genus only and there is no close pathogenic relative to this bacteria family[9]. The subspecies of F. tularensis are classified into human virulent, non-virulent, and moderately virulent. F. tularensis subsp. tularensis and F. tularensis subsp. holarctica are human virulent strains, the latter being less virulent[9, 10]. F. tularensis subsp. mediasiatica is moderately virulent to human[11]. F. tularensis subsp. novicida and F. philomiragia subsp. philomiragia are not virulent to human. The F. noatunensis strains are not able to grow at 37DC and hence they are not virulent to human[5].
The Francisella species can also be categorized into metabolically competent and metabolically incompetent. The metabolically competent strains have been found in environmental samples while the incompetent depend on a host for growth. The metabolic competence of a species relate to the number of intact metabolic genes present in its genome[10]. F. tularensis subsp. tularensis, F. tularensis subsp. holarctica and F. tularensis subsp. mediasiatica are metabolically incompetent and have a larger number of disrupted genes (i.e. partially conserved genes with internal stop codons or frameshifts) in their genomes while F. philomiragia subsp. philomiragia and F. tularensis subsp. novicida are metabolically competent and have few disrupted genes. The characterized genomes of strains within the F. tularensis subspecies are highly rearranged between themselves and insertion elements (IS-elements) have been regarded as a key feature to create these rearrangements. The genomic breakpoints are typically flanked by IS-elements and associated with a large number of pseudogenes[8–10]. The F. tularensis subspecies genomes possesses two copies of FPI (Francisella pathogenecity island), while the metabolically competent have one copy of FPI. Although several studies comparing human virulent, moderately virulent strains and non-virulent strains have been reported, the mechanisms behind the pathogenecity of Francisella strains are still largely unknown.
Identification of several new highly virulent strains of Francisella from farmed fish has opened up for a broader comparison between members of this important and very special group of bacteria. The F. noatunensis strains are highly pathogenic to fishes and can cause high mortality and losses in farmed fish[12]. To gain more detailed information about F. noatunensis subspecies we sequenced F. noatunensis subsp. orientalis Toba04 strain genome using pyrosequencing. We were able to assemble the F. noatunensis subsp. orientalis Toba04 genome into one high quality scaffold. The genome sequence and assembly of the virulent fish pathogen F. noatunensis subsp. orientalisToba04 has been annotated and used in a comparative genomic approach to analyze the properties that are shared with the mammalian pathogenic and environmental Francisella strains. We also tried to understand the factor influencing virulence among the Francisella strains used for comparative studies. Our sequence analysis revealed that the F. noatunensis subsp. orientalis Toba04 strain lack IS-elements shedding new light on the role and possibly the mechanisms of genome rearrangement in the Francisella species. One feature shared between the orientalis and the human pathogenic Francisella strains is the presence of disrupted genes and metabolic incompetence.
Results and discussion
Genome features
Summarizing some main characteristics for the genome sequence presented in this paper together with a representative set of other already sequenced Francisella genomes
F. noatunensis subsp. Orientalis Toba04 | F. philomiragia subsp. philomiragia ATCC 25017 | F. tularensis subsp. novicida U112 | F. tularensis subsp. medisisatica FSC 147 | F.tularensis subsp. tularensis SCHU S4 | F.tularensis subsp. holarctica OSU18 | |
|---|---|---|---|---|---|---|
Fish parasite | Free-living | Free-living | Mammalian parasite | Mammalian parasite | Mammalian parasite | |
Genome size(bp) | 1,847,202 | 2,045,775 | 1,910.031 | 1,893,886 | 1,892,775 | 1,895,727 |
GC content(%) | 32 | 32 | 32 | 32 | 32 | 32 |
ORFs | 2289 | 1966 | 1781 | 1750 | 1852 | 1932 |
Protein coding genes | 1595 | 1911 | 1719 | 1406 | 1604 | 1555 |
Structural RNAs | 39 | 48 | 48 | 47 | 48 | 49 |
IS elements | 0 | 8 | 29 | 85 | 78 | 116 |
Pathogenecity Island | 1 | 1 | 1 | 2 | 2 | 2 |
Pseudogenes | 252 | 3 | 14 | 297 | 200 | 328 |
Francisella became pathogenic long before it became pathogenic to mammals
A: Phylogenetic tree made using core genes present in Francisella noatunensis subsp. orientalis Toba04 and Francisella noatunensis subsp. noatunensis (from atlantic cod; incomplete assembly). B: Whole genome phylogenetic tree made from available Francisella subspecies. Red colour represents highly virulent species while yellow represent less virulent ones. Orange represents moderately virulent species and green represent very rarely or non-virulent ones.
The lack of IS elements in the genome of F. noatunensis subsp. orientalis Toba04 suggests that its acquisition of a pathogenic lifestyle happened a long time ago. Moran and Plague found that pathogens or symbionts that have recently adopted an obligate host association have numerous IS elements while ancient obligate host associations most often have no IS elements[13]. This suggests that the present Francisella strain pathogenic to tilapia results from an ancient event where free-living Francisella-like bacteria infected fish (possibly tilapia) and underwent a period of gene decay and genome rearrangement where IS elements may have played a major role. As observed by Moran and Plague, bacteria that have ancient obligate host associations, lack IS-elements, and the present data on F. noatunensis subsp. orientalis Toba04 suggests that it falls into this category. This is also supported by the results of our analyses of phylogeny and pseudogenes content summarized below.
The two fish infecting Francisella does not form a monophyletic group (Figure1A) and this is an indication that they can have become parasitic at two independent events. This further indicates that they have lost there IS-elements independently and this view is supported by the fact that the two fish parasitic bacteria have different re-arrangements in their genomes.
Francisella species have highly rearranged genomes
The following figure shows the rearrangement plot between Francisella philomiragia subsp. philomiragia ATCC25017 and Francisella noatunensis subsp. orientalis Toba04 . The purple arrows shows the location of 10 IS elements present in Francisella philomiragia subsp. philomiragia. The black arrows show the location of 4rRNAs present in Francisella noatunensis subsp. orientalis Toba04.
The pseudogenes of F. noatunensis subsp. orientalis Toba04 are old
The graphs show the number of pseudogenes with different number of inactivating mutations (x-axis). Note that the Y-axis is logarithmic. A) This graph is generated using pseudogenes from Francisella tularensis subsp. tularensis SCHUs4. B) This graph is generated using pseudogenes from Francisella tularensis subsp. holarctica OSU18. C) This graph is generated using pseudogenes from Francisella noatunensis subsp. orientalis Toba04.
Old pseudogenes under neutral evolution in F. noatunensis subsp. orientalis Toba04
The graphs show the number of pseudogenes (Y-axis) having different number of substitutions (X-axis). Note that the Y-axis is logarithmic. The graphs refer to pseudogenes in Francisella tularensis subsp. tularensis SCHUs4 (A) Francisella tularensis subsp. holarctica OSU18 (B), and Francisella noatunensis subsp. orientalis Toba04 (C).
Comparative analysis of F noatunensis subsp. orientalis Toba04 with other available Francisella genomes
The Venn diagram shows the number of genes (ortholog clusters) found to be shared between F. noatunensis subsp. orientalis Toba04 , metabolically incompetent, and metabolically competent.
Intact unique genes present in free-living Francisella genomes
We identified 522 genes present only in the metabolically competent genomes. The assumption is that these genes have been present also in the ancestors of the analyzed pathogenic strains but lost after they became parasites. In agreement with previous reports, we find that many of the 522 genes are involved in metabolism, intracellular transport and amino acid biosynthesis[8–10]. In addition we identified 17 membrane-associated proteins and 62 proteins annotated with signal-peptide (3 also annotated as membrane associated) which might play an important role in immunity during host parasite interactions, which are present only in metabolically competent genomes. Genes like the capsule polysaccharide biosynthesis protein required for immunity to destroy foreign antigens are present only in the metabolically competent genomes. We used DAVID[18] to identify over-represented functional terms (including Gene Ontology terms) among the unique genes from the metabolically competent species(compared to their frequency in the complete Francisella philomiragia genome). Functional terms related to DNA (integration, replication, binding, and metabolism), amino acid metabolism, transporters, membrane-association and signal peptides are among the most significant terms (Additional file1: Figure S3). The present analysis narrows down the list of genes unique to free-living Francisella. Taken together, these results are consistent with earlier analyses and indicate that pathogenic Francisella strains have lost a substantial number of genes (512), and many of these have functions related to metabolism and DNA. This indicates that the parasitic Francisella species share many features in their adaptation to a parasitic lifestyle and that the fish pathogen may serve as a valuable model to study features of the mammalian parasitic Francisella.
Intact unique genes present in the fish parasite F. noatunensis subsp. orientalisToba04
We found 305 genes unique for the F. noatunensis subsp. orientalis genome. Performing BLASTp search against non-redundant protein database we were able find significant matches for 260 genes which are not present in any of the other Francisella strains. Among these genes a putative lipopolysaccharide biosynthesis membrane protein is present. It is important for the virulent causing serious diseases in human and animals[19, 20]. The other genes represent transporters, transferases, purine biosynthesis, thiamine biosynthesis, oxidation reduction, catalytic activities and hypothetical proteins and could be important for the pathogenecity of the strain.
Intact unique genes present in the human parasitic Francisella genomes
The human parasitic Francisella genomes contain 233 genes that are not present in the other Francisella genomes studied. We compared these 233 genes with all genes in the non-redundant protein database using BLAST to find homologs in other mammalian strains and found 162 unique genes that are shared between F. tularensis subsp. tularensis, F. tularensis subsp. holarctica, and F. tularensis subsp. mediasiatica (Additional file1: Table S3). There are 63 genes which are present only in F. tularensis subsp. holarctica strain, all of them hypothetical proteins. In F. tularensis subsp. mediasiatica 11 genes are unique and includes a protein involved in the type III restriction-modification system (FTM_0875), part of the defence mechanism against foreign DNA[21], where as F. tularensis subsp. tularensis has 44 unique genes and all of them hypothetical proteins. The unique proteins present in this strain might be related to the highly virulent nature in this strain although functions of these proteins are unknown. The other 44 genes are shared between these three strains. 30 of these genes have homologs in other bacteria and their presence in the Francisella genomes may be due to horizontal gene transfer. These genes represent functions like ATP binding, transferase activities and lipopolysaccharides biosynthesis and 11 of them are hypothetical proteins.
The genomes of the studied fish and human pathogenic genomes all contain genes unique among Francisella species. It seems unlikely that they have all been present in an ancestral Francisella genome and more likely that some genes have been horizontally transferred from other organisms, likely in most cases from environmental bacteria. Most of the genes like lipid biosynthesis, polysaccharides biosynthesis, cold-shock DNA-binding domain-containing protein, membrane protein and other genes in the horizontal gene transfer list could have been useful for survival of the pathogen or be involved in virulence. The hypothetical proteins present in F. tularensis subsp. tularensis could be important to determine the high virulent nature of this species. Having a novel pathogenic Francisella genomic sequence available, we use the opportunity to analyze in a comparative manner with the selected set of Francisella genomes (Table1) with focus on systems that are believed to be important for virulence. The distant evolutionary relationship between the human pathogenic and the fish pathogenic strains, can potentially give some insight into virulence and also genome decay and what is the essential, core, set of genes in Francisella.
Virulence mechanism
Francisella Pathogenecity Island and its role in virulence
Most of the intracellular pathogen’s virulence mechanisms are activated by type III or type IV secretion system[22]. However, in Francisella species the genes involved in Pathogenecity Island is related to type VI secretion system[9, 10]. The genomes of the human parasitic Francisella species possess two copies of FPIs where as F. noatunensis subsp. orientalis Toba04 and the free-living, metabolically competent species, only contain one copy. The human parasites have likely acquired an extra copy of the FPI after diverging from the free-living relatives[11]. The FPI consists of 17 genes.
We compared the FPI regions between the human pathogenic, the free-living, and the fish pathogenic F. noatunensis subsp. orientalis Toba04 genome (Table1); we found that pdpD is missing only from the F. tularensis subsp. holarctica genome. Previous studies report that replacement or loss of pdpD reduced the expression of iglA[23]. It may be a factor for the less virulent nature of these genomes in human. We also found that the pdpC gene is missing in F. noatunensis subsp. orientalis Toba04 and F. philomiragia subsp. philomiragia, but present in all other genomes analyzed. It has been suggested that pdpC is essential for infection in mammalian cells[24]. The absence in F. noatunensis subsp. orientalis Toba04 suggests that pdpC is not essential for Francisella to infect fish. In agreement with the phylogenetic analyses presented earlier, most of the amino acid sequences of the proteins encoded in the FPI region of F. noatunensis subsp. orientalis are more similar to the corresponding genes in F. philomiragia subsp. philomiragia than to those in the human pathogenic genomes analyzed.
Oxidative stress response
Oxidative stress response plays a major role in virulence. The LysR family of regulatory proteins are regulators for oxidative stress response[10, 25]. In the F. noatunensis subsp. orientalis Toba04 genome there are 6 genes encoding for LysR protein family regulators (OOM_0025, OOM_0069, OOM_0378, OOM_0457, OOM_1159, OOM_1654). There are 8 LysR proteins present in F. tularensis subsp. tularensis SCHUs4 and 5 LysR proteins in F. tularensis subsp. holarctica OSU18. Interestingly LysR proteins are not found in F. tularensis subsp. mediasiatica strains and the absence of LysR proteins in F. tularensis subsp. mediasiatica could be a factor explaining their moderate virulence.
Secretion system
The type 3 secretion type proteins present in Francisella noatunensis subsp. orientalis
F.noatunensis subsp. orientalisToba04 | Protein | Homologous T3SS effector’s NCBI geneID |
|---|---|---|
ID | ||
OOM_1477 | phosphoribosylglycinamide synthetase | 28868095 |
OOM_1473 | phosphoribosylglycinamide synthetase | 28868095 |
OOM_1080 | glycogen branching enzyme | 8714177 |
OOM_1052 | methionine aminopeptidase | 12512890 |
OOM_1036 | haloacid dehalogenase-like hydrolase | 28868062 |
OOM_0422 | hypothetical protein | 34497721 |
OOM_0090 | pyrimidine reductase/pyrimidine deaminase | 28868061 |
Type 4 pili proteins in bacteria are generally involved in motility. These proteins play a major role in bacterial virulence since they facilitate entrance of the bacteria into the host[29] and several have been reported to be present in Francisella species[9–11]. There are 6 type iv pili proteins present in the F. noatunensis subsp. orientalis Toba04 genome (OOM_0045, OOM_0401, OOM_0402, OOM_0611, OOM_1408, OOM_1374). We also note that the pilA gene represented by three ORFs in F. tularensis subsp. tularensis (FTT_0888, FTT_0889, FTT_0890) and one ORF (FTN0413) in F. tularensis subsp. novicida important for mediating virulence[30] is absent in F. noatunensis subsp. orientalis. A PilA gene family protein (OOM_1408) with different amino acid sequence is present only in F. noatunensis subsp. orientalis Toba04 and F. philomiragia subsp. philomiragia.
Two-component regulatory system
Two-component regulatory systems are important for recognition of environmental changes and virulence in bacterial pathogens[31, 32]. The kdpD gene belonging to the two component regulatory system[10] is present in F. tularensis subsp. tularensis, F. tularensis subsp. mediasiatica, F. tularensis subsp. novicida, F. philomiragia subsp. philomiragia but absent in F. noatunensis subsp. orientalis Toba04 and F. tularensis subsp. holarctica. In addition, we found that the two-component regulatory sensor histidine kinase (Fphi_1001) gene is present only in metabolically competent Francisella species and not present in all the metabolically incompetent proteins including F. noatunensis subsp. orientalisToba04. This protein is important for the stimulus response when any virulent species enters into the host organism.
Iron acquisition system
Iron acquisition is crucial as a virulence factor. Bacteria need iron inside the phagosomes for growth and iron deficiency leads to abnormal cell development[25, 33]. Ferric uptake regulatory protein (FTT_0030) is modulating the iron uptake system in F. tularensis subsp. tularensis[9], and no other gene has been found in this genome regulating iron content. Two proteins IucA/IucC (OOM_0522) and Ferrous iron transporter (OOM_0685) involved in siderophore synthesis and iron transport are present only in F. noatunensis subsp. orientalis Toba04 and in the metabolically competent genomes. Siderophore synthesis is one of the major mechanisms for iron aquisition in fish. Absence of genes required for siderophore synthesis in F. tularensis subspecies shows a different host adaptation for human virulent Francisella subspecies.
Comparison of metabolic pathways from F. philomiragia subsp. philomiragia ATCC 25017, F. noatunensis subsp. orientalis Toba04 and F. tularensis subsp. tularensis SCHUs4
The important amino acid pathways required for the growth of Francisella subspecies are given
Amino acid biosynthesis pathways required for growth | F. tularensis subsp. tularensis SCHUs4 | F. philomiragia subsp. philomiragiaATCC 25017 | F. noatunensis subsp. orientalisToba04 |
|---|---|---|---|
Asparagine | 1/1 | 1/1 | no |
Cysteine | 2/2 | 2/2 | 2/2 |
Serine | 2/3 | 2/3 | 2/3 |
Threonine | 1/2 | 2/2 | 2/2 |
Methionine | no | 1/4 | 1/4 |
Tyrosine | 1/3 | 2/3 | 1/3 |
Lysine | 3/10 | 7/10 | 4/10 |
Proline | 2/4 | 4/4 | 3/4 |
Arginine | no | 4/4 | 4/4 |
Histidine | no | 7/10 | no |
Valine | 3/4 | 4/4 | 3/4 |
Iso-leucine | 4/5 | 4/5 | 3/5 |
Leucine | 5/9 | 9/9 | 6/9 |
There are 14 amino acid pathways essential for the growth of the Francisella tularensis subspecies (Asp, Cys, Ser, Thr, Met, Tyr, Lys, Pro, Arg, His, Val, Ile, and Leu)[34]. In addition, the F. tularensis subsp. tularensis SCHUs4 pathways for sulphate assimilation, threonine biosynthesis, valine biosynthesis and isoleucine biosynthesis are incomplete together with pathways for methionine, arginine, histidine, lysine and tyrosine biosynthesis in the same subspecies[9]. However, in our computational prediction of pathways in F. tularensis subsp. tularensis suggests that it has pathways for synthesis of all amino acids except Arg and His. For F. noatunensis subsp. orientalis Toba04 we are not able find the pathways for His, Asp and Cys.
Enzymes required for the His biosysnthesis pathway is only found in F. philomiragia subsp. philomiragia. In F. noatunensis subsp. orientalis, the genes required for His biosynthesis are present as pseudogenes. The pathway is also absent in F. tularensis subsp. tularensis SCHUs4. It is of interest to note the absence of the pathway for Asp synthesis. Asparagine is an essential amino acid in fish specific F. noatunensis subsp. orientalis Toba04, suggesting that this amino acid may be taken from the host. We were able to find complete pathways for Asp, Cys, Thr, Pro, Arg, Val and Leu in F. philomiragia subsp. philomiragia indicating that pathways for synthesizing all these amino acids were present in the ancestral Francisella genome and lost in the metabolically incompetent genomes. The pathway for Sulfate assimilation is absent in F. tularensis subsp. tularensis SCHUs4 and present in F. philomiragia subsp. philomiragia (Additional file1: Figure S4).
Conclusions
We have presented the whole genome characterisation of F. noatunensis subsp. orientalis Toba04 and extensive comparative analysis against other Francisella subspecies. All the Francisella strains that are non-virulent to human possess one set of Pathogenecity Island and very low number of IS elements. F. noatunensis subsp. orientals Toba04 which is most closely related to F. philomiragia subsp. philomiragia has no IS elements present in its genome. IS elements are important for generating genomic rearrangements typically observed between Francisella species. Since the F. noatunensis subsp. orientalis Toba04 genome is significantly rearranged compared to other Francisella species we propose that IS elements have been present but they are now lost. In addition, we identified 252 pseudogenes in F. noatunensis subsp. orientalis and they are typically created as a result of genomic rearrangements. The analysis of the pseudogenes from Francisella species demonstrated that the pseudogenes from F. noatunensis subsp. orientalis are old by having more than one inactivating mutation. The whole genome phylogenetic analysis revealed two main branches that separate the mammalian and fish parasitic Francisella species.
Although the pathogenic Francisella species resides on different phylogenetic branches they share a set of common features like a large number of pseudogenes and several interrupted metabolic pathways resulting in metabolic incompetence. The metabolic incompetence is like an adaptation to an intracellular life style and points to similar evolutionary constraints from the different vertebrate hosts.
Our work provides insight into studies of Francisella subspecies evolution, and our comparative analysis and results will help to understand the pathogenecity mechanisms for each Francisella subspecies. We have also listed important genes influencing the virulent mechanisms in each pathogenic strain specifically so that researchers working on Francisella could work on those genes for further understanding on virulent factors. In addition, we found both fish and human pathogens share many features and it may be possible to use the fish parasites as models to enhance our knowledge about host parasite interactions for this important group of pathogens.
Methods
Sequencing and assembly
The F. noatunensis subsp. orientalis Toba04 strain was sequenced using 454-pyrosequencing[35] generating 263,717 reads consisting of 56,522,682 bases. These were assembled using Newbler v.2[35]http://454.com/products-solutions/analysis-tools/gs-de-novo-assembler.asp leading to 21 contigs with total length of 1,848,209 bases and N50 of 215,480 base pairs. The gaps were closed by first analysing the contig graph, which presents the connections between contigs, based on the repeat information present in the reads. The edges between the contigs with coverage less than 10 were removed. Further comparative analysis (using BLAST) with fully sequenced strains of F.tularensis subsp. tularensis SCHUs4 was performed, and the edges between contigs which were inconsistent across these strains were removed and incremental assembly using runViewer program present in the Newbler was performed. This led to 8 contigs with total length of 1,847,034 and N50 of 429,132. Later a series of PCR, suggested on the basis of contig graph, were performed to join these contigs and check the correctness of assembly. Several sets of specific primers were designed at the ends of each contig.
GoTaq PCR enzyme (Promega) was used in all amplifications. Two and two primers were combined, in individual PCR reactions, so that all possible combinations of primers were tested (an all against all combination approach). Genomic DNA identical to the DNA used for sequencing was used as template. Fragments appearing as single bands in agarose gel electrophoresis were either purified using ExoSAP-IT (GE Healthcare) and sequenced on both strands using the PCR primers and the BigDye chemistry (Applied Biosystems), or the PCR fragments were cloned into a pCR4_TOPO vector following the supplied instructions (Invitrogen) and several random clones were sequenced using the vector primers. All the sequences were used in the assembly. To check the correctness of assembly 10 pair of primers were designed in one of the contigs and PCR reactions were performed. All reactions gave result as expected. This led to 19 primers out of which 12 gave products (5 were one sided) and 7 failed. This information was fed back into the assembler as fake paired ends and another series of incremental assembly was performed leading to 5 contigs with 1,715,028 bases and N50 of 1,033,009. Another series of PCR based on these results led to 63 primer products and information about repeats and orientation. A straight-forward fake paired-end presentation into the assembler broke up the assembly due to short matches of product in various locations confusing the assembler. So the reassembly was done incrementally in a fashion maintaining the consistency between PCR results and reads based assembly. This lead to 1 big (length 1,857,341) and 3 small scaffolds (total length 8940). The pink line in Figure5 is the scaffold linking while plain black line contig linking. Since all the contigs do not have paired end connection information, they don’t show in the scaffold-graph (including contigs of less than 500 bases), these gaps can also be filled manually (just replacing the gap with contigs based on graph). For example between 14 and 15 we can put in 3, 2, and 5 but Newbler did not have enough information to do it automatically (Additional file1: Figure S2).
Annotation
We used the prokaryotic annotation pipeline from TIGR to annotate the genome: Glimmer 3[36] was used to predict the genes in the genome. NcRNAdb[37] and RNammer[38] was used to predict 23s and 16s RNA genes. tRNAscan-SE[39] was used to predict tRNAs present in the genome. We used TMHMM[40] to predict transmembrane helices and SignalP[41] to predict signal peptides. The ORFs predicted were compared against NCBI’s non-redundant database using BlastX. Protein coding genes from predicted results were curated manually using Artemis[15]. All the predicted proteins were searched against NCBI’s COG database (Clusters of Orthologous Groups of proteins based on phylogenetic classification of proteins encoded in complete genomes) to find the protein family. Cognitor was used to predict the COGs for each protein[42]. EC numbers were assigned to the proteins using the BRENDA database[43]. Interpro-Scan was used to add domain based annotation of the proteins[44]. CGview was used to make circular genome plot[45].
IS elements and genome rearrangement plot
IS-finder[46] web server was used to find the IS-.elements. We used BLASTx[47] to compare against the IS elements database and the results were manually checked. Nucmer in Mummer 3.0[48] package was used to prepare the comparison plot between F. noatunensis subsp. orientalis and F. philomiragia subsp. philomiragia. To plot F. noatunensis subsp. orientalis’s pseudogenes against F. philomiragia subsp.philomiragia genome, we compared pseudogenes against F. philomiragia subspecies genome and the gene coordinates were extracted from BLASTp results using in-house Perl program. We used that coordinates to map the pseudogenes in the comparison plot.
Mutation count
To count the number of mutation in the pseudogenes in Francisella subspecies, we compared all the pseudogenes present in the genomes against the NCBI’s non-redundant database. From the Blast result we calculated the possible inactivating mutations including insertions, substitutions, premature stop codons and point mutations using a Perl program written in-house. The log-normal graphs were made using MS-Excel.
Maximum likelihood distance
We calculated the maximum likelihood distance between the pseudogenes present in Francisella species and its 1st homologous hit from BLASTp result. We used biodist module in bio++ package to calculate the maximum likelihood distance values using L95 nucleotide substitution model.
Phylogenetic tree
Mauve 2.3.1[49] was used to prepare the whole genome alignment between all the Francisella species. The same software was also used to show the rearrangement between F. noatunensis subsp. orientalis and F. philomiragia subsp. philomiragia. The location of rRNAs and IS-elements were marked manually. The genome alignment was loaded in to the MEGA4[50] for editing and subsequently the Neighbor-joining method was used to predict the tree with 1000 replicates for the bootstrap value calculation. We used the same procedure for the phylogenetic tree predicted using core gene sets. Incomplete assembly of F. noatunensis subsp. noatunensis was compared against protein sequences of F. noatunensis subsp. orientalis to find core set of genes commonly present in Francisella species.
Metabolic pathways
Pathway tools software[51] was used to predict the metabolic pathway for F. noatunensis subsp. orientalis. We also predicted metabolic pathway for F. philomiragia subsp. philomiragia to use it as reference for F. noatunensis subsp. orientalis and comparative analysis. The Pathologic module was used to predict the pathways for both the genomes. The function Overviews->highlight->species comparison were used to compare pathways between genomes.
Genome comparison
Protein sequences from F. tularensis subsp. tularensis SCHUs4, F. tularensis subsp. holarctica OSU18, F. tularensis subsp. mediasiatica FSC 147, F. tularensis subsp. novicida U112, F. philomiragia subsp. philomiragia ATCC 25017 and F. noatunensis subsp. orientalis genomes were extracted from GenBank file. We used BLASTp[47] to compare the protein sequences against themselves to find unique and the genes which are shared between the genomes. We calculated two values to classify if two proteins from different species are same: (i) we calculated the score bit percentage from the top homologous hit. We divided the top hit’s score bit value by the values from subsequent hits from other Francisella species. The score bit percentage should be >=65% of top homologous hit’s score bit to classify the same two proteins. (ii) We calculated the alignment length percentage between the query and subject. The alignment length percentage is calculated from the length of the subject protein and length of the alignment. It should be >=75% between query and subject to classify two proteins as the same. We used the same method to cluster the proteins in to ortholog groups.
Declarations
Acknowledgements
We thank MSD Animal Health for economically supporting the genome sequencing.
Authors’ Affiliations
References
- Francis E: Tularemia. I. The occurrence of tularemia in nature as a disease of man. Public Health Rep. 1921, 36: 1731-1753. 10.2307/4576069.View ArticleGoogle Scholar
- Oyston PC, Sjostedt A, Titball RW: Tularaemia: bioterrorism defence renews interest in Francisella tularensis. Nat Rev Microbiol. 2004, 2: 967-978. 10.1038/nrmicro1045.View ArticlePubMedGoogle Scholar
- Zerihun MA, Feist SW, Bucke D, Olsen AB, Tandstad NM, Colquhoun DJ: Francisella noatunensis subsp. noatunensis is the aetiological agent of visceral granulomatosis in wild Atlantic cod Gadus morhua. Dis Aquat Org. 2011, 95: 65-71. 10.3354/dao02341.View ArticlePubMedGoogle Scholar
- Sjödin A, Svensson K, Ohrman C, Ahlinder J, Lindgren P, Duodo S, Hnath J, Burans JP, Johansson A, Colquhoun DJ, Larsson P, Forsman M: Genome characterisation of the genus Francisella reveals insight into similar evolutionary paths in pathogens of mammals and fish. BMC Genomics. 2012, 13 (1): 268-10.1186/1471-2164-13-268.PubMed CentralView ArticlePubMedGoogle Scholar
- Mikalsen J, Olsen AB, Tengs T, Colquhoun DJ: Francisella philomiragia subsp. noatunensis subsp.nov., isolated from farmed Atlantic cod (Gadus morhua L.). Int. J. Sys. Evol. Microbiol. 2007, 57: 1960-1965. 10.1099/ijs.0.64765-0.View ArticleGoogle Scholar
- Mikalsen J, Colquhoun DJ: Francisella asiatica sp. nov. isolated from farmed tilapia (Oreochromis sp.) and elevation of Francisella philomiragia subsp. noatunensis to species rank as Francisella noatunensis comb. nov., sp. nov. Int J Syst Evol Microbiol. 2009, 25: [Epub ahead of print]Google Scholar
- Santic M, Molmeret M, Klose KE, Abu Kwaik Y: Francisella tularensis travels a novel, twisted road within macrophages. Trends Microbiology. 2006, 14: 37-44. 10.1016/j.tim.2005.11.008.View ArticleGoogle Scholar
- Rohmer L, Fong C, Abmayr S, Wasnick M, Larson Freeman TJ, Radey M, Guina T, Svensson K, Hayden HS, Jacobs M, Gallagher LA, Manoil C, Ernst RK, Drees B, Buckley D, Haugen E, Bovee D, Zhou Y, Chang J, Levy R, Lim R, Gillett W, Guenthener D, Kang A, Shaffer SA, Taylor G, Chen J, Gallis B, D'Argenio DA, Forsman M, et al, et al: Comparison of Francisella tularensis genomes reveals evolutionary events associated with the emergence of human pathogenic strains. Genome Biol. 2007, 8: 102-10.1186/gb-2007-8-6-r102.View ArticleGoogle Scholar
- Larsson P, Oyston PC, Chain P, Chu MC, Duffield M, Fuxelius HH, Garcia E, Hälltorp G, Johansson D, Isherwood KE, Karp PD, Larsson E, Liu Y, Michell S, Prior J, Prior R, Malfatti S, Sjöstedt A, Svensson K, Thompson N, Vergez L, Wagg JK, Wren BW, Lindler LE, Andersson SG, Forsman M, Titball RW: The complete genome sequence of Francisella tularensis, the causative agent of tularemia. Nature Genetics. 2005, 37 (2): 153-159. 10.1038/ng1499.View ArticlePubMedGoogle Scholar
- Champion MD, Zeng Q, Nix EB, Nano FE, Keim P, Kodira CD, Borowsky M, Young S, Koehrsen M, Engels R, Pearson M, Howarth C, Larson L, White J, Alvarado L, Forsman M, Bearden SW, Sjöstedt A, Titball R, Michell SL, Birren B, Galagan J: Comparative genomic characterization of francisella tularensis strains belonging to low and high virulence subspecies. PLoS Pathogens. 2009, 5 (5): 1-19.View ArticleGoogle Scholar
- Larsson P, Elfsmark D, Svensson K, Wikström P, Forsman M, Brettin T, Keim P, Johansson A: Molecular evolutionary consequences of niche restriction in francisella tularensis, a facultative intracellular pathogen. PLoS Pathogens. 2009, 5 (6): e1000472-10.1371/journal.ppat.1000472.PubMed CentralView ArticlePubMedGoogle Scholar
- Birkbeck TH, Feist SW, Verner-Jeffreys DW: Francisella infections in fish and shellfish. J Fish Dis. 2011, 34 (3): 173-187. 10.1111/j.1365-2761.2010.01226.x.View ArticlePubMedGoogle Scholar
- Moran NA, Plague GR: Genomic changes following host restriction in bacteria. Curr. Opin. Genet. Dev. 2004, 14 (6): 627-633. 10.1016/j.gde.2004.09.003.View ArticlePubMedGoogle Scholar
- Petrosino JF, Xiang Q, Karpathy SE, Jiang H, Yerrapragada S, Liu Y, Gioia J, Hemphill L, Gonzalez A, Raghavan TM, Uzman A, Fox GE, Highlander S, Reichard M, Morton RJ, Clinkenbeard KD, Weinstock GM: Chromosome rearrangement and diversification of Francisella tularensis revealed by the type B (OSU18) genome sequence. J Bacteriol. 2006, 188 (19): 6977-6985. 10.1128/JB.00506-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Carver T, Berriman M, Tivey A, Patel C, Böhme U, Barrell BG, Parkhill J, Rajandream MA: Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008, 24 (23): 2672-2676. 10.1093/bioinformatics/btn529.PubMed CentralView ArticlePubMedGoogle Scholar
- Andersson SG, Zomorodipour A, Andersson JO, Sicheritz-Pontén T, Alsmark UC, Podowski RM, Näslund AK, Eriksson AS, Winkler HH, Kurland CG: The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature. 1998, 396: 133-140. 10.1038/24094.View ArticlePubMedGoogle Scholar
- Kuo CH, Ochman H: The extinction dynamics of bacterial pseudogenes. PLoS Genet. 2010, 6 (8): e1001050-10.1371/journal.pgen.1001050.PubMed CentralView ArticlePubMedGoogle Scholar
- da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc. 2009, 4 (1): 44-57.View ArticleGoogle Scholar
- Jun L, Woo NYS: Pathogenecity of Vibrios inFish. Journal of Ocean University of Qingdao. 2003, 2 (2): 117-128.Google Scholar
- Kanistanon D, Hajjar AM, Pelletier MR, Gallagher LA, Kalhorn T, Shaffer SA, Goodlett DR, Rohmer L, Brittnacher MJ, Skerrett SJ, Ernst RK: A francisella mutant in lipid a carbohydrate modification elicits protective immunity. PLOS pathogens. 2008, 4 (2): e24-10.1371/journal.ppat.0040024.PubMed CentralView ArticlePubMedGoogle Scholar
- Gallagher LA, McKevitt M, Ramage ER, Manoil C: Genetic dissection of the Francisella novicida restriction barrier. J Bacteriol. 2008, 190: 7830-7837. 10.1128/JB.01188-08.PubMed CentralView ArticlePubMedGoogle Scholar
- Santic M, Al-Khodor S, Abu Kwaik Y: Cell biology and molecular ecology of Francisella tularensis. Cellular Microbiology. 2010, 12: 129-139. 10.1111/j.1462-5822.2009.01400.x.View ArticlePubMedGoogle Scholar
- Nano FE, Zhang N, Cowley SC, Klose KE, Cheung KK, Roberts MJ, Ludu JS, Letendre GW, Meierovics AI, Stephens G, Elkins KL: A Francisella tularensis pathogenicity island required for intramacrophage growth. J Bacteriol. 2004, 186: 6430-6436. 10.1128/JB.186.19.6430-6436.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Hazlett KR, Cirillo KA: Environmental adaptation of Francisella tularensis. Microbes Infect. 2009, 11: 828-834. 10.1016/j.micinf.2009.06.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Groisman EA: Principles of Bacterial Pathogenesis. 2001, San Diego, California:Academic Press,Google Scholar
- Winstanley C, Hart CA: Type III secretion systems and pathogenicity islands. J Med Microbiol. 2001, 50: 116-126.View ArticlePubMedGoogle Scholar
- Tobe T, Beatson SA, Taniguchi H, Abe H, Bailey CM, Fivian A, Younis R, Matthews S, Marches O, Frankel G, Hayashi T, Pallen MJ: An extensive repertoire of type III secretion effectors in Escherichia coli O157 and the role of lambdoid phages in their disseminatio. Proc Natl Acad Sci USA. 2006, 103: 14941-14946. 10.1073/pnas.0604891103.PubMed CentralView ArticlePubMedGoogle Scholar
- Forsberg A, Guina T: Type II secretion and type IV pili of Francisella. Ann N Y Acad Sci. 2007, 1105: 187-201. 10.1196/annals.1409.016.View ArticlePubMedGoogle Scholar
- Chakraborty S, Monfett M, Maier TM, Benach JL, Frank DW, Thanassi DG: Type IV pili in Francisella tularensis: roles of pilF and pilT in fiber assembly, host cell adherence and virulence. Infection and Immunity. 2008, 76 (7): 2852-2861. 10.1128/IAI.01726-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Forsberg A, Guina T: Type II secretion and type IV pili of Francisella. Ann N Y Acad Sci. 2007, 1105: 187-201. 10.1196/annals.1409.016.View ArticlePubMedGoogle Scholar
- Parish T, Smith DA, Kendall S, Casali N, Bancroft GJ, Stoker NG: Deletion of two-component regulatory systems increases the virulence of mycobacterium tuberculosis. Infection and Immunity. 2003, 71 (1134-1140):Google Scholar
- Flamez C, Ricard I, Arafah S, Simonet M, Marceau M: Two-component system regulon plasticity in bacteria: a concept emerging from phenotypic analysis of Yersinia pseudotuberculosis response regulator mutants. Adv Exp Med Biol. 2007, 603: 145-155. 10.1007/978-0-387-72124-8_12.View ArticlePubMedGoogle Scholar
- Nano FE, Schmerk C: The francisella pathogenicity island. Ann N Y AcadSci. 2007, 1105: 122-137. 10.1196/annals.1409.000.View ArticleGoogle Scholar
- Traub A, Mager J, Grossowicz N: Studies on the nutrition of Pasteurella tularensis. J Bacteriol. 1955, 70: 60-69.PubMed CentralPubMedGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al, et al: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437: 376-380.PubMed CentralPubMedGoogle Scholar
- Delcher AL, Bratke KA, Powers EC, Salzberg SL: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007, 23 (6): 673-679. 10.1093/bioinformatics/btm009.PubMed CentralView ArticlePubMedGoogle Scholar
- Szymanski M, Erdmann VA, Barciszewski J: Noncoding RNAs database (ncRNAdb). Nucleic Acids Res. 2007, 35: D162-D164. 10.1093/nar/gkl994.PubMed CentralView ArticlePubMedGoogle Scholar
- Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW: RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007, 35 (9): 3100-3108. 10.1093/nar/gkm160.PubMed CentralView ArticlePubMedGoogle Scholar
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25 (5): 955-964.PubMed CentralView ArticlePubMedGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.View ArticlePubMedGoogle Scholar
- Emanuelsson O, Brunak S, von Heijne G, Nielsen H: Locating proteins in the cell using TargetP, SignalP, and related tools. Nature Protocols. 2007, 2: 953-971. 10.1038/nprot.2007.131.View ArticlePubMedGoogle Scholar
- Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29 (1): 22-28. 10.1093/nar/29.1.22.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang A, Scheer M, Grote A, Schomburg I, Schomburg D: BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009. Nucleic Acids Res. 2009, 37: D588-D592. 10.1093/nar/gkn820.PubMed CentralView ArticlePubMedGoogle Scholar
- Zdobnov EM, Apweiler R: InterProScan - an intergration platform for the signature-recognition methods in InterPro. Bioinformatics. 17: 847-848.Google Scholar
- Stothard P, Wishart DS: Circular genome visualization and exploration using CGView. Bioinformatics. 2005, 21: 537-539. 10.1093/bioinformatics/bti054.View ArticlePubMedGoogle Scholar
- Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M: ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006, 34 (Database issue): D32-36.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biology. 2004, 5: R12-10.1186/gb-2004-5-2-r12.PubMed CentralView ArticlePubMedGoogle Scholar
- Darling AC, Mau B, Blattner FR, Perna NT: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004, 14 (7): 1394-1403. 10.1101/gr.2289704.PubMed CentralView ArticlePubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.View ArticlePubMedGoogle Scholar
- Karp PD, Paley S, Romero P: The pathway tools software. Bioinformatics. 2002, 18: 225-232. 10.1093/bioinformatics/18.suppl_1.S225.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.




