Skip to main content

Pseudogene accumulation in the evolutionary histories of Salmonella enterica serovars Paratyphi A and Typhi



Of the > 2000 serovars of Salmonella enterica subspecies I, most cause self-limiting gastrointestinal disease in a wide range of mammalian hosts. However, S. enterica serovars Typhi and Paratyphi A are restricted to the human host and cause the similar systemic diseases typhoid and paratyphoid fever. Genome sequence similarity between Paratyphi A and Typhi has been attributed to convergent evolution via relatively recent recombination of a quarter of their genomes. The accumulation of pseudogenes is a key feature of these and other host-adapted pathogens, and overlapping pseudogene complements are evident in Paratyphi A and Typhi.


We report the 4.5 Mbp genome of a clinical isolate of Paratyphi A, strain AKU_12601, completely sequenced using capillary techniques and subsequently checked using Illumina/Solexa resequencing. Comparison with the published genome of Paratyphi A ATCC9150 revealed the two are collinear and highly similar, with 188 single nucleotide polymorphisms and 39 insertions/deletions. A comparative analysis of pseudogene complements of these and two finished Typhi genomes (CT18, Ty2) identified several pseudogenes that had been overlooked in prior genome annotations of one or both serovars, and identified 66 pseudogenes shared between serovars. By determining whether each shared and serovar-specific pseudogene had been recombined between Paratyphi A and Typhi, we found evidence that most pseudogenes have accumulated after the recombination between serovars. We also divided pseudogenes into relative-time groups: ancestral pseudogenes inherited from a common ancestor, pseudogenes recombined between serovars which likely arose between initial divergence and later recombination, serovar-specific pseudogenes arising after recombination but prior to the last evolutionary bottlenecks in each population, and more recent strain-specific pseudogenes.


Recombination and pseudogene-formation have been important mechanisms of genetic convergence between Paratyphi A and Typhi, with most pseudogenes arising independently after extensive recombination between the serovars. The recombination events, along with divergence of and within each serovar, provide a relative time scale for pseudogene-forming mutations, affording rare insights into the progression of functional gene loss associated with host adaptation in Salmonella.


Salmonella enterica serovars Typhi and Paratyphi A (Typhi, Paratyphi A) are human-restricted bacterial pathogens that cause related systemic diseases, known as typhoid, paratyphoid or enteric fever [1]. Together, these pathogens infect more than 25 million people annually worldwide, resulting in > 200,000 deaths [2]. Historically, Paratyphi A was responsible for less than 20% of these infections [2], however Paratyphi A infection rates have been rising, particularly in South East Asia where this serovar is now responsible for 30–50% of enteric fever cases [36]. This increase has been associated with rises in antibiotic resistance among paratyphoid infections [3, 7, 8]. It may also be associated with vaccination against Typhi, which unfortunately provides little cross-protection against Paratyphi A [9, 10]. Finished genomic sequence is currently available for two Typhi isolates (recent clinical isolate CT18 and laboratory strain Ty2) and one Paratyphi A isolate (laboratory strain ATCC9150) [1113].

Typhi and Paratyphi A are unusual among S. enterica, as most serovars infect a broad range of host species and cause self-limiting gastroenteritis, while Typhi and Paratyphi A infect only humans and cause systemic disease [14]. The basis for their unusual shared phenotype is unclear. Whole-genome sequence comparisons suggest that the Paratyphi A and Typhi chromosomes are much more closely related at the DNA level than other S. enterica serovars. Furthermore the genomes of both organisms harbour a large number of pseudogenes (> 4% of coding sequences in each genome) [1113] compared to host-generalist relatives such as S. enterica serovar Typhimurium (0.9%) or E. coli K12 (0.7%).

A recent study showed that the apparent similarity between Paratyphi A and Typhi genome sequences is due to low nucleotide divergence (mean 0.18%) across a quarter of the genome, while the rest of the genome sequences are as divergent as any other pair of S. enterica serovars (mean 1.2%) [15]. The study used model-based approaches to demonstrate that this is due to relatively recent convergence via recombination between 23% of the Paratyphi A and Typhi genomes, whose initial divergence occurred around the same time as that of other S. enterica serovars. It is possible that this extensive recombination was responsible for the convergence of Paratyphi A and Typhi on a human-restricted lifestyle, however it is also plausible that the serovars followed independent paths to host-restriction and the opportunity for recombination arose after they became isolated together in this shared niche. The direction of recombination cannot be determined, and may have been uni- or bi-directional.

Pseudogenes are coding sequences (CDS) that are putatively inactivated by mutations including nonsense substitutions, frameshifts, or truncation by deletion or rearrangement. Loss of gene function through pseudogene formation and gene deletion appears to be a hallmark of host-restricted pathogenic bacteria compared to their host-generalist relatives [11, 13, 1619]. This is likely due to a combination of adaptation (whereby loss of gene function is selected for in the new host) and genetic drift associated with population bottlenecks during or following adaptation to the new niche. It has been reported that Paratyphi A and Typhi share some of their pseudogenes [13], resulting in convergent loss of gene functions which may be associated with adaptation to their shared niche. The genomes of S. enterica encode two type III secretion systems (TTSS), which mediate secretion of a range of effector proteins into host cells [20]. Many of these effectors are encoded in Salmonella pathogenicity islands 1 and 2 (SPI-1 and SPI-2, reviewed in [20, 21]), including several that are pseudogenes in Typhi and/or Paratyphi A. The inactivation of these and other genes involved in interactions between Salmonella and host is thought to play a key role in the host adaptation of these serovars [11, 13].

Here we report the 4.5 Mbp genome sequence of a recent clinical isolate of Paratyphi A, strain AKU_12601, allowing the first comparative analysis between two Paratyphi A isolates at the whole-genome sequence level. We also present a novel comparative annotation of pseudogenes in all four Paratyphi A and Typhi genomes. This is combined with previously reported divergence data [15] in order to tease apart the roles that recombination and pseudogene formation have played in the genetic and phenotypic convergence of Paratyphi A and Typhi.

Results and Discussion

Sequencing the Paratyphi A AKU_12601 genome

The whole genome sequence of Paratyphi A strain AKU_12601 was assembled, finished and annotated as described in the Methods section. The genome consists of a 4,581,797 bp circular chromosome, encoding 4,285 CDS, and a 212,711 bp IncHI1 multidrug resistance plasmid pAKU_1 [EMBL:AM412236] which has been described in detail elsewhere [22]. The AKU_12601 genome was also resequenced using the Illumina Genome Analyzer (Illumina), to a depth of 20-fold coverage. Short reads (35 bp) generated by resequencing were aligned to the finished sequence, which identified five high quality single base discrepancies between the assemblies (see Methods). One was found to be an erroneous base call in the finished sequence following checking of trace files and was corrected prior to EMBL submission. The remaining four bases (6-, 8-, 10-, and 20-fold read depth in Illumina data) may be errors in the Illumina resequencing, or reflect genuine mutations arising during culturing in the laboratory.

Data accessions

The finished sequence and annotation of the AKU_12601 genome is available in EMBL under accession FM200053, and the Illumina resequencing data is available under accession ERA000012

Comparison of Paratyphi A strains AKU_12601 and ATCC9150

Comparative analysis revealed the two Paratyphi A genomes to be collinear, with no rearrangements and no acquisitions of phage or other large mobile elements. In contrast, Typhi Ty2 contains an inversion of half the genome between two rRNA operons and large-scale phage variation compared to Typhi CT18 [12]. Several insertion/deletion events and substitutions were identified between the Paratyphi A genomes.

Insertions and deletions

A total of 39 insertion/deletion events, including 13 differences in homopolymeric tracts, were identified between AKU_12601 and ATCC9150 (Table 1). Two IS 10 elements were inserted in AKU_12601, within the nmpC gene and a hypothetical pseudogene (SSPA4008a/SPA4318). Six variable number tandem repeats (VNTRs) were identified, including one less tandem copy each of the tRNA-Gly and rrT RNA genes in AKU_12601.

Table 1 Insertion/deletion events between Paratyphi A AKU_12601 and ATCC9150

The largest single locus difference between the two genomes occurs within the O-antigen biosynthetic cluster rfb, where a 2.7 kb sequence including the 3' end of putative O-antigen transporter rfbX (SSPA0733) and two putative glycosyltransferase genes (rfbV/SSPA0734 and 5' end of rfbU/SSPA0735) is present in three tandem copies in ATCC9150. A single copy of this sequence is present in other S. enterica serovars [23], therefore the AKU_12601 sequence is assumed to be the ancestral form. The repeats in ATCC9150 generate two copies of a chimeric coding sequence, combining the 5' end of rfbU with the 3' end of rfbX (Figure 1). These genes are involved in synthesis and transport of O-antigen [23], but it is unclear whether the increased copy number and chimeric sequences generated by these repeats cause any functional differences in O-antigen expression between ATCC9150 and AKU_12601.

Figure 1
figure 1

Tandem repeats in the O-antigen biosynthesis cluster in Paratyphi A ATCC9150. Bottom row: gene arrangement in Paratyphi A AKU_12601 and Typhi, presumed to be the ancestral form. Top row: gene arrangement in Paratyphi A ATCC9150, apparently resulting from two tandem duplications. Labels give systematic identifiers for the gene sequences in each genome, identical coding sequences are shown in the same colours, identical sequences are joined by lines.

An additional 122 bp sequence was present in AKU_12601 between the iap and ygbF genes, including two additional copies of a 30 bp repeat sequence present in six copies in ATCC9150. Smaller VNTRs were identified within pduP and rcnA, resulting in repeats of two and four amino acids respectively in the encoded proteins. VNTRs are useful as genetic markers for typing Salmonella enterica serovars, and variability in the rcnA VNTR among Paratyphi A isolates has been reported previously [24].

Single nucleotide polymorphisms (SNPs)

In addition to insertion/deletion events, 188 SNPs were identified. These include 101 non-synonymous and 51 synonymous SNPs, giving a dN/dS ratio of 0.62, similar to that observed between diverse Typhi strains [25]. While extreme care must be taken in interpreting dN/dS ratios based on the comparison of two closely related genomes [26], this ratio is consistent with some degree of purifying selection in the Paratyphi A population.

Differences in pseudogene complements

The Paratyphi A AKU_12601 genome contains 204 pseudogenes, constituting 4.8% of annotated CDSs. Although our comparative analysis revealed very few sequence differences between the two Paratyphi A genomes (188 SNPs, 39 insertion/deletion events), these differences include 22 pseudogene-forming mutations (see Table 2). The mutations include six nonsense SNPs and 16 insertion/deletion events, and were verified by inspecting the capillary sequencing traces and Illumina reads data for Paratyphi A AKU_12601. This suggests that pseudogene-forming mutations are continuing to accumulate in Paratyphi A, as has been observed in Typhi [12, 25].

Table 2 Inactivating mutations unique to either AKU_12601 or ATCC9150

Comparison of pseudogenes in Paratyphi A and Typhi genomes

In order to comprehensively investigate the mechanisms of convergent gene loss in Paratyphi A and Typhi, we assembled a comparative table of pseudogenes present in each serovar (Additional file 1). This analysis includes all previously annotated pseudogenes, some additional Typhi pseudogenes suggested previously [13] and some novel pseudogenes identified by manually inspecting Typhi and Paratyphi A sequences for all genes annotated as pseudogenes in any of the AKU_12601, ATCC9150, CT18 or Ty2 genomes (see Methods).

Shared pseudogenes

The resulting table includes 66 pseudogenes common to Typhi (strains CT18, Ty2) and Paratyphi A (strains AKU_12601, ATCC9150) (Additional file 1). This is almost double the figure reported previously [13], although many of the additional pseudogenes are remnants of transposase or bacteriophage genes. By aligning the Typhi and Paratyphi A DNA sequences for the shared pseudogenes, we identified shared and independent inactivating mutations (Additional file 1). Contrary to previous reports [13], we found common inactivating mutations in many of the shared pseudogenes.

The functions of most of the shared pseudogenes was discussed by the authors of the ATCC9150 genome study [13] and need not be repeated here. Of particular note, however, 20 of the shared pseudogenes (54% of non-phage/transposase shared pseudogenes) encode secreted or surface-exposed proteins (Table 3), thus are likely to have contributed to convergence upon similar patterns of host interactions. Furthermore, inactivation of different genes in the same pathway will often result in similar loss of function, thus the true contribution of pseudogene formation to phenotypic convergence between Typhi and Paratyphi A is likely underestimated by considering only shared pseudogenes. For example, different members of the cbi cluster are inactivated in Typhi and Paratyphi A, which may result in similar inactivation of the cobalamin synthesis pathway [13].

Table 3 Pseudogenes shared between Paratyphi A and Typhi

Were pseudogenes shared by recombination?

Recombination has clearly been an important mechanism of convergence between Paratyphi A and Typhi [15]. The accumulation of pseudogenes is a convergent trait evident in these genomes, and shared patterns of pseudogene formation is a likely mechanism for phenotypic convergence. But did recombination contribute to the sharing of pseudogenes?

More than 30% of the pseudogene complements of Typhi and Paratyphi A were shared (Additional file 1), consistent with the possibility that recombination of 23% of the genomes resulted in direct sharing of many of their pseudogenes. We determined whether each pseudogene lay in regions that were predicted to have undergone relatively recent recombination between Paratyphi A and Typhi (sequence divergence < 0.3% between serovars according to [15]) (see Additional file 1). Of all the pseudogenes present in both Paratyphi A AKU_12601 and ATCC9150, 24.3% lie in recently recombined regions; of the pseudogenes present in both Typhi CT18 and Ty2, 25.0% lie in recombined regions. According to [15], 25.6% of genes in CT18 lie in the recently recombined regions.

These observations are consistent with two scenarios, illustrated in Figure 2: (1) most pseudogenes were inactivated prior to recombination, and recombination was random with respect to the location of pseudogenes (Figure 2b); or (2) most pseudogenes were inactivated after recombination, and these pseudogene-forming mutations were random with respect to recombined regions (Figure 2c). If (1) were true, we would expect that (i) genes that are pseudogenes in one serovar but intact in the other (i.e. serovar-specific pseudogenes) would not lie in recombined regions, and (ii) most pseudogenes in recombined regions would have been shared during recombination, i.e. they would be pseudogenes in both Paratyphi A and Typhi and share common inactivating mutations in both genomes (red circles in Figure 2b). If (2) were true, we would expect that (i) serovar-specific pseudogenes would be distributed randomly with respect to recombined and nonrecombined regions, and (ii) very few pseudogenes would have been shared during recombination, i.e. very few pseudogenes in recombined regions would share inactivating mutations (red circles in Figure 2c).

Figure 2
figure 2

Scenarios of recombination and pseudogene formation in Paratyphi A and Typhi. (a) True distribution of pseudogenes in the Paratyphi A AKU_12601 and Typhi CT18 genomes (gene order based on gene co-ordinates in Typhi CT18). (b-c) Distribution of pseudogenes resulting from data simulated under two scenarios, under both of which 40 pseudogenes are inherited from the most recent common ancestor of Paratyphi A and Typhi, and extensive accumulation of pseudogenes occurs before or after recombination of 25% of genes. For ease of simulation, the recombination shown is uni-directional, but bi-directional exchange would result in similar patterns. (b) Scenario 1: 150 additional pseudogenes accumulate in each serovar, followed by recombination. (c) Scenario 2: only 20 additional pseudogenes arise before recombination, after which a further 150 pseudogenes accumulate in each serovar.

The distribution of serovar-specific and shared pseudogenes in recombined and nonrecombined regions is shown in Figure 2a and summarised in Table 4. Pearson χ2 tests for each serovar based on this data give non-significant results (p - value > 0.2, Table 4), thus there is no evidence of association between shared or serovar-specific pseudogenes and regions of recombination, consistent with scenario (2). More than 20% of serovar-specific pseudogenes lie in recombined regions of each genome (Figure 2a, black lines in inner ring), consistent with scenario (2) whereby serovar-specific pseudogenes are expected to be randomly distributed in the genome of which 23% has been recombined (Figure 2c, black lines in inner ring). These observations are extremely unlikely under scenario (1), which would predict recombination to result in shared but not serovar-specific pseudogenes being present in recombined regions (Figure 2b, inner ring).

Table 4 Distribution of serovar-specific and shared pseudogenes in recombined regions

We found only 18 pseudogenes in recombined regions harboured the same inactivating mutations (red lines and circles in inner rings, Figure 2a), less than 20% of pseudogenes in the recombined regions of each genome (Additional file 1). As illustrated in Figure 2, this is consistent with scenario (2) but not scenario (1), which would predict that most pseudogenes lying in recombined regions would be shared by virtue of recombination and therefore carry the same inactivating mutations (red circles in Figure 2).

The patterns of pseudogene distribution we observe therefore suggest that the majority of pseudogenes present in the extant genomes of Paratyphi A and Typhi accumulated after the recombination of 23% of their genomes. Whether this relationship is causal though, remains to be proven. The acceleration of pseudogene formation is most likely due to a combination of host-adaptation and genetic drift associated with a population bottleneck in the new human-restricted niche. However whether the extensive recombination between Typhi and Paratyphi A resulted in, or resulted from, human-restriction of the two organisms, is unknown. It is plausible that host-restriction occurred independently in Typhi and Paratyphi A, providing both (a) an opportunity for recombination soon after they became isolated together in this shared niche, and (b) a trigger for accelerated pseudogene formation. Alternatively, a chance recombination event may have led to host-restriction of both organisms. It has been noted that recombination between Paratyphi A and Typhi involved sharing of intact serovar-specific or rare genes, resulting in many more shared rare genes than would be expected otherwise [15] and presumably promoting the sharing of novel functions. It is plausible therefore that recombination between Paratyphi A and Typhi led to a combination of gene acquisition and loss-of-function resulting in restriction to the human host, bestowing upon these serovars a unique and novel genetic profile that contributed to host restriction and the ability to cause systemic infection. Such an event would likely set Paratyphi A and Typhi on a similar trajectory of host adaptation and associated population bottlenecks, which might account for their similar profiles of rapid accumulation of pseudogenes through adaptive selection and genetic drift.

Tracing pseudogene formation in the evolutionary histories of Paratyphi A and Typhi

The recombination described between Paratyphi A and Typhi provides a rare marker of relative time in the evolutionary histories of these organisms. The recombination was discovered by analysing the distribution of nucleotide divergence levels between different regions of the two genomes, which clearly identified a distinct sub-population of low divergence corresponding to the recombined regions (mean 0.18% compared to genome average of 1.2%) [15]. Although not providing a precise measure of age, this suggests that the recombination event happened approximately 15% (0.18/1.2 = 0.15) as long ago as the initial divergence of Paratyphi A, Typhi and other S. enterica serovars. This implies that recombination occurred well before the most recent common ancestors of each serovar (see Figure 3), and thus prior to the last population bottlenecks in the Paratyphi A and Typhi populations.

Figure 3
figure 3

Pseudogene formation in the evolutionary histories of Paratyphi A and Typhi. Phylogenetic tree based on multiple alignments of all nonrecombined genes as defined in [15], rooted using S. bongori and E. coli as outgroups. Scale bar is nucleotide divergence. The timing of the recombination between Paratyphi A and Typhi is an approximation inferred from published divergence data [15]. Group (i) pseudogenes were inactivated prior to the divergence of Paratyphi A and Typhi, some are also inactivated in Typhimurium and Paratyphi B; following their divergence Paratyphi A and Typhi likely accumulated few additional pseudogenes; during the recombination of 23% of their genomes (direction of transfer unknown) 18 pseudogene sequences were shared between Paratyphi A and Typhi, including five non-ancestral pseudogenes (group ii); many pseudogenes were formed during a period of accelerated pseudogene accumulation in both serovars, including most group (iii) pseudogenes; pseudogenes continue to accumulate in individual sub-lineages after the most recent common ancestor of each serovar (group iv).

We divided the pseudogenes into distinct categories with different relative ages (Additional file 1): (i) ancestral pseudogenes (shared pseudogenes inactivated prior to the divergence of Paratyphi A and Typhi), (ii) recombined pseudogenes (shared pseudogenes in recombined regions, with shared inactivating mutations assumed to have arisen after initial divergence), (iii) recent conserved pseudogenes (including serovar-specific pseudogenes, and shared pseudogenes containing different inactivating mutations in Paratyphi A and Typhi; the majority of these are expected to have become pseudogenes after recombination) and (iv) recent strain-specific pseudogenes (pseudogenes in some but not all strains belonging to their respective serovar). Table 3 summarises the shared pseudogenes in each category (excluding ancestral transposase/phage gene remants) and Figure 3 shows their approximate timing overlaid on a phylogenetic tree of S. enterica serovars. Note that some serovar-specific pseudogenes (group iii) will likely be shown to be strain-specific (group iv) as more strains are sequenced (see below).

Ancestral pseudogenes

The inactivating mutations in group (i) pseudogenes are assumed to have been inherited by Paratyphi A and Typhi from a common ancestor (Figure 3). Alternatively some may have been exchanged between Paratyphi A and Typhi soon after their divergence from other S. enterica. Either way, these pseudogenes were among the earliest to arise in the evolutionary history of Paratyphi A and Typhi, thus their inactivation has been well tolerated in these serovars (most have also accumulated secondary mutations). This is unsurprising for the majority of ancestral pseudogenes which are insertion sequence (IS) transposase and phage genes/fragments. However the inactivation of seven genes known to be functional in Typhimurium and other Salmonella, in particular those that are secreted or surface exposed (Table 3), is likely to have had significant functional impact including potential modulations of host interactions. It is also possible that the loss of these genes had little effect on the pathogenic potential of Paratyphi A and Typhi and that they had classic S. enterica host-generalist lifestyles until much later on. However the best described of these seven co-inherited pseudogenes is the secreted effector protein sopD2, which in Typhimurium is involved in host interactions and virulence [27] and therefore constitutes a plausible candidate for an early modulator of host interactions in Paratyphi A and Typhi.

Pseudogenes shared by recombination

Group (ii) contains five recombined pseudogenes (Table 3), which display 0.14–0.25% nucleotide divergence between the two serovars compared to a genome average of 1.2% and thus were likely exchanged long after the initial divergence of Paratyphi A and Typhi (Figure 3). One of these encodes an IS transposase, leaving four candidates for convergence via shared gene inactivation directly attributable to recombination. These include the secreted effector protein sopA, which mimics mammalian ubiquitin ligase and is recognized and degraded by the human ubiquitination pathway [28]. It is necessary for virulence in both murine systemic infections and bovine gastrointestinal infections by Typhimurium [29, 30], thus is clearly important for interactions between Salmonella and mammalian hosts. The loss of this gene in Paratyphi A and Typhi may therefore have been an important factor in the restriction or adaptation of these serovars to the human systemic niche. SopA is also a pseudogene in the sequenced Paratyphi B strain SPB7 [EMBL:CP000886], although this is difficult to interpret as it is unclear whether this strain is of the systemic or enteric pathotype (negative for tartrate fermentation, but also sopE-negative using PCR described in [31]). The other genes are putative uncharacterised SPI-3 protein sugR, and two genes not annotated previously in the ATCC9150 genome – putative secreted protein SSPA0097 (interrupted by IS 200 insertion) and putative L-asparaginase protein SSPA3228 (truncated at both ends by deletions).

Recent pseudogenes: convergence after recombination

In addition to > 100 pseudogenes specific to each serovar, group (iii) includes 22 shared pseudogenes containing different inactivating mutations in Paratyphi A and Typhi (Table 3). While it is possible that some of those lying outside recombined regions may have been present prior to recombination, we propose that most of these mutations arose in the period of rapid pseudogene accumulation after recombination. These pseudogenes are examples of convergent gene loss through independent mutation, and are therefore good candidates for involvement in adaptation to the human host. They include only one transposase gene, the remainder being genes of known or putative function, many of which have been implicated in host interactions in serovar Typhimurium (e.g. fhuA, fhuE, shdA, ratB, sivH) [13, 32]. Two of the independently acquired pseudogenes, both members of fimbrial clusters lying in Salmonella pathogenicity islands (safE in SPI-6, sefD in SPI-10), were not identified in previous pseudogene comparisons [13].

It is not possible to distinguish whether there has been adaptive selection against the activity of these genes in Paratyphi A and Typhi, or simply shared tolerance for their inactivation. For example, it has been noted [13] that three of these genes (shdA, ratB and sivH, part of the 25 kbp pathogenicity island CS54 [32]) are involved in intestinal colonization and persistence, which does not occur in typhoid or paratyphoid infection. However we cannot distinguish whether the independent inactivation of these genes in each serovar is due to selection against colonization of the intestine (which may stimulate host immune responses), or genetic drift since intestinal colonization is not required to sustain a systemic infection.

Ongoing accumulation of strain-specific pseudogenes

A recent comparative analysis of whole-genome variation in 19 Typhi strains inferred that their last common ancestor harboured only 180 pseudogenes, while individual isolates had each accumulated at least 10–28 additional pseudogenes since their divergence from that ancestor [25]. The number was predicted to be an underestimate, as it did not take into account pseudogene formation via insertion/deletion of one or two nucleotides which would introduce frameshifts. In our comparison of the AKU_12601 and ATCC9150 genomes we found 22 mutations resulting in strain-specific pseudogene formation (10–12 per strain, Table 2), and we predict that future comparative analyses of additional strains will uncover further examples of recently acquired strain-specific pseudogenes. These strain-specific pseudogenes must have arisen since the most recent common ancestors of the respective Paratyphi A and Typhi populations and are therefore more recent than those that are conserved within the serovars (see Figure 3). It is interesting to note that three genes were identified with strain-specific mutations in one serovar and independent mutations in the other serovar (see Additional file 1). This may provide the opportunity for ongoing convergence between sub-lineages of the Typhi and Paratyphi A populations as each serovar continues to evolve and adapt.


The Paratyphi A AKU_12601 genome sequence presented here allowed the first whole-genome comparison between Paratyphi A strains. By comparing the annotation of pseudogenes in these Paratyphi A genomes and the two finished Typhi genomes CT18 and Ty2, we were able to identify novel examples of pseudogenes that are shared between these human-adapted serovars. Paratyphi A and Typhi have each undergone a parallel, rapid accumulation of pseudogenes after extensive recombination of their genomes.

Although Paratyphi A and Typhi share 27 pseudogenes over and above those inherited in inactive form from a common ancestor, only five were shared via recombination while 22 are the result of more recent convergence through independent adaptive mutation. Therefore recombination and pseudogene formation have played largely independent roles in the genetic convergence of Paratyphi A and Typhi.

The recombination between Paratyphi A and Typhi enabled us to identify different groups of pseudogenes that have arisen in these genomes at different points in their evolutionary histories. This implicates loss-of-function of a few genes in early restriction to the human host (ancestral pseudogenes including sopD2) and some in subsequent convergent adaptation to the new niche (conserved and in particular shared conserved pseudogenes including shdA, ratB, sivH). Pseudogenes shared by recombination (e.g. sopA) may have contributed to host-restriction or host-adaptation.

While the analysis presented here considers only Paratyphi A and Typhi, there are other examples of human-adapted S. enterica serovars, including Sendai, Paratyphi C and the systemic pathovar of Paratyphi B. It can be expected that as genome sequences for these become available, comparative analysis may yield further insights into their mechanisms of host adaptation. However the occurrence of relatively recent recombination between Paratyphi A and Typhi has afforded a unique insight into the order of events and mechanisms involved in their convergent evolution, a scenario which has likely been played out in many other host-adapted bacteria.


Sequencing of AKU_12601

Paratyphi A strain AKU_12601 was isolated from a Pakistani paratyphoid patient in Karachi, Pakistan in 2002. The whole-genome shotgun consisted of 83,857 paired-end reads from libraries of 2 to 2.8 kb in pUC19, 5 to 6 kb in pMAQ1, and 6 to 9 kb in pMAQ1, giving 9.8-fold coverage. A scaffold was produced using 1,180 paired-end reads from a 20- to 30-kb library in pBACe3.6. The whole genome sequence was finished to standard criteria [33], using 9,879 directed sequencing reads. The sequence was annotated, and the annotation was manually curated using Artemis software [34] as previously described [33]. The sequence includes both the chromosome, presented here, and the 212,711 bp IncHI1 multidrug resistance plasmid pAKU_1 which has been described in detail elsewhere [22]. AKU_12601 was also resequenced using the Illumina Genome Analyzer (Illumina), with 3,191,127 single-end 35 bp reads providing 21.9-fold coverage of the chromosome.

Sequence comparisons

Maq [35] was used to map Illumina/Solexa 35 bp reads to the finished AKU_12601 sequence and identify potential errors (reported as SNPs by Maq using default parameters). Capillary traces were manually inspected for the five loci at which SNPs were reported by Maq with consensus base quality > 20 and read depth > 5.

Pairwise whole-genome sequence comparisons were generated with blastn and visualized using ACT [36]. Insertions, deletions and nucleotide substitutions between the collinear Paratyphi A AKU_12601 and ATCC9150 genomes were identified using diffseq (EMBOSS [37]).

Comparison and annotation of pseudogenes

In order to compare annotated genomes of Paratyphi A AKU_12601 [EMBL:FM200053] and ATCC9150 [EMBL:CP000026], Typhi CT18 [EMBL:AL513382] and Ty2 [EMBL:AE014613] with Typhimurium LT2 [EMBL:AE006468], pairwise whole-genome sequence comparisons were generated with blastn and visualized using ACT [36]. Every gene annotated as a pseudogene in any Typhi or Paratyphi A genome was manually inspected in all five genomes, and its pseudogene status in each genome reassessed. All pseudogenes identified in this way are present in the AKU_12601 genome annotation, although many such genes are not annotated in all of ATCC9150, CT18 and Ty2. For coding sequences found to be a pseudogene in more than one serovar, multiple alignments were used to determine whether the same or independent inactivating mutation(s) were present in the different serovars.

Data simulation

An initial set of 40 genes were selected at random to represent ancestral pseudogenes. Additional sets of 20 and 150 genes were selected at random for each of two serovars, to represent pseudogenes that accumulated after initial divergence of the serovars (sampling with replacement). The same random sets of pseudogenes were used to simulate both scenarios, with only the timing varying (set of 150 pseudogenes arising before or after recombination). To simulate uni-directional recombination events depicted in Figure 2, serovar 2 pseudogenes lying in recombined regions were replaced with serovar 1 pseudogenes lying in recombined regions. All genes were selected at random from 4600 annotated in Typhi CT18, and their status as recombined or nonrecombined was taken directly from the table of Typhi genes provided in [15].

Phylogenetic analysis

Nucleotide sequences for genes that have not undergone recent recombination between Typhi and Paratyphi A (according to the table provided in [15]) were extracted from the CT18 genome sequence using Artemis. Homologous sequences in other genomes were identified using blastn, top scoring gapped sequence alignments for each genome were assembled into a single multiple alignment for each gene using Mview [38], which were then concatenated. The analysis included Typhimurium (strains LT2, SL1344) and S. enterica serovar Paratyphi B SPB7 [EMBL:CP000886], S. bongori and E. coli K12 [EMBL:U00096] were included as outgroups to root the tree. The S. bongori and Typhimurium SL1344 sequences are available from the Wellcome Trust Sanger Insitute [39]. MrBayes [40] was used to fit a phylogenetic model to the concatenated multiple alignment of all (nonrecombined) genes (GRT+Γ model, 200,000 iterations), Figure 3 shows the consensus tree.


  1. Parry CM, Hien TT, Dougan G, White NJ, Farrar JJ: Typhoid fever. N Engl J Med. 2002, 347 (22): 1770-1782. 10.1056/NEJMra020201.

    Article  CAS  PubMed  Google Scholar 

  2. Crump JA, Luby SP, Mintz ED: The global burden of typhoid fever. Bull World Health Organ. 2004, 82 (5): 346-353.

    PubMed Central  PubMed  Google Scholar 

  3. Maskey AP, Basnyat B, Thwaites GE, Campbell JI, Farrar JJ, Zimmerman MD: Emerging trends in enteric fever in Nepal: 9124 cases confirmed by blood culture 1993–2003. Trans R Soc Trop Med Hyg. 2008, 102: 91-95. 10.1016/j.trstmh.2007.10.003.

    Article  PubMed  Google Scholar 

  4. Woods CW, Murdoch DR, Zimmerman MD, Glover WA, Basnyat B, Wolf L, Belbase RH, Reller LB: Emergence of Salmonella enterica serotype Paratyphi A as a major cause of enteric fever in Kathmandu, Nepal. Trans R Soc Trop Med Hyg. 2006, 100 (11): 1063-1067. 10.1016/j.trstmh.2005.12.011.

    Article  CAS  PubMed  Google Scholar 

  5. Ochiai RL, Wang X, von Seidlein L, Yang J, Bhutta ZA, Bhattacharya SK, Agtini M, Deen JL, Wain J, Kim DR, Ali M, Acosta CJ, Jodar L, Clemens JD: Salmonella paratyphi A rates, Asia. Emerg Infect Dis. 2005, 11 (11): 1764-1766.

    PubMed Central  Article  PubMed  Google Scholar 

  6. Zhang ZK, Huang YN, Guo BC, Deng ML, Yuan RZ, Wang QS: Surveillance of the antibiotic resistance and plasmid of Salmonella paratyphoid. Chin J Antibiot. 2004, 29 (10): 610-

    Google Scholar 

  7. Pokharel BM, Koirala J, Dahal RK, Mishra SK, Khadga PK, Tuladhar NR: Multidrug-resistant and extended-spectrum beta-lactamase (ESBL)-producing Salmonella enterica (serotypes Typhi and Paratyphi A) from blood isolates in Nepal: surveillance of resistance and a search for newer alternatives. Internat J Infect Dis. 2006, 10 (6): 434-438. 10.1016/j.ijid.2006.07.001.

    Article  CAS  Google Scholar 

  8. Tankhiwale SS, Agrawal G, Jalgaonkar SV: An unusually high occurrence of Salmonella enterica serotype paratyphi A in patients with enteric fever. Indian J Med Res. 2003, 117: 10-12.

    CAS  PubMed  Google Scholar 

  9. Yang J: Enteric Fever in South China: Guangxi Province. J Infection Developing Countries. 2008, 2 (4): 292-297.

    Google Scholar 

  10. Guzman CA, Borsutzky S, Griot-Wenk M, Metcalfe IC, Pearman J, Collioud A, Favre D, Dietrich G: Vaccines against typhoid fever. Vaccine. 2006, 24 (18): 3804-3811. 10.1016/j.vaccine.2005.07.111.

    Article  CAS  PubMed  Google Scholar 

  11. Parkhill J, Dougan G, James KD, Thomson NR, Pickard D, Wain J, Churcher C, Mungall KL, Bentley SD, Holden MT, Sebaihia M, Baker S, Basham D, Brooks K, Chillingworth T, Connerton P, Cronin A, Davis P, Davies RM, Dowd L, White N, Farrar J, Feltwell T, Hamlin N, Haque A, Hien TT, Holroyd S, Jagels K, Krogh A, Larsen TS, Leather S, Moule S, O'Gaora P, Parry C, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S, Barrell BG: Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature. 2001, 413 (6858): 848-852. 10.1038/35101607.

    Article  CAS  PubMed  Google Scholar 

  12. Deng W, Liou SR, 3rd GP, Mayhew GF, Rose DJ, Burland V, Kodoyianni V, Schwartz DC, Blattner FR: Comparative genomics of Salmonella enterica serovar Typhi strains Ty2 and CT18. J Bacteriol. 2003, 185 (7): 2330-2337. 10.1128/JB.185.7.2330-2337.2003.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  13. McClelland M, Sanderson KE, Clifton SW, Latreille P, Porwollik S, Sabo A, Meyer R, Bieri T, Ozersky P, McLellan M, Harkins CR, Wang C, Nguyen C, Berghoff A, Elliott G, Kohlberg S, Strong C, Du F, Carter J, Kremizki C, Layman D, Leonard S, Sun H, Fulton L, Nash W, Miner T, Minx P, Delehaunty K, Fronick C, Magrini V, Nhan M, Warren W, Florea L, Spieth J, Wilson RK: Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat Genet. 2004, 36 (12): 1268-1274. 10.1038/ng1470.

    Article  CAS  PubMed  Google Scholar 

  14. Coburn B, Grassl GA, Finlay BB: Salmonella, the host and disease: a brief review. Immunol Cell Biol. 2007, 85 (2): 112-118. 10.1038/sj.icb.7100007.

    Article  PubMed  Google Scholar 

  15. Didelot X, Achtman M, Parkhill J, Thomson NR, Falush D: A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: convergence or divergence by homologous recombination?. Genome Res. 2007, 17: 61-68. 10.1101/gr.5512906.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  16. Andersson JO, Andersson SGE: Genome degradation is an ongoing process in Rickettsia. Mol Biol Evol. 1999, 16 (9): 1178-1191.

    Article  CAS  PubMed  Google Scholar 

  17. Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, Wheeler PR, Honore N, Garnier T, Churcher C, Harris D, Mungall K, Basham D, Brown D, Chillingworth T, Connor R, Davies RM, Devlin K, Duthoy S, Feltwell T, Fraser A, Hamlin N, Holroyd S, Hornsby T, Jagels K, Lacroix C, Maclean J, Moule S, Murphy L, Oliver K, Quail MA, Rajandream MA, Rutherford KM, Rutter S, Seeger K, Simon S, Simmonds M, Skelton J, Squares R, Squares S, Stevens K, Taylor K, Whitehead S, Woodward JR, Barrell BG: Massive gene decay in the leprosy bacillus. Nature. 2001, 409 (6823): 1007-1011. 10.1038/35059006.

    Article  CAS  PubMed  Google Scholar 

  18. Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, Harris DE, Holden MT, Churcher CM, Bentley SD, Mungall KL, Cerdeno-Tarraga AM, Temple L, James K, Harris B, Quail MA, Achtman M, Atkin R, Baker S, Basham D, Bason N, Cherevach I, Chillingworth T, Collins M, Cronin A, Davis P, Doggett J, Feltwell T, Goble A, Hamlin N, Hauser H, Holroyd S, Jagels K, Leather S, Moule S, Norberczak H, O'Neil S, Ormond D, Price C, Rabbinowitsch E, Rutter S, Sanders M, Saunders D, Seeger K, Sharp S, Simmonds M, Skelton J, Squares R, Squares S, Stevens K, Unwin L, Whitehead S, Barrell BG, Maskell DJ: Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet. 2003, 35: 32-40. 10.1038/ng1227.

    Article  PubMed  Google Scholar 

  19. Thomson NR, Howard S, Wren BW, Holden MT, Crossman L, Challis GL, Churcher C, Mungall K, Brooks K, Chillingworth T, Feltwell T, Abdellah Z, Hauser H, Jagels K, Maddison M, Moule S, Sanders M, Whitehead S, Quail MA, Dougan G, Parkhill J, Prentice MB: The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081. PLoS Genet. 2006, 2 (12): e206-10.1371/journal.pgen.0020206.

    PubMed Central  Article  PubMed  Google Scholar 

  20. Schlumberger MC, Hardt WD: Salmonella type III secretion effectors: pulling the host cell's strings. Curr Opin Microbiol. 2006, 9: 46-54. 10.1016/j.mib.2005.12.006.

    Article  CAS  PubMed  Google Scholar 

  21. Haraga A, Ohlson MB, Miller SI: Salmonellae interplay with host cells. Nat Rev Microbiol. 2008, 6: 53-66. 10.1038/nrmicro1788.

    Article  CAS  PubMed  Google Scholar 

  22. Holt KE, Thomson NR, Wain J, Minh DP, Nair S, Hasan R, Bhutta ZA, Quail MA, Norbertczak H, Walker D, Dougan G, Parkhill J: Multidrug-resistant Salmonella enterica serovar Paratyphi A harbors IncHI1 plasmids similar to those found in serovar Typhi. J Bacteriol. 2007, 189 (11): 4257-4264. 10.1128/JB.00232-07.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  23. Schnaitman CA, Klena JD: Genetics of lipopolysaccharide biosynthesis in enteric bacteria. Microbiol Rev. 1993, 57 (3): 655-682.

    PubMed Central  CAS  PubMed  Google Scholar 

  24. Ramisse V, Houssu P, Hernandez E, Denoeud F, Hilaire V, Lisanti O, Ramisse F, Cavallo JD, Vergnaud G: Variable Number of Tandem Repeats in Salmonella enterica subsp. enterica for Typing Purposes. J Clin Microbiol. 2004, 42 (12): 5722-5730. 10.1128/JCM.42.12.5722-5730.2004.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  25. Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J, Dolecek C, Achtman M, Dougan G: High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet. 2008, 40 (8): 987-993. 10.1038/ng.195.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  26. Rocha EPC, Smith JM, Hurst LD, Holden MTG, Cooper JE, Smith NH, Feil EJ: Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 2006, 239 (2): 226-235. 10.1016/j.jtbi.2005.08.037.

    Article  CAS  PubMed  Google Scholar 

  27. Jiang X, Rossanese OW, Brown NF, Kujat-Choy S, Galán JE, Finlay BB, Brumell JH: The related effector proteins SopD and SopD2 from Salmonella enterica serovar Typhimurium contribute to virulence during systemic infection of mice. Mol Microbiol. 2004, 54 (5): 1186-1198. 10.1111/j.1365-2958.2004.04344.x.

    Article  CAS  PubMed  Google Scholar 

  28. Zhang Y, Higashide W, Dai S, Sherman DM, Zhou D: Recognition and Ubiquitination of Salmonella Type III Effector SopA by a Ubiquitin E3 Ligase, HsRMA1. J Biol Chem. 2005, 280 (46): 38682-38688. 10.1074/jbc.M506309200.

    Article  CAS  PubMed  Google Scholar 

  29. Raffatellu M, Wilson RP, Chessa D, Andrews-Polymenis H, Tran QT, Lawhon S, Khare S, Adams LG, Baumler AJ: SipA, SopA, SopB, SopD, and SopE2 Contribute to Salmonella enterica Serotype Typhimurium Invasion of Epithelial Cells. Infect Immun. 2005, 73: 146-154. 10.1128/IAI.73.1.146-154.2005.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  30. Zhang S, Santos RL, Tsolis RM, Stender S, Hardt WD, Baumler AJ, Adams LG: The Salmonella enterica Serotype Typhimurium Effector Proteins SipA, SopA, SopB, SopD, and SopE2 Act in Concert To Induce Diarrhea in Calves. Infect Immun. 2002, 70 (7): 3843-3855. 10.1128/IAI.70.7.3843-3855.2002.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  31. Prager R, Rabsch W, Streckel W, Voigt W, Tietze E, Tschäpe H: Molecular properties of Salmonella enterica serotype paratyphi B distinguish between its systemic and its enteric pathovars. J Clin Microbiol. 2003, 41 (9): 4270-4278. 10.1128/JCM.41.9.4270-4278.2003.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  32. Kingsley RA, Humphries AD, Weening EH, De Zoete MR, Winter S, Papaconstantinopoulou A, Dougan G, Bäumler AJ: Molecular and phenotypic analysis of the CS54 island of Salmonella enterica serotype typhimurium: identification of intestinal colonization and persistence determinants. Infect Immun. 2003, 71 (2): 629-640. 10.1128/IAI.71.2.629-640.2003.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  33. Parkhill J, Achtman M, James KD, Bentley SD, Churcher C, Klee SR, Morelli G, Basham D, Brown D, Chillingworth T, Davies RM, Davis P, Devlin K, Feltwell T, Hamlin N, Holroyd S, Jagels K, Leather S, Moule S, Mungall K, Quail MA, Rajandream MA, Rutherford KM, Simmonds M, Skelton J, Whitehead S, Spratt BG, Barrell BG: Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature. 2000, 404 (6777): 502-506. 10.1038/35006655.

    Article  CAS  PubMed  Google Scholar 

  34. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945. 10.1093/bioinformatics/16.10.944.

    Article  CAS  PubMed  Google Scholar 

  35. Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18 (11): 1851-1858. 10.1101/gr.078212.108.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  36. Carver TJ, Rutherford KM, Berriman M, Rajandream MA, Barrell BG, Parkhill J: ACT: the Artemis comparison tool. Bioinformatics. 2005, 21 (16): 3422-3423. 10.1093/bioinformatics/bti553.

    Article  CAS  PubMed  Google Scholar 

  37. Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.

    Article  CAS  PubMed  Google Scholar 

  38. Brown NP, Leroy C, Sander C: MView: a web-compatible database search or multiple alignment viewer. Bioinformatics. 1998, 14 (4): 380-381. 10.1093/bioinformatics/14.4.380.

    Article  CAS  PubMed  Google Scholar 

  39. Sanger Institute – Salmonella species comparative sequencing. []

  40. Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19 (12): 1572-1574. 10.1093/bioinformatics/btg180.

    Article  CAS  PubMed  Google Scholar 

Download references


This work was supported by the Wellcome Trust. John Wain is also supported by the MRC. We acknowledge the support of the Wellcome Trust Sanger Institute Pathogen Sequencing Unit and core sequencing and informatics groups.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kathryn E Holt.

Additional information

Authors' contributions

KH performed comparative annotation, sequence analysis and phylogenetic analysis of the genomes and drafted the manuscript. NRT participated in annotation and GCL participated in comparative annotation and analysis. RH and ZB isolated AKU_12601 and provided DNA for sequencing. MQ, NB and HN participated in sequencing, while DW, MS, BW and KM participated in finishing the AKU_12601 chromosome sequence. JP, GD and JW conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Pseudogenes present in Paratyphi A AKU_12601, ATCC9150 or Typhi CT18, Ty2. Details of all pseudogenes present in finished genomes of Paratyphi A or Typhi, including gene identifiers in all four genomes, nucleotide divergence between Paratyphi A and Typhi, and classification into different classes: pseudogenes in both Paratyphi A and Typhi (ancestral, shared by recombination, independent mutations), pseudogenes in either Paratyphi A or Typhi, and genome-specific pseudogenes. (XLS 149 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Holt, K.E., Thomson, N.R., Wain, J. et al. Pseudogene accumulation in the evolutionary histories of Salmonella enterica serovars Paratyphi A and Typhi. BMC Genomics 10, 36 (2009).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Illumina Genome Analyzer
  • Host Adaptation
  • Paratyphoid Fever
  • Recent Clinical Isolate
  • Secrete Effector Protein