Genomes of Helicobacter pylori from native Peruvians suggest admixture of ancestral and modern lineages and reveal a western type cag-pathogenicity island

Background Helicobacter pylori is presumed to be co-evolved with its human host and is a highly diverse gastric pathogen at genetic levels. Ancient origins of H. pylori in the New World are still debatable. It is not clear how different waves of human migrations in South America contributed to the evolution of strain diversity of H. pylori. The objective of our 'phylogeographic' study was to gain fresh insights into these issues through mapping genetic origins of H. pylori of native Peruvians (of Amerindian ancestry) and their genomic comparison with isolates from Spain, and Japan. Results For this purpose, we attempted to dissect genetic identity of strains by fluorescent amplified fragment length polymorphism (FAFLP) analysis, multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and the sequence analyses of the babB adhesin and oipA genes. The whole cag pathogenicity-island (cagPAI) from these strains was analyzed using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. We observed that while European genotype (hp-Europe) predominates in native Peruvian strains, approximately 20% of these strains represent a sub-population with an Amerindian ancestry (hsp-Amerind). All of these strains however, irrespective of their ancestral affiliation harbored a complete, 'western' type cagPAI and the motifs surrounding it. This indicates a possible acquisition of cagPAI by the hsp-Amerind strains from the European strains, during decades of co-colonization. Conclusion Our observations suggest presence of ancestral H. pylori (hsp-Amerind) in Peruvian Amerindians which possibly managed to survive and compete against the Spanish strains that arrived to the New World about 500 years ago. We suggest that this might have happened after native Peruvian H. pylori strains acquired cagPAI sequences, either by new acquisition in cag-negative strains or by recombination in cag positive Amerindian strains.


Background
Helicobacter pylori is a Gram-negative bacterium that established itself in the human stomach possibly thousands of years ago [1]. This opportunistic pathogen infects over 50% of the worlds' population, causing no harm to most colonized people [2]. Only a small subset of infected people experience H. pylori-associated illnesses such as chronic gastritis, peptic ulcer disease, gastric carcinoma, and mucosa-associated lymphoid tissue (MALT) lymphoma. Associations of various clinical outcomes with disease-specific virulence factors remain dogmatic [3] years after the completion of genome sequences [4]. The debate has been further intensified as some studies have posed the possibility that H. pylori infection has some protective effects in esophageal diseases [3]. Also, possible symbiotic associations have been proposed based on the finding that H. pylori harbor protective, bacteriocin like effect and may therefore be beneficial to its host [5].
Subsequent to the decipherment of the potential of polymorphic DNA markers in reconstruction of human migration and phylogeography [6,7], pathogen genotypes were successfully used in tracking and analyzing patterns of human migrations [8][9][10] in different continents. Recently, sequence variation in H. pylori has provided a window into human population migration [11] and also revealed that impact of religions on stratification of human ethnic groups can be analyzed based on H. pylori haplotypes [12].
Ancient origins and dissemination of H. pylori are quite debatable in the context of the vast South American continent that has witnessed many different waves of population migration [13], especially in view of the fact that H. pylori has been present in this continent since pre-Columbian times [14]. However, evolution of virulence and fitness in such 'ancient' strains that arrived first in the Americas and then, possibly out-competed by the influx of 'modern' strains from Europe [14] remains largely unexplored.
A landmark study based on PCR based DNA motif analysis proposed that H. pylori jumped recently from animals to humans and, therefore, the acquisition of H. pylori by humans may be a recent phenomenon [15]. This study has been the basis for the idea of 'H. pylori free New World' [15]. However, two independent studies based on largescale analyses of candidate gene polymorphisms contrasted the idea of recent acquisition and suggest that H. pylori might have co-evolved with humans [11,16]. In view of these intriguing ideas on ancient origin of H. pylori, additional evidences based on strains from different geographical regions (especially those with a rich history of multiple waves of human migrations such as the South Americas) are clearly needed.
We attempted to dissect gene pool diversity of Amerindian isolates of H. pylori from Peru with an objective to explore ancient and modern features of the Peruvian H. pylori strains corresponding to different waves of human migration. We also looked if it is possible to link some of the native Peruvian strains to their ancestors in Asia.

FAFLP based genotyping, candidate gene sequence analysis and multi-locus sequencing
Phylogenetic relationships assessed by FAFLP genotyping of Peruvian isolates ( Figure 1A) revealed 2 different lineages corresponding to the hp-Europe and hsp-Amerind. Representative isolates from both the lineages obtained by FAFLP were subjected to reconfirmation by MLST analysis. We observed that all the isolates corresponding to the Amerind type FAFLP profile were found to be genetically closer to strains from Alaska and were therefore, accepted as genotype hsp-Amerind. All the other isolates from Peru clustered with those from Spain and therefore joined MLST genotype hp-Europe.
We looked at the geographic signatures of oipA gene sequences in Peruvian isolates and found that 7 of the 27 isolates including the 2 hsp-Amerind isolates had an Eastern type oipA (CT < 6) signature; 4 of these had a non functional gene (frame out) ( Table 1). Nineteen of the Peruvian isolates had a western type oipA (CT ≥ 6), 15 of which had a functionally intact coding frame. It appears that oipA sequence conveys the European ancestry for a majority of these isolates, including some hsp-Amerind, however, oipA is a virulence linked gene and its resolution power for lineage identification is not as robust as that of the 7 housekeeping genes we used, or the babB gene. It is likely that oipA might have been exchanged or recombined between different lineages (like most of the other virulence genes such as the cagPAI) to give hybrid genotypes.
The babB gene sequences of our hsp-Amerind isolates were compared to the sequences previously reported for Amerindian isolates from Venezuela by Ghose et al. [16] by multiple sequence alignment and phylogenetic analysis ( Figure 1B). These phylogenies, when studied in the light of MLST based inferences, revealed distinct affinities between Peruvian Amerindian strains and Venezuelan Amerind strains, pointing thus to their common Asian origins ( Figure 1B).
In summary, our overall phylogenetic analyses, demonstrated that about 20% of the strains from native Peruvian patients that we examined share common genotypic patterns with Asian (Amerind) strains and therefore, Ancient Asian gene pool must have been the origin of the present day strains inhabiting the native Peruvians.

Analysis of the cagPAI and the cagA gene in isolates from Spain, Peru and Japan
Overlapping primer amplification strategy to span entire cagPAI worked very well with our hsp-Amerind isolates ( Figure 3A) where all the constituent genes of the PAI were successfully amplified. This indicates that the 5 Amerind strains we looked at represent a 'chimera genome' made up of an ancient like core genome component (MLST and babB typing) and a modern type flexible genome component (cagPAI and its right junction typing). Tyrosine phosphorylation of the immunodominant cagA protein is known to occur at the EPIYA motifs at the C-terminus by the SRC family of kinases [17,18]. This EPIYA motif consists of 4 distinct EPIYA sites -EPIYA-A, -B, -C and -D, based on the amino acid sequence that neighbor it ( Figure  3B). Based on the presence of these EPIYA motifs, cagA can be distinguished into the Western type (W) in case the -C site is present and the East Asian (EA) type in case where the -C site is replaced by the -D site. Our data for the type of EPIYA motifs present in Spanish (n = 6), Peruvian (n = 27), and Japanese (n = 16) strains revealed that the western type of cagA was predominant among all the Spanish and Peruvian strains. The Japanese isolates alone revealed East Asian type EPIYA D signatures ( Figure 3B).
Results of the PCR of cagPAI from 26 Peruvian isolates studied in the present context have been depicted in Table  1. Briefly, the strains from Peru and Spain did not readily amplify regions of the cagPAI as did Japanese isolates, suggesting thereby a distinct allelic diversity at the primer binding sites in the western PAI of these isolates. This observation places native Peruvian strains closer to Spanish strains and therefore hints that both the Spanish and Peruvian strains harbor a similar type of cagPAI.
The right junction of the cagPAI also revealed similar acquaintances based on PCR based insertion deletion and substitution analysis of the region spanning cagA right junction to glr, for the 6 Peruvian isolates we analyzed. Such genotypes for all the isolates we used from Spain, Japan and Peru were determined earlier by Kersulyte et al [15]. Collectively, this study and the previous observations [15] demonstrated that the cag right junction motif types were shared by Spanish and Peruvian isolates and that Japanese isolates did not share genetic affinities with Peruvian strains. This observation again places native Peruvian strains closer to Spanish strains and therefore hints that both the Spanish and Peruvian strains share cag-PAI insertion sites and the regions flanking them.

Discussion
H. pylori is presumed to be an ancient colonizer of the human stomach which possibly co-evolved with its host. We report that at least some of the H. pylori strains found in Peru share considerable homology with strains found in Asia. This supports the hypothesis that H. pylori was associated with its host well before Asian people crossed the Bering strait (20,000 years BP) to colonize America. We also report that the cagPAI was acquired by native Peruvian strains probably from a European source. This might have occurred as a single import, most probably during the last 500 years of the spread of H. pylori in the South American continent. It is evident from the fact that the Amerindian Peruvian strains, though of an Asian descent, do not share characteristic features of Asian cag-PAIs but show homology to the PAIs of European strains. Alternatively, the Amerindian strains might have gained the PAI through a series of recombination events over a period of time. But this hypothesis appears weak when we take into account the short time span of 500 years within which the European strains spread in the Americas.
We tried to potentate these ideas and to provide evidence in favor of the ancient origin of the pathogen and the possibility that some extraneous gene cassettes might have been acquired by otherwise symbiotic H. pylori, sometimes during its natural history. To support this proposition, our methodology was targeted with a two pronged approach to i) further substantiate ancient link of the pathogen and ii) to prove that the pathogenicity island was a 'recent' addition to the genome of H. pylori. Our analyses based on FAFLP and MLST linked about 20% of the Peruvian isolates to Amerindian genotype, and conveyed that H. pylori was most probably introduced to the New World by Asian people. We, therefore, disagree with the idea of an 'H. pylori free New World' [15]. This disagreement of interpretation arose possibly because Kersulyte et al. [15] looked at only a few loci in the genome and stressed mainly on the motifs surrounding the cagPAI on its right junction.
We also looked at the cagPAI of such strains and found that the Peruvian isolates we tested carried western type (EPIYA C) cag islands ( Figure 3B). We did not record any eastern type signature in the cagPAI (EPIYA D motif) in Peruvian isolates, given a distinct presence of ethnic Japanese in Peru. This inference also came from MLST data (Figure 2), where none of the Peruvian isolates clustered with Japanese genotype hp-EastAsia (Figure 2). Similarly, the absence of Asian/Amerindian type islands (or their remnants in native Peruvian isolates we analyzed, leads us to speculate that their ancestors in Asia were seemingly benign due to (natural?) absence of functional cag genes. This finding potentates the idea that cag genes in Peru mainly originated in Europe and therefore confirms the scenario proposed by Kersulyte et al. [15] as far as the cag-PAI and its right junction is concerned.
Kauser et al., [19] from our group have previously described PCR analyses of the cagPAI for more than 300 MLST analysis based on concatenated gene sequences of 7 housekeeping genes of H. pylori (Kimura-2 parameter) Figure 2 MLST analysis based on concatenated gene sequences of 7 housekeeping genes of H. pylori (Kimura-2 parameter). The phylogenetic tree was based on a total of 19 sequence records (concatenated) obtained under this study (SJM, HUPB, HU, CPY) while incorporating other ~400 sequence records from pubMLST database (pubmlst.org) which were specific to different genotypes in the world (Courtesy, Daniel Falush). Different genetic populations (Hp) and subpopulations (hsp) or genotypes are named and differentially colored after previous conventions [11,20]. All SJM isolates, Amerindians (SJM23 and 92-highlighted) and non-Amerindians (arrowheads) analyzed by us from Peru are highlighted in bold face black fonts with green twigs indicating presence of a western type cagPAI.       strains from different parts of the world. Of these, the majority of strains from whom complete PAI was amplified were from Japan (57%) whereas only 18.6% strains from Peru and 13.3% strains from Spain could support amplification of an intact cagPAI, due mainly to allelic diversity present at the primer annealing sites. These observations were further endorsed by our present analysis that revealed that all the Peruvian isolates we analyzed carried only the European type PAI.

H L N F S D I K K E L N A K L G N F N N N N N N G L K N E P I Y A K V N K K K T G Q V A S P E E P I Y A Q V A K K V T Q K ID R L N Q IA S G L G G V G Q A A G F P L K K H D K V D D L S K V G R S V S P E P IY A T ID D L G G P F P L K R H D K V D D L S K
It has been recently reported that true Amerind strains either do not carry a cagPAI or carry only a vestigial, incomplete PAI [20]. It is already well known that H. pylori can import short stretches of DNA from strains from very different populations when they (presumably co-) infect individuals in the same location. However, it appears that the cagPAI and the region surrounding it had been exchanged by the Amerindian strains in Peru, where, the human population underwent major changes in recent history with the arrival of European conquerors and settlers. The long isolated Amerindian H. pylori strains thus came in contact with the European strains, which harbored a cagPAI. It is established that the cagPAI might give selective advantages during host colonization and therefore, "endogenous" Helicobacter, in Peru could be outcompeted in a human community with newly arrived cag-PAI positive strains introduced by the European conquerors. The "endogenous" Helicobacter strains could moreover acquire cagPAI during mixed infection with western cagPAI positive strains leading to the observed strains with Eastern-like core genome content and a western-type cagPAI. So this finding would really be interesting if it is clear that the entire cag-island had been exchanged. Then, there would be a very interesting question about the mechanism by which an entire island had been transferred; there is no doubt that such exchanges have happened at some point, but if they happened within the last 500 years, then, there would be much better chances of catching it in action. This is probably where the present data lead us to, and, we suggest that future efforts may be directed towards confirmation of this evolutionary mechanism.
Finally, it is possible that phylogenetic methods based on highly recombining gene loci [21][22][23][24][25][26][27] may not be fully perfect to predict genetic relationships in terms of inheritance from different ancestral populations, especially when we use tools such as Mega 3.0 [28] which do not support admixture analysis. Partly in view of this possibility and to ensure that our conclusions did not represent shortcomings of a single approach, we adopted an integrated genotyping strategy for the cagPAI [29,30] as well as the core genome through MLST of less recombining, neutral genes that encode cytoplasmic enzymes. Given the fact that these housekeeping genes are selectively neutral and uniform as compared to virulence associated loci such as the flagellins and vacA [31], recombinant and hybrid alleles that blur phylogeographic inferences, could be a rare occurrence rather than a rule. Nonetheless, it will be important to ascertain proportions of nearly pure and hybrid alleles among native Peruvian H. pylori through admixture analyses based on sophisticated population genetics tools [32] that reveal contemporary gene flow and proportion of different nucleotides inherited from ancestral populations on an evolutionary time-scale.

Conclusion
In This could also possibly provide for a reasonable model of geographic evolution to understand acquisition of virulence in pathogenic bacteria over a period of time.

Bacterial strains, genomic DNA and diagnostic PCR
Genomic DNA preparations for strains originating from Peru, Spain and Japan were provided by D. E. Berg and Asish Mukhopadhyay (Washington University, St. Louis, Mo.). These DNA were isolated from patients diagnosed with gastric ulcers from Spain (HupB); gastric cancer and DU cases from Japan (CPY, Hu); and from gastritis cases alone from Peru (SJM). However, in the current study, the clinical background of the individual isolates was not taken into account. The Peruvian isolates we looked at (n = 27) were originally from Native Peruvian people mainly of Amerindian ancestry from Lima [15]. PCR based analysis of genes namely glmM, babB [16] and oipA was carried out to ascertain the quality of DNA samples we used. Also these PCR assays served as amplification level controls for the analysis of insertion, deletion and substitution in the cagPAI.

Integrated genotyping of H. pylori based on chromosomal DNA signatures
Whole genome fingerprinting based on FAFLP genotyping was performed as described previously [21][22][23]. Briefly, the profiling of whole genome micro-restriction fingerprints with EcoRI/MseI enzymes using fluorescence tagged primer pairs EcoRI+A/MseI+0 and EcoRI+G or A/MseI+0 was performed for all the strains. The PCR amplified fragments for each of the strains were then subjected to electrophoretic separation on a 5% acrylamide gel and scoring of the fluorescent markers was done using an automated DNA analysis workstation (ABI Prism 3100 DNA sequencer). Cluster analysis of DNA profiles was conducted on the basis of fingerprint characteristics. All the data obtained through molecular genotyping and DNA profiling were deposited in the genoBASE pylori database [33]. The genoBASE pylori server was queried for comparative analyses.
Genotyping based on candidate genes oipA and babB was carried out as described [24][25][26]. Short stretches of oipA gene were analyzed to determine the 'geographic signature' based on CT repeat [27]. In addition, 600 bp region each from the 7 housekeeping genes spread throughout the genome [atpA, efp, ureI, ppa and mutY, trpC, yphC] was amplified and sequenced for all the isolates exactly as described previously [11]. Sequencing was performed with both forward and reverse primers, using an ABI Prism 3100 DNA sequencer (Applied Biosystems, USA). PCR and direct sequencing were performed at least twice to determine and confirm the DNA sequences for each isolate. Consensus sequence for each of the samples was generated using Genedoc (version 2.6.002). Multiple alignments of sequenced nucleotides were carried out using Clustal X (version 1.81). Neighbor joining trees were constructed in Mega 3.0 [28], using bootstrapping at 10000 bootstrap trials (FAFLP and babB) and through Kimura-2 parameters (for MLST). For construction of phylogenetic trees based on MLST genotyping procedures, sequences of 7 housekeeping genes of strains belonging to different established genotypes were obtained from the pubMLST database [34] (courtesy, Daniel Falush).
Profiling of the cagPAI PCR analyses were carried out to find the status of the cag-PAI using 8 sets of primers that amplified the cagA gene, its promoter region, the cagE and cagT genes and the left end of the cagPAI [25,26,29]. We also analyzed whole cag-PAI of the representative isolates (SJM92 and SJM23) from hspAmerind by PCR using overlapping primers as described by Blomstergren and colleagues [30]. The 3' end of the cagA gene was amplified using primers mentioned elsewhere [17] and the amplified products for strains from Spain, Peru, and Japan were sequenced with forward and reverse primers. The consensus sequences were then translated into amino acid sequences using GeneDoc software (version 2.6.002) and were then assigned to the Western or the East Asian group based on the C or D repeat present respectively in the EPIYA motif [18]. Chromosomal rearrangements are known to give rise to 5 types of insertion-deletion and substitution motifs in the region between the 3' end of cagA gene and the 3' end of the glutamate racemase (glr) gene. Although the statuses of these motifs for the Peruvian strains we analyzed were described previously by Kersulyte and colleagues [15], we re-assessed 6 of them by PCR exactly as described earlier [15].

Authors' contributions
SMD and IA performed MLST typing and phylogenetic analysis. SMD also helped in analysis of babB and oipA genotyping. AAK and AA performed PCR based analysis of the cagPAI genes. SAR extended bioinformatics support. LAS provided expert clinical consultation and contributed to manuscript writing. NA performed FAFLP analysis, planned and supervised the study, wrote the manuscript and provided overall leadership. All the authors read and approved the final manuscript.