Although the Indian peninsula has seen many different waves of population migration , the Paleolithic archaeological evidence is not clearly visible to understand peopling of this Country . Nonetheless, the Indus Valley and Harappan civilizations portray footprints of Neolithic period  suggestive of the arrival of Indo-European speakers who established the caste system, an anthropologically significant prehistoric event [24, 25]. The cultural and historical importance of the arrival and settlement of the Indo-Aryans is undisputed, but it is not clear if this was established through 'replacement of the existing people by outsiders'  or did the 'people already in India changed their habits and cultures?' . Such questions have never been addressed in an unambiguous manner, even though the potential of polymorphic DNA markers in reconstruction of human migration and phylogeography [26, 27] has long been appreciated. It appears that even carefully planned geographic genomics studies remained largely speculative due to the lack of a universal 'gold standard' as the classical mitochondrial DNA markers offer too few informative polymorphisms and the newly developed Y - chromosome markers are even less polymorphic than mitochondrial hypervariable regions . Lately, new genetic models were successfully harnessed based on parasites and pathogens that probably accompanied their human host during evolution and much of the human history including migrations and expansions [2, 4, 5] in different continents. Such approaches constitute an attractive alternative to reconstruct human origins and spreads, population dynamics and bottlenecks, wars and displacements, farming and plagues etc.
Our study was aimed at tracking ancient origins of the Indian H. pylori through a two-pronged approach to i) substantiate European link of the pathogen in India and ii) to prove that the pathogeniCity island was also of European origin and this PAI has not been a 'recent' addition to the genome of Indian H. pylori. Our analyses, based on MLST and comprehensive genotyping of the cagPAI, linked about 100% of the Indian isolates to H. pylori sub-population hpEurope. This perhaps conveys the message that H. pylori was most probably introduced to the Indian subcontinent by ancient Indo-European nomadic people and our findings, therefore, are consistent with the idea of a possible gene flow into India with the arrival of Indo-Aryans.
Overall, based on the MLST data (Figure 2) and the cagPAI patterns (Figure 3), we suggest that H. pylori might have arrived in India probably at the same time when Indo-European language speaking people crossed into India (~4000–10,000 years before present). Alternatively, the unquestionable common origin of Indian strains with the European ones could be actually more ancient, following the upper Paleolithic spread of Homo sapiens in Eurasia, as suggested by mtDNA variability , and our data on H. pylori MLST do not rule out this possibility.
Present day India represents a 'genetic playground' with tremendous diversity of cultures, and languages. However, the people are largely stratified as tribals and nontribals . Four main language families are spoken, the largest being, Indo European (IE), which is prevalent in North, and the second largest Dravidian (DR) group represents languages spoken in the South . The other two language groups include Tibeto-Burman (TB) of the Sino-Tibetan and the Austro-Asiatic (AA) families, largely spoken in far North and the North-east India. While most of the IE speakers belong to castes, the majority of the tribal communities (>450) speak about 750 different dialects that fit within any one of the other three language families (DR, TB, AA) [25, 28]. Such an enormous cultural diversity might argue for many different populations and sub-populations of H. pylori. But until now, and including this study, H. pylori with genetic features of hpEurope have only been reported from India [29, 30]. Even the newly described sub-population hpAsia2 from Ladakh is also a variant of hpEurope and many Ladakhi strains that we looked at in this study, clustered with European H. pylori clade (Figure 2). Also, the cagA sequences from H. pylori belonging to tribal Oraon and Santhals were indistinguishable from those of the mainstream Indians and Europeans (Figure 4), indicating sweeping spread of a single H. pylori genotype across the Indian peninsula. Moreover, we did not document presence of any other H. pylori populations and sub-populations such as hspAmerind, hspMaori, hpAfrica and hpEast Asia in the limited, but representative culture collection that we looked at. However, the visible footprints of other migrations into India such as from the North Eastern corridor and the presence of phenotypic features resembling to Africans in the South, make it unwise to presume an 'H. pylori free India' at the time of arrival of Indo-European speaking invaders. This issue and the fact that H. pylori's first association with humans traces back to millions of years before present, in Africa [6, 17], it is more realistic to hypothesize that H. pylori of African and Asian gene pool might have already been present in India. The predominance of a single H. pylori population might therefore, point to a distinct survival advantage conferred by a fully functional (western type) cagPAI. This analogy is consistent with the scenario we previously reported  for the South American, Amerindian strains, which were presumably out competed by their Spanish counterparts arriving with an intact and functional western cagPAI.
Finally, it is possible that phylogeny based on highly recombining gene loci [15, 29, 31–35] may not be completely foolproof to extract inheritance from different ancestral populations, especially when we use tools such as MEGA 3.0 , which do not support admixture analysis. Moreover, phylogenetic methods based on bifurcating trees, such as Neighbor joining analysis, may not be fully appropriate for analysis at the intra-species level [37, 38], especially in case of hypervariable genomic regions, where multiple homoplasy due to reversions, recurrent mutations etc., or polytomy may sometimes confound the phylogenetic interpretation. However, the housekeeping genes used here are selectively neutral and uniform as compared to virulence associated loci such as the flagellins and vacA , and therefore, recombinant and hybrid alleles that blur lineage inferences, could be a rare occurrence and not a routine. Partly in view of this assumption and due to our previous experiences on dissecting complex ancestry of native Peruvian isolates using phylogenetic methods  we did not attempt admixture analysis with complicated Bayesian statistics. However, to ensure that our conclusions did not represent shortcomings of a single method, we adopted an integrated phylogenetic approach combining MEGA/NETWORK based analyses and genotyping strategy based on full cagPAI and its left and right end sequences. Interestingly, these approaches unambiguously show the Indian H. pylori genotypes scattered among the European ones. Although this would be consistent with gene flow into India with the Indo-Aryans, or even more ancient origins following the Paleolithic expansion of humans in Eurasia, but also consistent with another scenario: migration from India to Europe. However, the later scenario becomes insignificant due to the unavailability of supporting archeological, linguistic and historical data. Nonetheless, an understanding of the time-scale would be helpful for choosing between such explanations, with the estimation of divergence times between the H. pylori sequences in the different human populations. These issues therefore need to be addressed in future.