Evaluating the possibility of detecting evidence of positive selection across Asia with sparse genotype data from the HUGO Pan-Asian SNP Consortium
© Liu et al.; licensee BioMed Central Ltd. 2014
Received: 11 October 2013
Accepted: 25 April 2014
Published: 2 May 2014
The HUGO Pan-Asian SNP Consortium (PASNP) has generated a genetic resource of almost 55,000 autosomal single nucleotide polymorphisms (SNPs) across more than 1,800 individuals from 73 urban and indigenous populations in Asia. This has offered valuable insights into the correlation between the genetic ancestry of these populations with major linguistic systems and geography. Here, we attempt to understand whether adaptation to local climate, diet and environment partly explains the genetic variation present in these populations by investigating the genomic signatures of positive selection.
To evaluate the impact to the selection analyses due to the considerably lower SNP density as compared to other population genetics resources such as the International HapMap Project (HapMap) or the Singapore Genome Variation Project, we evaluated the extent of haplotype phasing switch errors and the consistency of selection signals from three haplotype-based approaches (iHS, XP-EHH, haploPS) when the HapMap data is thinned to a similar density as PASNP. We subsequently applied haploPS to detect and characterize positive selection in the PASNP populations, identifying 59 genomics regions that were selected in at least one PASNP populations. A cluster analysis on the basis of these 59 signals showed that indigenous populations such as the Negrito from Malaysia and Philippines, the China Hmong, and the Taiwan Ami and Atayal shared more of these signals. We also reported evidence of a positive selection signal encompassing the beta globin gene in the Taiwan Ami and Atayal that was distinct from the signal in the HapMap Africans, suggesting the possibility of convergent evolution at this locus due to malarial selection.
We established that the lower SNP content of the PASNP data conferred weaker ability to detect signatures of positive selection, but the availability of the new approach haploPS retained modest power. Out of all the populations in PASNP, we identified only 59 signals, suggesting a strong need for high-density population-level genotyping data or sequencing data in order to achieve a comprehensive survey of positive selection in Asian populations.
KeywordsHaplotype phasing Positive selection Population structure Genetic diversity
Asia is the largest continent on Earth, covering 30% of the available land area and playing host to more than 60% of the human populations in the world. With a latitudinal range of 11.6°S to 81.9°N and a longitudinal range of 27.3°E to 169.0°W, Asia possesses extremely diverse climates and geographical conditions, with temperatures ranging from arctic in northern Asia to tropical along the Equator and humidity ranging between below 10% in the interior of the continent to in excess of 90% along the coast and in Southeast Asia. Comprising 49 countries, many of which contain a wide variety of ethnicities and subpopulations, Asia also hosts a myriad of unrelated language families such as the Sino-Tibetan languages predominantly spoken in East Asia; the Indo-European and Dravidian languages predominantly spoken in south and central Asia; the Altaic languages predominantly found in Korea, Japan, and central and northern Asia; and the Austronesian and Tai-Kadai languages commonly spoken in Southeast Asia.
The distinct languages in Asia have limited the extent of historical interactions between different population groups, leading to a greater degree of genetic homogeneity between populations sharing the same linguistic system while extending the genetic differences between populations with different linguistic systems. This situation is similar to that present in the Africa continent . The diverse geographical and climatic conditions have directly influenced the rate of population growth and movement, as well as urbanization and agricultural land use in different parts of Asia, where differential sanitation and health systems have exerted profound influence on the burdens of diseases in different parts of Asia, particularly those of vector-borne infectious diseases such as malaria and dengue .
The availability of genetic datasets for global populations from the International HapMap Project (HapMap) [3–5], the Human Genome Diversity Project (HGDP)  and the Singapore Genome Variation Project  have facilitated research into how humans have adapted differentially to the climate of their habitats [8–10], prevalent diet [11, 12] and other environmental assaults including those from pathogens and allergens [13–17]. Several reports have also described the convergent evolution of hemoglobin genes in populations residing in different high-altitude locations around the world, allowing humans to adapt to an environment with reduced levels of oxygen [18–22]. Given the diversity in geography, culture, environment and language that is present in Asia, there have not been many systematic reports investigating the genomic evidence of local adaptations of Asian populations, especially those of the ethnic minorities and indigenous populations [8, 10, 13, 23, 24].
The HUGO Pan-Asian SNP Consortium (PASNP) was a collaborative agreement established to perform an unprecedented genetic survey of Asian populations, and has provided a valuable resource of around 54,974 autosomal single nucleotide polymorphisms (SNPs) for 1,928 individuals from 73 Asian and two non-Asian HapMap populations (Europeans: CEU, and Nigerian Africans: YRI) . The individuals surveyed in PASNP not only included those from urban populations, but also from the indigenous populations and ethnic minorities. Due to the sparse density of SNPs across the genome, surveys into genomic evidence of local adaption in this dataset have depended on SNP-based methods such as the Wright FST index that highlights striking differences in allele frequencies across a region , instead of haplotype-based approaches such as the iHS  or XP-EHH  that confer higher statistical power. The assumption behind both allele frequency and haplotype-based methods is the same: the frequency of the beneficial allele in a population will rise uncharacteristically rapidly, such that (1) the variants from neighboring SNPs that reside on the same haplotype as the beneficial allele will be co-inherited and there is insufficient time for recombination events to break down this extended haplotype; and (2) the beneficial allele and these variants will be found at notably higher frequencies in this population, than in other populations not experiencing the same evolutionary pressure to adapt.
The introduction of a new haplotype-based approach haploPS  presents the opportunity to perform a systematic survey and characterization of genomic signatures of local adaptation in the PASNP dataset. As haploPS relies on locating the extended haplotype form that the advantageous allele resides on, the expectation is the sparser SNP density from PASNP will still allow sufficient number of SNPs to anchor an extended haplotype. Comparing the selected haplotypes from different populations will also allow the inference of whether the signals across different populations are attributed to the same evolutionary event, or whether they are independent and are the consequence of convergent evolution.
Here, we aim to discover and characterize the origin and segregation of positive selection signals that are present in the Asian populations in PASNP. As the ability to detect the extended haplotypes using haplotype-based approaches relies on the accuracy and fidelity of the haplotype phasing, we first compared the extent of switch errors found in phasing the haplotypes for the full set of SNPs from the three population panels in Phase 2 of the HapMap (HapMap2), to the extent of the errors present when the SNP density for the HapMap samples has been reduced to a similar content as the PASNP data. In addition, we evaluated the power of haploPS via simulations to locate true signals of positive selection for haplotype data with a sparse set of SNPs, and also performed an empirical comparison of the degree of overlap in the selection signals found for the HapMap populations using the full and sparse sets of SNPs. The latter two exercises will allow an evaluation of the degree of power loss as a result of the sparser SNP density. By clustering the 73 PASNP populations into 31 groups according to shared linguistic systems and geographical proximity of the populations, we run haploPS to discover evidence of local adaptations in these 31 groups. The results provide the first systematic and large-scale survey of local adaptation in Asia, particularly in mapping the genomic features for the indigenous populations that are attributed to evolutionary pressures.
Population structure analyses
Inference of haplotype phasing accuracy with sparse genotype data
Switch error rates in phasing the HapMap2 samples with SHAPEIT on the original 2.6 million SNPs and on a reduced panel of around 55,000 SNPs
Switch error rates
JPT + CHB
Power of positive selection methods with haplotypes from sparse genotype data
Empirical comparison of consistency in positive selection signals in HapMap2 panels using haploPS, iHS and XP-EHH on the original 2.6 million SNPs and on a reduced panel of around 55,000 SNPs
Number of positively selected region identified
Reduced density (overlapped with original1)
Positive selection in the PASNP data
A region on chromosome 2 between 196.8 Mb and 198.0 Mb was identified in 12 of the 31 population groupings (Additional file 1: Table S2), and encompassed selection signals located between 197.0 Mb and 197.5 Mb previously reported in nine populations from HapMap and SGVP  that spanned PGAP1, a gene which caused perinatal lethality and male infertility in mice . This region similarly exhibited consistent evidence from iHS and XP-EHH analyses in HGDP populations from Europe, East Asia and South Asia (Additional file 1: Figure S1). The 12 extended haplotype forms from the population groupings were perfectly identical (Additional file 1: Figure S2), yielding a haplotype similarity index (HSI) of 1.00. We adopted the interpretation of the HSI as suggested by the simulation results on the sensitivity and specificity from a previous study : a high HSI (defined as ≥ 0.98) means the extended haplotype forms from the different populations are highly similar and are thus likely to be carrying the same selected allele from a single mutation event; a low HSI (defined as ≤ 0.9) suggests that the haplotype forms are considerably different and the selection signals are likely to be independent and indicative of convergent evolution. The HSI of 1.00 thus suggests that the same advantageous mutation is likely to be responsible for the selection signals present in the 12 population groupings, and this mutation has emerged prior to the divergence of these populations.
We also did a comparison of the selection regions identified by HaploPS and the top genes under selection detected by the Fst approach as reported in the previous study by Qian and colleagues . We observed that of the 193 genes found by Fst, 57 were similarly present in 29 of the 59 regions identified by HaploPS (Additional file 1: Table S2). The overlapping genes include PIK3R3, which was among the strongest signals by Fst approach and was functionally related immune protection and signal transduction. ERBB4 was also found to be under positive selection in non-African populations by iHS, XLR and XP-EHH in previous studies . The MHC region on chromosome 6 was also detected by both Fst and HaploPS. The remaining 30 regions were discovered uniquely by HaploPS, and which included the HBB gene in Taiwan aboriginals. This shows that our investigation using the haplotype-based method provides additional evolutionary insights for the PASNP data, where it not only provided additional evidence for regions identified by Fst, but also discovered novel regions under positive selection.
Due to the sparse SNP density of the PASNP data, locating genomic signatures of positive selection has previously relied on a SNP-based approach that essentially prioritized genomic regions with significant differentiation in allele frequency to indicate the presence of positive selection. Here we have utilized haploPS, a haplotype-based method of detecting positive selection by explicitly characterizing the haplotype form that is carrying the advantageous variant, to identify evidence of positive selection in 31 groups of populations that have been clustered on the basis of genetic, linguistic and geographical similarities. Empirical comparison of the consistency in selection signals identified with the original and thinned SNP data from HapMap2 indicated haploPS had the highest specificity, and simulations indicated that the method was also effective at detecting selection signals at low frequencies in the populations. HaploPS successfully located 59 genomic regions undergoing positive selection that distinguished the aboriginal people from Malaysia, Philippines, Taiwan and China from the rest of the populations. Characterizing the inferred frequencies of the advantageous variants indicated that most of the low frequency signals were found in the ethnic minorities and indigenous populations, with urban and cosmopolitan populations being more likely to carry medium to high frequency signals.
Haplotype-based approaches to locate selection signals require: (i) accurate haplotype phasing across the SNPs to preserve the extended haplotype structure; and (ii) sufficient number of SNPs to be present on the extended haplotype to anchor the signal. In this paper, we have illustrated that the sparse SNP density provided a double whammy to the statistical power to locate selection signals, since the lower density also affected the accuracy of haplotype phasing by introducing more switch errors that can break up the structure of extended haplotypes. In light of this, we have focused on reporting only what have been observed to be positively selected, and the absence of classical signals such as the pigmentation-linked KITLG and East Asian hair morphology-linked EDAR is likely to be attributed to the lower power to identify genuine signals. One example that illustrates this is the skin pigmentation locus ADAM17 that was found to be positively selected in all four East Asian populations (CHB, CHD, CHS, JPT) in HapMap and SGVP. Our analyses similarly identified this locus in the Japan Okinawans and Koreans, but failed to locate a signal in other East Asian populations.
Of the 59 regions identified in this analysis, 34 of them overlapped with positive selection signals previously reported in East and Southeast Asian populations from HapMap and SGVP, and where 24 of the signals were present in at least three of these populations . However, the sparse SNP density meant that some of these regions spanned considerable distances and thus encompassed multiple signals discovered in the HapMap and SGVP datasets with almost 30-fold higher SNP density. Indeed one of the limitations of the current analysis is the over-estimation in the size of the genomic regions which will require data of higher SNP density in the corresponding populations to fine map. Interestingly, of the 25 signals that were present only in the PASNP dataset, 24 of them were found in ethnic minorities or aboriginal populations, raising the possibility that the majority of these signals were evidence of local adaptation found only in these indigenous groups.
The discovery of a positive selection signal that extends for almost 5 Mb around the cluster of hemoglobin genes (including HBB) in the Taiwan Ami and Atayal may present the first evidence of genetic resistance against malaria. The incidence of thalassemia is higher in the Taiwan indigenous populations than the urban populations, and there are at least two possible hypotheses: (i) malaria is endemic in Malaysia and the Philippines and based on existing evidence that suggests the Taiwan aborigines are related to the indigenous people of Malaysia and/or the Philippines, the advantageous mutations may have arisen prior to the divergence of the Taiwan and these Southeast Asian indigenous people; (ii) malaria is present in Taiwan and that has driven the emergence of genetic factors providing host resistance to malaria, in a similar situation as in African populations such as the Gambia, Nigeria and Kenya. The PASNP data included samples from Malaysian and Philippines Negrito, and our survey did not present any conclusive evidence of extended haplotypes in these populations, or any indication that the haplotypes from these Southeast Asian populations were similar to the selected haplotype form in the Ami and Atayal. There have however been historical reports of migrant Chinese being more susceptible to malaria than the Taiwan aborigines [30, 31] and evidence of malarial selection on specific immunoglobulin allotypes in the Taiwan aborigines , indicating that the second hypothesis is possible in light of the high infant mortality attributed to malaria in the absence of modern healthcare.
Population bottlenecks can reduce genetic variation in a population and the resultant homogeneity can be mistaken as evidence of positive selection. Whether indigenous populations in Asia had experienced strong population bottlenecks as those recently reported in native Americans is unclear , but it is important to acknowledge that bottlenecks or inbreeding can increase false discoveries of positive selection in a population. We observed that majority of the signals identified in the ethnic minorities or indigenous populations were less frequent in the populations, similar to previous observations of positive selection signals in African populations . These signals may belong to very recent evolutionary events that have occurred in the population, or may actually correspond to balancing selection which prevented the derived allele from sweeping to fixation. Access to modern healthcare can mitigate the selective pressure of genetic factors in determining survival and reproductive advantages, and genetic adaptation is likely to be an ongoing process in indigenous populations due to the tendency to reside in natural habitats and rely on traditional medicine.
This study has established that the lower SNP content of the PASNP data conferred weaker ability to detect signatures of positive selection, but the availability of the new approach haploPS retained modest power. Despite the analysis of 73 Asian populations, we have only identified 59 signals, highlighting the need for a comprehensive survey of Asian genomics with either microarray data of higher SNP density or population-level whole-genome sequencing such as that by the 1000 Genomes Project, in order to understand the subtle variations in local adaption between Asian populations due to climate, diet and environmental differences. Indeed in the study of population genetics, both the presence and absence of adaptation can be insightful, evident in classical signals at the lactase gene (LCT) and genes related to skin pigmentation (KITLG, SLC24A5). However, genetics studies involving Asian ethnic minorities and indigenous populations will require careful engagement of local communities, as the challenges of educating and obtaining informed consent from these communities are comparable to the situation in Africa. This may require the cooperation and knowledge transfer between genetic scientists across Asia, to share experiences and leadership in research ethics, data analyses and the principles of data sharing and ownership in order to develop a genomics research network in Asia.
The dataset from the HUGO Pan-Asian SNP Consortium consists of 1,928 individuals from 73 Asian populations nd 2 non-Asian HapMap populations (Europeans CEU, Nigeria Africans YRI) that have been genotyped on the Affymetrix GeneChip HumanMapping 50 K Xba Array. A total of 54,794 SNPs passed quality control and were present in all the populations. The 73 Asian populations represent major, ethnic minority and indigenous populations from East Asia, Southeast Asia and South Asia. The sample size of each population ranges from 5 in Melanesian to 90 in the Koreans.
Grouping of Asian populations
The 73 Asian populations were partitioned into 31 groups according to the maximum likelihood phylogenetic tree for the populations as described in the original PASNP publication . Briefly, this used the genotype data for SNPs in the autosomal chromosomes to perform a maximum-likelihood inference of population similarity with the CONTML program in the PHYLIP package . Branches with insufficient bootstrap support (defined as <50%) were merged, and populations that were found in the same major branch but were geographically and linguistically similar were also merged to increase the sample sizes during the analysis for signatures of positive natural selection.
Haplotype phasing and accuracy evaluation
Haplotypes for each individual were estimated from the genotype data with SHAPEIT  using the reference haplotypes from 1,092 individuals in Phase 1 of the 1000 Genomes Project . Phasing was performed on 54,787 shared SNPs across all 1,928 individuals from the 73 Asian and two non-Asian HapMap populations within the same batch runs, which did not consider the existence of different populations. To evaluate phasing accuracy, we repeated the haplotype phasing using genotype data in the four populations (CEU, CHB, JPT, YRI) in Phase 2 of the International HapMap Project (HapMap2)  in two settings: (i) with the full set of autosomal SNP data; (ii) with a reduced set of autosomal SNPs that have been thinned to reflect the SNP density similar to the Affymetrix 50 K Xba array. The thinning was done using the PASNP dataset as a template. SNP markers that were common to both the HapMap2 and PASNP datasets were first selected. If a SNP marker in PASNP dataset was missing in HapMap2, the nearest neighboring position was chosen to represent the position. The resultant haplotypes across the 22 chromosomes for these samples are compared against those from the HapMap which we considered as the benchmark, as these have been phased with PHASE  and incorporated pedigree information in inferring the haplotypes for CEU and YRI trios . The quality of the phasing was quantified by the switch error, obtained by the ratio of the number of switches in the SHAPEIT haplotypes that were needed to recover the HapMap-phased haplotypes to the total number of heterozygote markers minus one across the genome in each individual. The switch error was calculated for every individual and subsequently averaged across all the individuals in each population.
Detecting positive selection in PASNP groups
Three different haplotype-based methods (haploPS, iHS, XP-EHH) and one allele frequency based method (Fst) were used to detect genomic signatures of positive selection on the reduced set of SNPs from the populations in HapMap2. However, only haploPS was used in the analysis of the PASNP data. Population-average recombination rates were used by all three methods. We used the C++ software for haploPS, iHS and XP-EHH that were publicly available at http://www.statgen.nus.edu.sg/~haplops and http://hgdp.uchicago.edu/Software/.
HaploPS performs an explicit search for uncharacteristically long haplotypes in the genome that are found at a particular frequency . By performing an exhaustive search across the SNPs, haploPS quantifies the evidence of a long haplotype on the basis of the genetic distance (in cM) spanned and the number of SNPs that is present on the haplotype. Each of these haplotypes is assigned two empirical p-values, defined as: (i) the proportion of haplotypes across the genome that span a genetic distance at least as large as the candidate haplotype; (ii) the proportion of haplotypes that span as many SNPs as the candidate haplotype. These two empirical p-values are used to construct the haploPS score, defined as the product of the two empirical p-values multiplied by the total number of haplotypes across the genome. Note that the haploPS score does not have the interpretation of a traditional p-value, and can be larger than 1. Haplotypes with haploPS score < 0.05 are deemed to exhibit evidence of positive selection. This procedure is performed for each population grouping across a range of haplotype frequencies from 0.05 to 0.95, at a step-size increment of 0.05. At each genomic location, the significant haplotype found at the highest haplotype frequency is reported, and the estimated frequency of the advantageous allele is taken as the highest haplotype frequency where the haploPS score is significant.
The integrated haplotype score (iHS) was calculated by estimating the extended haplotype homozygosity (EHH) score. The EHH is the probability of identity-by-descent for two haplotypes that carry a core haplotype within a distance to a pre-defined focal SNP , and the iHS is the integration of EHH scores up to the SNP with an EHH score of 0.05, or until there is a gap of more than 2.5 Mb . The iHS statistic can be artificially inflated in the presence of gaps ranging from 20 kb to 200 kb, and the statistic is corrected by a scaling factor according to that described by Voight and colleagues . The raw iHS statistics are normalized within 20 derived allele frequency bins, and SNPs are subsequently grouped into non-overlapping windows of 1 Mb. The proportion of SNPs in each window with |iHS| > 2 is calculated, and windows with a degree of over-representation as found in the top 1% of all the windows are considered as candidate regions of positive selection.
The cross-population extended haplotype homozygosity (XP-EHH) contrasts evidence of positive selection between a target population and a reference population at a focal SNP . At each focal SNP position, neighboring SNPs that are present in both populations and within 1 Mb of the focal SNP are used to calculate the XP-EHH score, provided there is at least one SNP in the region with an EHH between 0.03 and 0.05. A SNP with EHH nearest to 0.04 is identified, and the EHH scores across all the SNPs between the focal SNP and the identified SNP are integrated. The XP-EHH statistic is defined as the logarithm of the ratio of this integral in the target population with respect to the reference population. The genome-wide distribution of the raw XP-EHH statistics is standardized to zero mean and unit variance. SNPs are subsequently grouped into non-overlapping windows of 1 Mb, and the maximum XP-EHH score in each window is denoted. Candidate selection regions are identified as windows found in the top 1% of the distribution of the maximum XP-EHH scores. We used YRI as the reference population for all non-African population groups, and CEU was used as the reference population for YRI.
Fst analysis was performed following description of Qian et al. . For each pair of populations, we calculated the Weir and Hill unbiased Fst on the common set of SNPs. Then a sliding window approach was used, where the window size is set to be 500 Kb. In each window, the average value of the highest three Fst values were used as the window’s statistics. Windows with smaller than five SNPs were excluded from the analysis. The significance threshold was defined as top 1% and windows with statistics larger than the threshold were considered as regions with positive selection.
Power simulation for haploPS at a reduced SNP density
In order to assess the degree of power loss of haploPS at a reduced SNP density similar to that of the PASNP data, we utilized the simulation data for haploPS that have been made publicly available at http://www.statgen.nus.edu.sg/~haplops/ and thinned the available dataset to 1/20th of the original SNP density. Briefly, the dataset has been simulated using SelSim which produced genotype data for a region undergoing positive selection by introducing an advantageous mutation at a pre-specified location. The selection coefficient was set as 0.01, and the frequency of the advantageous mutation was set to range between 10% and 100%, in increments of 10%. The effective population size Ne was assumed to be 17,469, and the mutation rate was set to 3 × 10−8 per base per generation. The recombination rate was generated by cosi with a baseline rate of 1 cM/Mb. The simulation was performed to generate SNPs in 100 kb regions, where the original simulations yielded an average of 200 SNPs per 100 kb (a SNP density similar to that present in HapMap Phase 2) although we thinned the density to 1/20th of the original density (or around 10 SNPs per 100 kb) to reflect the SNP density that is present in the PASNP data. Null simulations were performed with cosi to generate regions without positive selection. A total of 2,000 positively selected regions and 2,000 neutral regions were generated. HaploPS was applied to both the original simulated dataset and the reduced-density dataset to evaluate statistical power. The power is quantified as the fraction of the 2,000 positive selection iterations where the haploPS score obtained is less than the 1st percentile of the distribution of haploPS scores obtained from the 2,000 iterations under the null model.
Population clustering of positive selection signals
For the 59 genomic regions that have been identified to be positively selected in at least one of the 31 population groupings, we constructed a 31 × 59 indicator matrix, where the (i, j)th element of the matrix takes value 1 if the jth region is found to be positively selected in population i, and is 0 otherwise. This is used to calculate a 31 × 31 correlation matrix, which indicates the degree of sharing of the 59 selection signals across the 31 population groupings. This correlation matrix is used to perform a hierarchical clustering using the Ward’s minimum variance method implemented in hclust in R.
Quantifying haplotype similarity and inferring origin of shared selection events
To infer whether positive selection events shared by multiple populations originated from the same mutation event or from separate mutation events, we assessed the degree of similarity between the identified haplotype forms in the different populations using the haplotype similarity index (HSI) . This assumes that if the advantageous mutation arises before the different populations diverged, then the same haplotype form will be identified to carry the advantageous allele across the different populations and we will thus expect a significant degree of similarity in the identified haplotype forms. Conversely, if the locus is positively selected in multiple populations due to independent emergence of the same or different advantageous alleles in the locus, these alleles would have arisen on different haplotypes and thus the identified haplotype forms will be considerably different. Thus, for a region that is found to be positively selected by haploPS in K populations, we can identify the K selected haplotype forms for these populations respectively, and compare the alleles at the common set of L SNPs. The K × K similarity matrix M is calculated such that the leading diagonal entries are all ones, and the (i, j)th entry of the matrix corresponds to the scaled Manhattan distance between the selected haplotype forms for population i and population j defined as M(i, j) = 1–l/L with l represent the number of sites out of L where the two haplotypes carry different alleles. An eigen-decomposition is performed on the matrix M, and the haplotype similarity index (HSI) is defined as the amount of variance explained by the first principal component. We infer a shared signal as a single mutation event if the HSI > 0.98, and as convergent evolution if HSI < 0.9.
X.L., R.T.H.O. and Y.Y.T., acknowledge support from the Saw Swee Hock School of Public Health from the National University of Singapore and the National Research Foundation Singapore (NRF-RF-2010-05). M.A., W.Y.S. and Y.Y.T. also acknowledge support from the Life Sciences Institute, National University of Singapore.
- Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, Hirbo JB, Awomoyi AA, Bodo JM, Doumbo O, Ibrahim M, Juma AT, Kotze MJ, Lema G, Moore JH, Mortensen H, Nyambo TB, Omar SA, Powell K, Pretorius GS, Smith MW, Thera MA, Wambebe C, Weber JL, Williams SM: The genetic structure and history of Africans and African Americans. Science. 2009, 324 (5930): 1035-1044. 10.1126/science.1172257.PubMed CentralPubMedView ArticleGoogle Scholar
- Coker RJ, Hunter BM, Rudge JW, Liverani M, Hanvoravongchai P: Emerging infectious diseases in southeast Asia: regional challenges to control. Lancet. 2011, 377 (9765): 599-609. 10.1016/S0140-6736(10)62004-1.PubMedView ArticleGoogle Scholar
- Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, et al: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007, 449 (7164): 851-861. 10.1038/nature06258.PubMedView ArticleGoogle Scholar
- The International HapMap Project. Nature. 2003, 426 (6968): 789-796. 10.1038/nature02168.Google Scholar
- Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Bonnen PE, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, International HapMap 3 Consortium1, et al: Integrating common and rare genetic variation in diverse human populations. Nature. 2010, 467 (7311): 52-58. 10.1038/nature09298.PubMedView ArticleGoogle Scholar
- Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM: Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008, 319 (5866): 1100-1104. 10.1126/science.1153717.PubMedView ArticleGoogle Scholar
- Teo YY, Sim X, Ong RT, Tan AK, Chen J, Tantoso E, Small KS, Ku CS, Lee EJ, Seielstad M, Chia KS: Singapore genome variation project: a haplotype map of three Southeast Asian populations. Genome Res. 2009, 19 (11): 2154-2162. 10.1101/gr.095000.109.PubMed CentralPubMedView ArticleGoogle Scholar
- Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4 (3): e72-10.1371/journal.pbio.0040072.PubMed CentralPubMedView ArticleGoogle Scholar
- Hancock AM, Witonsky DB, Alkorta-Aranburu G, Beall CM, Gebremedhin A, Sukernik R, Utermann G, Pritchard JK, Coop G, Di Rienzo A: Adaptations to climate-mediated selective pressures in humans. PLoS Genet. 2011, 7 (4): e1001375-10.1371/journal.pgen.1001375.PubMed CentralPubMedView ArticleGoogle Scholar
- Suo C, Xu H, Khor CC, Ong RT, Sim X, Chen J, Tay WT, Sim KS, Zeng YX, Zhang X, Liu J, Tai ES, Wong TY, Chia KS, Teo YY: Natural positive selection and north-south genetic diversity in East Asia. Eur J Hum Genet. 2012, 20 (1): 102-110. 10.1038/ejhg.2011.139.PubMed CentralPubMedView ArticleGoogle Scholar
- Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, Powell K, Mortensen HM, Hirbo JB, Osman M, Ibrahim M, Omar SA, Lema G, Nyambo TB, Ghori J, Bumpstead S, Pritchard JK, Wray GA, Deloukas P: Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet. 2007, 39 (1): 31-40. 10.1038/ng1946.PubMed CentralPubMedView ArticleGoogle Scholar
- Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN: Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 2004, 74 (6): 1111-1120. 10.1086/421051.PubMed CentralPubMedView ArticleGoogle Scholar
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, International HapMap C, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, et al: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449 (7164): 913-918. 10.1038/nature06250.PubMed CentralPubMedView ArticleGoogle Scholar
- Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES: Positive natural selection in the human lineage. Science. 2006, 312 (5780): 1614-1620. 10.1126/science.1124309.PubMedView ArticleGoogle Scholar
- Genovese G, Friedman DJ, Ross MD, Lecordier L, Uzureau P, Freedman BI, Bowden DW, Langefeld CD, Oleksyk TK, Uscinski Knob AL, Bernhardy AJ, Hicks PJ, Nelson GW, Vanhollebeke B, Winkler CA, Kopp JB, Pays E, Pollak MR: Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science. 2010, 329 (5993): 841-845. 10.1126/science.1193032.PubMed CentralPubMedView ArticleGoogle Scholar
- Walsh EC, Sabeti P, Hutcheson HB, Fry B, Schaffner SF, de Bakker PI, Varilly P, Palma AA, Roy J, Cooper R, Winkler C, Zeng Y, de The G, Lander ES, O'Brien S, Altshuler D: Searching for signals of evolutionary selection in 168 genes related to immune function. Hum Genet. 2006, 119 (1–2): 92-102.PubMedView ArticleGoogle Scholar
- Carnero-Montoro E, Bonet L, Engelken J, Bielig T, Martinez-Florensa M, Lozano F, Bosch E: Evolutionary and functional evidence for positive selection at the human CD5 immune receptor gene. Mol Biol Evol. 2012, 29 (2): 811-823. 10.1093/molbev/msr251.PubMedView ArticleGoogle Scholar
- Simonson TS, McClain DA, Jorde LB, Prchal JT: Genetic determinants of Tibetan high-altitude adaptation. Hum Genet. 2012, 131 (4): 527-533. 10.1007/s00439-011-1109-3.PubMedView ArticleGoogle Scholar
- Simonson TS, Yang Y, Huff CD, Yun H, Qin G, Witherspoon DJ, Bai Z, Lorenzo FR, Xing J, Jorde LB, Prchal JT, Ge R: Genetic evidence for high-altitude adaptation in Tibet. Science. 2010, 329 (5987): 72-75. 10.1126/science.1189406.PubMedView ArticleGoogle Scholar
- Scheinfeldt LB, Soi S, Thompson S, Ranciaro A, Woldemeskel D, Beggs W, Lambert C, Jarvis JP, Abate D, Belay G, Tishkoff SA: Genetic adaptation to high altitude in the Ethiopian highlands. Genome Biol. 2012, 13 (1): R1-10.1186/gb-2012-13-1-r1.PubMed CentralPubMedView ArticleGoogle Scholar
- Scheinfeldt LB, Tishkoff SA: Living the high life: high-altitude adaptation. Genome Biol. 2010, 11 (9): 133-10.1186/gb-2010-11-9-133.PubMed CentralPubMedView ArticleGoogle Scholar
- Huerta-Sanchez E, Degiorgio M, Pagani L, Tarekegn A, Ekong R, Antao T, Cardona A, Montgomery HE, Cavalleri GL, Robbins PA, Weale ME, Bradman N, Bekele E, Kivisild T, Tyler-Smith C, Nielsen R: Genetic signatures reveal high-altitude adaptation in a set of Ethiopian populations. Mol Biol Evol. 2013, 30 (8): 1877-1888. 10.1093/molbev/mst089.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu X, Ong RT, Pillai EN, Elzein AM, Small KS, Clark TG, Kwiatkowski DP, Teo YY: Detecting and characterizing genomic signatures of positive selection in global populations. Am J Hum Genet. 2013Google Scholar
- Qian W, Deng L, Lu D, Xu S: Genome-wide landscapes of human local adaptation in Asia. PLoS ONE. 2013, 8 (1): e54224-10.1371/journal.pone.0054224.PubMed CentralPubMedView ArticleGoogle Scholar
- Abdulla MA, Ahmed I, Assawamakin A, Bhak J, Brahmachari SK, Calacal GC, Chaurasia A, Chen CH, Chen J, Chen YT, Chu J, Cutiongco-de la Paz EM, De Ungria MC, Delfin FC, Edo J, Fuchareon S, Ghang H, Gojobori T, Han J, Ho SF, Hoh BP, Huang W, Inoko H, Jha P, Jinam TA, Jin L, Jung J, Kangwanpong D, Kampuansai J, HP-AS Consortium, et al: Mapping human genetic diversity in Asia. Science. 2009, 326 (5959): 1541-1545.PubMedView ArticleGoogle Scholar
- Menelaou A, Marchini J: Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold. Bioinformatics. 2013, 29 (1): 84-91. 10.1093/bioinformatics/bts632.PubMedView ArticleGoogle Scholar
- Ueda Y, Yamaguchi R, Ikawa M, Okabe M, Morii E, Maeda Y, Kinoshita T: PGAP1 knock-out mice show otocephaly and male infertility. J Biol Chem. 2007, 282 (42): 30373-30380. 10.1074/jbc.M705601200.PubMedView ArticleGoogle Scholar
- Ong RT, Liu X, Poh WT, Sim X, Chia KS, Teo YY: A method for identifying haplotypes carrying the causative allele in positive natural selection and genome-wide association studies. Bioinformatics. 2011Google Scholar
- Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK: Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009, 19 (5): 826-837. 10.1101/gr.087577.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Worcester M, Clark University: The Journal of Race Development, Volume 2. 1912, : Princeton University: Clark UniversityGoogle Scholar
- Brown MJ: Is Taiwan Chinese? The Impact of Culture, Power, and Migration on Changing Identities. 2004, London, England: University of California Press Ltd.View ArticleGoogle Scholar
- Schanfield MS, Ohkura K, Lin M, Shyu R, Gershowitz H: Immunoglobulin allotypes among Taiwan aborigines: evidence of malarial selection could affect studies of population affinity. Hum Biol. 2002, 74 (3): 363-379. 10.1353/hub.2002.0033.PubMedView ArticleGoogle Scholar
- O’Fallon BD, Fehren-Schmitz L: Native Americans experienced a strong population bottleneck coincident with European contact. Proc Natl Acad Sci U S A. 2011, 108 (51): 20444-20448. 10.1073/pnas.1112563108.PubMed CentralPubMedView ArticleGoogle Scholar
- Felsenstein J: PHYLIP - phylogeny inference package (Version 3.2). Cladistics. 1989, 5: 164-166.Google Scholar
- Delaneau O, Zagury JF, Marchini J: Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013, 10 (1): 5-6. 10.1038/nchembio.1414.PubMedView ArticleGoogle Scholar
- Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491 (7422): 56-65. 10.1038/nature11632.PubMedView ArticleGoogle Scholar
- Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001, 68 (4): 978-989. 10.1086/319501.PubMed CentralPubMedView ArticleGoogle Scholar
- Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, Donnelly P, International HapMap Consortium: A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet. 2006, 78 (3): 437-450. 10.1086/500808.PubMed CentralPubMedView ArticleGoogle Scholar
- Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419 (6909): 832-837. 10.1038/nature01140.PubMedView ArticleGoogle Scholar
- Spencer CC, Coop G: SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics. 2004, 20 (18): 3673-3675. 10.1093/bioinformatics/bth417.PubMedView ArticleGoogle Scholar
- Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D: Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005, 15 (11): 1576-1583. 10.1101/gr.3709305.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.