Development of admixture mapping panels for African Americans from commercial high-density SNP arrays

Chen, Guanjie; Shriner, Daniel; Zhou, Jie; Doumatey, Ayo; Huang, Hanxia; Gerry, Norman P; Herbert, Alan; Christman, Michael F; Chen, Yuanxiu; Dunston, Georgia M; Faruque, Mezbah U; Rotimi, Charles N; Adeyemo, Adebowale

doi:10.1186/1471-2164-11-417

Research article
Open access
Published: 05 July 2010

Development of admixture mapping panels for African Americans from commercial high-density SNP arrays

Guanjie Chen¹,
Daniel Shriner¹,
Jie Zhou¹,
Ayo Doumatey¹,
Hanxia Huang¹,
Norman P Gerry²,
Alan Herbert³,
Michael F Christman²,
Yuanxiu Chen⁴,
Georgia M Dunston⁴,
Mezbah U Faruque⁴,
Charles N Rotimi¹ &
…
Adebowale Adeyemo¹

BMC Genomics volume 11, Article number: 417 (2010) Cite this article

5988 Accesses
13 Citations
Metrics details

Abstract

Background

Admixture mapping is a powerful approach for identifying genetic variants involved in human disease that exploits the unique genomic structure in recently admixed populations. To use existing published panels of ancestry-informative markers (AIMs) for admixture mapping, markers have to be genotyped de novo for each admixed study sample and samples representing the ancestral parental populations. The increased availability of dense marker data on commercial chips has made it feasible to develop panels wherein the markers need not be predetermined.

Results

We developed two panels of AIMs (~2,000 markers each) based on the Affymetrix Genome-Wide Human SNP Array 6.0 for admixture mapping with African American samples. These two AIM panels had good map power that was higher than that of a denser panel of ~20,000 random markers as well as other published panels of AIMs. As a test case, we applied the panels in an admixture mapping study of hypertension in African Americans in the Washington, D.C. metropolitan area.

Conclusions

Developing marker panels for admixture mapping from existing genome-wide genotype data offers two major advantages: (1) no de novo genotyping needs to be done, thereby saving costs, and (2) markers can be filtered for various quality measures and replacement markers (to minimize gaps) can be selected at no additional cost. Panels of carefully selected AIMs have two major advantages over panels of random markers: (1) the map power from sparser panels of AIMs is higher than that of ~10-fold denser panels of random markers, and (2) clusters can be labeled based on information from the parental populations. With current technology, chip-based genome-wide genotyping is less expensive than genotyping ~20,000 random markers. The major advantage of using random markers is the absence of ascertainment effects resulting from the process of selecting markers. The ability to develop marker panels informative for ancestry from SNP chip genotype data provides a fresh opportunity to conduct admixture mapping for disease genes in admixed populations when genome-wide association data exist or are planned.

Background

Admixture mapping is an approach for localizing disease susceptibility loci that attempts to capitalize on the long-range linkage disequilibrium occurring in populations formed by recent mixing of ancestral populations [1–6]. The approach uses samples from recently admixed populations to detect susceptibility loci at which the risk alleles have different frequencies in the ancestral parental populations. Admixture mapping is an economical and theoretically powerful approach. Compared to linkage, admixture mapping does not require families and has more power. Compared to association, admixture mapping requires ~200-500-fold fewer markers, is not susceptible to allelic heterogeneity, and can be used with either case-only or case-control study designs. Admixture mapping can also be performed with generalized linear models to accommodate quantitative traits [1]. Admixture mapping has been performed for many complex traits which exhibit strong differences in prevalence across ethnicities, such as end-stage renal disease [7, 8], hypertension [9–11], multiple sclerosis [12], obesity [13–15], peripheral arterial disease [16], prostate cancer [17, 18], rheumatoid arthritis [19], serum inflammatory markers [20], systemic lupus erythematosus [21], type 2 diabetes [22], and white blood cell count [23].

Several groups have built panels of ancestry-informative markers (AIMs) based on multiple databases of human genetic variation [24–27]. Previously, admixture mapping required the construction of panels of AIMs based on screening large reference sets of genetic variation for ancestry-informative markers followed by de novo genotyping at those preselected markers in the admixed study sample and the samples representing the (putative) ancestral parental populations [12, 20, 23, 26, 28, 29]. However, given commercially available high-density marker arrays, it is now possible to construct customized panels from markers already genotyped in the admixed study sample(s) [27, 30–34].

In this study, we constructed marker panels for admixture mapping with African American populations, starting from the Affymetrix Genome-Wide Human SNP Array 6.0, which probes variation at 909,508 single-nucleotide polymorphisms (SNPs). Using genome-wide genotypes in our study sample of African Americans already experimentally determined for genome-wide association studies and HapMap data to represent the presumed ancestral parental populations, we constructed one panel consisting of SNPs with large differences in allele frequencies between the ancestral parental populations and a second panel consisting of SNPs with large F_ST values between the ancestral parental populations. We also constructed a panel consisting of random markers not selected to be ancestrally informative. Characteristics of these panels, including the number of markers and information content, are presented. As a test case, we apply these panels to a study of hypertension in African Americans.

Methods

Study Population

The admixed population under study comprised participants in the Howard University Family Study (HUFS) from the Washington, D.C. metropolitan area [35]. The first phase of recruitment involved enrolling and examining a randomly ascertained cohort of African American families with members in multiple generations. To facilitate nested case-control study designs, additional unrelated individuals from the same geographic area were enrolled in a second phase of recruitment. Participants were not ascertained based on any phenotypes. Participants were interviewed and measured for various anthropometric and clinical variables. Blood pressure was measured in the sitting position using an oscillometric device (Omron Healthcare, Kyoto, Japan). Three readings were taken with a ten minute interval between readings. The reported systolic and diastolic blood pressure readings were the average of the second and third readings. Hypertension case status was defined as systolic blood pressure ≥ 140 mmHg, or diastolic blood pressure ≥ 90 mmHg, or treatment with antihypertensive medication. We identified a subset of 1,017 unrelated individuals including 509 hypertensive cases and 508 controls for use in admixture mapping.

Genome-wide genotyping in the HUFS was performed using the Affymetrix Genome-Wide Human SNP Array 6.0. DNA samples were prepared and hybridized following the manufacturer's instructions [35]. Genotype calls were made using the Birdseed algorithm, version 2 [36]. We had four inclusion criteria: the individual sample call rate had to be ≥ 95% (no samples excluded), the SNP call rate had to be ≥ 95% (41,885 SNPs excluded), the minor allele frequency had to be ≥ 0.01 (19,154 SNPs excluded), and the p-value for the Hardy-Weinberg (HWE) test of equilibrium had to be ≥ 1.0×10^-3 (6,317 SNPs excluded). After filtering, 842,074 autosomal and X chromosomal SNPs remained.

HapMap phase III CEU (1,403,896 SNPs and 180 individuals), YRI (1,484,416 SNPs and 180 individuals), and ASW (1,536,247 SNPs and 90 individuals) genotype data were obtained from the International HapMap Project http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2008-07_phaseIII/. We retained unrelated individuals, leaving 109 CEU individuals, 108 YRI individuals, and 55 ASW individuals. We used the same criteria (sample call rate > 95%, locus call rate ≥ 95%, minor allele frequency > 0.01, HWE p ≥ 1.0×10^-3) for filtering genotypes. After filtering, the intersection of the CEU, YRI, and HUFS data sets included 708,383 SNPs. We used these contemporary samples of 109 unrelated CEU individuals and 108 unrelated YRI individuals as proxy samples for the presumed ancestral parental populations of our African American sample.

δ and F_ST Calculations

For a given SNP, δ was calculated as the absolute difference in allele frequencies in the CEU and YRI data, δ = |p_CEU- p_YRI|. Wright [37] suggested the fixation index F_ST to evaluate population differentiation. We estimated F_ST between the CEU and YRI samples using the formula in which and . Wright [38] suggested qualitative guidelines for the interpretation of F_ST: values from 0 to 0.05 indicate little population differentiation, values between 0.05 and 0.15 indicate moderate population differentiation, values between 0.15 and 0.25 indicate large population differentiation, and values above 0.25 indicate very large population differentiation.

Genetic Map of SNPs

The Rutgers Combined Linkage-Physical Map of the Human Genome was used to locate markers on the genetic map (in cM) given positions on the physical map (in bp). The positions of SNPs on the genetic map were obtained using a web-based application http://integrin.ucd.ie/cgi-bin/rs2cm.cgi.

Selection of Ancestry-Informative Markers from HapMap Data

We followed a six-step process to select AIMs. First, we selected SNPs for which the minor allele frequency was ≥ 0.01 in both ancestry populations (CEU and YRI). Second, we filtered for SNPs for which δ ≥ 0.6 between CEU and YRI. Third, we divided each chromosome into consecutive, non-overlapping bins of size 1 Mb and sorted the SNPs within each bin in descending order according to the δ values. Fourth, for each chromosome, we estimated pairwise correlations between the top-ranked SNPs across the bins. Fifth, for each pair of SNPs, if r² ≥ 0.4 in either the CEU or YRI sample, we discarded the SNP with the smaller δ value from its bin and promoted all remaining SNPs in that bin. If δ values were equal (to the fourth decimal place), we discarded the distal SNP. We iterated steps 4-5 until r² < 0.4 in either of the CEU or YRI sample for all pairs of top-ranked SNPs per bin. The resulting panel comprised 2,076 AIMs. We repeated this entire process based on F_ST ≥ 0.4, yielding a second panel consisting of 1,923 AIMs. Given δ = 0.6, the allowable values of F_ST range from δ² = 0.36 to [39]. Similarly, given F_ST = 0.4, the allowable values of δ range from to [39]. These calculations show the comparability of the two thresholds.

Information Content and Map Power

We calculated the Shannon information content (SIC), defined as

in which a₀₀ = (1 - m) × p_YRI, a₀₁ = m × p_CEU, a₁₀ = (1 - m) × (1 - p_YRI), a₁₁ = m × (1 - p_CEU), and m is the proportion of European ancestry.

For a locus i and individual j, X_ij was defined as the entropy of the locus-specific ancestry estimate and G_j was defined as the entropy of the genome-wide ancestry estimate. The relative power at locus i was defined as . If X_ij = Gj for all j, then r_i = 0 and there is no additional information about local ancestry beyond information about genome-wide ancestry. If X_ij = 0 for all j, then r_j = 1 and there is perfect information for local ancestry [31]. The statistic r_iand the average of r_i across loci, r_avg, were estimated using ANCESTRYMAP [3]. Relative to a study with perfect information about local ancestry (r_avg = 1), 1/r_avg times as many samples must be genotyped to achieve comparable power [31].

Estimation of Individual Admixture and Population Structure

We used the variance inflation factor (VIF) to prune markers in linkage disequilibrium (LD). The VIF is equal to, in which is the multiple correlation coefficient. A VIF of 1 implies that the index SNP is completely independent of all other SNPs. Starting from a common set of SNPs passing quality control among the HapMap CEU, HapMap YRI, and HUFS data sets, we used LD-based pruning (VIF 1.1, window size 50 SNPs, window slide of 5 SNPs) to generate a set of 74,546 SNPs with minimal LD between the markers. We then randomly selected one-third of the SNPs to obtain a random marker panel (21 k random panel) that had 10-fold greater marker density than the AIMs panels. We also generated an additional panel (2 k random panel) by randomly sub-sampling 10% of the 21 k random panel to match the marker density of the AIMs panels. We examined clustering using a parametric approach implemented in STRUCTURE [40] and a nonparametric approach implemented in AWclust [41]. Analysis was performed in STRUCTURE without any prior population assignment and was performed ten times for each number of clusters (K), with 10,000 burn-in steps and a run length of 10,000 steps under the admixture model. We recorded the log likelihood of each analysis conditional on K estimated by STRUCTURE. Compared with this parametric approach, the nonparametric approach in AWclust [41] uses allele-sharing distance (ASD) and Ward's minimum variance algorithm to cluster the individuals in the ASD matrix. AWclust does not assume Hardy-Weinberg equilibrium or linkage equilibrium and does not require allele frequency estimates. We varied K from one to six in both programs.

Application of the panels to a study of hypertension

Two statistics were used to test for the presence of disease loci using ANCESTYMAP [3]. One was the locus-genome statistic, which compared the admixture proportion at one locus with the genome-wide average among cases only. The locus-genome statistic was tested via a likelihood-ratio statistic, i.e., the likelihood of a locus being a disease locus to the likelihood of the locus not being a disease locus. The LOD score was defined as the likelihood-ratio test statistic divided by 2ln(10). The genome-wide significance threshold of the LOD score was set at 2 [3]. The other statistic was the case-control statistic, which compared cases with controls at every point in the genome, testing for differences in ancestry estimates. A deviation from the genome-wide average of one parental population ancestry seen in cases but not in controls provided evidence of a disease locus. The case-control statistic followed the standard normal distribution under the null hypothesis that a locus was not a disease locus. The genome-wide significance threshold of the z-statistic was set at ± 4.2 for the two panels of AIMs and ± 4.7 for the panel based on random markers. We specified in the disease model that the relative risk for hypertensive heart disease among African Americans was 2.80 compared to European Americans [26].

Results

Marker Panels for Admixture Mapping in African Americans

The distribution of SNPs across the AIMs panels (one based on δ contained 2,076 AIMs (Additional file 1), the other based on F_ST contained 1,923 AIMs (Additional file 2)) and two random marker panels (21 k random marker panel and 2 k random marker panel, Additional file 3) are shown in Table 1. The panels covered all 22 autosomes and the X chromosome (Table 1). All marker panels showed lower heterozygosities in the parental samples than in the admixed sample, with the two panels of AIMs showing ascertainment effects of lower heterozygosities in the parental samples and higher heterozygosity in the admixed sample (Table 2). Scatter plots of allele frequencies for AIMs showed clear differentiation of the two parental populations (Figure 1), as did the STRUCTURE plot assuming K = 2 populations (Figure 2) and the AWclust plot (Additional file 4). Excluding centromeres, the average inter-marker distance was 1.33 cM for the panel based on δ, 1.43 cM for the panel based on F_ST, 0.124 cM for the panel based on 21 k random markers, and 1.17 cM for the panel based on 2 k random markers (Additional file 5). The average values of δ, F_ST, and SIC were 0.715, 0.519, and 0.300 for the δ panel, 0.708, 0.531, and 0.308 for the F_ST panel, 0.142, 0.049, and 0.026 for the 21 k random marker panel, and 0.143, 0.050, and 0.026 for the 2 k random marker panel, respectively (Additional file 6).

Table 1 Distribution of markers

Full size table

Table 2 Average heterozygosities

Full size table

The two panels of AIMs shared 1,745 markers. The remaining markers (331 in the panel based on δ, 178 in the panel based on F_ST) showed no significant difference in Shannon information content (SIC) (t-test, p = 0.10). The δ and F_ST values in the two panels were highly positively correlated (r = 0.92, p < 0.0001). The δ in the panel based on δ was significantly higher than δ estimated from the panel based on F_ST (p = 0.0004). Similarly, F_ST in the panel based on F_ST was significantly higher than F_ST in the panel based on δ (p < 0.0001).

Sample Characteristics

The genome-wide average F_ST between HUFS and YRI was 0.0295, indicating little population differentiation. The genome-wide average F_ST was 0.0656 between HUFS and CEU and 0.0753 between CEU and YRI, both indicating moderate population differentiation. As expected, these results indicated that our admixed HUFS sample was more similar to YRI than CEU, i.e., the proportion of African ancestry exceeded the proportion of European ancestry. Similarly, principal coordinate analysis showed that the HUFS sample was intermediate between the two ancestral parental populations and on average closer to YRI than CEU (Additional file 4). The estimated proportions of African ancestry in the HUFS sample using ANCESTRYMAP were 0.81 ± 0.11 and 0.84 ± 0.08 for the autosomes and the X chromosome, respectively.

Admixture Information Content

We evaluated the informativeness of the two panels of random markers compared to the informativeness of the two panels of AIMs. The proportions of markers in the panel of 21 k random markers for which r_i ≥ 0.50, r_i ≥ 0.75, and r_i ≥ 0.80 were 96.74%, 7.68%, and 1.20%, respectively, and the panel had a map power of r_avg = 0.65. The proportions of markers in the panel of AIMs based on δ for which r_i ≥ 0.50, r_i ≥ 0.75, and r_i ≥ 0.80 were 98.82%, 38.86%, and 2.28%, respectively. The panel of AIMs based on F_ST yielded values similar to those from the panel of AIMs based on δ values. The map power was r_avg = 0.73 for the panels based on δ and F_ST (Figures 3 and 4). The proportion of markers in the panel based on 2 k random markers for which r_i ≥ 0.50, r_i ≥ 0.75, and r_i ≥ 0.80 were 0.19%, 0%, and 0%, respectively, and the panel had a map power of r_avg = 0.13 (Figures 3 and 4). These estimates indicate that the two panels of AIMs extracted more ancestry information than a 10-fold denser panel of random markers and much more than the 2 k random marker panel. Using the r_avg statistic, one would need to study 1.37 (= 1/0.73), 1.37 (= 1/0.73), 1.54 (= 1/0.65), and 7.69 (= 1/0.13) times as many samples to maintain power to detect disease genes as would be necessary if one had full ancestry information, using the panels based on δ, F_ST, 21 k random markers, and 2 k random markers, respectively.

We constructed panels conditional on approximate linkage equilibrium over 1 Mb bins. Our iterative pruning procedure was designed to avoid gaps in coverage and to eliminate background linkage disequilibrium. To compare our panels with previously published panels, we obtained two panels of AIMs developed for African Americans by Tian et al. [28]. From their panel of 4,222 AIMs, 682 AIMs were in common with the CEU, YRI, and HUFS data sets and all 682 AIMs passed quality control. Similarly, 321 AIMs from their panel of 2,000 AIMs were in common with the CEU, YRI, and HUFS data sets and all 321 AIMs passed quality control. As a result of the substantial reduction in marker density, the map power was reduced for both panels of Tian et al. using our HUFS data set (Table 3). The substantial reduction in marker density occurred because the panels of Tian et al. were developed independently of the Affymetrix chip we used for genotyping our sample and there was little overlap in the SNPs in their panels and on the chip. To investigate if this limitation also applied to another African American data set, we obtained the HapMap phase III ASW data. In the ASW data set, ~50% of the AIMs in either panel of Tian et al. were present, compared to > 98% of the AIMs from our panels, whereas almost every AIM present in the data passed quality control (Table 4). These comparisons highlight the advantage of being able to customize a panel using preexisting GWAS genotypes, especially for filling in gaps to improve coverage.

Table 3 Comparison of map power for different panels using HUFS

Full size table

Table 4 Percentages of markers passing quality control for different panels using the HapMap ASW sample

Full size table

Application of the Admixture Panels

As an example of applying our newly developed panels, we investigated hypertension in the HUFS. The relative risk for hypertensive heart disease among African Americans was 2.80 compared to European Americans [26]. Averaged genome-wide, the individual proportion of European ancestry was 0.192 ± 0.098, 0.193 ± 0.098, and 0.264 ± 0.106 among normotensive subjects and 0.196 ± 0.119, 0.196 ± 0.119, and 0.268 ± 0.109 among hypertensive subjects, for the panels based on δ, F_ST, and 21 k random markers, respectively. Although this result suggests that most of the differential risk in hypertension is probably not explainable by genetics, it does not preclude specific loci from significantly contributing to differential risk. Assuming the hybrid isolation model, i.e., a single generation of admixture with no subsequent gene flow, the estimated number of generations since the original admixture event was 7.44 ± 3.35, 7.33 ± 3.01, and 8.65 ± 5.31 for the panels based on δ, F_ST, and 21 k random markers, respectively.

We performed admixture mapping using both the locus-genome and case-control statistics for hypertension in the HUFS data. No marker reached genome-wide significance for hypertension case/control status using ANCESTRYMAP (Figure 5). Using a pairwise score test for markers shared between the two AIM panels, no significant difference was found between the panels (p = 0.8616 for the locus-genome statistics, p = 0.3087 for the case-control statistics). Similarly, using a t-test for AIMs not shared between the two panels, no significant difference was found between the panels (p = 0.6099 for the locus-genome statistics, p = 0.5607 for the case-control statistics).

Discussion

In this study, we constructed panels of markers with variable informativeness for ancestry in admixed African Americans. We had previously genotyped our sample using the Affymetrix Genome-Wide Human SNP Array 6.0 for genome-wide association studies. Repurposing markers for admixture mapping eliminates the need for de novo genotyping. After linkage disequilibrium-based pruning, we constructed a set of 2,076 uncorrelated markers with large differences in allele frequencies and another set of 1,923 uncorrelated markers with large F_ST values. Using these ancestry-informative markers, we estimated that the proportion of European ancestry in our sample of 1,017 unrelated African Americans from Washington, D.C. was 0.19 ± 0.11 for both panels, comparable to an estimated proportion of 0.21 ± 0.11 in a sample of 442 African Americans with multiple sclerosis and 276 controls [3]. Using a set of 21 k random markers (i.e., not ascertained to be informative for ancestry) in our study yielded a slightly higher estimate of admixture proportions (0.266 ± 0.108). Although it is possible to perform genome-wide admixture mapping using panels of markers not preselected to be informative for ancestry [30], our results confirm that a few thousand AIMs can be used to estimate admixture proportions as efficiently as 10-fold more random markers.

Admixed populations most commonly used in admixture mapping to date involve those formed by recent admixture between groups originating from different continents as a result of European maritime expansion during the past few hundred years [4]. The number of generations since the original admixture event based on our sample of African Americans was estimated at 7.44 ± 3.35 and 7.33 ± 3.01 generations for the panels based on δ and F_ST, respectively. This estimate is similar to previous estimates of 6.0 ± 1.6 [3], 6.3 ± 1.1 [26], and 7 [42]. Thus, these estimates are stable across different marker panels and different samples of African Americans.

The power of admixture mapping is affected by the information content of the marker map, the sample size, and admixture proportions. We estimated that both AIM panels had an average map power of 0.73 ± 0.08, which is similar to 0.71 ± 0.09 for a previously constructed panel of 2,154 AIMs in African Americans [26]. The two panels had higher map power than the panel of 21 k random markers, which had an average map power of 0.65 ± 0.08. For the locus-genome statistic, a sample size of 500 cases provides 70% power to detect a locus conferring 1.7-fold increased risk due to ancestry [3]. Our study sample size of 509 cases and 508 controls was underpowered for loci conferring 1.5-fold or less risk due to ancestry. Although the power of admixture mapping decreases in populations with a much larger contribution from only one parental population [26], the map power is fairly constant for values of admixture proportion from 10% to 90% [3]. Our estimated values of 19% European ancestry and 81% African ancestry both fall within this range.

Conclusions

We constructed two panels of AIMs for admixture mapping in African Americans from experimentally determined genotypes using the Affymetrix Genome-Wide Human SNP Array 6.0. We constructed the panels conditional on linkage equilibrium over 1 Mb bins. Our iterative pruning procedure was designed to avoid gaps in coverage and to eliminate background linkage disequilibrium. Given the mathematical relationship between δ and F_ST, we recommend both panels of AIMs equally.

Developing marker panels for admixture mapping from existing genotype data derived from commercial high density SNP chips offers two major advantages. (1) No de novo genotyping needs to be done, thereby saving costs. (2) Markers can be filtered for various quality measures and replacement markers (to minimize gaps) can be selected at no additional cost. For our African American sample, we took advantage of preexisting HapMap genotypes for the CEU and YRI samples, but appropriate parental populations may not have already been sampled for some admixed populations. We found that the map power for sparser panels of AIMs is higher than for denser panels of 21 k random markers. Historically, the number of AIMs in an admixture panel reflected the trade-off between maximizing genomic coverage and minimizing genotyping costs. Currently, custom genotyping a panel of ~2,000 AIMs is less expensive than chip-based genome-wide genotyping. However, chip-based genome-wide genotyping is currently less expensive than custom genotyping a panel of ~20,000 random markers. Presumed parental populations are necessary to characterize AIMs. In contrast, parental populations are not needed to characterize random markers prior to estimating admixture proportions. Apart from needing many more random markers compared to AIMs, the major disadvantage of using a panel of random markers without parental populations or external reference samples is the inability to label clusters. Taken together, the ability to develop dense panels of markers from commercial chips provides a fresh opportunity to conduct admixture mapping for disease genes in admixed populations.

References

Hoggart CJ, Shriver MD, Kittles RA, Clayton DG, McKeigue PM: Design and analysis of admixture mapping studies. Am J Hum Genet. 2004, 74 (5): 965-978. 10.1086/420855.
Article CAS PubMed Central PubMed Google Scholar
Montana G, Pritchard JK: Statistical tests for admixture mapping with case-control and cases-only data. Am J Hum Genet. 2004, 75 (5): 771-789. 10.1086/425281.
Article CAS PubMed Central PubMed Google Scholar
Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, Hauser SL, Smith MW, O'Brien SJ, Altshuler D: Methods for high-density admixture mapping of disease genes. Am J Hum Genet. 2004, 74 (5): 979-1000. 10.1086/420871.
Article CAS PubMed Central PubMed Google Scholar
McKeigue PM: Prospects for admixture mapping of complex traits. Am J Hum Genet. 2005, 76 (1): 1-7. 10.1086/426949.
Article CAS PubMed Central PubMed Google Scholar
Smith MW, O'Brien SJ: Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nat Rev Genet. 2005, 6 (8): 623-632. 10.1038/nrg1657.
Article CAS PubMed Google Scholar
Zhu X, Zhang S, Tang H, Cooper R: A classical likelihood based approach for admixture mapping using EM algorithm. Hum Genet. 2006, 120 (3): 431-445. 10.1007/s00439-006-0224-z.
Article PubMed Google Scholar
Kao WH, Klag MJ, Meoni LA, Reich D, Berthier-Schaad Y, Li M, Coresh J, Patterson N, Tandon A, Powe NR: MYH9 is associated with nondiabetic end-stage renal disease in African Americans. Nat Genet. 2008, 40 (10): 1185-1192. 10.1038/ng.232.
Article CAS PubMed Google Scholar
Kopp JB, Smith MW, Nelson GW, Johnson RC, Freedman BI, Bowden DW, Oleksyk T, McKenzie LM, Kajiyama H, Ahuja TS: MYH9 is a major-effect risk gene for focal segmental glomerulosclerosis. Nat Genet. 2008, 40 (10): 1175-1184. 10.1038/ng.226.
Article CAS PubMed Central PubMed Google Scholar
Zhu X, Luke A, Cooper RS, Quertermous T, Hanis C, Mosley T, Gu CC, Tang H, Rao DC, Risch N: Admixture mapping for hypertension loci with genome-scan markers. Nat Genet. 2005, 37 (2): 177-181. 10.1038/ng1510.
Article CAS PubMed Google Scholar
Deo RC, Patterson N, Tandon A, McDonald GJ, Haiman CA, Ardlie K, Henderson BE, Henderson SO, Reich D: A high-density admixture scan in 1,670 African Americans with hypertension. PLoS Genet. 2007, 3 (11): e196-10.1371/journal.pgen.0030196.
Article PubMed Central PubMed Google Scholar
Zhu X, Cooper RS: Admixture mapping provides evidence of association of the VNN1 gene with hypertension. PLoS One. 2007, 2 (11): e1244-10.1371/journal.pone.0001244.
Article PubMed Central PubMed Google Scholar
Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, Tandon A, Lincoln RR, DeLoa C, Fruhan SA, Cabre P: A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet. 2005, 37 (10): 1113-1118. 10.1038/ng1646.
Article CAS PubMed Google Scholar
Basu A, Tang H, Arnett D, Gu CC, Mosley T, Kardia S, Luke A, Tayo B, Cooper R, Zhu X: Admixture mapping of quantitative trait loci for BMI in African Americans: evidence for loci on chromosomes 3q, 5q, and 15q. Obesity. 2009, 17 (6): 1226-1231.
CAS PubMed Central PubMed Google Scholar
Cheng CY, Kao WH, Patterson N, Tandon A, Haiman CA, Harris TB, Xing C, John EM, Ambrosone CB, Brancati FL: Admixture mapping of 15,280 African Americans identifies obesity susceptibility loci on chromosomes 5 and X. PLoS Genet. 2009, 5 (5): e1000490-10.1371/journal.pgen.1000490.
Article PubMed Central PubMed Google Scholar
Cheng C-Y, Reich D, Coresh J, Boerwinkle E, Patterson N, Li M, North KE, Tandon A, Bailey-Wilson JE, Wilson JG: Admixture mapping of obesity-related traits in African Americans: the Atherosclerosis Risk in Communities (ARIC) Study. Obesity. 2010, 18 (3): 563-572. 10.1038/oby.2009.282.
Article PubMed Central PubMed Google Scholar
Scherer ML, Nalls MA, Pawlikowska L, Ziv E, Mitchell GF, Huntsman S, Hu D, Sutton-Tyrrell K, Lakatta EG, Hsueh WC: Admixture mapping of ankle-arm index: identification of a candidate locus associated with peripheral arterial disease. J Med Genet. 2010, 47 (1): 1-7. 10.1136/jmg.2008.064808.
Article CAS PubMed Central PubMed Google Scholar
Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, Waliszewska A, Penney K, Steen RG, Ardlie K, John EM: Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci USA. 2006, 103 (38): 14068-14073. 10.1073/pnas.0605832103.
Article CAS PubMed Central PubMed Google Scholar
Bock CH, Schwartz AG, Ruterbusch JJ, Levin AM, Neslund-Dudas C, Land SJ, Wenzlaff AS, Reich D, McKeigue P, Chen W: Results from a prostate cancer admixture mapping study in African-American men. Hum Genet. 2009, 126: 637-642. 10.1007/s00439-009-0712-z.
Article CAS PubMed Central PubMed Google Scholar
Hughes LB, Morrison D, Kelley JM, Padilla MA, Vaughan LK, Westfall AO, Dwivedi H, Mikuls TR, Holers VM, Parrish LA: The HLA-DRB1 shared epitope is associated with susceptibility to rheumatoid arthritis in African Americans through European genetic admixture. Arthritis Rheum. 2008, 58 (2): 349-358. 10.1002/art.23166.
Article PubMed Central PubMed Google Scholar
Reich D, Patterson N, Ramesh V, De Jager PL, McDonald GJ, Tandon A, Choy E, Hu D, Tamraz B, Pawlikowska L: Admixture mapping of an allele affecting interleukin 6 soluble receptor and interleukin 6 levels. Am J Hum Genet. 2007, 80 (4): 716-726. 10.1086/513206.
Article CAS PubMed Central PubMed Google Scholar
Molokhia M, Hoggart C, Patrick AL, Shriver M, Parra E, Ye J, Silman AJ, McKeigue PM: Relation of risk of systemic lupus erythematosus to west African admixture in a Caribbean population. Hum Genet. 2003, 112 (3): 310-318.
CAS PubMed Google Scholar
Elbein SC, Das SK, Hallman DM, Hanis CL, Hasstedt SJ: Genome-wide linkage and admixture mapping of type 2 diabetes in African American families from the American Diabetes Association GENNID (Genetics of NIDDM) Study Cohort. Diabetes. 2009, 58 (1): 268-274. 10.2337/db08-0931.
Article CAS PubMed Central PubMed Google Scholar
Nalls MA, Wilson JG, Patterson NJ, Tandon A, Zmuda JM, Huntsman S, Garcia M, Hu D, Li R, Beamer BA: Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. Am J Hum Genet. 2008, 82 (1): 81-87. 10.1016/j.ajhg.2007.09.003.
Article CAS PubMed Central PubMed Google Scholar
Chiang CW, Gajdos ZK, Korn JM, Kuruvilla FG, Butler JL, Hackett R, Guiducci C, Nguyen TT, Wilks R, Forrester T: Rapid assessment of genetic ancestry in populations of unknown origin by genome-wide genotyping of pooled samples. PLoS Genet. 6 (3): e1000866-10.1371/journal.pgen.1000866.
Xu S, Jin L: A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery. Am J Hum Genet. 2008, 83 (3): 322-336. 10.1016/j.ajhg.2008.08.001.
Article CAS PubMed Central PubMed Google Scholar
Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E: A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet. 2004, 74 (5): 1001-1013. 10.1086/420856.
Article CAS PubMed Central PubMed Google Scholar
Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG, Vargas E, McKeigue PM: A genomewide admixture mapping panel for Hispanic/Latino populations. Am J Hum Genet. 2007, 80 (6): 1171-1178. 10.1086/518564.
Article CAS PubMed Central PubMed Google Scholar
Tian C, Hinds DA, Shigeta R, Kittles R, Ballinger DG, Seldin MF: A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet. 2006, 79 (4): 640-649. 10.1086/507954.
Article CAS PubMed Central PubMed Google Scholar
Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, Kittles R, Alarcon-Riquelme ME, Gregersen PK, Belmont JW: Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat. 2009, 30 (1): 69-78. 10.1002/humu.20822.
Article PubMed Central PubMed Google Scholar
Tang H, Coram M, Wang P, Zhu X, Risch N: Reconstructing genetic ancestry blocks in admixed individuals. Am J Hum Genet. 2006, 79 (1): 1-12. 10.1086/504302.
Article CAS PubMed Central PubMed Google Scholar
Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G: A genomewide admixture map for Latino populations. Am J Hum Genet. 2007, 80 (6): 1024-1036. 10.1086/518313.
Article CAS PubMed Central PubMed Google Scholar
Tian C, Hinds DA, Shigeta R, Adler SG, Lee A, Pahl MV, Silva G, Belmont JW, Hanson RL, Knowler WC: A genomewide single-nucleotide-polymorphism panel for Mexican American admixture mapping. Am J Hum Genet. 2007, 80 (6): 1014-1023. 10.1086/513522.
Article CAS PubMed Central PubMed Google Scholar
Tian C, Kosoy R, Lee A, Ransom M, Belmont JW, Gregersen PK, Seldin MF: Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS One. 2008, 3 (12): e3862-10.1371/journal.pone.0003862.
Article PubMed Central PubMed Google Scholar
Xu S, Jin L: A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery. Am J Hum Genet. 2008, 83 (3): 322-336. 10.1016/j.ajhg.2008.08.001.
Article CAS PubMed Central PubMed Google Scholar
Adeyemo A, Gerry N, Chen G, Herbert A, Doumatey A, Huang H, Zhou J, Lashley K, Chen Y, Christman M: A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet. 2009, 5 (7): e1000564-10.1371/journal.pgen.1000564.
Article PubMed Central PubMed Google Scholar
Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K: Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet. 2008, 40 (10): 1253-1260. 10.1038/ng.237.
Article CAS PubMed Central PubMed Google Scholar
Wright S: The genetical structure of populations. Ann Eugen. 1951, 15: 323-354.
Article CAS PubMed Google Scholar
Wright S: Evolution and the Genetics of Populations, Vol. 4 Variability Within and Among Natural Populations. 1978, Chicago, Illinois: Univ. Chicago Press
Google Scholar
Rosenberg NA, Li LM, Ward R, Pritchard JK: Informativeness of genetic markers for inference of ancestry. Am J Hum Genet. 2003, 73 (6): 1402-1422. 10.1086/380416.
Article CAS PubMed Central PubMed Google Scholar
Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003, 164 (4): 1567-1587.
CAS PubMed Central PubMed Google Scholar
Gao X, Starmer JD: AWclust: point-and-click software for non-parametric population structure analysis. BMC Bioinformatics. 2008, 9: 77-10.1186/1471-2105-9-77.
Article PubMed Central PubMed Google Scholar
Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, Beaty TH, Mathias R, Reich D, Myers S: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 2009, 5 (6): e1000519-10.1371/journal.pgen.1000519.
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgements

The study was supported by grants S06GM008016-320107 to CNR and S06GM008016-380111 to AA, both from the NIGMS/MBRS/SCORE Program. Participant enrollment was carried out at the Howard University General Clinical Research Center (GCRC), supported by grant number 2M01RR010284 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH. Additional support was provided by the Coriell Institute for Medical Research. This research was supported by the Intramural Research Program of the Center for Research on Genomics and Global Health (CRGGH). The CRGGH is supported by the National Human Genome Research Institute, the National Institute of Diabetes and Digestive and Kidney Diseases, the Center for Information Technology, and the Office of the Director at the National Institutes of Health (Z01HG200362).

Author information

Authors and Affiliations

Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, 20892, USA
Guanjie Chen, Daniel Shriner, Jie Zhou, Ayo Doumatey, Hanxia Huang, Charles N Rotimi & Adebowale Adeyemo
Coriell Institute for Medical Research, Camden, NJ, 08103, USA
Norman P Gerry & Michael F Christman
Department of Genetics and Genomics, Boston University School of Medicine, Boston, Massachusetts, 02118, USA
Alan Herbert
National Human Genome Center, Howard University, Washington, DC, 20060, USA
Yuanxiu Chen, Georgia M Dunston & Mezbah U Faruque

Authors

Guanjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Shriner
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ayo Doumatey
View author publications
You can also search for this author in PubMed Google Scholar
Hanxia Huang
View author publications
You can also search for this author in PubMed Google Scholar
Norman P Gerry
View author publications
You can also search for this author in PubMed Google Scholar
Alan Herbert
View author publications
You can also search for this author in PubMed Google Scholar
Michael F Christman
View author publications
You can also search for this author in PubMed Google Scholar
Yuanxiu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Georgia M Dunston
View author publications
You can also search for this author in PubMed Google Scholar
Mezbah U Faruque
View author publications
You can also search for this author in PubMed Google Scholar
Charles N Rotimi
View author publications
You can also search for this author in PubMed Google Scholar
Adebowale Adeyemo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adebowale Adeyemo.

Additional information

Authors' contributions

GC performed the statistical analysis and drafted the manuscript. DS participated in statistical analysis and interpretation and drafted the manuscript. JZ managed the data and participated in data analysis. AD and HH conducted molecular laboratory analysis. NPG conducted molecular laboratory analysis and genotype calling. AH and MFC conceived and designed the study. YC, GMD, and MUF contributed to study coordination and reviewed the manuscript. CNR and AA conceived and designed the study, participated in data interpretation, and wrote the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

12864_2009_3011_MOESM1_ESM.XLS

Additional file 1: Markers in the panel based on δ. δ, F_ST, and SIC values for AIMS in the panel based on δ. (XLS 374 KB)

12864_2009_3011_MOESM2_ESM.XLS

Additional file 2: Markers in the panel based on F_ST. δ, F_ST, and SIC values for AIMS in the panel based on F_ST. (XLS 348 KB)

12864_2009_3011_MOESM3_ESM.XLS

Additional file 3: Markers in the 2 k and 21 k random marker panels. δ, F_ST, and SIC values for AIMS in the panel based on 2 k and 21 k random marker panels. (XLS 4 MB)

12864_2009_3011_MOESM4_ESM.DOC

Additional file 4: Multidimensional scaling plot. Top four dimensions from multidimensional scaling plot showing HUFS in blue circles, CEU in red squares, and YRI in green diamonds. (DOC 102 KB)

12864_2009_3011_MOESM5_ESM.DOC

Additional file 5: Inter-marker genetic distances (excluding centromeres). Average inter-marker distances in the panels based on δ, F_ST, 2 k, and 21 k random marker. (DOC 58 KB)

12864_2009_3011_MOESM6_ESM.DOC

Additional file 6: Distributions of δ , F_ST, and SIC for the AIMs panels. Genome-wide distributions of δ, F_ST, and SIC values for AIMS. Red represents values from the panel based on δ, blue represents values from the panel based on F_ST, and dark green represents values from the panel of 21 k random markers. Top) δ values. Middle) F_ST values. Bottom) SIC values. (DOC 597 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chen, G., Shriner, D., Zhou, J. et al. Development of admixture mapping panels for African Americans from commercial high-density SNP arrays. BMC Genomics 11, 417 (2010). https://doi.org/10.1186/1471-2164-11-417

Download citation

Received: 25 November 2009
Accepted: 05 July 2010
Published: 05 July 2010
DOI: https://doi.org/10.1186/1471-2164-11-417

Development of admixture mapping panels for African Americans from commercial high-density SNP arrays

Abstract

Background

Results

Conclusions

Background

Methods

Study Population

δ and F ST Calculations

Genetic Map of SNPs

Selection of Ancestry-Informative Markers from HapMap Data

Information Content and Map Power

Estimation of Individual Admixture and Population Structure

Application of the panels to a study of hypertension

Results

Marker Panels for Admixture Mapping in African Americans

Sample Characteristics

Admixture Information Content

Application of the Admixture Panels

Discussion

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us

δ and F_ST Calculations