A genome-wide Asian genetic map and ethnic comparison: The GENDISCAN study

Background Genetic maps provide specific positions of genetic markers, which are required for performing genetic studies. Linkage analyses of Asian families have been performed with Caucasian genetic maps, since appropriate genetic maps of Asians were not available. Different ethnic groups may have different recombination rates as a result of genomic variations, which would generate misspecification of the genetic map and reduce the power of linkage analyses. Results We constructed the genetic map of a Mongolian population in Asia with CRIMAP software. This new map, called the GENDISCAN map, is based on genotype data collected from 1026 individuals of 73 large Mongolian families, and includes 1790 total and 1500 observable meioses. The GENDISCAN map provides sex-averaged and sex-specific genetic positions of 1039 microsatellite markers in Kosambi centimorgans (cM) with physical positions. We also determined 95% confidence intervals of genetic distances of the adjacent marker intervals. Genetic lengths of the whole genome, chromosomes and adjacent marker intervals are compared with those of Rutgers Map v.2, which was constructed based on Caucasian populations (Centre d'Etudes du Polymorphisme Humain (CEPH) and Icelandic families) by mapping methods identical to those of the GENDISCAN map, CRIMAP software and the Kosambi map function. Mongolians showed approximately 1.9 fewer recombinations per meiosis than Caucasians. As a result, genetic lengths of the whole genome and chromosomes of the GENDISCAN map are shorter than those of Rutgers Map v.2. Thirty-eight marker intervals differed significantly between the Mongolian and Caucasian genetic maps. Conclusion The new GENDISCAN map is applicable to the genetic study of Asian populations. Differences in the genetic distances between the GENDISCAN and Caucasian maps could facilitate elucidation of genomic variations between different ethnic groups.


Background
Genetic maps provide specific positions of genetic markers, which are required for performing genetic studies. Linkage analyses, which aim to identify genetic loci related to human phenotypes and complex diseases, have been performed with Caucasian genetic maps even in Asian populations, because no comprehensive Asian genetic maps with dense markers have yet been introduced. Since multipoint methods are frequently used in linkage analyses, it is important to use correct maps for the population being studied [1].
Distance between adjacent genetic markers in genetic maps is calculated from average recombination rates between markers during meiosis with map functions. The Kosambi map function is widely used nowadays.
Other than for Caucasians, however, there are few human genetic maps for different ethnic groups. Although genetic maps of four different ethnic groups (African Americans, Mexican Americans, East Asians, and Whites) were recently constructed, the number of markers was quite small (n = 353) and the maps were constructed based on nuclear families [11]. Misspecification of genetic maps may reduce the power of linkage analyses [1,12], and different ethnic groups may have different recombination rates [13]. Therefore, separate genetic maps for Asian populations are needed to investigate the Asian genome more precisely.
We constructed an Asian genetic map with 1039 microsatellite markers using 1026 genotyped individuals in 73 large Mongolian families. This study was undertaken as a part of GENDISCAN (GENe DIScovery for Complex traits in isolated large families of Asians of Northeast) project. The construction of an Asian genetic map may be applicable to further linkage studies of Asian ethnic groups as well as to understanding the genomic variations between Asians and Caucasians with megabase resolution.

Results
Files providing details of the GENDISCAN map (e.g., the genetic/physical positions, the genetic distance of inter-vals, the 95% confidence intervals of the genetic distance of intervals, the genetic distance of Rutgers Map v.2 intervals, the p-values denoting the significance levels for differences in the genetic distance between the GENDISCAN map and Rutgers Map v.2 intervals, the heterozygosity of markers, the number of informative meioses of markers and the number of informative meioses between markers) are available. See additional file 1: Details of GENDISCAN map.
We genotyped 73 families, consisting of 1446 family members and a total of 1790 meioses. Among the 1446 family members, 1026 were genotyped and 1500 meioses became available for investigation. Considering the heterozygosity of the 1039 microsatellite markers genotyped in this study, 47 to 1098 informative meioses of each marker (average, 711.5) were obtained using CRIMAP software [14]. Only 18 markers (1.7%) showed fewer than 400 informative meioses.
The GENDISCAN map shows the genetic positions of the markers, both sex-averaged and sex-specific, along with the physical positions. A summary of the map is presented in Table 1. The physical lengths of the chromosomes are also included. The GENDISCAN map covers 2703.1 Mb, which is 94.3% of the human genome assembly Build 36.2. When we excluded the telomeric heterochromatic regions, which have a wide range of sequencing gaps in especially acrocentric chromosomes (13,14,15,21,22), the coverage increases to 96.9%.
We compared our GENDISCAN map with the Rutgers Map v.2, one of the most accurate genetic maps of Caucasians (generated from CEPH and Icelandic families) and including 28121 polymorphic markers, with an average of 301 informative meioses [8]. Among the 1039 markers shown in the GENDISCAN map, we were able to determine the genetic positions of 1006 microsatellite markers common to both the GENDISCAN and Rutgers Map v.2, from Rutgers Map v.2. The genetic positions of the remaining 33 markers, which are not present in Rutgers Map v.2, were estimated by an interpolation considering the physical positions of the markers from human genome assembly Build 36.2.
The sex-averaged, female and male whole genome lengths were 3230.6 cM, 3906.0 cM and 2394.0 cM, respectively, which are 5 -9% shorter than the respective genome lengths of Rutgers Map v.2 ( Figure 1). The genetic lengths of chromosomes of the sex-averaged, female and male maps are illustrated in Figures 2, 3 and 4, respectively. All chromosomal lengths of the GENDISCAN map were shorter than those of the Rutgers Map v.2 except for male chromosomes 3 and 6, and sex-averaged chromosome 3. Paired t-tests demonstrated significant differences between the whole genome and chromosome genetic lengths ( Table 2). The whole genome and chromosome 2 lengths of all three types of GENDISCAN map were significantly shorter than those of the Rutgers MAP v.2.
The genetic distances between adjacent markers and the 95% confidence intervals were estimated. The average intermarker spacing was 2.66 Mb and 3.17 cM. The inter- Comparison of whole genome lengths between the GENDIS-CAN map and Rutgers Map v.2 Figure 1 Comparison of whole genome lengths between the GENDISCAN map and Rutgers Map v.2. Error bars represents the 95% confidence intervals from the paired ttests.
Comparison of sex-averaged genetic lengths of the chromo-somes between the GENDISCAN map and Rutgers Map v.2 Figure 2 Comparison of sex-averaged genetic lengths of the chromosomes between the GENDISCAN map and Rutgers Map v.2. Error bars represents the 95% confidence intervals from the paired t-tests.
marker recombination rate was derived by dividing the intermarker genetic distance by the physical distance. The recombination rate patterns for chromosomes are illustrated in additional files (see additional files 2, 3, and 4: recombination rates of the sex-averaged, female and male maps, respectively). These figures are helpful in comparing the genome-wide recombination patterns of GENDIS-CAN and Rutgers Map v.2. The sex-averaged recombination rate patterns of chromosome 8p were quite different ( Figure 5). We compared 1017 intermarker genetic distances between GENDISCAN and Rutgers Map v.2, and calculated p-values for the significance of these differences ( Figure 6, see Methods). A histogram of these 1017 normalized intermarker-interval-differences, or z scores transformed from the corresponding p-values, showed that the distribution of intermarker-interval-differences between the GENDISCAN map and Rutgers Map v.2 was close to normal ( Figure 7).
Although most of the intervals of the GENDISCAN map and Rutgers Map v.2 were in good agreement, 40 intervals (3.9%) differed significantly after Bonferroni's multiple comparison correction (p-value < 4.9 × 10-5). Two of these 40 intervals were excluded, since their intermarker genetic distances on Rutgers Map v.2 were derived by interpolation. Thus, we identified 38 ethnically different marker intervals ( Table 3). The differences in local genomic structure difference in these intervals may cause local recombination rate differences among ethnic groups.

Discussion
The human genome varies among ethnic groups as a result of their diverse history. Complex phenotypes result from the interaction of different genes with the unique environments to which humans are exposed. Finding specific disease loci within families helps identify the genetic causes of complex diseases effectively.
The GENDISCAN study, which began in 2003, was designed to identify specific genetic loci and genes that influence complex traits and diseases in Northeast Asian populations. As the lifestyle of Northeast Asians has become more westernized, the prevalence of complex diseases, such as diabetes, obesity, cardiovascular diseases and cancers, has increased. Linkage analyses require appropriate genetic maps for identifying correct loci. Hence we constructed a genetic map of an Asian population as an initial step in our GENDISCAN study.
World populations can be grouped into nine clusters based on genetic distances: African; New Guinean and Australian; Pacific Islander; Southeast Asian; Northeast Asian; Arctic Northeast Asian; Amerind; North African and West Asian; and European [15,16]. The Northeast Asian cluster includes Japanese, North Chinese, Koreans and Mongolians. Since genetic distance within clusters is closer than between clusters, our genetic map of Mongolians may be more applicable to Japanese, North Chinese and Koreans than genetic maps of Caucasians.
When we compared the GENDISCAN map with the Caucasian Rutgers Map v.2, we found that the genetic length of the GENDISCAN map was much shorter. Genomewide, Mongolians show about 1.9 fewer recombinations per meiosis compared with Caucasians. This is due to a general trend of 1017 marker intervals overall rather than several specific genomic regions. Although the genome- wide recombination rate patterns showed good agreement between the two ethnic groups, those of Asians were generally smaller. However, we also identified several regions in which the patterns did not correlate; that is some recombination jungles in the GENDISCAN map appear as recombination deserts in Rutgers Map v.2, and vice versa.
A previous examination of ethnic differences in genetic maps identified no significant differences for genomewide genetic length between Caucasians and Asians, but found a significant local difference on 8p, a finding identical to ours [11]. The ethnic difference on 8p is likely due to a frequent local polymorphic inversion [11]. Interestingly, we found a suggestive inversion of marker orders against the physical map on 8p22 (D8S520 and D8S1759 with likelihood of [inversion]/[original] = 17.8) by FLIPS option of CRIMAP. The sex averaged map of chromosome 8, reflecting the inversion, is available on additional file 5.
Since previous ethnic-specific maps were constructed using a small number of genetic markers (n = 353) and nuclear families, with most families made up of no grandparents and several children, these findings may be less robust than ours [11]. Generally, determining the phase of genotypes to find recombinations requires genetic information of three generations, or that of two generations with many children.
Recombination is related to diversity of DNA sequences, linkage disequilibrium (LD) and copy number changes [17]. Asian-Americans have a smaller number of single nucleotide polymorphisms (SNPs) than European-Americans (5050 versus 6736) and lower minor allele frequencies (MAFs) of the SNPs (820 (16%) versus 1579 (23%) whose MAFs > 5%) [18].  [20]. These findings indicate that Asians have a more homogeneous genome than Caucasians, probably as a result of their low recombination rate.
Although genetic studies using SNP markers or resequencing on random populations rather than families give finescale recombination pattern data, these data are indirect and may be biased by mutation, selection, drift and demography [21]. Most recombinations occur in short kilobase scale regions, known as recombination hotspots  [22]. Fine-scale data have suggested that these hotspots and intensities may differ among different ethnic groups [23]. Moreover, population-specific hotspots have also been identified, and populations of close geographic regions tend to show similar hotspot intensities [24]. The properties of recombination hotspots are not well known, but some characteristics have been described [22]. The local DNA sequences of recombination hotspots present more long terminal repeats of retrotransposons, THE1A and THE1B, as well as CT-rich and GA-rich repeats [22]. Some DNA motifs, such as the CCTCCCT oligomer of THE1A and THE1B, and the CCCCACCCC oligomer within recombination hotspots, may be local DNA signals of recombination hotspots [22]. Additional studies of recombination patterns, with comparisons among different ethnic groups, are necessary to understand the nature of recombination hotspots. The present study, which found significantly different marker intervals between two ethnic groups, can facilitate further comparisons of genomic variations among ethnic groups.    [7,8], the marker-matched genome-wide genetic length of Rutgers Map v.2 is 2 cM shorter than that of Rutgers Map v.1 (data not shown). Moreover, construction of the GEN-DISCAN genetic map of chromosome 1 using fewer and fewer markers increases, rather than decreases, the genetic length (data not shown). Biologically, the double recombinations are considered very rare in human meiosis, since not only is recombination uncommon (about 32 per genome per meiosis), but also one chiasma inhibits formation of another chiasma nearby (positive interference). The Kosambi map function, which is widely used and thought to reflect adequate levels of double recombination in humans, has been applied for calibrating the slight possibility of double recombination when constructing genetic maps [25]. Therefore, if non-Mendelian genotyping errors, which appear as double recombinations, were properly eliminated during the cleaning process, there is no reason to expect that small numbers of markers reduce genetic length in genetic maps.
The statistical methods used in the comparisons are also worthy of note. The paired t-test, which is used to compare of genetic lengths of the whole genome and chromosomes, is not the method of choice for testing the difference between sums of intervals (genetic lengths of the whole genome and chromosomes), but is the method of choice for testing the difference between the average of intervals. However, since each interval distance of the GENDISCAN map is significantly shorter than that of Rut- gers Map v.2, the sum of interval distances of GENDIS-CAN is likely shorter. We therefore used paired t-tests to compare of whole genome genetic lengths and to estimate the significance levels of their differences.
We assumed that the number of recombination events between markers would follow a binomial distribution, then a normal distribution for estimating the 95% confidence interval of each intermarker genetic distance (see Methods, Statistical Methods for details). Adjusted Wald methods were used for marker intervals, whose  equals zero. The 95% confidence intervals of the GENDISCAN interval distances and p-values for the significance of differences between GENDISCAN and Rutgers Map v.2 intervals must be interpreted with caution, since they were calculated under those assumptions. However, we believe that those values are helpful parameters for assessing the certainty and finding the significant differences between genetic maps.

Conclusion
In summary, we constructed a genetic map with large Asian families. The GENDISCAN map may provide better results than Caucasian genetic maps in linkage analysis of Asians. We also found that the GENDISCAN map shows shorter genetic distances than a Caucasian genetic map, with Asians having 1.9 fewer recombination events per meiosis than Caucasians. The recombination rates of some marker intervals differed significantly between populations. Our results illustrate the differences in recombination patterns between ethnic groups and provide clues to their underlying genomic variations.

Subjects
Genetic mapping was performed as part of the GENDIS-CAN study, designed for linkage analysis of a number of complex traits of the Asian population. In 2006, we collected and genotyped 978 individuals in Dashbalbar, Dornod Province, Mongolia. Their relationships were determined from interviews and confirmed by genotype data (see Methods, Genotyping). Informed consent was obtained from all enrolled subjects, and the study protocol was approved by the institutional review board (IRB) of Seoul National University (approval number, H-0307-105-002).
The pedigree size was exceedingly large for an effective analysis; it included seven families, with the largest including 949 genotyped subjects. Hence, we separated the seven families into 73 families, causing 44 genotyped subjects to be included in more than one family. However, no meioses overlapped in this procedure. The final pedigree included 1446 individuals, with 1026 subjects genotyped, and a total of 1790 meioses, with 895 for each gender, and 1500 meioses available for investigation.

Genotyping
Venous blood was collected and DNA was extracted from leukocytes using standard protocols. Genotyping was completed with 1039 microsatellite markers throughout the autosomes by deCODE genetics.
Genotyping errors were detected and removed using three software packages: nonpaternity was checked with PREST [26]; individual relationships other than paternity were identified and corrected with PEDCHECK [27]; and non-Mendelian errors were investigated with SimWalk [28]. Mendelian and non-Mendelian errors constituted 0.13% and 0.26%, respectively, of all genotype data.

Genetic Mapping
After correcting genotype errors, the GENDISCAN map was generated with CRIMAP software [14]. The orders of markers were determined by comparison with the physical map of the human genome assembly Build 36.2. We confirmed the order of markers with the FLIPS option.
The FIXED option of CRIMAP was used to calculate recombination fractions between two successive markers. The Kosambi map function was used for the estimation of genetic distances. Genetic maps for both the sex specific and sex averaged genetic maps were generated.

Statistical Methods
The genetic lengths of the whole genome and each chromosome were compared between the GENDISCAN and Rutgers Map v.2 using paired t-tests. For these comparisons, all 1017 intermarker intervals contributed to genome-wide genetic distance estimation and the marker intervals on the particular chromosome to the genetic distance of each chromosome.
To estimate the 95% confidence interval of each intermarker genetic distance of the GENDISCAN map, we assumed that the number of recombination events between adjacent markers would follow a binomial distribution (Nr ~ B(Nm, ): Nr, the number of recombination; Nm, the number of informative meioses; , recombination fraction). Moreover, such a distribution was transformed into a normal distribution, since the number of meioses between markers was not small (n > 30). We obtained  and Nm of each intermarker interval from the CRIMAP software [14], which calculates recombination fractions through both two-point and multipoint maximum likelihood estimation. For example, if M1, M2, M3 and M4 are genetic markers with the correct order in a pedigree, the CRIMAP software estimates  between M2 and M3, even if the interval (M2, M3) is not informative, using the flanking informative markers (M1 and M4). Therefore, the Nm obtained above is not the direct value for calculating the final recombination fraction, but was used to understand approximately how large the Nm was that was used for estimating .
The final estimates of  and Nm for each interval were used to calculate the 95% confidence intervals of recombination fractions using the following equations: which are from the normal distribution and binomial distribution, respectively. In the case where the estimated  equals 0, we applied the adjusted Wald method, in which  is replaced by ' = 2/(Nm + 4) [8]. In this study, 25 of the 1017 intermarker recombination fractions were calculated by adjusted Wald methods. Finally, the 1017 recombination fractions and the confidence intervals were transformed into genetic distances (cM) by the Kosambi map function. In addition, the statistical significance levels of differences in s between GENDISCAN and Rutgers Map v.2 were tested based on the confidence intervals. Bonferroni's multiple comparison correction method was applied to determine significant differences (p-value < 4.9 × 10 -5 ).