Analysis of human meiotic recombination events with a parent-sibling tracing approach
© Lee et al; licensee BioMed Central Ltd. 2011
Received: 9 December 2010
Accepted: 26 August 2011
Published: 26 August 2011
Skip to main content
© Lee et al; licensee BioMed Central Ltd. 2011
Received: 9 December 2010
Accepted: 26 August 2011
Published: 26 August 2011
Meiotic recombination ensures that each child inherits distinct genetic materials from each parent, but the distribution of crossovers along meiotic chromosomes remains difficult to identify. In this study, we developed a parent-sibling tracing (PST) approach from previously reported methods to identify meiotic crossover sites of GEO GSE6754 data set. This approach requires only the single nucleotide polymorphism (SNP) data of the pedigrees of both parents and at least two of children.
Compared to other SNP-based algorithms (identity by descent or pediSNP), fewer uninformative SNPs were derived with the use of PST. Analysis of a GEO GSE6754 data set containing 2,145 maternal and paternal meiotic events revealed that the pattern and distribution of paternal and maternal recombination sites vary along the chromosomes. Lower crossover rates near the centromeres were more prominent in males than in females. Based on analysis of repetitive sequences, we also showed that recombination hotspots are positively correlated with SINE/MIR repetitive elements and negatively correlated with LINE/L1 elements. The number of meiotic recombination events was positively correlated with the number of shorter tandem repeat sequences.
The advantages of the PST approach include the ability to use only two-generation pedigrees with two siblings and the ability to perform gender-specific analyses of repetitive elements and tandem repeat sequences while including fewer uninformative SNP regions in the results.
Meiotic recombination is important for generating genetic diversity. Meiotic recombination occurs between homologous chromosomes during chiasmata formation, a process that is required for normal chromosomal segregation during meiosis. While variation in recombination rates is a ubiquitous feature of the human genome , the mechanisms governing the distribution of crossovers along meiotic chromosomes remain largely unclear, with the exception of the recent discovery that Prdm9 is involved in the activation of mammalian recombination hotspots [2–5]. Sex-specific effects [6–8] on regional meiotic recombination have been described. Recombination rates are approximately 1.7-fold higher in female meiosis than in male meiosis. In addition, crossover rates in males are 5-fold lower near centromeres but 10-fold higher near telomeres compared with those in females . These differences could be related to sex-specific patterns of initiation of synapses between homologs. For example, synaptonemal complex lengths are shorter in males than in females , and synapses appear preferentially in subtelomeric regions in males .
Meiotic recombination events can be measured directly or indirectly . Physical crossovers between homologous chromosomes, indicating meiotic recombination events, can be directly observed at specific time points during spermatogenesis . Alternatively, crossovers may be analyzed directly in cytogenetic analysis by labeling meiosis-related proteins, such as MLH1 . Despite the unequivocal value of direct analysis, these techniques are labor-intensive and precision is limited. Therefore, most analyses of human recombination currently rely on indirect approaches such as genetic linkage analysis of human pedigrees. This involves tracking the inheritance of alleles at multiple polymorphic markers (short tandem repeat polymorphisms, STRP; or single nucleotide polymorphisms, SNP) along the chromosomes across generations [15–17].
Based on the distribution of SNPs in both parents and multiple siblings, meiotic cross sites in human chromosomes can be identified. This method was first proposed by Coop et al. in 2008 to trace the "informative markers" transmitted by the father to each offspring . They defined the "informative markers" as SNPs that are heterozygous in the father and homozygous in the mother. In 2009, Chowdhury et al. used two datasets, namely, the Autism Genetic Research Exchange (AGRE) and the Framingham Heart Study (FHS), to characterize the variation in recombination phenotypes . They analyzed sex differences and recombination jungles across the human genome, and described the gene loci associated with recombination phenotypes .
In this study, we have used a parent-sibling tracing (PST) approach, which was derived from two previous reports [6, 20], to analyze the Genomic Medicine Research Core Laboratory, Taiwan (GMRCL) dataset of Affymetrix SNP6.0 arrays which consists of 900 K SNP markers and the GSE6754 dataset from Gene Expression Omnibus (GEO) , which consists of 853 families. Our analyses of this dataset of 2,145 meioses resulted in a 1-Mb-resolution recombination map. In addition, we were able to characterize the relationships between recombination sites and repetitive elements as well as the relationships between recombination sites and tandem repeats sequences.
Comparison of the code calling schemas between the IBD and PST methods showed that IBD identified fewer genotyping combination calls than the PST approach. For instance, when we analyzed the recombination sites in the 100-kb genomic region located at 114.6 Mb on chromosome 1 (Figures 2B and 2C, indicated with the arrow), the numbers of uninformative SNPs in the recombination site for the IBD and PST methods were 22 and 19, respectively (Figures 2D and 2E), resulting in uninformative regions of 54 kb for the IBD method (Figure 2D) and 48 kb for the PST approach (Figure 2E), respectively.
Comparison of the size and SNP numbers in uninformative regions
Q2 (Q1 - Q3) kb †
Q2 (Q1 - Q3) kb †
110 (58 - 336)
60 (23 - 157)
61 (19 - 189)
3291 (2255 - 5738)
1751 (1270 - 3347)
2683 (1249 - 5796)
1806 (947 - 3389)
3768 (1858 - 6420)
1701 (938 - 2853)
2842 (1145 - 5789)
2151 (1234 - 3712)
3557 (1877 - 6415)
1892 (1195 - 3230)
2046 (1130 - 4031)
The Affymetrix Human Mapping 10 K 2.0 Arrays (containing 10 K SNPs) were used to map autism susceptibility loci in the GSE6754 dataset . Three three-generation pedigrees (family ID: 3117, 3180, 8071) were selected to compare the usefulness of the IBD and PST methods. Since the 10 K 2.0 array covered fewer SNPs, the mean size of uninformative regions were about 20-fold higher and the number of uninformative SNPs was approximately 6-fold lower than those of SNP 6.0 Arrays. Compared to other approaches, the PST approach identified fewer uninformative SNPs and smaller uninformative genomic regions (Table 1).
Number of recombination sites in 2145 siblings from 853 families
In order to identify the regions with the highest and the lowest number of recombination events, we scanned the entire human genome. We first divided the genome into 2,765 bins of 1-Mb each. We then identified the number of recombination sites in each bin separately for female and male meioses. The results obtained from chromosome 1 are shown in Figure 3B (see the Additional file 3 for the results on other chromosomes). We also compared the recombination maps obtained from dataset GSE6754 with Marshfield map  (Figure 3B, middle panel), and Icelandic map  (Figure 3B, lower panel). The correlation coefficients between the data in GSE6754 map and Icelandic map and Marshfield map were r = 0.49 and r = 0.31, respectively.
Correlation of the distance from the recombination site to the centromere with the number of recombination events
Correlation between the recombination sites and particular repeats
We divided the genome into 2,765 bins of 1-Mb each and determined the number of tandem repeats in each bin. We then analyzed the correlation between the number of maternal meiotic recombination sites and the number of tandem repeats (Figure 6B); the correlation coefficient was 0.11 (P < 2 × 10-7). Furthermore, we grouped tandem repeats into 4 quartiles by the length of these repeat sequences, as (Q1) 1-4, (Q2) 5-15, (Q3) 16-24 and (Q4) > 25 bp. The correlation coefficients between recombination sites and the 4 quartiles were 0.25 (P < 1 × 10-16), 0.11 (P < 2 × 10-8), 0.04 (P = 0.08) and 0.03 (P = 0.16), respectively (Figures 6C-F). These results showed that the maternal meiotic recombination sites were positively correlated with shorter repeat sequences and less correlated with longer repeat sequences. Similarly, we analyzed the correlation between the number of paternal meiotic recombination sites and the number of tandem repeats, with r = 0.12 (P < 5 × 10-9). The correlation coefficients for the 4 subgroups were 0.19 (P < 1 × 10-16), 0.09 (P < 4 × 10-6), 0.09 (P < 3 × 10-6) and 0.05 (P = 0.004), respectively (Additional file 4).
In this study, we use a PST approach to analyze the sites of meiotic recombination in two-generation pedigrees. We first tested it on a GMRCL dataset of the Affymetrix SNP 6.0 array consisting of 900 K SNP markers, followed by a 10 K GSE6754 dataset. In the GSE6754 dataset, which was previously used for mapping autism risk loci, most data are based on two-generation pedigrees (1,168 families) as this dataset contains only 29 three-generation pedigrees. Although the PST approach requires only pedigrees of two generations, it requires information from at least two siblings. The use of SNPs as genetic markers to identify recombination sites can often result in the inclusion of uninformative regions. However, the size of uninformative regions that result from the PST approach is significantly lower than that seen from the use of the IBD method (Table 1 ).
We next assessed whether crossovers may alter the DNA sequence by causing de novo mutations at sites of recombination. Given that the uninformative regions of PST were relatively small, eight recombination events were identified with sizes of less than 2 kb. Notably, we did not identify any sequence variation at these recombination points (data not shown). This observation needs further validation by sequencing more datasets.
The average number of recombination events observed with the PST approach was similar to the findings of other studies. The distribution of recombination events showed a mean value of 23.8 in paternal origin and 39.5 in maternal origin. Chowdhury et al reported the genome-wide recombination events in paternal origin ranged from 25.9 to 27.3 while in maternal origin ranged from 38.4 to 47.2 . Another study by Cheung et al demonstrated that the mean numbers of recombination events were 24.0 in male meiosis and 38.4 in female meiosis .
In an indirect pedigree analysis using SNPs as genetic markers, Cheung et al  reported that several recombination events appeared to occur nearer to the telomeres. Using the PST approach, we analyzed the distance between the recombination site and the centromere for each gender separately (Table 3). In male meiosis, most of the crossovers are located in the q arms, and the number of recombination events increased significantly when moving from centromeres to telomeres. Interestingly, we observed fewer recombination events in the p arms of female chromosomes, resulting in the male-to-female ratio of 1.67 (Table 2). In women, only chromosomes 1q and 6q showed a significant, positive correlation between the number of recombination sites and distance from the centromere (Table 3).
To determine the extensive sequence-context variation in recombination hotspots, Myers et al. constructed a fine-scale map of recombination rates and hotspots across the human genome based on genotypes of 1.6 million SNPs in three sample populations, including 24 European Americans, 23 African Americans, and 24 Han Chinese . The authors reported an increase of recombination hotspots in the regions surrounding coding genes, though these were preferentially located outside the transcribed regions. The analysis of the relationships between recombination hotspots and repeat elements indicated that L2 and THE1B are unusually high in hotspots, whereas L1 elements are low . In this study, we identified a similar pattern of frequent hotspots in L2 as opposed to the low number of hotspots in L1 elements (Table 4 ). Of note, results showed that the majority of the hotspots in both paternal and maternal meioses were similar.
Human chromosomes are characterized by prominent differences in the pattern and rate of meiotic recombination events. Significant inter-individual and gender differences also exist. The major advantages of the PST approach include the use of two-generation pedigrees with two or more siblings, fewer uninformative SNP regions, and the ability to perform gender-specific analyses of recombination hotspots (using databases derived from high density arrays such as Affymetrix SNP6.0) and repetitive elements. An accurate determination of meiotic crossovers using this approach may prove useful to explore the biology of human chromosomes.
In the present study we compared different SNP-based methods for detecting recombination points, i.e. IBD (Figure 1A) , and PST (Figure 1B). The code calling schema for the IBD and PST methods are depicted in the Additional Files 1A and 1B. The meiosis recombination sites were exported from the PSTReader, a MATLAB-based program (version 7.9). The PSTReader was used to define the recombination sites for the IBD and PST methods. The MATLAB source code, example data, and a standalone application can be freely downloaded from: http://www.mcu.edu.tw/department/biotec/en_page/PSTReader/index.htm.
In this study, a set of the Affymetrix Genome-Wide Human SNP array 6.0 (GMRCL dataset) consisting of 900 K SNP markers was used as a template. DNA was extracted from blood collected in a study that was approved by the Chang Gung Memorial Hospital Institute Review Board (IRB#99-0229B). SNP genotyping was performed using the SNP array 6.0 (Affymetrix, Santa Clara, CA, http://www.affymetrix.com) at the Genomic Medicine Research Core Laboratory (GMRCL), Chang Gung Memorial Hospital. The GMRCL dataset includes the genotypes of an anonymous family consisting of the paternal/maternal grandfather, paternal/maternal grandmother, father, mother and two children. The identity-delinked SNP genotypes and pedigree information for each member can be downloaded from http://www.mcu.edu.tw/department/biotec/en_page/PSTReader/index.htm.
The GSE6754 dataset was downloaded from the Gene Expression Omnibus (GEO), and contains information from 6,971 Affymetrix GeneChip Human Mapping 10 K 2.0 Arrays. Data from parental and sibling genotypes are available for 1,168 families in this dataset. To increase analytic accuracy, we excluded samples with genotyping call rates less than 90%, those lacking pedigree information, and individuals with chromosomal abnormalities (n = 22) . The remaining 3,864 arrays of 853 families (1,721 parents and 2,145 siblings) were included in the PST analysis of recombination events in human meiosis. The details on individual, families, and pedigrees are provided in Additional file 5.
The recombination sites and repetitive elements were mapped using the hg18 (NCBI Build 36) human reference assembly. The classes and characters of major repetitive elements were downloaded from RepeatMasker , and the tandem repeat sequences were identified using the Tandem Repeats Finder program . Correlations between recombination sites and repetitive elements or tandem repeat sequences were analyzed with MATLAB (version 7.9). To assess the distribution and correlation between recombination sites and repetitive elements or tandem repeat sequences, we calculated the number of recombination sites (or repetitive elements or tandem repeat sequences) using a window width set to 1 Mb. We divided the human genome into 2765 bins of 1 Mb each and determined the number of recombination sites in each bin. The distance for each 1 Mb window was calculated based on SNP positions according to the Affymetrix data, assuming a constant crossover rate between two adjacent SNP markers. To calculate the correlation coefficients between the recombination in GSE6754 map, Icelandic map and Marshfield map, we divided the human genome into 2765 bins of 1 Mb each and determined the number of recombination sites in each bin, as described above.
identity by descent
identity by state
simple tandem repeat polymorphisms
single nucleotide polymorphisms.
This study was supported by grants: NSC 97-2320-B-130-001-MY2 (to YS Lee), NSC 98-3112-B-001-027 from the National Research Program for Genomic Medicine (to YS Lee and CH Chen); DOH99-TD-C-111-006 (to A Chao and TH Wang) and DOH99-TD-I-111-TM013 (to TH Wang) from the Department of Health, Taiwan; and CMRPG340463 (to TH Wang) from the Chang Gung Medical Foundation. The authors wish to thank Dr. Chi-Nue Tsai (Chang Gung University) and Dr. Shih-Tien T. Wang of Children's Hospital of Wisconsin, Milwaukee, for helpful discussion.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.