Recent and historical recombination in the admixed Norwegian Red cattle breed
© Sodeland et al; licensee BioMed Central Ltd. 2011
Received: 14 September 2010
Accepted: 14 January 2011
Published: 14 January 2011
Skip to main content
© Sodeland et al; licensee BioMed Central Ltd. 2011
Received: 14 September 2010
Accepted: 14 January 2011
Published: 14 January 2011
Comparison of recent patterns of recombination derived from linkage maps to historical patterns of recombination from linkage disequilibrium (LD) could help identify genomic regions affected by strong artificial selection, appearing as reduced recent recombination. Norwegian Red cattle (NRF) make an interesting case study for investigating these patterns as it is an admixed breed with an extensively recorded pedigree. NRF have been under strong artificial selection for traits such as milk and meat production, fertility and health.
While measures of LD is also crucial for determining the number of markers required for association mapping studies, estimates of recombination rate can be used to assess quality of genomic assemblies.
A dataset containing more than 17,000 genome-wide distributed SNPs and 2600 animals was used to assess recombination rates and LD in NRF. Although low LD measured by r2 was observed in NRF relative to some of the breeds from which this breed originates, reports from breeds other than those assessed in this study have described more rapid decline in r2 at short distances than what was found in NRF. Rate of decline in r2 for NRF suggested that to obtain an expected r2 between markers and a causal polymorphism of at least 0.5 for genome-wide association studies, approximately one SNP every 15 kb or a total of 200,000 SNPs would be required. For well known quantitative trait loci (QTLs) for milk production traits on Bos Taurus chromosomes 1, 6 and 20, map length based on historic recombination was greater than map length based on recent recombination in NRF.
Further, positions for 130 previously unpositioned contigs from assembly of the bovine genome sequence (Btau_4.0) found using comparative sequence analysis were validated by linkage analysis, and 28% of these positions corresponded to extreme values of population recombination rate.
While LD is reduced in NRF compared to some of the breeds from which this admixed breed originated, it is elevated over short distances compared to some other cattle breeds. Genomic regions in NRF where map length based on historic recombination was greater than map length based on recent recombination coincided with some well known QTL regions for milk production traits.
Linkage analysis in combination with comparative sequence analysis and detection of regions with extreme values of population recombination rate proved to be valuable for detecting problematic regions in the Btau_4.0 genome assembly.
The historical pattern of recombination in the population of genomes of a species or breed contain an enormous amount of information on history of population size, including expansions and contractions, gene flow between other breeds, and selection . It has also been demonstrated that rate of recombination is not uniform across a chromosomal segment, rather recombination events tend to occur in recombination hotspots [2, 3]. The pattern of linkage disequilibrium (LD) in the current generation of a species reflects all of these processes. While the pattern of LD therefore contains much information, deciphering the relative contribution of each process to the current pattern of LD is challenging [1, 4–11].
Some additional insight into the relative contribution of each process can be gained from comparing historical patterns of recombination inferred from LD to recent patterns of recombination inferred from genetic maps. One hypothesis would be that in genome regions where large discrepancies are observed between map distances inferred from LD and genetic map distances, strong selection is occurring. Norwegian Red cattle (NRF) was developed mainly through crosses of old Norwegian breeds with other Scandinavian breeds like Swedish Red and White, Black and White Swedish and Finnish Ayrshire. Pedigree data has been recorded since formation of NRF, and the breed has been under strong artificial selection for traits such as milk and meat production, fertility and health. A further attraction of using NRF for this type of study is the extensive pedigree data available, assisting determination of frequency of recombination events between adjacent markers. The extent of LD in cattle has been investigated in a number of studies [7, 12–15]. Relative to humans cattle display elevated LD, which is likely due to small recent effective population size generally observed in livestock populations [7, 12–16]. Previous studies have shown some variation between cattle breeds in rate of decline in LD with increasing distance between genetic markers [15, 17, 18], which is also at least partly attributable to population history.
Another application for recombination rate estimates is within validation of positioning and assembly of genome contigs by linkage analysis. The bovine genome has recently been sequenced by a combined bacterial artificial chromosome and whole-genome shotgun approach . The resulting Btau_4.0 assembly has contig and scaffold N50 sizes of 48.7 kb and 1.9 Mb respectively, and represents 95% of the total genome sequence placed on the 29 autosomes and the X chromosome. Construction of genetic maps in NRF was used to assess quality of the Btau_4.0 assembly and indicated a positional error rate of less than 0.8% .
The sequencing and assembly of larger genomes is a complex task with many challenges, and will usually result in imperfect assemblies. The desire to build a complete assembly is often at odds with the application of stringent merging criteria, and a compromise strategy resulting in longer scaffolds containing some assembly errors is usually the end result [20–22].
Aim of this study was to provide maps of historic and recent recombination rate in NRF, and then to attempt to use these to infer aspects of population history. Recombination rate information was also used to assess quality of the Btau_4.0 assembly.
For NRF the highest and lowest chromosomal mean values for r2 were found on BTA22 and BTA1, and highest and lowest mean values of r2 for inter marker distances less than 10 Mb were found for BTA5 and BTA19. Chromosomal mean values for NRF for r2, and for r2 for inter marker distances less than 10 Mb, for all chromosomes are presented in Additional file 3.
Expected linkage disequilibrium by inter-marker distance
Correlation between total cumulative ρ(h) and ρ(r) over all chromosomes was found to be 0.84. A reduced ρ(r) relative to ρ(h) for a genomic region could be an indication that animals in the observed pedigree have been under strong artificial selection for traits affected by polymorphisms in that particular region. Regions where ρ(r) was most strikingly reduced relative to ρ(h) were in the middle of BTA1 and in the middle of BTA20. Reduced ρ(r) relative to ρ(h) was also found on BTAs 6, 10,15, 16, 18, 19, 27 and 29, while elevated ρ(r) relative to ρ(h) was found on BTAs 3, 4, 7, 9, 11, 14, and 17.
On BTA1 several QTLs affecting milk production traits have been reported [27–32], and a meta-analysis reported by Khatkar et al.  indicated presence of three QTLs for milk yield on this chromosome. The BTA20 region centres around a mutation reported to affect protein percentage in the GHR gene . Hayes et al.  reported evidence for strong selection in this region in a study of divergence between dairy cattle and beef cattle. On BTA6 two QTLs affecting milk production traits have been reported in NRF [36, 37] and signatures of strong selection have been detected .
Elevated population recombination rate may be due to population expansion or gene conversion, while reduced recombination rate may be due to directional selection, genetic drift, gene flow, population substructure or low effective population size . Regions under strong selection in both historic and recent generations might not show differences between ρ(h) and ρ(r).
It can be seen that ρ(h) is elevated for BTAs 1, 13, 15, 16, 18, 19, 20 and 29, while ρ(r) is elevated for BTAs 3, 7, 13, 18 and 19. Further, reduced ρ(h)is observed for BTAs 4, 5, 7, 8, 9, 14, 17, 23, 24 and 26, while ρ(r)is reduced for BTAs 20 and 24. Consistently elevated ρ(h) and ρ(r) are found for BTAs 13, 18 and 19 and consistently reduced ρ(h) and ρ(r) are found for BTA24. In accordance with results described above, ρ(r) is significantly reduced relative to ρ(h) for BTA1, BTA6 and BTA20.
In addition to chromosome region and chromosome size, sex-specific differences in recombination rates have been reported [42–44]. In this study male genetic maps for NRF were used, which might differ from female genetic maps. The patterns of recombination described here might also be sex-specific.
Moreover, rate of recombination is not uniform across a chromosomal segment and recombination events tend to occur in recombination hotspots [2, 3]. It has been estimated that these hotspots occur on average every 50 - 100 kb in the human genome [45, 46]. Although the procedure applied here for estimation of ρ(h) incorporates a model accounting for variable recombination rates across chromosomes , a SNP density higher than obtained in this study would be required in order to detect such fine-scale recombination hotspots in the bovine genome.
Approximately 5% of the genomic sequence is expected to be missing in Btau_4.0 , and positioning of previously un-positioned contigs could aid completion of the assembly by pointing towards regions of special interest for re-sequencing efforts. Here a comparative analysis of the Btau_4.0 assembly with the human genome Build 19 allowed 4,276 previously un-positioned bovine contigs to be given putative genome positions. Determining recombination events between adjacent markers in an extensive pedigree can be used to construct dense genetic maps, and sufficient information was available from our NRF linkage analysis to validate the positions of 321 of these contigs . Comparative analysis and linkage analysis identified 130 new contig positions as being less than 5 Mb apart (Additional file 4). Even though large synteny blocks exist between species comparative analysis will yield spurious positions. Here 40% of positions identified by comparative analysis were validated by linkage analysis.
Putative contig positions not coinciding with elevated ρ may be due to failure to detect regions with elevated recombination rate, incorrect positioning of un-positioned contigs or un-positioned contigs containing sequence overlap with already assembled contigs. Inability to detect regions with elevated recombination rate could result from surrounding SNPs not containing enough information or from un-positioned contigs being too short to detectably affect ρ. Incorrect positioning of un-positioned contigs could be due to repeat sequence mapping to similar but not equal genomic sequence or random mapping to the wrong position by linkage analysis. Incorrect positioning of un-positioned contigs could also explain why not all of positions found by linkage analysis for the 321 contigs validated positions found by comparative analysis.
An alternative Bos Taurus genome assembly (UMD2) was reported by Zimin et al. . This assembly had 95% identity with the Btau_4.0 assembly but had more genomic sequence placed on the bovine chromosomes. Regions differing between UMD2 and Btau_4.0 were identified by sequence alignments  and some of these regions coincide with regions identified as problematic here. Some examples are the region around 105 Mb on BTA7, the region around 70 Mb on BTA12, the region around 10 Mb on BTA13, the region around 30 Mb on BTA18 and the proximal ends of BTA20, BTA21 and BTA28 (Additional file 5).
Genotyping of SNPs positioned on un-positioned contigs in a large pedigree and consequent linkage analyses as described here provide useful information for improving the current bovine genome assembly (Btau_4.0). The approach will gain even higher power and more accurate predictions as denser genetic maps become available. Likewise comparative sequence analysis would be a good supplement for correct contig or scaffold positioning.
Low levels of LD were observed in NRF relative to some of the breeds from which this breed originates. This is likely due to elevated heterogeneity in NRF from historic admixture, recent attempts to maintain a large effective population size through control of inbreeding and gene flow through import of sires from other Nordic countries. Reports from breeds other than those assessed in this study have described more rapid decline in r2 at short distances [17, 18] than was found in NRF. The results suggested that to obtain an expected r2 between markers and a causal polymorphism of at least 0.5 for genome-wide association studies in NRF, approximately one SNP every 15 kb or a total of 200,000 SNPs would be required for the 2.87 Gb genome.
For well known QTL regions for milk production on BTA1, BTA6 and BTA20, map length based on historic recombination was greater than map length based on recent recombination in NRF. Selective sweeps have previously been identified for the QTL regions on BTA20  and BTA6 . Reduced ρ(r) relative to ρ(h) was also found on BTAs 10,15, 16, 18, 19, 27 and 29, while elevated ρ(r) relative to ρ(h) was found on BTAs 3, 4, 7, 9, 11, 14, and 17.
While over 95% of the total genome sequence is included in bovine genomic assembly Btau_4.0, problematic regions exists and should be identified to facilitate assembly completion. Here such regions were identified by combining comparative sequence analysis, linkage analyses and detection of regions with extreme values of population recombination rate.
The Affymetrix 25 K MIP array  was used to genotype 2,589 NRF sires with paternal half-sib pedigree structure. In addition, 53 Holstein, 40 Finnish Ayrshire, 19 Sided Troender and Nordland Cattle and 39 Icelandic sires were genotyped. Genotypes were filtered for discordants (<2.5%), MAF (>0.025) and genotyped percentage (>75%). After initial filtering 17,483 SNPs remained. MAF were calculated for these SNPs with the Haploview 4.1 software .
Genetic maps for each of the 29 BTAs were constructed by use of the CRI-MAP 2.4 package . The map file created by use of the CRI-MAP fixed option was checked for elevated recombination rates between adjacent SNPs. Elevated recombination rates could be an indication of a wrongly positioned contig in the assembly. SNPs with genetic distance >6 cM between its two flanking SNPs or a genetic distance >4 cM between itself and one of its flanking SNPs were identified as suspicious and temporarily taken out of the genetic map. The CRI-MAP chrompic option was used to identify double recombinants. Double recombinants were manually inspected and corrected. SNPs showing up as double recombinants in more than 30 animals from 5 or more families were identified as suspicious and temporarily taken out of the genetic map. The fixed and chrompic procedures were repeated until no SNPs showed a genetic distance >6 cM between its two flanking SNPs or a genetic distance >4 cM between itself and one of its flanking SNPs. The SNPs temporarily taken out of the genetic map were attempted repositioned by use of the CRI-MAP twopoint option. SNPs mapping more strongly to positions within 2.5 Mb of their original positions were not repositioned. The genetic maps has previously been reported in Liu et al. .
The PHASE software  and the locally developed CRIHAP package were applied to utilize both linkage and LD information for determining haplotypes and impute missing genotypes.
A principal component analysis of the genomic relationship matrix among individuals of different and the same breed  was conducted to evaluate genetic distances between breeds.
Recombination rates in telomeric regions were compared to average recombination rates across chromosomes. Mean recombination rate per bp in a 10 Mb telomeric region for all autosomes was compared to overall mean recombination rate per bp. A t-test was used to compare means.
Estimates of pair-wise linkage disequilibrium measure r2 were calculated with the Haploview 4.1 software .
Scaled population recombination rate (ρ) between adjacent markers relative to inter marker distances was estimated using the LDhat 2.1 software  with haplotypes from 17,347 SNPs distributed on the 29 BTAs. The LDhat 2.1 software incorporates a model which allows for variable recombination rates across chromosomes . The reversible-jump markov chain monte carlo (rjMCMC) chain was run for 10,000,000 iterations, performing 5000 iterations between each sample. A block penalty of 5 was applied and the first 500,000 iterations were discarded as burn-ins. Some extreme values of ρ were observed, which could to be due to wrongly assembled contigs or other assembly artefacts. To determine historical recombination rate (ρ(h)) extreme values were corrected by replacing extreme values of ρ by a maximum value (max interval ρ = 10). Less than 5% of intervals between adjacent SNPs had ρ higher than this maximum value. To determine values of recent scaled population recombination rate from the observed pedigree (ρ(r)), pedigree based estimates of recombination (c) was scaled by a factor corresponding to 4Ne (from ρ = 4cNe) [3, 8]. The applied scaling factor ρ(h)/c = 4Ne was found by taking average values of total cumulative ρ(h) and total cumulative c for all autosomes. Cumulative values of recombination over each interval was used because we were interested in comparing a recent population recombination map based on the observed pedigree with a historic population recombination map, rather than comparing point estimates of recombination rate per distance. Cumulative interval values for ρ(r) and ρ(h) were found by multiplying recombination rate per length unit for each interval with interval length and then calculating cumulative values across each autosomal chromosome.
Positioning of previously un-positioned contigs from the bovine genome sequencing (Btau_4.0)  was done both by comparative sequence analysis with the human genome and by linkage analysis with already positioned SNPs in the bovine genome (Btau_4.0). In the comparative sequence analysis positioning of un-positioned bovine contigs was performed by combining MegaBLAST  searches for un-positioned contigs against the human genome Build 19 followed by MegaBLAST searches of hits in the human genome against the bovine genome (Btau_4.0). The first search revealed which areas of the human genome was most similar to each unknown contig, while the second search mapped those areas in the human genome sequence back to the bovine genome sequence. When two human sequences from one chromosomal region gave MegaBLAST hits against two sequences of a bovine chromosomal region it was assumed that the sequence between those human sequences on the human chromosome would also have similarity to the bovine genomic sequence between the two hits on the bovine chromosome. Bovine positions were predicted for 4,276 previously un-positioned contigs by this comparative method. Moreover, linkage analysis was conducted to position 568 SNPs distributed on 321 of the un-positioned contigs. The twopoint option in CRI-MAP 2.4  was used to map these 568 SNPs to SNPs already positioned in the bovine genome assembly. The results from linkage analysis were also presented in Liu et al. .
Thanks to Hanne Hamland and Monica Aasland Opsal for sample processing and Torfinn Nome for bioinformatics assistance. Thanks also to Johanna Vilkki and Emma Eyþórsdóttir for proving samples of Finish Ayshire and Icelandic Cattle, respectively. This project has been funded by The Research Council of Norway, GENO Breeding and AI association and BoviBank Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.