Genome-wide analysis in chicken reveals that local levels of genetic diversity are mainly governed by the rate of recombination
© Mugal et al; licensee BioMed Central Ltd. 2013
Received: 11 June 2012
Accepted: 4 February 2013
Published: 8 February 2013
Polymorphism is key to the evolutionary potential of populations. Understanding which factors shape levels of genetic diversity within genomes forms a central question in evolutionary genomics and is of importance for the possibility to infer episodes of adaptive evolution from signs of reduced diversity. There is an on-going debate on the relative role of mutation and selection in governing diversity levels. This question is also related to the role of recombination because recombination is expected to indirectly affect polymorphism via the efficacy of selection. Moreover, recombination might itself be mutagenic and thereby assert a direct effect on diversity levels.
We used whole-genome re-sequencing data from domestic chicken (broiler and layer breeds) and its wild ancestor (the red jungle fowl) to study the relationship between genetic diversity and several genomic parameters. We found that recombination rate had the largest effect on local levels of nucleotide diversity. The fact that divergence (a proxy for mutation rate) and recombination rate were negatively correlated argues against a mutagenic role of recombination. Furthermore, divergence had limited influence on polymorphism.
Overall, our results are consistent with a selection model, in which regions within a short distance from loci under selection show reduced polymorphism levels. This conclusion lends further support from the observations of strong correlations between intergenic levels of diversity and diversity at synonymous as well as non-synonymous sites. Our results also demonstrate differences between the two domestic breeds and red jungle fowl, where the domestic breeds show a stronger relationship between intergenic diversity levels and diversity at synonymous and non-synonymous sites. This finding, together with overall lower diversity levels in domesticates compared to red jungle fowl, seem attributable to artificial selection during domestication.
KeywordsGenetic diversity Recombination rate Chicken Mutation Selection Gene density
Modern population genetics have attempted to explain patterns of genetic variation in light of evolutionary forces thought to affect DNA sequence evolution. One obvious factor to form a candidate for governing local polymorphism levels is the rate of mutation since, in the absence of selection, sequence divergence should be proportional to the mutation rate . Another obvious factor is selection since both positive and negative selection reduces levels of genetic diversity at target loci. Selection should also affect diversity levels in regions linked to target loci. In the absence of recombination, the entire haplotype within which a selected allele is contained will be subject to change in frequency by selection. From this follows that recombination should itself be a factor of importance for levels of polymorphism. Specifically, when the local recombination rate is high, only regions within a relatively short physical distance from loci under selection are expected to show reduced polymorphism levels. There is well-developed theory for the expected effects of both types of selection relevant in this context, i.e. background selection arising from purifying selection  and selective sweeps arising from positive selection [3, 4].
Empirically, one of the clearest patterns that have emerged from studies of the distribution of levels of polymorphism across the genome is the positive effect of recombination rate on genetic diversity. This relationship was first observed in Drosophila melanogaster[5, 6] and then confirmed in various organisms including mouse , human [8–10], nematodes of the genus Caenorabditis[11, 12], sea beet  and grasses [14, 15]. However, a direct effect of recombination on mutation rate, i.e., a neutral scenario, has also been proposed to explain the correlation between recombination and polymorphism [9, 16], although this possible mutagenic effect of recombination is debated [17–19]. A recent large-scale analysis failed to demonstrate a relationship between recombination hotspots and mutation rate in the human genome . However, the fact that recombination rate as well as the rate of substitution often covary with several other genomic features impedes the understanding of any causal relationships [9, 21].
The possibility to capture patterns of sequence polymorphism across whole genomes allows critical tests of the importance of different evolutionary factors in shaping diversity levels . This information is especially important when making inferences of selection, not least when it comes to detecting signs of positive selection in searches for candidate loci for adaptive evolution. It is then imperative that variation in polymorphism caused by different factors can be distinguished from each other. Here we analyze genome-wide patterns of genetic diversity in domestic chicken G. gallus domesticus and its wild ancestor the red jungle fowl G. gallus gallus. This system is of particular interest given that intense artificial selection during domestication may have left strong footprints on patterns of genetic diversity within the genome [23–26]. We examine how nucleotide diversity in chicken is related to recombination rate, divergence in intergenic regions and at synonymous and non-synonymous sites, gene density and local GC content.
Pairwise Pearson correlation coefficients (and associated p -values) between local diversity level and p S and p N , based on 1 Mb windows for three chicken populations
Red jungle fowl
diversity level – pS
0.237 (1.17 · 10-12)
0.454 (< 2.2 · 10-16)
0.403 (< 2.2 · 10-16)
diversity level – pN
0.137 (4.68 · 10-05)
0.213 (1.65 · 10-10)
0.184 (3.98 · 10-08)
pS – pN
0.118 (4.36 · 10-04)
0.199 (2.70 · 10-09)
0.201 (1.67 · 10-09)
Estimates, by which we refer to multiple regression coefficients, and p -values in multi-linear regression analysis for six possible explanatory variables of chicken diversity levels in 1 Mb windows
Red jungle fowl
3.29 · 10 -05
2.51 · 10 -16
3.07 · 10 -05
6.07 · 10 -16
2.32 · 10 -05
8.40 · 10 -10
5.03 · 10 -05
9.45 · 10 -16
−7.94 · 10-06
1.68 · 10-02
−7.77 · 10-06
1.33 · 10-02
−1.82 · 10 -05
1.07 · 10 -08
−1.93 · 10 -05
2.09 · 10 -04 
2.30 · 10 -05
4.45 · 10 -08
2.11 · 10 -05
1.07 · 10 -07
2.19 · 10 -05
3.95 · 10 -08
3.81 · 10 -05
6.54 · 10 -09
−1.68 · 10 -05
7.10 · 10 -04
−1.50 · 10-05
1.37 · 10-03
−1.31 · 10-05
5.49 · 10-03
−2.60 · 10 -05
7.94 · 10 -04
−7.23 · 10-06
8.18 · 10-02
−7.18 · 10-06
6.76 · 10-02
−1.01 · 10-05
1.03 · 10-02
−1.41 · 10-05
3.00 · 10-02
−3.66 · 10-06
3.70 · 10-01
−4.64 · 10-06
2.29 · 10-01
−4.08 · 10-06
2.92 · 10-01
−7.13 · 10-06
2.63 · 10-01
Multiple R 2 = 0.1513
Multiple R 2 = 0.1509
Multiple R 2 = 0.1601
Multiple R 2 = 0.1693
p < 2.2 · 10-16
p < 2.2 · 10-16
p < 2.2 · 10-16
p < 2.2 · 10-16
As multi-linear regression analysis is sensitive to multi-collinearity in the explanatory variables, we performed partial least square regression (PLSR) analysis, a regression setup that accounts for multi-collinearity in the explanatory variables and allows dissection of the interrelationships between explanatory variables. PLSR groups together explanatory variables into PCs based on their correlations with each other. Subsequent regression analysis and the number of significant PCs then illustrate the number of independent effects on the response variable. Each significant PC represents an independent effect by one of the contributors to the respective PC on the response variable, most likely the main contributor, which we refer to as the true explanatory variable. The remaining contributors to the PC are likely to be dragged by the true explanatory variable via their correlations to the true explanatory variable. As such PLSR enables us to quantify a lower bound of the amount of variation explained by the true explanatory variable, where the upper bound is given by the R 2 obtained by simple linear regression.
Percentage of genetic variation explained by six possible explanatory variables according to PLSR analysis of chicken diversity level in 1 Mb windows
Red jungle fowl
Averages of diversity level, p S , p N and the six explanatory variables used for the regression analysis
diversity level × 104
pS × 104
pN × 105
dS × 102
dN × 102
Recombination as a determinant of levels of genetic diversity in chicken
Diversity levels are expected to reflect the product of the mutation rate and Ne. As a consequence, intra-genomic variation in the two latter parameters should lead to genomic heterogeneity in diversity levels. This heterogeneity can thus be framed either under a neutral scenario and reflect a mutation-driven pattern or under a model invoking natural selection causing variation in Ne, or both. It is well established that there is significant variation in the rate of mutation across the genome (e.g., ), providing a basis for variation in diversity level. Ne is also expected to vary among genomic regions, in this case for reasons related to the incidence and efficiency of natural selection . Importantly, with more functionally important sites follows more targets for selection. Moreover, when recombination rate is low, selection at linked sites will lower Ne over larger physical distances along chromosomes . Whether mutation or selection is the main factor governing diversity levels is a matter of on-going debate .
We found that recombination rate had the strongest effect on diversity levels in two domestic chicken breeds and in RJF. In contrast, divergence, taken as a proxy for the rate of mutation, had either no effect or an unexpected minor negative effect. This supports a selection model where the effect of background selection and/or selective sweeps is (physically) more widely reaching in genomic regions with low recombination rates. A positive correlation between diversity and recombination has been frequently observed in previous work, however, that the effect of recombination is indirect via selection has been difficult to disentangle from a possible direct mutagenic effect of recombination [9, 18, 31]. In our case, the negative correlation between recombination rate and divergence strengthens the selection model and argues against a mutagenic effect. Moreover, this interpretation is supported by the positive correlation of both pS and, in particular, pN with intergenic diversity level, showing that selective events that reduce the diversity within coding regions also reduce diversity at nearby linked sites (cf. ), or vice versa.
As mentioned above, selective effects are expected to increase with gene density, each coding site representing a potential target for selection. In this respect, the positive correlation between gene density and diversity goes against the theoretical expectation . We suggest that this is a statistical artifact caused by collinearity of explanatory variables, like the relatively strong positive correlation between recombination rate and gene density (r = 0.4). Estimates from multiple regressions should be interpreted cautiously if explanatory variables are correlated since they may lead to spurious and non-causative correlations, which very well might be the case here. Moreover, this might explain a similarly surprising result recently reported for Asian rice (Oryza sativa) where the association between gene density and recombination rate could potentially explain a negative relationship between recombination rate and polymorphism , similar to earlier findings in Arabidopsis thaliana and A. lyrata[34, 35].
Several studies have shown that recombination rate is correlated with chromosome size [27, 36, 37], which is not unexpected given the requirement of at least one recombination event per chromosome (or chromosome arm) for successful meiosis; as a consequence, smaller chromosomes will have a higher recombination rate per physical distance compared to larger chromosomes. This is confirmed in our data, with a negative correlation between chromosome size and recombination rate (r = −0.31, p < 2.2e-16). As correlations are transitive relations, a correlation between chromosome size and recombination rate together with a correlation between recombination rate and diversity will lead to a correlation between chromosome size and diversity; this has been empirically demonstrated in previous analyses of birds . In our main analysis we did not include chromosome size as candidate explanatory variable as we had no a priori reason to expect it to assert a direct effect on diversity level (in contrast to an indirect effect, via recombination). This was subsequently justified by a multi-linear regression analysis including chromosome size as explanatory variable (electronic Additional file 1: Table S4). This analysis suggested that in the RJF and broiler population there was no and in the layer population in fact an unexpected positive effect of chromosome size on diversity. Thus, we conclude that chromosome per se does not explain a negative correlation between chromosome size and diversity, as was also suggested by Megens et al. .
The absence of an effect of divergence on diversity levels
We approximated local mutation rate by divergence estimates of the chicken branch after the split from turkey based on CpG-masked intergenic sequences as well as divergence at synonymous sites, dS. Using these estimates we failed to find a persuasive effect of divergence on level of genetic diversity. This is surprising considering theory (that diversity and divergence are correlated is a basic tenet of the neutral theory of molecular evolution), and empirical evidence from some [9, 12, 18, 31] but not all previous studies [6, 40]. The lack of a significant effect could be explained by several factors. First, in theory, it could reflect a lack of local variation in mutation rate across the chicken genome. However, this is clearly not in line with earlier observations from avian genomes [41, 42]. Also, the auto-correlation analysis of divergence and diversity level (Figure 2) suggests that the two vary on a similar scale. Second, sequence features such as the local GC content appear to be strongly related to divergence in avian genomes. However, GC content is strongly correlated also with recombination rate via GC-biased gene conversion (gBGC), a process linked to recombination mimicking natural selection and leading to high GC content in high recombining regions . As a consequence, the covariation of recombination rate, GC content and divergence together with a strong impact of recombination rate on diversity could blur independent signals between divergence and diversity as suggested by the PLSR analysis (Figure 1). Thus, taken together the positive correlation between recombination rate and diversity and the absence of correlation between divergence and diversity support a selection model, where the weaker impact of mutation rate on diversity, if any, becomes indistinct by the stronger impact of recombination rate on diversity.
The impact of genomic features on identifying targets of adaptive evolution
There is considerable current interest in using population genomic data to identify regions that have been subject to recent events of positive selection. One means to do so is to search for outlier regions of nucleotide diversity, specifically regions of reduced diversity. The demonstration of recombination rate having a large impact on diversity level in the chicken genome should have at least two implications in this context. First, footprints of selection (selective sweeps) will be most easily seen in regions of low recombination, even if they occur less frequently in such regions due to Hill-Robertson interference. In addition, regions with low recombination and an associated reduced Ne are more likely to show hard sweeps than soft sweeps . This should be manifested both in diversity level being reduced over a larger genomic region and the reduction being visible over a longer time scale since the sweep. Second, and as consequence of the former, studies of the genomic distribution of adaptively evolving loci will be biased towards regions with low rate of recombination. This was confirmed when we analyzed the location of candidate loci for selective sweeps identified by Rubin et al. , with recombination rate being significantly lower in these candidate regions compared to the genomic average. This emphasizes the necessity of a rigorous statistical framework that incorporates genomic features such as recombination rate when interpreting polymorphism levels.
The footprint of domestication on patterns of chicken diversity
Although a significant part of the variation in diversity level was common to all three populations, we observed several important differences between domesticates and their wild ancestor. Rubin and colleagues found significantly lower overall heterozygosity in the two domestic breeds than in RJF . Our results corroborate this observation, with the most pronounced difference seen in intergenic regions, with diversity level of the broiler and the layer being 83-93% of RJF. The difference is clearly in the expected direction given bottlenecks during domestication and genetic drift in closed commercial populations. Microsatellite-based genotyping has revealed this to be a common feature among chicken breeds  and, to a varying extent, the same trend has also been seen among other domestic animals and plants [46–48].
Strong artificial selection for traits of agronomical interest during domestication should also act to lower Ne[25, 49]. If artificial selection occurs frequently genome-wide it could create a stronger link between polymorphism at functional and neutral sites in domesticates than in natural populations . In agreement with this prediction, the correlation of pS and pN to diversity level was stronger in the two domestic chicken breeds than in RJF (cf. Figure 2).
Two previous studies have sought to address the influence of recombination on chicken diversity levels. Fang et al.  used low-coverage genome sequence data from three birds to obtain polymorphism estimates and made pairwise linear regression between diversity and recombination rate estimates from a medium-density linkage map. Rao et al.  sequenced 15 introns, and used data from Sundström et al.  for another 14 introns, and performed pairwise linear regression between diversity and recombination rate, and between diversity and chicken-turkey divergence. Similar to our findings, these two studies reported a correlation between diversity and recombination. However, our study adds to previous work in several ways. Notably, by the combined use of a genome-wide approach for diversity estimation from population samples and with the access to divergence data from across the genome, we were able to address and quantify the role of mutation and recombination on diversity level in a rigorous statistical framework. In addition we considered possible impacts of dS, dN, gene density and the local GC content. Based on our analysis we suggest that local levels of genetic diversity in the chicken genome are mainly governed by the rate of recombination. The fact that divergence and recombination rate were negatively correlated argues against a mutagenic role of recombination and for a selection model. In support of the selection model, divergence, taken as a proxy for the rate of mutation, had either no effect or an unexpected minor negative effect. Moreover, by including genome-wide estimates of pS and pN we were able to directly study the role of selection and to integrate information from functional sites in the genome. In addition, the genome-wide approach allowed us to test for possible effects of various genomic features on the ability to identify target loci of adaptive evolution. Further, we showed that artificial selection during domestication is likely to explain several differences in levels of diversity between domestic breeds and the wild ancestor (red jungle fowl), for example a stronger relationship between recombination rate and intergenic diversity, as well as a stronger relationship between intergenic diversity levels and diversity at synonymous as well as non-synonymous sites.
Short read sequences and read mapping
We used a dataset by Rubin and colleagues available at the European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under the study accession number SRP001870 . This dataset is composed of 35 bp reads obtained by SOLiD sequencing technology of genomic DNA pools of unrelated chicken from the red jungle fowl (RJF; 8 males and 1 female) and two domesticated populations, broiler (24 males and 18 females) and layer (29 males) (accession numbers SRR035386, SRR035383 and SRR035384 for RJF, SRR035377, SRR035378, SRR035387, SRR035381, SRR035382, SRR035379 and SRR035380 for broiler and SRR035375, SRR035376, SRR035389, SRR035390 and SRR035385 for layer).
The reads were mapped against the chicken reference genome (WUGSC 2.1, May 2006 version; ) downloaded at the UCSC Genome Browser website (http://genome.ucsc.edu/) . The mapping was performed using the software BWA  allowing for a maximum of four mismatched bases and not allowing for insertions/deletions. Reads that mapped at several locations in the genome were excluded. Further, a genomic position had to fulfill four criteria in order to be included for downstream diversity level computation: (i) be covered by more than 4 and less than 50 reads; (ii) be outside of repeat sequences (based on the UCSC Genome Browser chicken repeat annotations); (iii) not correspond to a CpG prone site and (iv) be outside exons and untranslated regions (UTRs) that are likely to be affected by natural selection. In order to obtain data on synonymous and non-synonymous polymorphisms, we separately considered the corresponding positions within exons, still fulfilling criteria i) to iii).
Exons and UTRs coordinates were obtained through the BioMart query interface (http://www.ensembl.org/biomart/martview) . When no UTR was annotated for a transcript, we excluded 77 bp upstream of the transcript (i.e. in 5' direction) and 372 bp downstream of the transcript (i.e. in 3' direction), sizes corresponding to the mean lengths of annotated 5' and 3' UTRs in chicken, respectively. A CpG prone site was defined as any C followed by a G or any G preceded by a C, as well as any C/T polymorphism followed by a G or any G/A polymorphism preceded by a C, following .
SNP calling and estimates of diversity level
To be called a SNP, we followed the approach by Rubin et al. , that is we applied the criterion that the alternative nucleotide state, i.e. the non-reference allele, must be supported by at least three reads different to the nucleotide state found in the reference genome. While diversity estimates were obtained for RJF, broiler and layer populations separately, the support of the alternative nucleotide state was based on combined data of multiple populations. For example, consider a hypothetical position with a C in the reference genome. If the broiler population had two reads with A and one read with C at this position, and the layer population had one with A and two with C, the position was called a SNP in both layer and broiler populations, because the number of non-reference alleles (i.e., A) summed up to 3. Once the SNPs were called, we validated our SNP calls with those SNPs called by Rubin et al. , and only used consistently called SNPs.
After the validation step, we computed the number of SNPs per non-overlapping window (SNP density) for 1 Mb, 500 kb, 250 kb and 100 kb windows. Additionally, we computed the mean coverage per window as the average read depth per validated genomic position. These tasks were performed using in-house perl scripts. We determined the number of synonymous and non-synonymous SNPs per synonymous and non-synonymous sites by in-house C++ scripts incorporating the Bio++ library .
vwhere [SNP] represents the SNP density and n denotes the mean coverage. This computation was performed for all four window sizes for intergenic SNPs, where in the following θ is referred to as diversity level. For synonymous and non-synonymous polymorphisms the estimation was only performed for 1 Mb windows in order to ensure a reasonable signal-to-noise ratio. Here, the mean coverage was exclusively based on the read depth of synonymous and non-synonymous sites, respectively. Synonymous and non-synonymous diversity levels are in the following referred to as pS and pN, respectively. Note that the absolute values of our diversity estimates are not directly comparable to other studies in chicken, because we used NGS data based on pooled samples rather than Sanger sequencing data and we employed stringent filtering criteria.
Sequence alignments of orthologous intergenic regions for chicken, turkey (Meleagris gallopavo) and zebra finch (Taeniopygia guttata) were retrieved from whole-genome alignments from the Ensembl database release 61 via the Ensembl perl Application Programme Interfaces (APIs). We partitioned the whole-genome alignments into the four window sizes stated above, respectively, each with reference to the chicken genome. Then positions of transcribed regions including UTRs were established and masked with reference to the chicken genome. For each dataset, we restricted the data to windows with a minimum of 10,000 unambiguous sites, of which there were 1,038 windows of size 1 Mb, 2,040 of size 500 kb, 3,986 of size 250 kb and 9,205 of size 100 kb.
Coding sequence (CDS) alignments of orthologous genes in chicken, turkey and zebra finch were retrieved through the protein trees from the Ensembl database release 61 via Ensembl perl APIs. Orthologous genes were restricted to one-to-one orthologs, as defined in the Ensembl database. Alignments of one-to-one orthologs were then concatenated based on the windows defined for intergenic regions; for genes spanning more than one window, the different parts were assigned to the respective window. Windows containing no one-to-one orthologs were discarded for downstream analysis. Sequence alignments were cleaned for possible misaligned sites running Gblocks with default parameter settings .
Estimation of divergence, dN, dS, gene density and recombination rate
We estimated chicken-specific divergence for intergenic regions as the branch length between chicken and its common ancestor with turkey after all sites showing a CpG in any of chicken, turkey and zebra finch had been masked from the 3-way alignments. Estimation of branch length was based on the PAML software package version 4.1 and the general time-reversible substitution model implemented in baseml. CpG sites were masked from the alignments in order to avoid substitution rate variation caused by hypermutability of CpG sites and thus divergence being affected by the local CpG content.
We estimated chicken-specific rates of non-synonymous (dN) and synonymous (dS) substitution for the concatenated CDS alignments using PAML software package version 4.1. CDS alignments were concatenated in a given window of size 1 Mb, 500 kb and 250 kb, respectively. To estimate chicken-specific dS and dN, we then used the branch-model implemented in codeml allowing the dN/dS ratio to vary between the chicken branch and the remaining tree.
We estimated gene density as the proportion of exonic sites within a particular window. We also included UTRs and exon-intron boundaries as “genic” sites, as they might represent functionally important sequences. For the exon-intron boundaries, we included 10 bp of intronic sequence after the end and before the start of each exon .
We computed the sex-averaged chicken recombination rate using data from Groenen et al.  and the WUGSC 2.1 chicken assembly. Recombination rate per 1 Mb window was computed as the mean recombination rate (genetic distance/physical distance) between markers weighted by the physical distance between markers, ranging from 0 – 28.6 cM per 1 Mb window (a histogram of recombination rate is provided in the Additional file 1: Figure S3).
To investigate the degree of common variation in genetic diversity between the three chicken populations we performed PCA of local diversity level for 1 Mb, 500 kb and 250 kb non-overlapping windows. The computation of the PCs was performed via an eigenvalue-decomposition of the associated covariance matrix as implemented in the “princomp” function of the statistical software package R version 2.9.2. The degree of common variation was then defined as the leading PC, i.e. PC I, and as a local measure of the common genetic variation in the three populations we projected diversity levels on PC I, which can be seen as a smoothing function through genetic diversity in all three chicken populations.
We performed multi-linear regression analysis for diversity level grouped into four groups: common genetic variation in diversity level (PC I) and variation in each of the three chicken populations separately. For all four groups we conducted regression analysis based on 880 out of 1,038 non-overlapping windows of size 1 Mb, where data on the six possible explanatory variables recombination rate, divergence, gene density, GC content, dS and dN were available. We further conducted regression analysis for 1,623 windows of size 500 kb and 2,651 windows of size 250 kb. We transformed the explanatory variables in order to reduce the skewness in their distributions. Recombination rate was log-transformed to base 10, after adding a constant of 1 in order to allow for zero rate values. All the other explanatory variables were transformed by the square root. Regression analysis was then performed after Z-transformation of the explanatory variables, which means standardization of the mean value to 0 and of the standard deviation to 1.
We performed PLSR analysis, a regression setup that accounts for multi-collinearity in the explanatory variables . As stated above for the multi-linear regression analysis, explanatory variables were first transformed to reduce the skewness in their distributions and then Z-transformed. In addition, also diversity level estimates were also Z-transformed. PLSR was then conducted for diversity level estimates based on 1 Mb windows for each of the four groups of genetic variation separately.
We performed an autocorrelation analysis of local diversity level and divergence based on 100 kb windows. This was done computing Pearson correlation coefficients and their p-values for measurements of nearest neighboring windows (k = 1) up to windows lying 5 Mb apart (k = 50).
Genomic regions of candidate selective sweeps identified by Rubin and colleagues were mapped onto the 1 Mb windows used throughout our analysis . Averages of diversity levels, pS and pN, as well as averages of the six explanatory variables used in the above described regression analysis were determined for the candidate loci as the arithmetic means of the respective 1 Mb windows. Genome-wide averages of the same variables were determined as the arithmetic means over all windows. To assess the significance in the difference between the averages for the candidate loci and the genome-wide averages we bootstrapped the genome-wide averages based on a sample size of 9999 and computed p-values based on their bootstrap confidence intervals.
All statistical analyses were performed with the software package R version 2.9.2.
Red jungle fowl
Principal component analysis
Partial least square regression
Effective population size
GC-biased gene conversion
Application Programme Interface
We thank Yves Clément for helpful discussions on the statistical analysis. Financial support was obtained from the European Research Council, the Knut and Alice Wallenberg Foundation and the Swedish Research Council.
- Kimura M: Evolutionary rate at the molecular level. Nature. 1968, 217 (5129): 624-626.View ArticlePubMedGoogle Scholar
- Charlesworth B, Morgan MT, Charlesworth D: The effect of deleterious mutations on neutral molecular variation. Genetics. 1993, 134 (4): 1289-1303.PubMed CentralPubMedGoogle Scholar
- Smith JM, Haigh J: The hitch-hiking effect of a favourable gene. Genet Res. 1974, 23 (1): 23-35.View ArticlePubMedGoogle Scholar
- Andolfatto P: Adaptive hitchhiking effects on genome variability. Curr Opin Genet Dev. 2001, 11 (6): 635-641.View ArticlePubMedGoogle Scholar
- Berry AJ, Ajioka JW, Kreitman M: Lack of polymorphism on the drosophila fourth chromosome resulting from selection. Genetics. 1991, 129 (4): 1111-1117.PubMed CentralPubMedGoogle Scholar
- Begun DJ, Aquadro CF: Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. Melanogaster. Nature. 1992, 356 (6369): 519-520.View ArticlePubMedGoogle Scholar
- Nachman MW: Patterns of DNA variability at X-linked loci in Mus domesticus. Genetics. 1997, 147 (3): 1303-1316.PubMed CentralPubMedGoogle Scholar
- Nachman MW, Bauer VL, Crowell SL, Aquadro CF: DNA variability and recombination rates at X-Linked loci in humans. Genetics. 1998, 150 (3): 1133-1141.PubMed CentralPubMedGoogle Scholar
- Hellmann I, Prafer K, Ji H, Zody MC, Pääbo S, Ptak SE: Why do human diversity levels vary at a megabase scale?. Genome Res. 2005, 15 (9): 1222-1231.PubMed CentralView ArticlePubMedGoogle Scholar
- Spencer CCA, Deloukas P, Hunt S, Mullikin J, Myers S, Silverman B, Donnelly P, Bentley D, McVean G: The influence of recombination on human genetic diversity. PLoS Genetics. 2006, 2 (9): e148-PubMed CentralView ArticlePubMedGoogle Scholar
- Cutter AD, Payseur BA: Selection at linked sites in the partial selfer caenorhabditis elegans. Mol Biol Evol. 2003, 20 (5): 665-673.View ArticlePubMedGoogle Scholar
- Cutter AD, Choi JY: Natural selection shapes nucleotide polymorphism across the genome of the nematode caenorhabditis briggsae. Genome Res. 2010, 20 (8): 1103-1111.PubMed CentralView ArticlePubMedGoogle Scholar
- Kraft T, Säll T, Magnusson-Rading I, Nilsson N-O, Halldén C: Positive correlation between recombination rates and levels of genetic variation in natural populations of sea beet (Beta vulgaris subsp. maritima). Genetics. 1998, 150 (3): 1239-1244.PubMed CentralPubMedGoogle Scholar
- Dvorák J, Luo M-C, Yang Z-L: Restriction fragment length polymorphism and divergence in the genomic regions of high and low recombination in self-fertilizing and cross-fertilizing Aegilops species. Genetics. 1998, 148 (1): 423-434.PubMed CentralPubMedGoogle Scholar
- Tenaillon MI, Sawkins MC, Long AD, Gaut RL, Doebley JF, Gaut BS: Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. Mays L.). Proc Natl Acad Sci U S A. 2001, 98 (16): 9161-9166.PubMed CentralView ArticlePubMedGoogle Scholar
- Lercher MJ, Hurst LD: Human SNP variability and mutation rate are higher in regions of high recombination. Trends in Genetics. 2002, 18 (7): 337-340.View ArticlePubMedGoogle Scholar
- Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh Y-P, Hahn MW, Nista PM, Jones CD, Kern AD, Dewey CN: Population genomics: whole-genome analysis of polymorphism and divergence in drosophila simulans. PLoS Biology. 2007, 5 (11): e310-PubMed CentralView ArticlePubMedGoogle Scholar
- Kulathinal RJ, Bennett SM, Fitzpatrick CL, Noor MAF: Fine-scale mapping of recombination rate in drosophila refines its correlation to diversity and divergence. Proc Natl Acad Sci U S A. 2008, 105 (29): 10051-10056.PubMed CentralView ArticlePubMedGoogle Scholar
- Huang S-W, Friedman R, Yu N, Yu A, Li W-H: How strong is the mutagenicity of recombination in mammals?. Mol Biol Evol. 2005, 22 (3): 426-431.View ArticlePubMedGoogle Scholar
- 1000 GC: A map of human genome variation from population-scale sequencing. Nature. 2010, 467 (7319): 1061-1073.View ArticleGoogle Scholar
- Flowers JM, Molina J, Rubinstein S, Huang P, Schaal BA, Purugganan MD: Natural selection in gene-dense regions shapes the genomic pattern of polymorphism in wild and domesticated rice. Mol Biol Evol. 2012, 29 (2): 675-687.View ArticlePubMedGoogle Scholar
- Gossmann TI, Woolfit M, Eyre-Walker A: Quantifying the variation in the effective population size within a genome. Genetics. 2011, 189 (4): 1389-1402.PubMed CentralView ArticlePubMedGoogle Scholar
- Vigouroux Y, McMullen M, Hittinger CT, Houchins K, Schulz L, Kresovich S, Matsuoka Y, Doebley J: Identifying genes of agronomic importance in maize by screening microsatellites for evidence of selection during domestication. Proc Natl Acad Sci U S A. 2002, 99 (15): 9650-9655.PubMed CentralView ArticlePubMedGoogle Scholar
- Clark RM, Linton E, Messing J, Doebley JF: Pattern of diversity in the genomic region near the maize domestication gene tb1. Proc Natl Acad Sci U S A. 2004, 101 (3): 700-707.PubMed CentralView ArticlePubMedGoogle Scholar
- Rubin CJ, Zody MC, Eriksson J, Meadows JR, Sherwood E, Webster MT, Jiang L, Ingman M, Sharpe T, Ka S: Whole-genome resequencing reveals loci under selection during chicken domestication. Nature. 2010, 464 (7288): 587-591.View ArticlePubMedGoogle Scholar
- Innan H, Kim Y: Pattern of polymorphism after strong artificial selection in a domestication event. Proc Natl Acad Sci U S A. 2004, 101 (29): 10667-10672.PubMed CentralView ArticlePubMedGoogle Scholar
- ICGSC: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432 (7018): 695-716.View ArticleGoogle Scholar
- ICPMC: A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. Nature. 2004, 432 (7018): 717-722.View ArticleGoogle Scholar
- Betancourt AJ, Welch JJ, Charlesworth B: Reduced effectiveness of selection caused by a lack of recombination. Curr Biol. 2009, 19 (8): 655-660.View ArticlePubMedGoogle Scholar
- Hodgkinson A, Eyre-Walker A: Variation in the mutation rate across mammalian genomes. Nat Rev Genet. 2011, 12 (11): 756-766.View ArticlePubMedGoogle Scholar
- Slotte T, Bataillon T, Hansen TT, St. Onge K, Wright SI, Schierup MH: Genomic determinants of protein evolution and polymorphism in Arabidopsis. Genome Biol Evol. 2011, 3: 1210-1219.PubMed CentralView ArticlePubMedGoogle Scholar
- Cutter AD, Moses AM: Polymorphism, divergence, and the role of recombination in saccharomyces cerevisiae genome evolution. Mol Biol Evol. 2011, 28 (5): 1745-1754.View ArticlePubMedGoogle Scholar
- Payseur BA, Nachman MW: Gene density and human nucleotide polymorphism. Mol Biol Evol. 2002, 19 (3): 336-340.View ArticlePubMedGoogle Scholar
- Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng HG, Bakker E, Calabrese P, Gladstone J, Goyal R: The pattern of polymorphism in arabidopsis thaliana. PLoS Biology. 2005, 3 (7): 1289-1299.View ArticleGoogle Scholar
- Kawabe A, Forrest A, Wright SI, Charlesworth D: High DNA sequence diversity in pericentromeric genes of the plant arabidopsis lyrata. Genetics. 2008, 179 (2): 985-995.PubMed CentralView ArticlePubMedGoogle Scholar
- Groenen MAM, Wahlberg P, Foglio M, Cheng HH, Megens H-J, Crooijmans RPMA, Besnier F, Lathrop M, Muir WM, Wong GK-S: A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate. Genome Res. 2009, 19 (3): 510-519.PubMed CentralView ArticlePubMedGoogle Scholar
- Groenen MAM, Cheng HH, Bumstead N, Benkel BF, Briles WE, Burke T, Burt DW, Crittenden LB, Dodgson J, Hillel J: A consensus linkage map of the chicken genome. Genome Res. 2000, 10 (1): 137-147.PubMed CentralPubMedGoogle Scholar
- Huynh LY, Maney DL, Thomas JW: Contrasting population genetic patterns within the white-throated sparrow genome (zonotrichia albicollis). BMC Genet. 2010, 11: 96-PubMed CentralView ArticlePubMedGoogle Scholar
- Megens HJ, Crooijmans RPMA, Bastiaansen JWM, Kerstens HHD, Coster A, Jalving R, Vereijken A, Silva P, Muir WM, Cheng HH: Comparison of linkage disequilibrium and haplotype diversity on macro- and microchromosomes in chicken. BMC Genet. 2009, 10:Google Scholar
- Roselius K, Stephan W, Städler T: The relationship of nucleotide polymorphism, recombination rate and selection in wild tomato species. Genetics. 2005, 171 (2): 753-763.PubMed CentralView ArticlePubMedGoogle Scholar
- Axelsson E, Webster MT, Smith NGC, Burt DW, Ellegren H: Comparison of the chicken and turkey genomes reveals a higher rate of nucleotide divergence on microchromosomes than macrochromosomes. Genome Res. 2005, 15 (1): 120-125.PubMed CentralView ArticlePubMedGoogle Scholar
- Webster MT, Axelsson E, Ellegren H: Strong regional biases in nucleotide substitution in the chicken genome. Mol Biol Evol. 2006, 23 (6): 1203-1216.View ArticlePubMedGoogle Scholar
- Duret L, Galtier N: Biased gene conversion and the evolution of mammalian genomic landscapes. Annual Rev Genomics Human Genetics. 2009, 10 (1): 285-311.View ArticleGoogle Scholar
- Hermisson J, Pennings PS: Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005, 169 (4): 2335-2352.PubMed CentralView ArticlePubMedGoogle Scholar
- Granevitze Z, Hillel J, Chen GH, Cuc NTK, Feldman M, Eding H, Weigend S: Genetic diversity within chicken populations from different continents and management histories. Anim Genet. 2007, 38 (6): 576-583.View ArticlePubMedGoogle Scholar
- vonHoldt BM, Pollinger JP, Lohmueller KE, Han E, Parker HG, Quignon P, Degenhardt JD, Boyko AR, Earl DA, Auton A: Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature. 2010, 464 (7290): 898-902.PubMed CentralView ArticlePubMedGoogle Scholar
- Amaral AJ, Ferretti L, Megens H-J, Crooijmans RPMA, Nie H, Ramos-Onsins SE, Perez-Enciso M, Schook LB, Groenen MAM: Genome-wide footprints of pig domestication and selection revealed through massive parallel sequencing of pooled DNA. PLoS One. 2011, 6 (4): e14782-PubMed CentralView ArticlePubMedGoogle Scholar
- Haudry A, Cenci A, Ravel C, Bataillon T, Brunel D, Poncet C, Hochu I, Poirier S, Santoni S, Glémin S: Grinding up wheat: A massive loss of nucleotide diversity since domestication. Mol Biol Evol. 2007, 24 (7): 1506-1517.View ArticlePubMedGoogle Scholar
- Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen MD, Gaut BS: The effects of artificial selection on the maize genome. Science. 2005, 308 (5726): 1310-1314.View ArticlePubMedGoogle Scholar
- Fang L, Ye J, Li N, Zhang Y, Li S, Wong G, Wang J: Positive correlation between recombination rate and nucleotide diversity is shown under domestication selection in the chicken genome. Chin Sci Bull. 2008, 53 (5): 746-750.View ArticleGoogle Scholar
- Rao Y, Sun L, Nie Q, Zhang X: The influence of recombination on SNP diversity in chickens. Hereditas. 2011, 148 (2): 63-69.View ArticlePubMedGoogle Scholar
- Sundström H, Webster MT, Ellegren H: Reduced variation on the chicken Z chromosome. Genetics. 2004, 167: 377-385.PubMed CentralView ArticlePubMedGoogle Scholar
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A: The UCSC genome browser database: update 2011. Nucleic Acids Res. 2011, 39 (suppl 1): D876-D882.PubMed CentralView ArticlePubMedGoogle Scholar
- Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A: BioMart - biological queries made easy. BMC Genomics. 2009, 10 (1): 22-PubMed CentralView ArticlePubMedGoogle Scholar
- Keightley PD, Gaffney DJ: Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proc Natl Acad Sci U S A. 2003, 100 (23): 13402-13406.PubMed CentralView ArticlePubMedGoogle Scholar
- Dutheil J, Gaillard S, Bazin E, Glemin S, Ranwez V, Galtier N, Belkhir K: Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinforma. 2006, 7 (1): 188-View ArticleGoogle Scholar
- Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17 (4): 540-552.View ArticlePubMedGoogle Scholar
- Abril JF, Castelo R, Guigó R: Comparison of splice sites in mammals and chicken. Genome Res. 2005, 15 (1): 111-119.PubMed CentralView ArticlePubMedGoogle Scholar
- Naes T, Martens H: Multivariate calibration .2. Chemometric methods. Trac-Trend Anal Chem. 1984, 3 (10): 266-271.View ArticleGoogle Scholar