### Birds and housing

This study was conducted on birds originating from two lines divergently selected for PM-pHu according to a breeding scheme described in Alnahhas et al. [14]. In this experiment, sires were selected with higher intensity: at each generation, the best 19% sires (i.e. with the highest genetic value for the selected trait) and 50% dams were chosen to produce the next generation. Consistent with this higher selection intensity, genotyping in the first 5 generations was focused on sires, as described below. At generation 0 (G0), where the divergence process was not yet started and all birds were reared as a single population, 51 sires were genotyped. At generations 1 to 5 (G1-G5), the number of sires with descendants varied between 26 and 29 in the pHu- line and between 23 and 31 in the pHu + line, and the number of these sires that were genotyped varied between 17 and 28 in the pHu- line and between 14 and 30 in the pHu + line. QTL detection by association study was performed on a total of 558 offsprings (253 males and 305 females) of the 6th generation of selection (G6). Birds were reared in two successive batches and phenotyped for PM-pHu and SART-pHu as described in Alnahhas et al. [29].

### Genotyping

Birds used for QTL mapping (*n* = 558) and sires used for the scan of positive selection (*n* = 288) were genotyped by the Labogena Laboratory (Jouy en Josas, France) using the Illumina chicken SNP 60 K Beadchip containing 57,636 SNPs. After filtering SNPs for their minor allele frequency (higher than 0.05) and their call rate (higher than 0.95), 40,590 SNPs located on 28 autosomes were retained for QTL detection on a total of 558 birds which had all a call rate higher than 0.95. When applying the same criteria, a total of 42,026 SNPs and 288 birds were considered for the detection of regions under selection.

### Statistical analyses

#### Detection of selection signatures

We looked for genomic signatures of positive selection using the FLK [15] and hapFLK [16] methods, both implemented in the hapFLK software (https://forge-dga.jouy.inra.fr/projects/hapflk). These methods consider genotype data from multiple populations and detect regions where genetic differentiation between these populations is not consistent with a neutral evolution model. First, based on the matrix of observed genome-wide pairwise Reynold’s distances between populations, they estimate a population tree, using a Neighbour-Joining algorithm, and summarize this tree by a kinship matrix quantifying the amount of drift accumulated in each population since their divergence from a common ancestral population. Then, they scan the genome for loci where genetic differentiation between populations is significantly larger than expected under this pure drift model. Similar to the classical Fst, the statistic measuring genetic differentiation for one allele is a sum of the terms (pi – p)^2, where pi is the frequency of this allele in population i and p the estimated frequency of this allele in the ancestral population. The difference with Fst is that the terms of this sum (i.e. the populations) are weighted using the entries of the kinship matrix estimated previously, in order to account for differences of population size and for shared ancestry between populations. In the case of FLK, the statistic is computed for one of the two alleles of each SNP (the choice is arbitrary and has no influence on the results). In the case of hapFLK, local haplotypes are estimated (see below) and genetic differentiation is summed over all these haplotypes.

We applied these two tests for all generations of the selection experiment. At each generation (G1 to G5), 3 populations were compared: the pHu + line, the pHu- line, and the G0 population. G0 was defined as outgroup when building the population tree and computing the kinship matrix (option –outgroup). Data from G0 were also used when computing genetic differentiation at each locus (option –keep-outgroup). As a result, estimated allele frequencies in the ancestral population were always very close to the allele frequencies observed in G0 (p0), so the allele frequency differences computed within FLK (the terms (pi-p)^2 mentioned above) were almost equal to the allele frequency differences (pi-p0)^2 observed along each selected line.

HapFLK makes use of the local clustering approach of Scheet and Stephens [30] to estimate haplotype diversity. This approach assumes that for each position on the genome, local haplotypes can be divided into K main clusters, where each cluster may include several similar haplotypes. As a result of ancestral recombinations, the pair of clusters corresponding to a given diploïd individual is allowed to change from one locus to the other, and is assumed to form a Markov chain along the genome. Using a hidden Markov model where these clusters are the hidden variables and individual genotypes are the observed variables, haplotype clusters at all loci can be estimated for all individuals. One advantage of this approach is that it avoids defining fixed windows where haplotypes should be estimated: instead, transitions of the Markov chain representing the local haplotype clusters of an individual are automatically learnt from the data. However, this approach requires specifying the number K of haplotype clusters. Applying the cross-validation procedure implemented in the fastPHASE software to genotype data from G0, we set this number to 12 (option -K). We inferred haplotype clusters using 50 Expectation-Maximisation (EM) runs (option –nfit). Following Fariello et al. [16, 20], we assumed that hapFLK statistic was normally distributed under neutral evolution, and estimated the parameters of this normal distribution by fitting genome wide hapFLK values with a robust linear regression approach. HapFLK *p*-values at each SNP were computed from this estimated distribution. To identify selected regions, we used a q-value threshold of 0.05, therefore controlling false discovery rate at the 5% level [31].

For FLK analyses, *p*-values were directly obtained from the hapFLK software (they can easily be computed because the distribution of the FLK statistics under neutral evolution was shown to follow a chi-squared distribution with known degrees of freedom). As described above, we used an FDR threshold of 5% to identify significant SNPs. In contrast to hapFLK values, which are relatively smooth along the genome, FLK values, as most single SNP statistics, are highly variable along the genome. Consequently, defining significant regions is more difficult in this case. However, in our analysis, only few SNPs had significant FLK values (1 in G4 and 57 in G5) and several of them were located in regions already detected by hapFLK. Based on significant FLK values that did not overlap with a hapFLK region, we defined 10 new regions under selection including either one significant SNP or several significant SNPs distant by less than 1 Mb.

For all regions detected under selection using FLK or hapFLK, we performed another hapFLK analysis combining animals from all generations, which allowed studying the evolution of haplotype cluster frequencies along generations (haplotype clusters obtained by independent hapFLK analyses are not comparable). For each region, we estimated haplotype clusters using 3 EM runs, and plotted cluster frequencies for each of these runs using the R script hapflk-clusterplot.R, which is provided on hapFLK website. These 3 runs always led to very similar interpretations concerning the selection history in the region, so we only presented one of them.

#### Forward simulations

In order to evaluate the likelihood of an allele frequency trajectory going from p0 to pf in 5 generations, we simulated 10 millions of 5 generation allele frequency trajectories starting from p0, recorded the final allele frequency pj in each simulation j, and computed the proportion of simulations such that (pj-p0)^2 was greater than (pf-p0)^2. Two simulation strategies were implemented. In the first strategy, we considered a standard Wright-Fisher model with N haploïd individuals: if one allele has x copies out of N individuals at generation g, the number of copies at generation (g + 1) is a binomial variable with parameters (N, x/N). In the second strategy, we accounted for the number of reproducing males Nm(g) and females Nf (g) at each generation g: if one allele has xm copies among the 2*Nm(g) male alleles and xf copies among the 2*Nf(g) female alleles at generation g, the number of copies among the males at generation (g + 1) is Xmm + Xmf, where Xmm is a binomial variable with parameters (Nm(g + 1), xm/2*Nm(g)) and Xmf is a binomial variable with parameters (Nm(g + 1), xf/2*Nf(g)). Similarly, the number of copies among the females at generation (g + 1) is Xfm + Xff, where Xfm is a binomial variable with parameters (Nf(g + 1), xm/2*Nm(g)) and Xff is a binomial variable with parameters (Nf(g + 1), xf/2*Nf(g)). For each simulation strategy, a first set of simulations was performed using parameters mimicking the pHu- line, and a second set was performed using parameters mimicking the pHu + line. For strategy 1, the number of haploïd individuals was the haploïd effective population size estimated by FLK. For strategy 2, the number of male and female reproducers at every generation was the same as in the divergent selection experiment. The final *p*-value at each SNP (provided in Additional file 6: Table S1) was the product of the likelihoods obtained in the two lines.

#### QTL detection

QTL detection on birds from generation 6 was performed by the multi-marker Bayes Cπ analysis [32] implemented in GS3 software [33]. The statistical model was:

$$ {y}_j=\mu +\sum \limits_{i=1}^M{x}_{ij}{a}_i+{e}_j $$

where *y*_{
j
} is the phenotype for an individual *j* corrected for the fixed effects of sex and hatch*, μ* the overall mean, *M* the number of markers analysed, *x*_{
ij
} the genotype score (coded as 0, 1 or 2) of SNP i for individual j, a_{i} the additive effect of SNP i, and *e*_{
j
} the random residual for individual j with ej ∼ N(0, **I**σ^{2}_{e}). **I** is an identity matrix and σ^{2}_{e} is the residual variance. All unknown parameters were assigned prior distributions and sampled with a Monte Carlo Markov chain (MCMC) using Gibbs sampling. The MCMC was run for 400,000 iterations, with a burn-in of 80,000 iterations and thin interval of 400. The Bayes Cπ analysis allows introducing in the model all the SNPs at the same time but makes the assumption that only a small proportion of them has a significant effect on the trait. Thus, the prior parameter used for a_{i} is a mixture distribution with a_{i} ∼ N (0, σ^{2}_{a}) if the SNP is in the model (with a probability π) and a_{i} = 0 if the SNP is not in the model (with a probability 1-π). σ^{2}_{a} is the common marker effect variance and the hyper parameter (1-π) the prior probability that the effect of marker *i* is 0. For π the prior distribution was set at a Beta distribution with parameters α = 0.5 × 10^{4} and β = 99.5 × 10^{4}, meaning that π was almost fixed at 0.005. Variances σ^{2}_{a} and σ^{2}_{e} were assigned inverted chi-squared distributions with *v* = 4.2 degrees of freedom and a scale parameter \( {S}^2=\frac{{\widehat{\sigma}}^2\left(\nu -2\right)}{\nu } \) where \( {\widehat{\sigma}}^2 \)equals the prior value of σ^{2}_{a} or σ^{2}_{e}. The statistics used to detect significant SNP was the Bayes Factor (BF), which corresponds to the increase from prior to posterior probabilities of the SNP being “in” the model [34]. In this study we retained markers corresponding to BF = 20–150 and BF > 150 which indicated strong and very strong evidence of QTL linkage, respectively [34]. We defined QTL regions by merging candidate SNPs distant by less than 1 Mb, and further extended each obtained region by including the two 1 Mb flanking regions (that is 1 Mb before the first SNP and 1 Mb after the last SNP).

By including all the SNPs simultaneously in the model while assuming that only a proportion of those (π) has a significant effect, Bayes Cπ ensures that strong BF can only be reached at SNPs where the association between genotypes and traits clearly exceeds that expected from population structure and cannot result from drift alone. BF statistics were also calculated with a model including the fixed effect of the line to test whether the SNPs contributing to the variability of PM-pHu or SART-pHu in the whole population constituted from the two divergent lines also contributed to the intra-line variability.

*Computation of linkage disequilibrium*

The linkage disequilibrium measure r2 was computed for all SNP pairs distant by less than 500 kb and with minor allele frequency greater than 0.1, using the –r2 command of the Plink software, version v1.90b3.34 (
__https://www.cog-genomics.org/plink2__
). This computation was performed using genotyped animals from generation 0. For a given category of regions (for instance those listed in Table 1), all SNP pairs belonging to this category were collected and the average r2 among these pairs was estimated.