Data from two elite breeding lines of egg-laying chickens from Hy-Line Int., Line 1 (White Egg) and Line 2 (Brown Egg), were analyzed. The data included average adjusted progeny performance of sires for two egg-production and 10 egg-quality traits, pedigree information, and genotypes for a 3,000 SNP panel. Phenotypic data for each sire were the average performance of his daughters, with each daughter's phenotype adjusted for random effects of half of the Estimated Breeding Value (EBV) of the daughter's dam and for the fixed effect of hatch, as estimated by routine genetic evaluation procedures utilized by Hy-Line, International. Traits were classified into two categories based on age of measurement: early (first 3 egg weight, or measured at week 26) and late traits (measured at week 38). Early traits included Early Albumen Height (EAH, mm), First 3 Egg Weight (E3, gr), Early Egg Weight (EEW, gr), Early Production (EPD, % egg production), Early Shell Quality (EPS, gr in pin point pressure), Sexual Maturity (SM, day) and Early Yolk Weight (EYW, gr). Late traits included Late Albumen Height (LAH, mm), Late Egg Weight (LEW, gr), Late Production (LPD, % egg production), Late Shell Quality (LPS, gr in pin point pressure) and Late Yolk Weight (LYW, gr).

The DNA samples from sires in two consecutive generations (2003 and 2004) and from their paternal ancestors (3 to 6 generations back) were genotyped for about 3000 SNPs using an Illumina SNP array [14] that has been previously described [10, 15]. In total, 132 and 131 sires were genotyped in Line 1 and Line 2, respectively.

### Linkage disequilibrium analyses

Linkage disequilibrium was analyzed using the SNP data for chromosomes 1 and 4 on roosters from the two lines from two consecutive years (2003 and 2004), reflecting two subsequent generations. These two chromosomes were chosen because they had sufficiently larger numbers of SNPs to obtain accurate results. In addition, results are expected to be representative of other chromosomes (see discussion section). The numbers of genotyped roosters were 60 and 67 for Line 1 and 2, respectively, in year 2003; and 42 and 20 in year 2004. Chromosomes 1 and 4 had 449 and 184 genotyped SNPs, which were spaced on average 450–500 kb apart. Because low minor allele frequency (MAF) can skew measures of LD [10], only SNPs with MAF greater than or equal to 0.05 were used for LD analyses.

The SNP data were analyzed using Haploview [16] and PowerMarker [17] to measure the LD between all pairs of markers by r^{2} and r. To assess the decline of LD with distance, r^{2} was plotted against distance and a moving average of LD was calculated. A predicted LD curve was estimated by fitting the Sved [9] equation to the r^{2} data: r^{2} = 1/(1+4*N_{e} *d), where d is distance in Morgans estimated using 2.4 cM/Mb for both chromosomes, and N_{e} is the effective population size. This model was fitted to data from each line and each chromosome using methods described by Zhao et al. [18]. To assess consistency of LD across generations and between the two lines, correlations of LD were calculated for matching marker pairs, by distance between markers.

### SNP association analyses

For the purpose of detecting SNPs associated with traits, which was performed in line 1 only, Illumina's top A and B allele calls [19] for SNPs that were segregating in the population (MAF>0) were coded as 0 or 1, respectively, and the number of copies of the 0 allele that a genotyped sire carried at each locus (0, 1 or 2) was included in statistical models. For missing genotypes (<1% of all genotype calls), twice the frequency of allele 0 among genotyped sires within the line was used as the expected number of 0 alleles for that individual.

Initial analyses of SNP data of Line 1 showed 22 roosters with genotypes for one or more SNPs that were incompatible with the genotype of the sire (eg., genotype = 0 and sire genotype = 2). For most of these individuals, incompatibility was limited to one SNP, which was considered as a genotyping error (<0.005% of total genotypes) and coded as missing. For individuals with multiple incompatible SNPs (n = 21 to 60), parentage testing was carried out to determine the most likely sire. Briefly, SNP genotypes of these individuals were compared with the SNP genotypes of all genotyped individuals and the number of incompatible SNP genotypes was counted for each pair of individuals. For each of these individuals, there was a single sire in the previous generation with fully compatible SNP genotypes and these were, therefore, used as their sire in the analysis.

Association analyses were conducted using two types of analyses: 1) SNP-wise analyses, in which each SNP or each window of three consecutive SNPs was fitted separately, along with a polygenic effect, and 2) Genome-wise analyses, in which all SNPs were fitted simultaneously, using the BayesB method of Meuwissen et al. [6].

### SNP-wise analyses

Two different mixed models were used for the SNP-wise analyses: a 1-SNP model and a 3-SNP model. Both models were implemented using the Mixed Model procedure of SAS based on programs developed by Hassen et al. [

20] and using SAS macro LORG [

21] to fit a polygenic effect with relationships. The general model equation used was:

**Y** is an n × 1 vector of average adjusted daughters' performance for n sires

**X**
_{gen} is design matrix which relates observations to fixed generation effects

**Gen** is a vector of solutions for the generation effect

**X**
_{i} is the design matrix for SNP effects, with elements equal to the number of 0 alleles carried by each sire. For the 1-SNP analysis, this is an n × 1 matrix with elements equal to the number of 0 alleles carried for SNP i. For the 3-SNP analysis, this is an n × 3 matrix with elements corresponding to the number of 0 alleles for SNPs i-1, i, and i+1.

β
_{i} is a 1 × 1 (1-SNP analysis) or a 3 × 1 (3-SNP analysis) vector of SNP allele substitution effects, fitted as fixed effects.

**Z** is an incidence matrix relating random sire effects to the vector of observations

**s** is a vector of random effects of sires, assumed distributed normal with mean 0 and variance **A**
, where **A** is the additive genetic relationship matrix [22] derived from four generations of pedigree of male and female ancestors, and
is the sire variance

**e** is a vector of random error effects, which was assumed distributed normal with mean 0 and variance **D**
where **D** is a diagonal matrix with elements equal to the reciprocal of the number of progeny included into each sire's progeny average, and
is the error variance.

Estimates of
and
were obtained by ASReml [23] using the above model but without SNP effects.

The significance of SNP effects was obtained from a likelihood ratio test of the full model that included SNP effects to a reduced model without SNP effects [22]. Associated *P*-values were obtained from a chi-square distribution with one (for the 1-SNP model) or three (for the 3-SNP model) degrees of freedom.

### Genome-wise analyses

The BayesB method of Meuwissen et al. [

6] was used to identify SNPs associated with traits by fitting all SNPs across the genome simultaneously. In addition to the mean and random markers and error effects that were modelled in the BayesB method of Meuwissen et al. [

6], a polygenic effect of sires was also included. The model fitted to the average adjusted daughters' performances of sires from the two last generations was:

where **Y**, **X**
_{i}, *β*
_{i}, **Z**, **s**, and **e** are as in the 1-SNP analysis model described previously, *μ* is an intercept, I_{i} is a 0/1 indicator variable indicating whether SNP i is included in the model (see below), and the summation is over all SNPs that are segregating at MAF ≥ 0.05. Substitution effects *β*
_{i} were fitted as random with variance
. The prior probability that a SNP has genetic variance greater than zero (
>0 and so I_{i} = 1), 1-*π*, was set equal to 0.05 [6]. The prior distribution of the variance of such loci,
, was an inverted chi-square distribution *X*
^{-2}(*v*, *S*), with parameters *v* = 4.2 [6] and
, where
is the expected additive genetic variance, which was set to 4
, where
is the estimate of the sire variance obtained from the ASReml analysis described previously, *k* is the expected number of segregating QTL, which was set equal to 200 and
is mean allele frequency variance, which was set equal to 0.5. Note that the *S* parameter used here differs from what was used by Meuwissen et al. [6] because here
was taken to be
[24, 25] rather than
/*k*, as in Meuwissen et al. [6]. The Markov chain Monte Carlo (MCMC) algorithm used to fit this model was run for 150,000 iterations, which included running one Gibbs chain in each iteration for sampling *μ*,
, **s**,
, and *β*
_{i}, and running 100 cycles of the Metropolis-Hastings algorithm within each Gibbs chain for sampling
. The first 100,000 cycles were removed from the analysis as burn-in. If, in an iteration of the MCMC,
= 0 for SNP i, then that SNP was not included in the model in that iteration (I_{i} = 0).

Polygenic effects, **s**, and polygenic variance,
, were sampled using a fully blocked implementation of the Gibbs sampler presented by García-Cortés and Sorensen [26] in the context of the Gaussian linear model. The prior distribution of polygenic effects **s** was assumed Normal (0, **A**
), where **A** is the numerator relationship matrix, derived as described previously. The prior distribution of
was an inverted chi-square distribution *X*
^{-2}(*v*, *S*), with *v* = 4.2 and
, where
is the expected polygenic variance which was set to half of the genetic variance estimated for the model without SNPs using ASReml.

The fraction of cycles with I_{i} = 1 for each SNP gives a marginal posterior probability (*φ*
_{
i
}) of a SNP being associated with the trait, conditional on all other markers included in the model. An association was considered significant if *φ*
_{
i
} > 0.2. Given a prior probability of a SNP having non-zero variance of 1-*π* = 0.05, *φ*
_{
i
} > 0.2 can be interpreted as the probability that a SNP has non-zero variance at least 4 times more often than expected. However, this number is likely underestimated as effects of a QTL could be distributed across multiple correlated SNPs that may be genotyped in the QTL region (see discussion). Furthermore, if the sum of *φ*
_{
i
} for a window of 3 adjacent SNPs was greater than 0.35 (excluding windows already identified as significant by containing a SNP with *φ*
_{
i
} > 0.2), that 3-SNP window was considered to be significantly associated with the trait. The rationale for applying this approach is that the sum of *φ*
_{
i
} for 3 adjacent SNPs might represent an important signal from a region in which no single SNP was above the *φ*
_{
i
} > 0.2 threshold.

In addition to estimating trait polygenic and residual variances by ASReml, as described previously, they were also estimated using the Bayesian model without fitting SNP effects. In these analyses, the mean of the prior distribution of the polygenic variance was set equal to the estimate obtained from ASReml.