Skip to main content

Runs of homozygosity and selection signature analyses reveal putative genomic regions for artificial selection in layer breeding



The breeding of layers emphasizes the continual selection of egg-related traits, such as egg production, egg quality and eggshell, which enhance their productivity and meet the demand of market. As the breeding process continued, the genomic homozygosity of layers gradually increased, resulting in the emergence of runs of homozygosity (ROH). Therefore, ROH analysis can be used in conjunction with other methods to detect selection signatures and identify candidate genes associated with various important traits in layer breeding.


In this study, we generated whole-genome sequencing data from 686 hens in a Rhode Island Red population that had undergone fifteen consecutive generations of intensive artificial selection. We performed a genome-wide ROH analysis and utilized multiple methods to detect signatures of selection. A total of 141,720 ROH segments were discovered in whole population, and most of them (97.35%) were less than 3 Mb in length. Twenty-three ROH islands were identified, and they overlapped with some regions bearing selection signatures, which were detected by the De-correlated composite of multiple signals methods (DCMS). Sixty genes were discovered and functional annotation analysis revealed the possible roles of them in growth, development, immunity and signaling in layers. Additionally, two-tailed analyses including DCMS and ROH for 44 phenotypes of layers were conducted to find out the genomic differences between subgroups of top and bottom 10% phenotype of individuals. Combining the results of GWAS, we observed that regions significantly associated with traits also exhibited selection signatures between the high and low subgroups. We identified a region significantly associated with egg weight near the 25 Mb region of GGA 1, which exhibited selection signatures and has higher genomic homozygosity in the low egg weight subpopulation. This suggests that the region may be play a role in the decline in egg weight.


In summary, through the combined analysis of ROH, selection signatures, and GWAS, we identified several genomic regions that associated with the production traits of layers, providing reference for the study of layer genome.

Peer Review reports


Layer breeding places significant emphasis on the selection of egg-related traits such as egg number, weight, quality and appearance. Layer strains are subjected to long-term and intensive selection for the target traits, ensuring the development of efficient and high-performing commercial strains. Selection indexes including egg number (EN), egg weight (EW), quality and appearance traits have been used for decades in commercial breeding programs. It is commonly held that EW has a negative correlation with EN, prompting commercial breeders to prioritize higher EN and lower EW to meet the rising market demand for more eggs [1]. In addition, consumers seem to be focusing on the quality and appearance of eggs, which requires breeders to consider comprehensively the yield, quality, and appearance of eggs in order to meet consumers’ needs. With the advent of genomic selection (GS), it has become possible to investigate the genetic mechanism of important economic traits in greater depth [2]. The process of artificial selection has been accelerated by the application of GS, which has led to an increase in homozygosity in regions of the layer genome linked with egg-related traits [3].

Recently, runs of homozygosity (ROH) analysis has gained attention and is being increasingly adopted by researchers in the field to study population history and measure the degree of inbreeding [4,5,6]. ROH are continuous segments of a genome that are homozygous, meaning that the individual has inherited identical copies of genetic information from parents [7, 8]. ROHs can occur naturally in individuals as a result of inbreeding or heavy selection pressure and have hypothesized relevance with genes that influence disease susceptibility, cognitive ability, and production performance in individuals [6, 9, 10]. By utilizing ROH, scientists are able to better understand a population’s evolutionary history, inbreeding levels, and changes in genomic homozygosity in specific environments [11,12,13]. In agricultural research, ROH analysis has become an important tool for identifying genes and selection signatures that are associated with economic traits in livestock [3, 5, 14, 15]. Nevertheless, there are limitations in ROH research, mainly concerning the precise characterization of ROH in relation to the length and the number of loci [4, 16]. Presently, no definitive criteria have been established to determine the extent of ROH segments. Overall, the combination of ROH and selective sweep analysis can help us to gain a more comprehensive understanding of the genetic characteristics of different regions in the genome, and help to reveal the mechanisms associated with artificial selection and genetic variation to which chickens are subjected.

In this study, we used whole-genome sequencing (WGS) data to perform a genome-wide ROH study and multiple selective sweeps methods in order to detect and identify the putative regions subject to artificial selection for egg-related traits in a Rhode Island Red pure line with complete pedigree records. Rhode Island Red chickens are a widely used standard breed in the poultry industry due to their excellent egg production and quality. Specifically, we aimed to (i) investigate the distribution and frequency of ROH in this population, (ii) identify regions and genes within ROH islands that bear signals of selection across the whole population and annotate their functions, and (iii) combine the results of GWAS and selection signatures to analyze genomic differentiation and the differences of the incidence of SNP in ROH segments between subpopulations with high and low phenotype, to discovery genes or regions that may influence the phenotype. The traits used in this study include records at multiple time points of body weight (BW), egg number (EN), egg weight (EW), albumen height (AH), four kinds of eggshell color (ESCA, ESCB, ESCI and ESCL), eggshell strength (ESS), and glossiness (SINS) (Table 1). Our results will provide valuable insights into the genetic diversity and population structure of highly selected chicken populations and can inform breeding strategies to maintain and improve their productivity.

Table 1 Descriptive statistics of important chicken economical traits along with aging process


Characteristics of ROH and F(ROH)

A total of 141,720 ROH segments were identified in this population with an average length of 1.043 Mb, an average of 206.6 segments per individual, and 4,085 SNPs per segment. The largest ROH segment was 10.17 Mb in length and located between 117.39 and 127.56 Mb on GGA 2, containing 55,091 SNPs. Most of the ROH segments were located on chromosomes 1–4 (including 82,292 ROHs), accounting for 58.07% of the total (Fig. 1a). To distinguish ROH segments of different lengths, we artificially set three thresholds (1 Mb, 3 Mb and 5 Mb) to categorize them into four classes, and found that the proportions of these four ROH categories on each chromosome were not identical to the proportions in the entire genome. For instance, on large chromosomes, the proportions of each type were comparable; however, on small chromosomes, there tended to be a larger proportion of ROH segments longer than 3 Mb (Fig. 1a).

Fig. 1
figure 1

Descriptive graphics of runs of homozygosity (ROH) in Rhode Island Red chickens. a The distribution of ROH on chromosomes and the proportion of ROH for different lengths; b Inbreeding coefficients calculated by four methods: F(ROH) by PLINK, F(GRM) by G matrix in GCTA, F(PED) by pedigree in CFC, and F(HOM) by homozygosity in PLINK

When comparing inbreeding coefficients, F(ROH), with a mean of 0.224, was much higher than other calculations (F(PED): 0.019, F(GRM): 0.088, F(HOM): 0.070, Fig. 1b). And due to the different SNPs used, it was also observed that F(ROH) had low correlation coefficients (0.097 ~ 0.331) with F(GRM) and F(HOM), which used independent SNPs obtained by linkage disequilibrium (LD) pruning rather than all SNPs on genome. In addition, low Pearson correlation coefficient (0.097 ~ 0.348) between F(PED) and F(ROH), F(GRM), and F(HOM) were observed (see Additional file 1 Table S1), reflecting the fact that there might be relatively large differences between pedigree and genomic data for judging the degree of inbreeding.

In the population, we also found some individuals whose parents shared a common ancestor within the six-generation pedigree. Consequently, we formed subsets of individuals whose parents had a common ancestor from the same generation. Moreover, we posited that if the parents’ common ancestor appeared in later generations, the individual would exhibit a higher degree of inbreeding. We found that individuals whose parents shared a common ancestor in last four generations had a significantly higher number of ROH (223.57 ~ 228.14, Fig. 2a, p value < 0.01) and F(ROH) (0.242 ~ 0.247, Fig. 2b, p value < 0.01) than the population mean (number of ROH: 206.59; F(ROH): 0.224), and the greater the degree of inbreeding was, the larger the difference. However, this difference was no longer significant when the common ancestor appeared five generations earlier, suggesting that ROH was adept at reflecting recent inbreeding events. Similarly, individuals whose parents shared a common ancestor in last four generations had a higher proportion of long ROH (0.65 ~ 0.66%) than the population mean (0.62%) (Fig. 2c). These results confirmed that the use of F(ROH) to characterize the degree of inbreeding was relatively reliable in this Rhode Island Red pure line. Additionally, we observed that the deviation of F(ROH) between full siblings (mean = 0.032) was lower than the mean deviation between a random pair (0.037, p value = 0.025), indicating a relatively stable and common occurrence of ROH segments (Fig. 2d). Overall, ROH analysis is perfectly suited for studying inbreeding and selection in this group.

Fig. 2
figure 2

The degree of inbreeding measured by runs of homozygosity (ROH). “IndG-n” represents individuals whose parents’ common ancestor appeared “” generations ago. a The number of ROH in samples with different degrees of inbreeding; b F(ROH) of samples with different degrees of inbreeding; c The percentage of total ROH within each ROH length category; d The difference between two full siblings’ F(ROH) and random pairs. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001

ROH islands and De-correlated composite of multiple signals

For the whole population, we initially identified 23 ROH islands based on regions containing the top 1% most frequent loci in the ROH segments (Fig. 3a, threshold = 0.687) and then annotated the known QTL regions that overlapped with them (Table 2). Most ROH islands were distributed on GGA 1–6 (number: 18, accounting for 78.26%), and the average length of ROH islands was 1.13 Mb, containing a mean of 2058.6 SNPs, which revealed that ROH islands might be more common in regions with low SNP density (the average of SNP density of ROH segments was 4,085, as mentioned above). The average of minor allele frequency (MAF) of SNPs (n = 47,348) in ROH islands was 0.083, much lower than the average of all SNPs (0.243). In addition, the average incidence of SNPs in ROH segments for all ROH island regions was 73.41%, ranging from 68.68% to 84.81%, much higher than the average of all SNPs (17.70%).

Fig. 3
figure 3

Manhattan plot of the incidence of SNPs in ROH and selective sweeps detected by DCMS. a Whole genome-wide incidence of SNPs in ROH, which indicates the ROH islands in the population.; b Selective sweeps detected by DCMS on whole genome

Table 2 ROH islands overlapped with reported QTLs

The De-correlated composite of multiple signals (DCMS) consolidates three within-population statistics into a comprehensive score, including nucleotide diversity (Pi), Tajima’s D, and integrated Haplotype Score (iHS). Upon computing the DCMS statistic, we fitted the p-values to a normal distribution to identify candidate sweep regions by evaluating the empirical distribution’s top 1% (Fig. 3b, threshold = 2.20). This approach delineated a total of 176 candidate regions, with 127 of these regions situated on GGA 1–6 (72.16%). Out of 23 detected ROH islands, 12 coincided with these candidate regions, while 4 lay within 1 Mb proximity. Consequently, we identified 59 regions of overlap between ROH islands and selection signatures, corresponding to 60 genes (Table 3). Functional annotation of these identified genes suggested that they contributed to encompassing growth, immunity, disease resistance, cardiovascular and neurological functions in layers (Table 4). Notably, a number of pathways indicate that certain genes influence mRNA synthesis and metabolism. Moreover, the LARGE1, SLC18A2, and CACNB2 genes interact with various neural signal transduction pathways, while the LYN and LGALS2 genes are implicated in the regulation of immune responses.

Table 3 Genes located in overlapping regions of ROH islands and selective sweeps
Table 4 Functional annotation for overlap genes of ROH islands with selective sweeps

Combination of ROH and selection signature analysis with GWAS

To investigate the relationship between potentially selected regions and phenotypes, we performed GWAS and two-tailed analyses for 44 phenotypes on this population. We compared differences of the incidence of SNP in ROH segments and detected selection signatures with DCMS methods between the highest and lowest 10% subpopulations of samples for each phenotype, and combined these results with GWAS. First, we found that for all phenotypes, the average differences of the frequency of SNP appearing in ROH between the high and low subpopulations were almost zero, indicating that there were no obvious differences in genome homozygosity between two subpopulations (see Additional File 2, Table S2, where positive values indicate a higher frequency of SNPs occurring in the ROH segments of the low group, with the number representing the frequency difference; negative values indicate a higher frequency of SNPs in ROH segments in the high group). Then, we scanned the genome using FST, Pi ratio (Pi ratio = Pi(low) / Pi(high)), and cross-population extended haplotype homozygosity (XP-EHH, high group was set to be reference subpopulation) to analyze the genomic differentiation between two subpopulations for each phenotype, respectively, and established a DCMS framework individually to obtain P-values for each genomic window (Fig. 4). We retained genomic windows with FDR-corrected P values of less than 0.05 as potential selection signatures for each  phenotype. In addition, GWAS was performed to identify the significant SNPs for each phenotype on the whole population. In summary, we identified 1,682 significant and 4,095 suggestive significant SNPs (see Additional file 3 Table S3 and Additional file 4 Figure S1). Following GWAS, we also defined 29 QTL regions under 1 Mb in length, within 20 phenotype models (Table 5).

Fig. 4
figure 4

Manhattan plots of selective sweeps for EW56, BW80, AH72 and ESCA56. Selective sweeps detected by DCMS on whole genome for EW56, BW80, AH72 and ESCA56. Abbreviations: AH, albumen height; BW, body weight; ESC: eggshell color; EW, egg weight. The number following the trait indicates the age of week

Table 5 QTL regions and lead SNPs of phenotypes

Among these 29 QTL regions, we found that 13 overlapped with selection signatures identified by the DCMS method, involving multiple time points for EW, ESC and SINS, as well as BW80, AH72 and ESS36 (Fig. 5). Taking egg weight as an example (Fig. 6), we located the QTL near 25 Mb on GGA1. This region was highlighted not only by the DCMS methods but also exhibited selection signatures through the XP-EHH and FST methods. Since the standardized XP-EHH value is greater than 2, it suggests a higher extended haplotype homozygosity in this region of the low egg weight subpopulation. Furthermore, in this region, it was also observed that SNPs within ROH segments occur at a higher frequency in the low egg weight subgroup than the high (4.4 ~ 8.9%), implying that the low egg weight subpopulation has a higher degree of homozygosity in this region. In addition, six genes are located within this QTL region: CAPZA2, MET, CAV1, CAV2, TES, TFEC, and we performed functional annotation for them (Table 6).

Fig. 5
figure 5

Regional point and line plots of GWAS and selective sweeps. The combination of the regional point plots of GWAS with the line plots of selective sweeps for EW56, BW80, AH72 and ESCA56. Within the black vertical dashed line are the QTL regions defined through GWAS results. Red diamond dots indicate lead SNP within the corresponding QTL regions. The colored lines in each subfigure represent the P-values of the windows calculated by DCMS methods. Abbreviations: AH, albumen height; BW, body weight; ESC: eggshell color; EW, egg weight. The number following the trait indicates the age of week

Fig. 6
figure 6

Line plots of multi selective sweeps methods on the QTL associated with EW56. The upper, middle, and lower line plots represent the trends of the XP-EHH, FST and the differences of incidence of SNP in ROH segments near the QTL region associated with EW56, respectively

Table 6 Functional annotation for genes located in QTL regions associated with egg weight


In the current study, we utilized WGS data for ROH, selective sweeps and GWAS analyses in a population of 686 Rhode Island Red hens. With this study, we aim to explore the impact artificial selection might have on the genomes of layers and provide a theoretical foundation for specific breeding practices.

It is important to emphasize that this population has a relatively small number of founders and has undergone several generations of intense confinement selection for egg-related traits. These are important contextual factors for interpreting and discussing our results.

Compared to the results of previous studies on chicken [15, 17,18,19], the distribution and number of ROH in this population did not exhibit significant differences. ROH segments are commonly found on larger chromosomes, and most are classified as short in length (< 3 Mb). It is generally believed that longer ROH indicate inbreeding or a strong selection pressure, while shorter ROH can reflect the population structure of ancestors [7, 8]. In addition, we have observed a relatively higher proportion of long ROH (> 3 Mb) on smaller chromosomes; however, there is no clear explanation for this phenomenon yet.

One common application of ROH analysis is to assess inbreeding levels within a population. In our population, the value of F(ROH) was much larger than the other methods, which is relatively rare in various previous animal studies [3, 13, 20]. Comparative analysis of various methods revealed that F(ROH) and F(PED) exhibit relatively low correlations with alternative approaches, whereas F(GRM) displays a higher correlation with F(HOM). The discrepancy may arise from the use of different SNP datasets in the computation process—F(GRM) and F(HOM) typically employ loci independently filtered for LD [16]. In fact, when we used independent SNPs for detection of ROH, we found that the value of F(ROH) decreased, while the correlation with F(HOM) and F(GRM) increased. In the commercial breeding, pedigrees are conventionally employed to circumvent inbreeding, culminating in diminished inbreeding coefficients. Nonetheless, genomic analysis indicates that sustained intensive selective breeding across multiple generations can escalate genomic homozygosity, thereby exacerbating inbreeding levels. This phenomenon potentially accounts for the pronounced disparity in F(PED) relative to other genomic-based inbreeding coefficients. Further, our research indicates that individuals with more closely related parents possess an increased number of ROH segments and elevated F(ROH) values, confirming that ROH can measure the degree of inbreeding.

In analyzing the entire population, our objective was to pinpoint regions and corresponding genes under strong selection pressure by investigating genomic regions featuring overlaps between ROH islands and selective sweeps. We commenced by determining the frequency of each SNP within ROH segments, isolating the top 1% of SNPs genome-wide to identify 23 ROH islands. The findings revealed that both the SNP density and the MAF on these islands were considerably lower than the overall genomic mean. We speculate that reduced SNP density may aid ROH detection, and a diminished MAF indicates a propensity for SNP fixation within ROH islands.

DCMS strategy effectively amalgamates diverse selection signatures methods within a population into a single score, enhancing the precision of signal detection [21]. Within this framework, the methods Pi, Tajima’s D, and iHS were employed. We designated the top 1% of windows by DCMS P-value as potential signals of selection. Our analysis revealed that over half of ROH islands coincide with these candidate regions, culminating in the delineation of 60 genes. Subsequent functional annotation indicated that these pathways could influence a range of physiological activities in layers, encompassing growth, immunity, disease resistance, cardiovascular and neurological functions, kinesthetic capabilities, behavior, and metabolic processes at the cellular level. We have discovered several genes that may be related to the physiological activities of laying hens. The gene LARGE1 (Like-Glycosyltransferase 1) is associated with pathways related to ion channels and signal transduction in the body of laying hens. This gene encodes a glycosyltransferase that participates in the modification of glycoproteins. It has been reported as a gene potentially related to the abundance of intestinal microbiota in chickens, influencing the deposition of abdominal fat in chickens [22]. LYN (LYN Proto-Oncogene, Src Family Tyrosine Kinase) is a tyrosine kinase involved in various pathways related to immunity and cell apoptosis, and is also associated with growth [23] and aging [24]. These results describe the changes that occur in genome on whole population when it is subjected to artificial selection, yet it is difficult to directly relate to a specific production trait or breeding purpose.

The above findings mainly focused on regions of the genome that have been subjected to selection in the entire population, but did not address specific traits. To gain a deeper understanding of the impact of artificial selection on specific traits and associated genes, we conducted a two-tailed analysis. The analysis included: comparing the ROH islands and computing multiple selection signatures and building the DCMS framework between high and low subgroups, and combining these results with GWAS on each phenotype. We first found that in the high and low subgroups for most traits, the mean of the frequency differences of SNPs in ROH was close to zero, with extremes ranging from 0.22 to 0.38. These results suggest that there is no difference in the genomic homozygosity between high and low subgroups at the whole-genome level, though significant variations may exist in localized regions.

GWAS is capable of identifying common variants that explain genetic variation, and through it, we identified thousands of SNPs associated with different traits and defined QTL regions. We then explored the selection signatures by DCMS between high and low subgroups of each phenotype and corrected the results using FDR. Our aim was to find selection signatures in regions significantly associated with traits (QTL). Overall, we defined 29 QTL regions in 20 phenotypes (related to 8 traits, including AH, BW, ESCA, ESCI, ESCL, ESS, EW and SINS), and 13 QTLs overlapped with the significant regions in DCMS, which meant that these regions significantly associated with the traits were at the same time selection signatures identified by DCMS. In addition, since XP-EHH could be used to determine which subgroups had higher extended haplotype homozygosity, while ROH islands analysis could determine which had higher homozygosity, it might be possible to link the targets of artificial selection to the actual changes on the genome. Using EW56 as an example, we found significant DCMS results near the QTL region associated with EW56, implying that there might be genomic differences between the high and low EW56 subpopulations. The XP-EHH results were greater than 2, this suggested that the extended haplotype homozygosity was higher in the low EW56 subpopulation. In addition, the ROH island analysis also indicated that the degree of homozygosity of low EW56 subpopulation was higher near this region. Since the aim of artificial selection in this population is to reduce egg weight, our results seem to match this aim, but the exact relationship remains unknown.

Six genes are located within the QTL regions, and we have performed functional annotation on them. CAPZA2 and TES are involved in cell adhesion and regulating the cytoskeleton, affecting cell shape and movement. F-actin-capping protein subunit alpha-2 (CAPZA2) is a cytoskeleton assembly-associated protein that may be associated with the formation of small intestinal microvilli in poultry [25, 26]; Testin (TES) testosterone is a protein that is expressed in virtually all normal human tissues, and it plays an important role in its cell motility, adhesion and cytoskeleton [27]. Caveolin-1 (CAV1) is involved in several physiological activities such as signaling, apoptosis, and lipid metabolism. It has been reported to be associated with eggshell quality and organismal aging [28]. Caveolin-2 (CAV2) and CAV1, are members of the CAV family and are involved in the formation of lipid rafts (cholesterol-rich microdomains within membranes). They play a role in a variety of signaling, lipid metabolism, and cell protection from programmed death [29]. TFEC, as a transcription factor, may regulate genes related to cell differentiation and metabolism and has been reported to be expressed in chicken macrophages [30]. These genes are associated with the biological processes listed in Table 6, which may indirectly or directly influence the egg weight of laying hens. This includes (a) the modulation of membrane channels and signal transduction, such as potassium channel inhibition and calcium concentration regulation, which may affect muscle contraction and the function of the oviduct; (b) regulation of cell survival and programmed cell death, which may impact on follicle survival and lipid metabolism, related to the size and quality of the egg; © regulation of the cytoskeleton and adhesive structures, affecting cell morphology, differentiation, and the physical stability of follicles; (d) lipid and energy metabolism, such as lipid storage and cholesterol balance, which may play an important role in regulating the composition and size of the yolk.


In this research, we used WGS data to perform a genomic analysis of Rhode Island Red inbred lines of laying hens, employing ROH and selection signatures analyses. We detected 60 candidate genes within the intersection of ROH and selection signatures, potentially influencing productive attributes related to growth, immune response, disease resistance, cardiovascular health, and neurological functions in laying hens. Integrating both two-tailed analyses with GWAS identified multiple QTL subjected to selection for various phenotypes. Specifically, for egg weight, a QTL was pinpointed near the 25MB region on GGA 1. And both XP-EHH and ROH analyses indicated higher degree of extended haplotype homozygosity and genomic homozygosity in the low egg weight subpopulation near this region, consistent with the direction of artificial selection. Functional annotation of six genes (CAPZA2, MET, CAV1, CAV2, TES, TFEC) within this QTL indicated associations with vital physiological pathways, such as signal transduction, apoptotic processes, and lipid metabolism. Overall, our findings enhance the comprehension of the genome of layers and inform for poultry breeding programs.


Samples and whole genome sequencing

The population used in this study was obtained from a Rhode Island Red pure line in a commercial laying breeding program in Beijing, China. This line had been subjected to fifteen consecutive generations of intensive selection with selection indexes including EN, ESC and egg quality traits. The breeding stock was selected and reproduced for one generation per year, with 80 ~ 100 sire families per generation. Inbreeding was avoided in mating plans based on the pedigree. A total of 686 female chickens with full pedigree records over the past six generations and complete measurements of body weight and egg-related trait at different ages were used in this study. For each chicken, DNA was extracted from a volume of approximately 2 ml venous blood, which was collected from the wing and then placed in an anticoagulation tube with EDTA. The integrity of the DNA was verified, and whole-genome sequencing was performed using an Illumina HiSeq 2500 Sequencer (Illumina, Inc., San Diego, CA, USA). Paired-end reads of 150 bp were generated for each sample.

Data processing and quality control

Raw sequencing data were processed to acquire high-quality single nucleotide polymorphisms (SNPs) by adhering to the following protocols. Low-quality reads were filtered using FastQC v0.11.9 software [31]. Clean reads were then aligned to the Gallus gallus 6.0 reference genome using the Burrows‒Wheeler Alignment (BWA) v0.7.17 tool with default settings [32]. Potential duplicate reads were removed using the Picard toolkit, and the resulting alignments were indexed via SAMtools v1.17 [33]. Genome Analysis Toolkit (GATK) v4.2.3 was employed to process the alignments according to best practices [34]. To derive high-quality SNPs, a minimum quality score of 20 was applied to both bases and mapped reads for variant calling. SNPs from each bird were combined, and the resulting data were filtered using the GATK Variant Filtration module by applying stringent criteria: quality by depth > 5.0, mapping quality score > 40.0, FS < 60.0, MQRankSum > -12.5, ReadPosRankSum > -8.0, and excluding any three SNPs clustered within a 10 bp window. Subsequently, we employed PLINK v1.9 [35] to filter the SNP data using set parameters: a sample call rate exceeding 0.9, a SNP call rate over 0.9, and MAF greater than 0.01. Following filtration, the remaining SNPs and individuals were earmarked for imputation via BEAGLE v5.2 [36]. We then reperformed the PLINK v1.9 analysis, adhering to the same criteria previously mentioned. After these procedures, a comprehensive total of 5,904,820 SNPs spread across 32 chromosomes from 686 birds remained for subsequent analysis. Finally, these phased and filtered SNP data were annotated with SNPEff v5.2 [37] utilizing the chicken reference genome.

Detection of runs of homozygosity (ROH) and calculation of the inbreeding coefficient

Two studies were referenced for ROH setting [12, 38], and PLINK v1.9 was used for ROH detection with the following parameters: a minimum length of 500 Kb in an ROH (-homozyg-kb 500), a minimum of 50 SNPs in an ROH (-homozyg-snp 50), the minimum SNP density set to 1 SNP per 50 kb (-homozyg-density 50), and the maximum gap between two consecutive SNPs set to 1000 kb (-homozyg-gap 1000). For a sliding window of 50 SNPs, an allowance of no more than 5 missing SNPs per window (-homozyg-window-missing 5) and a maximum of one heterozygous SNP per window (-homozyg-window-het 1) were set.

Following ROH detection, we computed the coefficient of inbreeding using four different methods: (a) F(ROH), calculated as the sum of ROH segment lengths divided by the whole genome length; (b) F(PED), computed based on the pedigree using CFC software [39]; (c) F(GRM), the result of the diagonal of the genomic relationship matrix (GRM) constructed by GCTA v1.26.0 [40]; and (d) F(HOM), based on the homozygous sites by PLINK v1.9. We compared the results of all four methods in R ( to evaluate the degree of inbreeding and calculated the correlation coefficients of F(ROH) with F(PED), F(GRM), and F(HOM).

Detection of ROH islands

To delineate the genomic regions exhibiting the strongest association with ROH, we quantified the prevalence of SNPs within ROH by enumerating the occurrences of a specific SNP in an ROH across a diverse set of individuals. Subsequently, we selected the top 1% of SNPs demonstrating a prevalence exceeding 67.93% and combined proximate SNPs into genomic regions that were representative of ROH islands, which were then subjected to in-depth investigations. This result is presented by a Manhattan plot. All Manhattan plots in this paper were generated by the R packages CMplot [41] and ggplot2 [42].

Selective sweep identification and De-correlated composite of multiple signals

In addition to the analysis of selection signatures for within-population, we designed a two-tailed analysis to discover genomic differences between high- and low-level individuals for each trait. Before the detection for selection signatures, we placed the individuals ranking in the top and bottom 10% for each trait into the high- and low-level subgroups of this trait. To detect selection signatures within our population or between subgroups of each trait, we employed various strategies to examine the entire genome. Using VCFtools v0.1.16 [43], we computed fixation index (FST), nucleotide diversity (Pi) and Tajima’s D values with a sliding window approach, setting the window size to 50Kb and the step size to 25Kb for FST and Pi, with Tajima’s D also analyzed across 50Kb windows. And the Pi ratio is equal to the Pi value for each window in the low-level subgroup divided by the Pi value in the high-level subgroup (Pi ratio = Pi(low) / Pi(high)). Further, we determined genome-wide XP-EHH [44] and iHS values [45] using the –xpehh and -ihs command in the Selscan v1.3.0 software [46], normalized these values with the -norm command, and obtained the average XP-EHH and iHS value for each 50 Kb region. And in XP-EHH analysis, high subgroup was set to be reference subpopulation.

We incorporated Tajima’s D, Pi and iHS in the intra-population De-correlated composite of multiple signals (DCMS) [21] framework for detecting selection signatures of whole population, and FST, Pi ratio and XP-EHH in the inter-subpopulation DCMS framework for genomic differentiation between high- and low-level subgroups of each trait. In every DCMS, we calculated this statistic for each 50kb window using the MINOTAUR package in R [47]. The analysis steps were performed with reference to other studies [48,49,50]. To compute genome-wide P-values, we employed the stat_to_pvalue function, performing a left-tailed test for Pi, Tajima’s D, and iHS (setting two.tailed = FALSE, right.tailed = FALSE) and a right-tailed test for FST, Pi ratio and XP-EHH (setting two.tailed = FALSE, right.tailed = TRUE). An n × n correlation matrix was generated across these statistics utilizing the covNAMcd function in the rrcovNA R package (with parameters alpha = 0.75, nsamp = 300,000). This matrix provided the basis for calculating genome-wide DCMS values via the DCMS function in the MINOTAUR package. Subsequent to the DCMS value computation, we fitted a robust linear model to these values and normalized the distribution using the rlm function (with model dcms ~ 1) from the MASS R package. Finally, we applied the pnorm function to calculate P-values for the DCMS statistics and identify candidate sweep regions by evaluating the empirical distribution’s top 1%. For the DCMS between high- and low-subgroups, we applied the p.adjust function to conduct FDR correlation for the significant intervals associated with traits.

Then, we used BEDTools v2.26.0 [51] intersect to identify SNPs that were located in both ROH islands and potentially selected regions with low DCMS P-values. These SNPs were regarded as putative selection signatures and annotated to candidate genes and reported QTLs.

Functional annotation

We employed the online website g:Profiler ( [52] to retrieve enriched functional terms for these genes, including Gene Ontology (GO) categories and KEGG pathways.

Genome Wide Association Study (GWAS)

We conducted a genome-wide association study (GWAS) on all 42 traits incorporating all valid samples and SNPs passing the quality control of MAF (0.05) and Hardy–Weinberg test (1e-6), utilizing a univariate linear mixed model (LMM). This analysis was executed with GEMMA (version 0.98.4) software [53]. The statistical model implemented in this investigation is as follows:


Here, y represents the phenotypes of 686 individuals; W is a matrix of covariates (fixed effects: top five principal components and hatch effects) accounting for population structure, with α being a vector of corresponding effects that form the intercept; x signifies the marker genotypes and β refers to the associated marker effects; μ constitutes a vector of random polygenic effects with a covariance structure; and ε denotes a vector of random residuals. Additionally, the P value from the likelihood ratio test was chosen as a benchmark to evaluate the significance of the association between SNPs and egg traits. The threshold for genome-wide significance was established using a modified Bonferroni correction implemented through the R package simpleM. A total of 84,633 valid SNPs were carried out, and the thresholds of genome-wide significance and suggestive significance were defined as 5.91e-7 (0.05/84,633) and 1.18e-5 (1/84,633), respectively.

QTL region definition

For each trait, we used GWAS results to define QTL as chromosomal regions where the distance between adjacent pairs of significant variants was less than 1 Mb [54]. Within each locus, we identified the most significant variant as the lead variant. A maximum distance of 0.5 Mb on either side of the lead variant was allowed.

Availability of data and materials

Whole-genome resequencing data are available on the NCBI Sequence Read Archive (SRA) under PRJNA980845.


  1. Liu W, Li D, Liu J, Chen S, Qu L, Zheng J, et al. A genome-wide SNP scan reveals novel loci for egg production and quality traits in white leghorn and brown-egg dwarf layers. PLoS ONE. 2011;6:e28600.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Yan Y, Wu G, Liu A, Sun C, Han W, Li G, et al. Genomic prediction in a nuclear population of layers using single-step models. Poult Sci. 2018;97:397–402.

    Article  PubMed  Google Scholar 

  3. Shi L, Wang L, Liu J, Deng T, Yan H, Zhang L, et al. Estimation of inbreeding and identification of regions under heavy selection based on runs of homozygosity in a large white pig population. J Anim Sci Biotechnol. 2020;11:46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Peripolli E, Munari DP, Silva M, Lima ALF, Irgang R, Baldi F. Runs of homozygosity: current knowledge and applications in livestock. Anim Genet. 2017;48:255–71.

    Article  CAS  PubMed  Google Scholar 

  5. Dixit SP, Singh S, Ganguly I, Bhatia AK, Sharma A, Kumar NA, et al. Genome-wide runs of homozygosity revealed selection signatures in Bos indicus. Front Genet. 2020;11:92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Beynon SE, Slavov GT, Farre M, Sunduimijid B, Waddams K, Davies B, et al. Population structure and history of the Welsh sheep breeds determined by whole genome genotyping. BMC Genet. 2015;16:65.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in outbred human populations. Hum Mol Genet. 2006;15:789–95.

    Article  CAS  PubMed  Google Scholar 

  8. Ceballos FC, Joshi PK, Clark DW, Ramsay M, Wilson JF. Runs of homozygosity: windows into population history and trait architecture. Nat Rev Genet. 2018;19:220–34.

    Article  CAS  PubMed  Google Scholar 

  9. Zavarez LB, Utsunomiya YT, Carmo AS, Neves HH, Carvalheiro R, Ferencakovic M, et al. Assessment of autozygosity in Nellore cows (Bos indicus) through high-density SNP genotypes. Front Genet. 2015;6:5.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Marsden CD, Ortega-Del Vecchyo D, O’Brien DP, Taylor JF, Ramirez O, Vilà C, et al. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci. 2016;113:152–7.

    Article  CAS  PubMed  Google Scholar 

  11. Xu Z, Sun H, Zhang Z, Zhao Q, Olasege BS, Li Q, et al. Assessment of autozygosity derived from runs of homozygosity in Jinhua pigs disclosed by sequencing data. Front Genet. 2019;10:274.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Yuan J, Li S, Sheng Z, Zhang M, Liu X, Yuan Z, et al. Genome-wide run of homozygosity analysis reveals candidate genomic regions associated with environmental adaptations of Tibetan native chickens. BMC Genomics. 2022;23:91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wang H, Wang Q, Tan X, Wang J, Zhang J, Zheng M, et al. Estimation of genetic variability and identification of regions under selection based on runs of homozygosity in Beijing-You chickens. Poult Sci. 2023;102:102342.

    Article  CAS  PubMed  Google Scholar 

  14. Metzger J, Karwath M, Tonda R, Beltran S, Agueda L, Gut M, et al. Runs of homozygosity reveal signatures of positive selection for reproduction traits in breed and non-breed horses. BMC Genomics. 2015;16:764.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Bello SF, Lawal RA, Adeola AC, Nie Q. The study of selection signature and its applications on identification of candidate genes using whole genome sequencing data in chicken - a review. Poult Sci. 2023;102:102657.

  16. Meyermans R, Gorssen W, Buys N, Janssens S. How to study runs of homozygosity using PLINK? A guide for analyzing medium density SNP data in livestock and pet species. BMC Genomics. 2020;21:94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Marchesi JAP, Buzanskas ME, Cantão ME, Ibelli AMG, Peixoto JO, Joaquim LB, et al. Relationship of runs of homozygosity with adaptive and production traits in a paternal broiler line. Animal. 2018;12:1126–34.

    Article  CAS  PubMed  Google Scholar 

  18. Almeida OAC, Moreira GCM, Rezende FM, Boschiero C, de Oliveira PJ, Ibelli AMG, et al. Identification of selection signatures involved in performance traits in a paternal broiler line. BMC Genomics. 2019;20:449.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Boschiero C, Moreira GCM, Gheyas AA, Godoy TF, Gasparin G, Mariani P, et al. Genome-wide characterization of genetic variants and putative regions under selection in meat and egg-type chicken lines. BMC Genomics. 2018;19:83.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Peripolli E, Stafuzza NB, Munari DP, Lima ALF, Irgang R, Machado MA, et al. Assessment of runs of homozygosity islands and estimates of genomic inbreeding in Gyr (Bos indicus) dairy cattle. BMC Genomics. 2018;19:34.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Ma Y, Ding X, Qanbari S, Weigend S, Zhang Q, Simianer H. Properties of different selection signature statistics and a new strategy for combining them. Heredity (Edinb). 2015;115:426–36.

    Article  CAS  PubMed  Google Scholar 

  22. Wen C, Yan W, Mai C, Duan Z, Zheng J, Sun C, et al. Joint contributions of the gut microbiota and host genetics to feed efficiency in chickens. Microbiome. 2021;9:126.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ji J, Luo CL, Zou X, Lv XH, Xu YB, Shu DM, et al. Association of host genetics with intestinal microbial relevant to body weight in a chicken F2 resource population. Poult Sci. 2019;98:4084–93.

    Article  CAS  PubMed  Google Scholar 

  24. Park JW, Ji YI, Choi YH, Kang MY, Jung E, Cho SY, et al. Candidate gene polymorphisms for diabetes mellitus, cardiovascular disease and cancer are associated with longevity in Koreans. Exp Mol Med. 2009;41:772–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Huang Y, Mao X, van Jaarsveld RH, Shu L, Terhal PA, Jia Z, et al. Variants in CAPZA2, a member of an F-actin capping complex, cause intellectual disability and developmental delay. Hum Mol Genet. 2020;29:1537–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kaewsatuan P, Poompramun C, Kubota S, Yongsawatdigul J, Molee W, Uimari P, et al. Comparative proteomics revealed duodenal metabolic function associated with feed efficiency in slow-growing chicken. Poult Sci. 2022;101:101824.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Popiel A, Kobierzycki C, Dziegiel P. The role of testin in human cancers. Pathol Oncol Res. 2019;25:1279–84.

    Article  CAS  PubMed  Google Scholar 

  28. Cheng X, Li X, Liu Y, Ma Y, Zhang R, Zhang Y, et al. DNA methylome and transcriptome identified key genes and pathways involved in speckled eggshell formation in aged laying hens. BMC Genomics. 2023;24:31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Shu G, Liao WY, Feng JY, Yu KF, Zhai YF, Wang SB, et al. Active immunization of fatty acid translocase specifically decreased visceral fat deposition in male broilers. Poult Sci. 2011;90:2557–64.

    Article  CAS  PubMed  Google Scholar 

  30. Bush SJ, Freem L, MacCallum AJ, O’Dell J, Wu C, Afrasiabi C, et al. Combination of novel and public RNA-seq datasets to generate an mRNA expression atlas for the domestic chicken. BMC Genomics. 2018;19:594.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Yan L, Yang M, Guo H, Yang L, Wu J, Li R, et al. Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol. 2013;20:1131–9.

    Article  CAS  PubMed  Google Scholar 

  32. Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25:1754–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.

    Article  PubMed  PubMed Central  Google Scholar 

  34. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet. 2021;108:1880–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms. SnpEff Fly. 2014;6:80–92.

    Article  Google Scholar 

  38. Ceballos FC, Hazelhurst S, Ramsay M. Assessing runs of homozygosity: a comparison of SNP array and whole genome sequence low coverage data. BMC Genomics. 2018;19:106.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Sargolzaei M, Iwaisaki H, Colleau JJ. CFC: a tool for monitoring genetic diversity. In Proceedings of the 8th world congress on genetics applied livestock production: 13–18 August 2006. Belo Horizonte; 2006. p. 27–8.;jsessionid=4CCC46A48E8EAD41038094118D68D660.

  40. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, et al. rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study. Genomics Proteomics Bioinformatics. 2021;19:619–28.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Wickham H. ggplot2: elegant graphics for data analysis. New York: Springer-Verlag; 2016.

  43. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol. 2006;4:e72.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Szpiech ZA, Hernandez RD. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol Biol Evol. 2014;31:2824–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Verity R, Collins C, Card DC, Schaal SM, Wang L, Lotterhos KE. Minotaur: a platform for the analysis and visualization of multivariate results from genome scans with R shiny. Mol Ecol Resour. 2017;17:33–43.

    Article  CAS  PubMed  Google Scholar 

  48. Zhang S, Yao Z, Li X, Zhang Z, Liu X, Yang P, et al. Assessing genomic diversity and signatures of selection in Pinan cattle using whole-genome sequencing data. BMC Genomics. 2022;23:460.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Peripolli E, Reimer C, Ha NT, Geibel J, Machado MA, Panetto J, et al. Genome-wide detection of signatures of selection in indicine and Brazilian locally adapted taurine cattle breeds using whole-genome re-sequencing data. BMC Genomics. 2020;21:624.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Illa SK, Mukherjee S, Nath S, Mukherjee A. Genome-wide scanning for signatures of selection revealed the putative genomic regions and candidate genes controlling milk composition and coat color traits in sahiwal cattle. Front Genet. 2021;12:699422.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Reimand J, Arak T, Vilo J. g:Profiler–a web server for functional interpretation of gene lists. Nucleic Acids Res. 2011;39:W307–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Bouwman AC, Daetwyler HD, Chamberlain AJ, Ponce CH, Sargolzaei M, Schenkel FS, et al. Meta-analysis of genome-wide association studies for cattle stature identifies common genes that regulate body size in mammals. Nat Genet. 2018;50:362–7.

    Article  CAS  PubMed  Google Scholar 

Download references


The authors would like to thank Beijing Engineering Research Centre of Layer for providing the experimental chickens.


This work was supported by the National Key Research and Development Program of China (2022YFF1000204 and 2021YFD1300600), STI2030—Major Projects (2023ZD04052), China Agriculture Research Systems [CARS-40] and 2115 Talent Development Program of China Agricultural University [00109015].

Author information

Authors and Affiliations



XL performed the analysis and wrote the first draft; FL and XC assisted in preparing the datasets; YY, GL, and GW assisted in preparing samples in study and provided comments on the manuscript. CS and NY were in charge of the project management, contributed to the design of the study, and revised the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Congjiao Sun or Ning Yang.

Ethics declarations

Ethics approval and consent to participate

All animal experiments were approved by the Animal Care and Use Committee of China Agricultural University (Permit number: SYXK 2007–0023). All animal experiments were performed strictly according to the requirements of the Animal Ethics Procedures and Guidelines of the People’s Republic of China. All methods are reported in accordance with ARRIVE guidelines for the reporting of animal experiments.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Lan, F., Chen, X. et al. Runs of homozygosity and selection signature analyses reveal putative genomic regions for artificial selection in layer breeding. BMC Genomics 25, 638 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: