Identification of candidate genomic regions for chicken egg number traits based on genome-wide association study

Background Since the domestication of chicken, various breeds have been developed for food production, entertainment, and so on. Compared to indigenous chicken breeds which generally do not show elite production performance, commercial breeds or lines are selected intensely for meat or egg production. In the present study, in order to understand the molecular mechanisms underlying the dramatic differences of egg number between commercial egg-type chickens and indigenous chickens, we performed a genome-wide association study (GWAS) in a mixed linear model. Results We obtained 148 single nucleotide polymorphisms (SNPs) associated with egg number traits (57 significantly, 91 suggestively). Among them, 4 SNPs overlapped with previously reported quantitative trait loci (QTL), including 2 for egg production and 2 for reproductive traits. Furthermore, we identified 32 candidate genes based on the function of the screened genes. These genes were found to be mainly involved in regulating hormones, playing a role in the formation, growth, and development of follicles, and in the development of the reproductive system. Some genes such as NELL2 (neural EGFL like 2), KITLG (KIT ligand), GHRHR (Growth hormone releasing hormone receptor), NCOA1 (Nuclear receptor coactivator 1), ITPR1 (inositol 1, 4, 5-trisphosphate receptor type 1), GAMT (guanidinoacetate N-methyltransferase), and CAMK4 (calcium/calmodulin-dependent protein kinase IV) deserve our attention and further study since they have been reported to be closely related to egg production, egg number and reproductive traits. In addition, the most significant genomic region obtained in this study was located at 48.61–48.84 Mb on GGA5. In this region, we have repeatedly identified four genes, in which YY1 (YY1 transcription factor) and WDR25 (WD repeat domain 25) have been shown to be related to oocytes and reproductive tissues, respectively, which implies that this region may be a candidate region underlying egg number traits. Conclusion Our study utilized the genomic information from various chicken breeds or populations differed in the average annual egg number to understand the molecular genetic mechanisms involved in egg number traits. We identified a series of SNPs, candidate genes, or genomic regions that associated with egg number, which could help us in developing the egg production trait in chickens. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07755-3.


Background
Reproduction traits, especially egg production, are the most important economic cares in chickens [1]. Laying performance usually reflects a chicken's reproductive performance [2]. As an important source of animal protein, the consumption of poultry eggs worldwide has increased significantly over the past few decades [3]. Each person consumes approximately 12.5 kg of eggs per year [4]. Egg consumption may continue to increase with accretion in urban populations [5,6]. Therefore, it is of great practical and economic significance to understand the genetic mechanisms of chicken reproductive traits. However, egg production is a polygenic genetic trait with low to medium heritability and is affected by both genetic components and environmental factors [7,8]. It can be evaluated by many indicators such as age at first egg, egg number, egg production rate and so on. The egg number is an important reproductive trait in poultry breeding, an important indicator that can effectively evaluate individual egg production at a certain stage and the fertility of breeding chicken [9].
It is possible to analyze the genetic mechanisms of complex traits by using GWAS with the development of sequencing technology. GWAS can not only take full advantage of molecular markers at the genome level, but, owing to the use of whole genome sequences, avoid the effects of linkage imbalance between SNPs and underlying genes [10]. At present, a few candidate genes and regions related to egg number have been reported based on GWAS technology [9,[11][12][13][14][15][16]. According to the QTL database [17], 12,782 QTLs related to chicken economic traits have been identified, 332 of which are associated with egg number.
Some commercial lines or populations are intensively selected for their production traits. Rhode Island Red and White Leghorn chickens are well known for their distinguished egg productivity. Dwarf chickens in China have also been developed for egg production. The average annual egg number of them is approximately about 300 eggs. Chinese indigenous chickens grow relatively slowly and are known to produce less than 200 eggs per year. Therefore, we performed a GWAS based on the differences of egg number between egg-type chickens and local chickens to explore the underlying molecular genetic mechanisms and identify candidate genes or genomic regions related to egg number traits. The results of this study are supposed to be beneficial for layer breeding.

Population structure testing
Principal component analysis (PCA) using the first two principal components showed that there was an obvious stratification phenomenon between Chinese indigenous chicken breeds (black circle) and commercial egg-type chickens. At the same time, we found a Tibetan chicken was mixed with White Leghorn, as shown by the red circle in Fig. 1 and (see Additional File 4: Figure S1). Meanwhile, WL_CAU and WL_YQ were divided into two groups as they came from two different population. The first principal component (PC1) and the second principal component (PC2) explain 15.92 and 7.49% of the total variance (or 59.65 and 28.08% of the top three PCs), respectively. In GWAS, population stratification might lead to false-positive results. So we used principal components as covariates correct for stratification in this study [18]. When the covariate was added as 1st PC, top two PCs, top three PCs, top four PCs and top five PCs, we performed GWAS and calculated the genomic inflation factors (λ) respectively. λ was 1.004, 0.917, 0.916, 0.891, 0.882, respectively. When the 1st PC was added as a covariate, λ is the closest to 1, indicating that the correction effect of population stratification is the best [19]. So we finally decided to add the 1st PC in the GWAS mixed model to adjust for population stratification.
Admixture software was used to analyze the population structure. We displayed a bar plot based on the cross-validation error rate (Additional file 5: Figure S2). When K = 2, Rhode Island Red and one of the White Leghorn groups (WL_YQ) appeared as two differentiated clusters. When K = 3-4, two White Leghorn populations (WL_CAU, WL_YQ) gathered in the same group. When K = 5, two White Leghorn populations were separated. When K = 6-9, the high productivity layers from four populations (WL_CAU, WL_YQ, RIR, DW) were separated, indicating that the genetic backgrounds of these layers were different (Additional file 4: Figure S1). These results reinforce the subsequent analyses.

Genome-wide association study
We took the high productivity group as the case and the low productivity group as the control, and performed GWAS on the dramatic differences in egg number between two groups.
The QQ plot was presented as Fig. 2a, the λ was equal to 1.004, which means that there was no population stratification phenomenon and the GWAS results were reliable.
In addition, the most significant peak in this study was located at 48.61-48.84 Mb on chromosome 5. The chisquare test was carried out to compare the allele frequencies of the significant SNPs identified in this region between the high and low productivity groups. The results showed that the allele frequencies of these 20 SNPs were significantly different between the two groups (Additional file 2: Table S2). At the same time, four genes were repeatedly identified in this region. YY1 is involved in oocyte growth and maturation [34]. A member  Table 3. PC1, principal components one; PC2, principal components two. PC1 and PC2 explain 15.92 and 7.49% of the total variance (or 59.65 and 28.08% of the top three PCs), respectively   [35]. Brain enriched guanylate kinase (BEGAIN) is a gene specifically expressed in the brain and is involved in the regulation of postsynaptic neurotransmitter receptor activity [36].

Comparing with previously reported QTLs
Through Animal QTLdb, we detected 4 QTLs that overlapped with SNPs obtained from this study. Two of these 4 QTLs were associated with egg production, including 1 with egg production rate and 1 with small yellow follicle number. The remaining QTLs were related to reproductive traits, including 2 with ovary weight (Additional file 3: Table S3).

GWAS and QTL overlapping
An important condition for GWAS to achieve better results is to eliminate false associations caused by differences in allele frequencies arising from population stratification, recessive kinship, and genotyping errors [37]. The GEMMA adopted in this study considers the group stratification and sample structure. At the same time, we also added the PC1 as a covariate to reduce the group stratification effect. The results of the QQ plot and λ show that the correction effect is good, and there is no population stratification phenomenon. Four of the 148 QTLs in this study were those identified in previous studies. Among these overlapping QTLs, 2 QTLs were associated with egg production. This reinforces the results of our study. AX-75745363 is located the ENSGALG00000029896 gene on chromosome 13. AX-76715084 is located 0.046 Mb upstream of the GRXCR1 (glutaredoxin and cysteine-rich domain containing 1) gene on chromosome 4. Although these genes  have not been very well studied in chickens, and their functions have not been fully elucidated, they provide a reference and idea to understand the molecular mechanism for egg number traits.

Candidate genes
As far as we know, this study has the largest variety of breeds so far in the research of reproductive traits in chicken, which not only improves the detection ability of related QTLs, but also allows us to detect some QTLs related to fat and heat resistance. We speculate that this was due to the relatively slow fat formation and deposition [38,39] and the relatively high heat generation of layers [40]. Thus, we identified 32 candidate genes based on their function. These genes mainly affect egg number in three ways ( Table 2). Some genes regulate hormone levels, including gonadotropin-releasing hormone (GNRH), oxytocin (OXT), growth hormone (GH), and thyroid hormone (TH). All these hormones play a vital role in the female reproductive system [41][42][43][44][45][46]. Some genes affect egg number traits by affecting the growth and development of follicles. It is well known that the growth and development of follicles are critical for reproductive function, especially in chickens. The remaining genes directly affect reproductive system development. Among these 32 candidate genes, we also found 7 important genes that have been identified as related to reproductive traits such as egg production, egg number and litter size in previous studies, which further validates our findings. NELL2 and KITLG are located on GGA1. NELL2 not only affects the synthesis and secretion of GNRH [47], but has also been shown to be involved in maintaining the normal female reproductive cycle of mammals [20]. KITLG plays an important role in the growth and development of follicles [21,48]. It has been shown to be related to the litter size of goat and sheep, and has been considered an excellent candidate gene for reproductive traits of humans and livestock [21][22][23]. Therefore, it is reasonable to speculate that KITLG has an important impact on egg number traits. GHRHR located on GGA2 participates in the secretion and synthesis of GH. It is believed to be involved in the growth and reproduction of livestock [49]. Liu et al. identified three SNPs in the GHRHR promoter that are significantly related to egg number traits in Beijing You chickens [24]. NCOA1, located on GGA3, is involved in Regarding the sample size, we tried to make the ratio between case and control was 1:1 to avoid the large number difference between the two groups interfering the results regulating signal pathways mediated by TH and estrogen. It has not only been shown to be an important gene that influences reproductive traits in pigs and sheep, but also related to egg production, fertility, and reproductive traits in chicken [25][26][27][28][29][30]. ITPR1 repeatedly identified in the 18.56-18.57 Mb on chromosome 12 can not only participate in the signaling pathway of GnRH, estrogen, and the synthesis and secretion of GH and TH, but also affects the growth and differentiation of follicles. In addition, ITPR1 has been reported to be involved in the transport of Ca 2+ and may be associated with egg number [31]. This suggests that this region located on chromosome 12 and the ITPR1 gene may be important for chicken egg number traits. GAMT located on GGA28 has been shown to be associated with the reproductive system and development [32]. CAMK4 located on chromosome Z is involved in the signaling pathway of OXT and may play a role in the development of follicles and ovulation [50]. It is believed to play a significant role in the reproductive processes of females [33]. However, NELL2, GAMT, and CAMK4 have not been studied before in chickens, and the results of this study may pave the way for future researchers to explore the relationship between these genes and egg number traits. Also, the specific functions of these genes need to be further verified.

Candidate region
In this study, the most significant peak obtained was located at 48.61-48.84 Mb region on GGA5. In this region, YY1, SLC25A29, WDR25, and BEGAIN were annotated. Among them, YY1 and WDR25 have been shown to be related to oocytes and reproductive tissues, respectively [34,35]. However, there is no concrete literature to prove that they are associated with egg number traits in chickens, thus further research is still required. At the same time, interestingly enough, a number of studies have detected regions associated with egg number traits on chromosome 5 [12,15]. The region identified in our study was about 1.2 Mb away from the QTL reported by Zhang et al. [15]. Although the results are different, it has once again proved that chromosome 5 is an important candidate region that affects the reproductive traits of chickens.

Conclusions
In this study, we performed a GWAS based on the difference of egg number between high productivity layers and Chinese indigenous chickens and identified a series of SNPs and candidate genes related to reproductive traits. Four of the SNP effects overlapped with previously reported QTL regions, which supports the results of this study. These results may help us to better understand the molecular mechanisms underlying reproductive traits in chickens and even other species.

Experimental animals
For this study, 442 chicks were available. Among them, White Leghorn and Rhode Island Red are intensively selected commercial breed, Dwarf Chicken is a synthetic layer line. Both they are egg-type chickens and produce about 300 eggs per year. They were placed in the high egg productivity group in this study. The other ten Chinese indigenous breeds laying less than 200 eggs annually were classified into the low egg productivity group. The details of the samples are presented in Table 3.

Genotyping and quality control
In total, blood samples from 442 chickens from the high and low egg productivity groups were collected by standard venipuncture. After DNA extraction using the standard phenol/chloroform method [51], the chickens were genotyped using a 600 K Affymetrix Axiom Chicken Genotyping Array with a total of 580,961 SNPs [52]. Quality control was performed using Plink v1.9 [53]. SNPs with a minor allele frequency ≥ 1% and genotyping rate ≥ 98% were retained. Individuals with a genotype deletion rate of > 5% were excluded. SNPs with Hardy-Weinberg equilibrium P < 10 − 6 were eliminated. After filtering, 439 chickens, including 218 in high and 221 in low egg productivity groups, and 456,647 SNPs were retained for further analyses.

Population structure analysis
Prior to GWAS, population structure was examined by PCA. Plink 1.9 was used to determine the population structure and generate eigenvectors and eigenvalues, and the "ggplot2" package in R studio was used to visualize the results of PCA. We selected the first two principal components with the largest variance interpretation rate as the horizontal and vertical coordinates to create a PCA plot. At the same time, we calculated the principal component contribution rate based on 439 eigenvalues.
We retained relatively dependent SNPs with the plink '--indep-pairwise 25 5 0.2' command. The genetic structure was estimated using Admixture software [54]. We calculated the ancestor coefficient matrix, simulated the situation of genetic clusters (K) from 1 to 20, and computed the cross-validation error rate. Furthermore, we used an online pophelper to display a population structure bar plot (http:// pophelper.com/) [55].

Genome-wide association study
GWAS analyses of the egg number were performed using a univariate mixed linear model in GEMMA [56]. In the current study, only the PC1 was used as a covariate to correct population stratification. The model is as follows: where y denotes a phenotypic value vector of 439 individuals, W is a matrix of covariates (fixed effects that contain a column of 1 s and the first principal component), α represents a vector of the corresponding coefficients consisting of intercepts, x is a vector of marker genotypes, β is the effect size of a marker, u is a vector of random effects with a covariance structure that follows a normal distribution as u~N (0, KVg), where K is a genetic relationship matrix and Vg is the polygenic additive variance; and ɛ is a vector of random residuals. In this study, the Wald statistic was used to test each SNP.
Manhattan and Quantile-Quantile (QQ) plot were made by R package "CMplot" and "qqman" respectively, and we also calculated λ based on p-values from GWAS to judge the degree of false-positive [57]. λ was calculated by the median of the resulting chi-squared test statistics divided by the expected median of the chisquared distribution. We set the median of a chisquared distribution with one degree of freedom was 0.454 in this study.
The traditional Bonferroni correction is too strict, resulting in a higher false-negative rate and omission of some SNPs truly associated with the target trait [58]. Therefore, in this study, we calculated the sum of the number of independent SNPs and LD blocks for correction [59]. The effective number of independent tests was 127,862 in this study. Hence, the threshold P value was adjusted to 3.91 × 10 − 7 for a genome-wide significance level, and 7.82 × 10 − 6 for a genome-wide suggestive significance level. This means that SNPs with P values below 7.82 × 10 − 6 are considered and may be associated with egg number traits.

Bioinformatics analysis of candidate genes
We identified candidate genes by searching for the nearest genes located within 400 bp upstream or downstream of the significant associated SNPs and annotated based on the Galgal 5.0 assembly supported by Ensembl (http://www. ensembl.org/index.html) databases. We then checked the biological functions of these genes in PubMed (https:// pubmed.ncbi.nlm.nih.gov).

Overlap with known QTLs
In addition, regions within 100 kb of a candidate SNP were searched for previously reported QTLs with egg number, egg production or reproductive traits in the chicken QTL database (https://www.animalgenome.org/ cgi-bin/QTLdb/GG/index).