A new chicken 55K SNP genotyping array

Background China has the richest local chicken breeding resources in the world and is the world’s second largest producer of meat-type chickens. Development of a moderate-density SNP array for genetic analysis of chickens and breeding of meat-type chickens taking utility of those resources is urgently needed for conventional farms, breeding industry, and research areas. Results Eight representative local breeds or commercial broiler lines with 3 pools of 48 individuals within each breed/line were sequenced and supplied the major SNPs resource. There were 7.09 million - 9.41 million SNPs detected in each breed/line. After filtering using multiple criteria such as preferred incorporation of trait-related SNPs and uniformity of distribution across the genome, 52.18 K SNPs were selected in the final array. It consists of: (i) 19.22 K SNPs from the genomes of yellow-feathered, cyan-shank partridge and white-feathered chickens; (ii) 5.98 K SNPs related to economic traits from the Illumina 60 K SNP Bead Chip, which were found as significant associated SNPs with 15 traits in a Beijing-You crossed Cobb F2 resource population by genome-wide association study analysis; (iii) 7.63 K SNPs from 861 candidate genes of economic traits; (iv) the 0.94 K SNPs related to residual feed intake; and (v) 18.41 K from chicken SNPdb. The polymorphisms of 9 extra local breeds and 3 commercial lines were examined with this array, and 40 K - 47 K SNPs were polymorphic (with minor allele frequency > 0.05) in those breeds. The MDS result showed that those breeds can be clearly distinguished by this newly developed genotyping array. Conclusions We successfully developed a 55K genotyping array by using SNPs segregated from typical local breeds and commercial lines. Compared to the existing Affy 600 K and Illumina 60 K arrays, there were 21,41 K new SNPs included on our Affy 55K array. The results of the 55K genotyping data can therefore be imputed to high-density SNPs genotyping data. The array offers a wide range of potential applications such as genomic selection breeding, GWAS of interested traits, and investigation of diversity of different chicken breeds. Electronic supplementary material The online version of this article (10.1186/s12864-019-5736-8) contains supplementary material, which is available to authorized users.


Background
With a total of 107 chicken breeds, China has one of the richest local breed resources [1]. This diverse chicken genetic resource is a vital part of the diversity of biological genetic resources around the world and provides excellent material for breeding new varieties or to genetically improve breed.
China is the second-largest broiler producer and consumer all over the world, which accounts for approximately 11% of the chicken production across the globe (FAOSTAT, 2017). In China, chicken is the second largest meat product after pork, making up to 17% of the total meat production. Chicken meat is mainly obtained from the introduced white feather broilers and domestic yellow-feathered meat-type chickens (meat-type local chicken breed, meat-type bred variety and a relevant strain containing the consanguinity of Chinese indigenous chicken), each accounting for half of the consumption. However, the current challenge is how to effectively protect and maintain the existing local varieties. On the other hand, if breeding efficiency is promoted, new chicken lines breeding would be accelerated. The genome-wide SNP chip, also known as SNP array, arranges up to 25 million of DNA marker flanks on glass or special silicon chip to form the SNP probe array. It functions by means of the reaction of base pairing between the chip fixed DNA marker flanks with the target genome, so as to accurately identify the genetic information.
The genotyping arrays have been developed for pig [2], cow [3], dairy cattle [4], sheep [5], salmon [6], and buffalo [7] et al. In chicken, the first 3 K genotyping array was developed in 2005 with 3072 SNPs [8]. After that, in 2008, Groenen et al. did develop a 60 K bead chip for chicken which evenly covered the whole genome [9]. To date, the only available commercial arrays for chicken is Chicken the Affy 600 K SNP Array (Axiom Genome-Wide Chicken Genotyping Array), which was developed by Kranis et al [10]. The other arrays are privately owned by commercial companies. The array supplied an important tool for the genetic diversity analysis, breeds relationship analysis, GWAS, quantitative character positioning analysis of QTL, selective evolution investigation, and Genomic Selection [11]. Up till now the most efficient ways for SNP genotyping, biodiversity measuring, QTL mapping and genomic selection is using SNP arrays. These applications provide improved technical support for the conservation of indigenous breeds and development of new genetic lines/breeds.
One pitfall of all current chicken SNP arrays is the bias towards western commercial lines. The current chicken arrays, however, lack the genomic variation information of Chinese indigenous breeds. Therefore, it is imperative to develop a new type of genome-wide SNP chip with moderate flux in the chicken breeding industry, and also contains the genetic variation information specific to Chinese indigenous breeds. Overlap with the current arrays of the different platforms (Axiom and Illumina) is essential to link the commercial SNP arrays.
Through whole genome re-sequencing of a variety of Chinese native breeds and commercial chicken lines, integrating SNPs associated with economic traits detected in a crossing breed (either indigenous and commercial), a new public available moderate density (55 K) chicken array (IASCHICK) has been developed.

Results
The SNPs selection was performed in four groups. The roadmap is shown in Fig. 1, and the establishment of the four groups are indicated in the following paragraphs.
Genome re-sequencing of chickens supplying the first SNP group Eight Chinese local chicken breeds or inbred lines were selected for whole genome sequencing. Each breed/line holds 3 pools of 16 individuals per library without individual barcodes ( Table 1). The data summary of each library is provided in the Additional file 1. The number of SNPs per breed/line varied from 7.09 million to 9.41 million SNPs. The average number of detected SNPs was 8.61 M in the local lines, and 7.73 M in the commercial broilers. The total number of SNPs detected overall 8 breeds/lines was 15.2 M. The SNPs with minor allele frequency (MAF) < 0.05 and with low ΔF were excluded for further steps. The 140 K SNPs, which allelic frequencies distinct to the control breeds, were subsequently used as the first group of candidate SNPs.   [14], and 7.24 K SNPs in 46 genes that exhibited possible influence on other chicken economic traits [15] were selected (Additional file 3). The SNPs located in the 5 Kb of either side of the genes' up-and down-stream were also considered.
According to the SNPs detected by the genome resequencing of the previously mentioned 8 breeds, 15.18 K candidate SNPs were selected from 118.47 K SNPs on all those genes, which had priority with mutations in exons, splicing regions, promotors, and the 3′ and 5′ untranslated regions (UTRs).
In addition, a batch of 798 SNPs from an unpublished capture sequencing of chicken chromosomes 11, 16, and 19 were included in the third candidate group (Additional file 4). The SNPs were significantly related to high IgY levels in Beijing-You and White Leghorn chickens.
There were 15.98 K candidate SNPs that were selected for the design of the final genotyping array.
The fourth group of candidate SNPs are derived from whole genome sequences of low-and high-RFI chickens Whole genome sequencing of low-and high-RFI chickens were performed to locate the genomic variants for RFI based on differences in allelic frequency between high-and low-RFI chickens as described in our previous study [16]. The selected 4.32 K SNPs (3.74 K RFI related SNP in Beijing-You chickens and 0.58 K RFI related SNPs in Cobb chickens) were used as the candidate SNPs for the design of the final genotyping array in the next step.
Designing the Affy 55K genotyping array Based on the above four groups of candidate SNPs, a custom-made algorithm was used to fix the final array. Finally, 52,184 SNPs were selected for the final array. The mean physical distance of SNPs in each involved chromosome shows in Table 2. The priority 1 SNPs (the SNPs in group 2, 3 and 4) and 25 INDELs were first placed on the final SNP panel. The next step was addition of the priority 2 SNPs (the SNPs in group 1). The remaining 18.41 K SNPs was selected for the blank windows in the whole chicken genome which the SNPs in the four groups cannot be covered.
The SNPs positions of 55K array were given in Additional file 5. The selected SNPs were derived from the following five groups ( The comparisons of the Affy 55K array with the existing chicken arrays (Affy 600 K array, and Illumina 60 K) All the SNPs of this 55K array, Affy 600 K array [10], and Illumina 60 K array [9] were mapped to the latest chicken genome (GRCg6a). The overlap of the 3 arrays is shown in Fig. 3. There are 6740 SNPs (13%) which overlap between the Affy 55K array and the Illumina 60 K array. When comparing to the Affy 600 K array, there are 24,227 SNPs that overlap between the 55K array which accounts for 46%. There were 21,412 new SNPs included in 55K array compared to the existing arrays.
An MDS analysis was performed using the genotyped data to investigate the ability of the 55K panel to detect population stratification in the validated samples. Figure 4 shows the relative coordinates of individuals when plotted using the two largest principal components. Individuals originating from the commercial broilers, Hubbard and Cobb tightly clustered. The Chinese indigenous meat-type breeds clustered together. The two Chinese indigenous egg-type breeds, Xianju and Bai'er, clustered together. The remaining local breeds (mainly   characteristic of meat-types) were located relatively close to each other compared to egg-type breeds and commercial broilers. The commercial layer White Leghorn chickens were placed relative far away from to the Chinese local breeds and commercial broilers in Fig. 4. The linkage disequilibrium (LD) in Jingxing-Huang chicken and Cobb paternal line chicken were calculated, respectively.

Discussion
The 52.18 K SNPs selection was performed in four groups and NCBI SNPdb using several criteria. The first    The Beijing-You chicken and Guangxi Sanhuang chicken are representative of Chinese yellow-feathered chickens, which possess excellent meat quality and flavor [17]. Two Cyan-shank partridge lines possess meat flavor and appearance that are usually chosen by consumers. The Jingxing-Huang line is widely used in local breeding programs because of its dwarfism, feed-saving and space-saving characteristics [18]. The Cobb paternal line is a type of fast-growing line. The Recessive White chicken is a fast-growing line, which is popular in Chinese breeding programs because it can improve the growth rate of commercial generations without changing the appearance of offspring, when crossed with local colorful breeds. The SNPs with high ΔF between the local breeds and the commercial lines were used to determine the polymorphisms that have a larger difference in allelic frequency between breeds/lines. The main aim of whole-genome sequencing of different chicken breeds is to detect SNPs, although the pooled sequencing might generate potential bias. The second, third and fourth groups are those potentially associated with economic traits, including 7.42 K SNPs associated with weight, carcass, immune and meat quality traits, 15.98 K SNPs for breast muscle, body fat, reproduction etc. Improving feed efficiency is an important goal in poultry industry to reduce costs. RFI was considered independent of body weight and weight gain, selection for RFI would improve the feed efficiency without changing the economic traits [19]. For special interests, 4.32 K SNPs were selected from a whole genomic sequencing research of low-and high-RFI Cobb and Beijing-You chickens. The strategy for the first selection of SNPs in candidate genes for the array is that these SNPs have a higher potential to be in LD to the causative SNPs for the target traits. Finally, the last 18.41 K SNPs were selected from chicken SNPs database to make all SNPs cover the whole genome evenly. The average distance among SNPs are 22 kb generally (Fig.  2). Due to specific selection on immune related genes, a high SNP density in chromosome 16 is observed. Based on the limited known information and substandard assembly (galGal-5.0) of the micro-chromosomes, we would not obtain SNPs of insufficient quality in the micro-chromosomes. In summary, the setting of the algorithm was to select SNPs with relevant function and even distribution across the genome in terms of physical distance and obtained the representation of SNPs in local or commercial breeds.
When comparing to the Affy 600 K array, there are 24,227 SNPs that do overlap with the 55K array which accounts for 46%. The reason for this high percentage of overlap is that the 18.41 K SNPs for filling the gaps in the whole chicken genome were selected from chicken SNPdb, and the probe validated SNPs hold a high priority. This result showed that there were 21,41 K new SNPs included on the Affy 55K array compared to the two existing arrays. The results indicate that imputation of the 55K genotyping data to the high-density SNPs genotyping data is possible. In the new 55K genotyping array, 69% of SNPs are within genes (non-intergenic variant), the proportion is higher than the proportion in the Affy 600 K array (54%), and lower than the proportion in Illumina 60 K array (86%).
To investigate the ability of our 55K panel to detect polymorphisms and population structure in local or commercial breeds/lines. Nine Chinese local breeds (Chahua, Dagu, Liyang, Luhua, Qingyuan, Silkie, Wenchang, Bai'er, and Xianju) and 3 commercial lines (Hubbard, Cobb, and White Leghorn) were tested. The average call rate for each breed ranged from 97.0% to 98.7%. Across all populations, 76.7% to 88.0% SNPs were polymorphic (Table 5), which indicates that the 55K genotyping array can be used to determine genetic variation both in various local Chinese breeds and in commercial meat-type and egg-type breeds.
According to the results of MDS analysis (Fig. 4), individuals originating from the commercial broilers, Hubbard and Cobb clustered together tightly and the two Chinese indigenous egg-type breeds, Xianju and Bai'er, clustered together. It might be due to the fact that the two breeds were selected in the same direction [1]. This result was supported by the previous study on the genetic diversity of Chinese domestic fowls by mtDNA analyzing: the inter-population net nucleotide divergence (Da) between Xianju and Bai'er was 1.006, which was lower than the Da (1.115) between Xianju and Dagu chickens [20]. The remaining local breeds (mainly characteristic of meat-types) were located relatively close to each other compared to egg-type breeds and commercial broilers. The commercial layer White Leghorn chickens were placed relative far away from to the Chinese local breeds and commercial broilers. The relative proximity of Chinese local meat-type chicken and Chinese egg-type chicken in the MDS plot might be due to their shared region and ancestry. Thus, the MDS results were in agreement with the existing knowledge of the lines/ breeds and were also in similar to the previous studies, which showed phylogenetic relationships among different chicken breeds [1].
The linkage disequilibrium (LD) in the Jingxing-Huang breeds and Cobb paternal line were calculated and compared. The mean LD level decay to around 0.22 in 40 Kb. This result is similar with the result of Fu et al. in 2015 [21]. The r 2 of LD in the Jingxing-Huang breed is larger than that in Cobb paternal line. The Jingxing-Huang is an inbred line has a relatively small effective population size whereas the Cobb paternal line is three to four times larger.
In China, indigenous yellow-feathered chickens are highly diverse (more than 100 local breeds and 70 crosses). The major obstacle in applying genomic selection for improvement of local breeds is the cost of genotyping array. The 55K array has a medium SNPs density, cost-efficient, and optimal for Chinese local breeds compared with the existing 600 K commercial array. Furthermore, the 55K genotyping array incorporated known SNPs loci that possess a high potential for association with economic traits and traits that are expensive and difficult to measure, which will be interesting for both GWAS and genomic selection (GS) projects.
With the rapid development of next-generation sequencing technologies and reduction of the costs, genotyping with re-sequencing (IBS) will be the focus of future research. In the current phase, however, the IBS system is more complex and not as solid as the SNP array. The array genotyped data can be easily analyzed and standardized according to constant array SNP positions. The batch effect can be excluded by different laboratories and companies.

Conclusions
In conclusion, we developed Affy 55K genotyping array that was designed to use SNPs that are segregated in Chines local chicken breeds and commercial lines/ breeds, and where large number of SNPs are associated with economic traits. Compared to the existing Affy 600 K and Illumina 60 K arrays, 21,41 K new SNPs were included in the 55K SNP array. The results from the our Affy 55K genotyping array can be imputed to the high-density SNPs genotyping data. This array offers wide range of potential applications, such as the evaluation of germplasm resources of chicken breeds, investigation of diversity of different chicken breeds, implementation of genome-wide association studies and genomic selection.

Animals
For whole genome sequencing, the 384 chickens were sampled from eight local breeds or inbred lines (Table  1) Whole genome re-sequencing Genomic DNA was isolated from blood samples by the phenol-chloroform method. Samples DNA quality were validated by gel electrophoresis and Nanophotometer. The individual DNA samples (48 from each breed/line) were pooled to construct three libraries, with each library containing 8 males and 8 females. The libraries were constructed using the Nextera DNA Library Preparation Kit (Illumina Inc., San Diego, CA) according to the manufacturer's standard protocol. All libraries were sequenced on the Illumina Hiseq2500 (2 × 125 bp).

Genome sequence alignment and detection of the first group of candidate SNPs
Reads were filtered for low quality (> 10 consecutive nucleotides with Phred scores < 10), adaptor sequences, and sequences without a quality control-passed paired read using NGSQC toolkit (v2.3.3) [22]. Each trimmed pool sequencing coverage are shown in Table S5. Filtered sequenced reads were mapped to the reference genome (Gallus_gallus_4.0) by BWA software (v0.7.10) [23]. PCR duplications were removed with -rmdup argument in Samtools (version 0.1.1.18) [24]. SNPs were identified and genotyped for each data set with mpileup function in Samtools, then called by VarScan [25]. Only those highly confident variants supported by both methods were kept for downstream analyses. The SNPs calling details parameter were described by Liu et al [16]. The SNPs with MAF < 0.05 and the INDELs in each breed/line were filtered by vcftools [26]. In Beijing-you chicken, Jingxing-Huang chicken, Sanhuang chicken, and the two lines of cyan-shank partridges minus the MAFs of Cobb paternal line, as well as the MAFs of Recessive White chicken, and the paternal and maternal generation of Cobb minus the MAFs of Beijing-You chicken, respectively. The SNPs with low ΔF were excluded. The value of ΔF was adjusted for 140 K SNPs reserved in local breeds and commercial lines to generate the first group of candidate SNPs. The threshold of △F in local breeds and commercial lines are 0.609 and 0.731, respectively. The SNPs acquired through genome re-sequencing of eight breeds/lines supplied the major data for the first group of SNPs in the array. SNPs specific for chromosome W were removed and were not considered in current designing. There are also 25 INDELs for special interest, which were defined as priory 1.

Selection of the second group of candidate SNPs based on GWAS analysis of 15 traits
The second group of candidate SNPs was selected according to a GWAS analysis of 15 traits. Phenotype and genotype data were generated from the CAAS chicken F2 resource population as described in Sun's report [27]. Briefly, the population was derived from a cross between local Beijing-You chickens and commercial Cobb broilers (Cobb-Vantress, Inc.). The weight, carcass, immune and meat quality traits were measured from 367 F2 chickens. The 15 traits were as follows, (a.) body weight of day 28 and day 42, (b.) carcass traits including total weight percentage after slaughtering, breast muscle weight percentage, leg muscle weight percentage, abdominal fat percentage, (c.) meat quality traits including the breast muscle intramuscular fat ratio, ultimate pH (24 h), meat lightness, redness value and yellowness value of breast muscle, (d.) immune traits including IgY level to sheep red blood cell, the heterophil and lymphocyte ratio, IgY level in serum, and the average red blood cell backlog.
SNPs were genotyped by using Illumina 60 K SNP Bead chip for chicken [9]. All description of the phenotypes had been reported by Sun et al. in 2013 [27]. To maximize the polymorphism resources for SNP array, the GLM procedures were used for the GWAS analysis and was performed by PLINK software (version 1.07) [28] with 42,585 SNPs passed quality control. The details were described by Sun et al. [27]. The SNPs with top 1% lowest p-values were used in the following procedures.
Selection of the third group of candidate SNP based on the associated genes for target traits Known candidate genes for economic traits were collected and used for the SNP array design. All genes were identified through previous researches by our group [12,13,29,30]. We retrieved total 861 genes related to skeletal muscle and intramuscular fat development, chicken fat metabolism, salmonella enteritidis resistance etc. (Additional file 2). The SNPs were annotated by the Ensembl tool VEP [31]. Mutations and the SNPs in the exons, splicing region, and UTRs were firstly selected out. A maximum of 5 candidate SNPs were selected out for each gene.
In addition, the SNPs in this group also included a batch SNPs detected from a set of capture sequencing of Chr. 11, Chr. 16, and Chr. 19 of White Leghorns and Beijing-You chickens with low or high serum IgY (Liu et al., unpublished, Supplement Table S3).

Selection of the fourth group of candidate SNPs for RFI
The fourth group candidate SNPs were selected from a whole genomic re-sequencing research of low-and high-RFI Cobb and Beijing-You chickens. SNPs calling results showed that 8,505,214 and 8,479,041 single nucleotide polymorphisms (SNPs) were detected in low-and high-RFI Beijing-You chickens, respectively; 8,352,008 and 8,372,769 SNPs were detected in low-and high-RFI Cobb chickens, respectively. The SNPs with Fst value < 5% in each breed were excluded followed by SNPs with mean ΔF < 0.35 between low-and high-RFI chickens. Through the above filtering processes, 3.74 K SNPs assigned to 1137 candidate genes in Beijing-You chickens and 0.58 K SNPs (448 genes) in Cobb chickens were reserved [16].

Selection of the SNPs from chicken SNPs database
The first four groups cannot cover the whole genome evenly. In the fifth group, SNPs were selected from chicken SNPs database from NCBI (ftp://ftp.ncbi.nih. gov/snp/organisms/archive/chicken_9031/).

SNP screening according to the scoring of probes
All the SNPs' positions were transformed from WASHUC2.1 (Illumina 60 K), and Gallus_gallus-4.0 (Affy 600 K) to Gallus_gallus-5.0 (Affy 55 K) by the Lift-Over tool on UCSC Genome Browser. Take utility of all SNPs from the five candidate groups above, in silico validation, was performed using the AxiomGTv1 algorithm of APT, which generated an output score file containing p-convert values, signifying the SNP array quality and list of recommended and non-recommended SNP probes. For a high-quality SNP array design, non-recommended SNP probes were all excluded in the following procedure.

SNPs selection procedure for the final 55K array
The final SNPs selection was done in multiple steps using several criteria. The roadmap is shown in Fig. 1.
A custom-made algorithm was applied as described below. According to the Gallus_gallus-5.0, the chicken genome length is about 1.2 Gb. To ensure the probe position evenly distributed in the chicken genome, the whole genome was distributed by windows with 22 Kb length. The backward window started from the probe position of the forward probe position. The selection of the final array was performed on each chromosome separately. The first four groups SNPs were divided as 2 priorities. The SNPs in group 2, group 3, group 4, and the INDELs in group 1 were defined as priority 1, and the SNPs in group 1 were defined as priority 2.
1. a) The selection of the SNPs in priority 1. If there is no SNP in a 22 kb window, the window will be reserved. b) If there are one or two SNPs in the window, the SNP(s) was reserved. c) If there are 3 or more SNPs in a window, only 2 SNPs in this window will be reserved, which can make the SNPs even distributed in this window according to the following formula. . In the formula above, the S and E are the start position and the end position of the window respectively; and N i and N j are the target SNPs position in the window. The SNPs N i and N j which can minimum the SD 2 , will be reserved. 2. The selection of priority 2 SNPs. The windows reserved 1 or 2 SNPs will be skipped. The windows without SNP will be filled by one SNP of priority 2 according to the formula described above. 3. The windows without any SNP will be filled by 1 SNP from the NCBI SNPdb of chicken, while the validated SNPs will have a priority for filling.
The final array contains 55K probes for 52 K SNPs, which were manufactured by Affymetrix® using photolithography. The redundant probes are used for interrogating each SNPs [32,33]. The final 52 K SNPs were annotated by the online tool Ensembl VEP [34].
The comparisons of the 55K Affy array with the existing arrays (Affy 600 K array, and Illumina 60 K) All the SNPs' positions were transformed from WASHUC2.1 (Illumina 60 K), Gallus_gallus-4.0 (Affy 600 K) and Gallus_gallus-5.0 (Affy 55 K) to GRCg6a by the LiftOver tool on UCSC Genome Browser. All the SNP positions of the three genotyping arrays were compared. The SNPs on 600 K array and 60 K array were also performed by Ensembl VEP [31]. Overlapping Venn plot was performed by the Calculate and draw custom Venn diagrams website (http://bioinformatics.psb.ugent. be/webtools/Venn/).
Basic genotype statistics for each marker, including call rate, MAF, Hardy-Weinberg Equilibrium (HWE), allele and genotype counts were calculated using the Quality Assurance Module from the SNP Variation Suite version 7 (SVS; Golden Helix Inc., Bozeman, Montana: www. goldenhelix.com). The following quality control criteria (filtering) were used to remove SNPs with less than 95% call rate for further analysis. The SNPs with less than 0.05 MAF. SNPs were tested for HWE (P < 0.001) to identify possible typing error. Samples with more than 10% missing genotypes were removed from the study.
The MDS was performed using the genotype data of the SNPs from the 55K panel on all the breeds samples (n = 226) to assess the utility of the panel in detecting population structure. Population structure between 12 breeds was carried out using PLINK software (version 1.90b3) [28] with the MDS method on, and the plot was performed by ggplot2 [35]. The linkage disequilibrium in 2 populations were performed by the GAPIT [36]. The LD decay plot performed by PopLDdecay software are presented as whole genome levels and as chromosome levels with the parameter of smaller break point size of 5 Kb and bigger break point size of 40 Kb [37].