In this study, we performed whole genome sequencing for SNPs discovery and used the identified SNPs to characterize genetic diversity in the turkey genome. To avoid imputation of genotype calls across the different populations, mpileup was applied within each population separately because the applied method (mpileup) relies in part on Hardy-Weinberg Equilibrium (HWE) for imputation of genotypes
By using a NGS (Illumina GAIIx) approach, we discovered millions of high quality SNPs in the turkey. Next generation sequencing approaches are considered highly reliable for genome-wide discovery of sequence variation
, when used to compare different lines/strains to a reference genome
. The adoption of NGS platforms for the discovery of genomic variation has now become mainstream
The high quality of the SNPs discovery reported here is reflected by the low FDR of 0.00002 per nucleotide in the genome. This FDR suggests around 2.1 x 104 false discovered heterozygous positions per turkey genome (size of 1.1 x 109 base pairs). The SNPs FDR rate for the same 10 animals from distinct turkey populations was estimated after correcting for the coverage and using estimates of FDR per nucleotide position. The SNPs FDR was found to be 2.6%, a number that is similar in magnitude as found previously in the human 1000 Genome Project. In addition to the low FDR, we found a transition/transversion (Ti/Tv) ratio within the expected range. The expected Ti/Tv ratio of true novel variants can vary with the targeted region (whole genome, exome, specific genes), species and also can vary greatly by the CpG and GC content of the region
[59–61]. In the case of exomes, an increased presence of methylated cytosine in CpG dinucleotides in exonic regions leads to an increased Ti/Tv ratio
 due to an easy deamination and transition of a methylated cytosine to a thymine
. It is also observed that GC content is higher in birds and mammals than in invertebrates
. Observed Ti/Tv ratio in our study of turkey is in concordance with the findings from Dalloul et al.
, but slightly higher (2.45) than that of human. This higher ratio is most likely explained by the smaller genome size and a higher GC percentage in bird genomes.
We report the number of segregating as well as total number of SNPs with their functional annotation. The 23,795 nonsynonymous variants that were observed can potentially change the structure of proteins, possibly resulting in altered phenotypes
. Out of these nonsynonymous SNPs, 9,204 were unique to commercial population which may have been detected due to higher coverage and number of individuals for the commercial turkey population. We observed 5,417,069 SNPs that were present in non-protein coding DNA. Furthermore, we discovered 1,749,427 intronic variants, some of which may alter gene expression or result in alternative splicing
[64, 65]. Variants located in intergenic regions, such as promoter, enhancer and silencer regions can result in altered gene expression. The human genome comprises over 98% non-protein coding DNA
. Estimates suggest that at least 5.5% of the human genome, including 3.5% of its noncoding fraction, consists of regions under purifying natural selection against deleterious alleles
[67–69]. In addition, most of the variants involved in complex genetic diseases in humans are not located in coding regions
. Likewise, variation outside of coding regions may be responsible for economically important traits in domesticated species, e.g. disease resistance, meat quality, efficient growth, or high egg production. The functional information of these variants can help in prediction of phenotypes or genetic merit with higher accuracy and selection of individuals can be done accordingly.
The estimated average frequency of 1.07 heterozygous SNPs Kb-1 in the turkey is substantially lower than in chicken, which was previously reported as 4.28 and 2.24 heterozygous SNPs Kb-1 in two different studies
[28, 29]. In our study, heterozygous SNP discovery was found to be affected by the sequence coverage (e.g. sequence coverage in L6a, Nset1 and the SM animals was low and as a result the number of observed heterozygous SNPs was also low). Estimates of heterozygosity were therefore obtained only from genomic regions that were covered 5 to 10X to adjust for the effect of low sequence coverage.
Modern commercial turkey lines are derived from historic turkey populations that displayed low variation as a result of small effective population size
[70, 71]. Heritage (Nset and RP) and the wild SM turkey populations showed higher heterozygosity compared to the commercial populations, which is concordant with the findings of previous studies on ancient and overexploited species
[72–74]. The heritage variety BvSW showed the lowest heterozygosity of all turkey populations, which is consistent with the severe bottleneck that this population went through in 2000 (Alexandra Scupham, Personal communications).
Most birds have a characteristic division in chromosome size, with 5 or 6 large chromosomes, around 5 intermediate size chromosomes, and 25 to 30 very small chromosome pairs. In our study, we observed higher nucleotide diversity on smaller chromosomes compared to the larger turkey chromosomes which is in agreement with the previous study
. Since the recombination rate is far higher at the smaller sized turkey chromosomes as compared to large chromosomes
, which leads to lower linkage disequilibrium and higher haplotype diversity on the smaller chromosomes
. Although the high gene-density of the smaller chromosomes would make them susceptible to hitchhiking effects that could erode genetic variation, hitchhiking effects appear to be offset by the far higher recombination rate of the micro-chromosomes. Chromosome Z showed the lowest nucleotide diversity, which is concordant with the findings of Dalloul et al.
. This low nucleotide diversity of chromosome Z is likely the result of a lower effective population size of this chromosome and lower recombination rate
The presence of different allelic states in the wild SM and the domesticated populations is a demonstration of their divergence during the course of domestication event. Domesticated turkey lines were selected (artificially or naturally) for non-wild type alleles. Domestication has involved the selection on a desired trait(s)
, and previous studies on domesticated animals have demonstrated selective pressures on genes related to growth
 and coat colour
[80, 81]. Such studies have also demonstrated that artificial selection might have contributed to reduced polymorphism levels and increased LD in domesticated species
[10, 82–84]. On-going directional selection causes footprints of selection identifiable as regions where the derived allele frequency is higher than non-selected regions
[29, 85, 86]. Most of the turkey chromosomes are acrocentric and the five genomic regions that were found to be fixed for the reference alleles within the domesticated populations seem to be located close to the centromere
. This may explain the presence of a strong hitchhiking effect due to the low recombination rate close to the centromeres. These fixed turkey genomic regions were then investigated for the presence of report QTLs corresponding to these regions. While QTLs were not found within the fixed regions
, there were QTLs for growth and meat quality on chromosome 3, a QTL for percentage drip loss on chromosome 14 and a growth related QTL on the chromosome 22
. These QTLs for different traits on chromosomes 3, 14 and 22 were located at distinct positions that did not coincide with the observed regions with high reference allele frequency. Due to the evidence of the presence of structural and functional conservation in the turkey and the chicken genomes
[76, 88] and also the limited availability of information on turkey QTLs, these 5 turkey genomic regions that were found to be fixed for reference alleles within domesticated populations, were aligned with the chicken genome sequence (WASHUC2) to determine the position of these turkey genomic regions within the chicken genome (Additional file
1). Regions of the chicken genome exhibiting synteny with turkey were then examined for the presence of known chicken QTLs
. Several QTL were identified within these 5 genomic regions (Additional file
1) and most were related to growth traits (Additional file
1). Production census of turkeys from the last few decades
 show that turkeys are highly selected for growth and this high selection pressure might have favoured reference alleles in domesticated populations. Since several of the regions identified in this study are probably close to a centromere, the effect of selection may have extended over a larger region due to the likely reduced recombination rate in centromeric parts of the genome.
The genetic diversity analysis among the 11 different turkey lines showed that the heritage varieties and the commercial populations are derived from the wild South Mexican population. All of the heritage varieties (BvSW, RP and Nset) are closely related which is in agreement with previously published data
[43, 44]. The relatedness of these heritage varieties can probably be explained either by historic nature, a common origin, selection for similar traits/phenotype or a relatively low selection pressure in these varieties. The Nset, RP and BvSW heritage lines were developed in America in 1800, 1920 and 1930, respectively
[70, 71]. It is assumed that the colour pattern of RP is derived from crossbreeding with Narragansett and perhaps another variety, as Nset colour mutation is a component of the final RP colour (Smith et al., 2005). The close genetic relatedness observed between RP and Nset in our study is also concordant with that assumption and with previous studies
[43, 44]. According to Figure
2, commercial lines from different breeding companies did not resolve into two separate groups. The close relatedness of the L5 commercial line to the heritage lines is not surprising as it represents a female line selected for medium weight, conformation and egg production; selected traits characteristic of the heritage lines
. The other commercial lines that cluster separate from L5 in the dendrogram were selected for different objectives such as higher body weight and rapid growth.