Effect of sample stratification on dairy GWAS results

Background Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach. Results Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows. Conclusions Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.


Background
Genome-wide association study (GWAS) is a powerful tool for identifying genetic factors associated with phenotypes. Population stratification refers to systematic differences in allele frequencies between subpopulations and is a source for false positive results in GWAS [1][2][3][4]. In human populations, geographical separation followed by genetic drift is the basic cause of population stratification.
In dairy cattle, population stratification could be a result of several causes, genetic selection and hitchhiking, artificial selection, and genetic sampling or drift. In U.S. dairy cattle breeding, artificial insemination has been widely used and this has increased the likelihood of the presence of related individuals in randomly selected samples and the presence of large half-sib families; both of which contribute to population stratification. Many years of genetic selection in dairy cattle has caused substantial allele frequency changes [5] that also contribute to population stratification. Methods proposed to address population or sample stratification in GWAS include the genomic control approach [1], principal component analysis (PCA) [2], and a mixed model approach [3,4]. Our dairy GWAS report identified highly significant SNP effects with minor favorable allele frequencies and a large number of X chromosome effects [6]. This article is a follow-up study of our GWAS report to analyze potential sample stratification of the study population, the relationship between the significant effects and sample stratification, and the effect of sample stratification correction on our reported GWAS results. Effect of stratification correction was evaluated with three methods: the efficient mixed-model association expedited (EMMAX) method which accounted for correlation among all individuals [3], a generalized least squares (GLS) method based on half-sib intraclass correlation [7,8], and a PCA method [2] that used the top 20 principal components as covariables.

Overlap between genome, pedigree and phenotype stratifications
The genome stratification based on 45,878 SNP markers was shown using the multidimensional scaling (MDS) method [9] for two sets of data: the association data set of 1,654 contemporary Holstein cows used in the GWAS by Cole et al. [6], and the selection signature data set [5] that included the 1,654 contemporary cows and historical U.S. Holstein cattle for a total of 2,366 cows and bulls. The majority of the historical cattle (301 cows and bulls) were the University of Minnesota Holstein control line that remained unselected since 1964 [5].
The association data of 1,654 cows showed some genome stratification, with a large cluster on the left and a small cluster on the right of Figure 1A. The X chromosome is the only individual chromosome with a similar stratification pattern ( Figure 1B) as the stratification for all chromosomes ( Figure 1A), and no individual autosome had such a similar pattern (Additional file 1: Figure S1).
The selection signature data set of 2,366 contemporary and historical cattle showed more stratification with three clusters ( Figure 1C). The third cluster on the far left of Figure 1C had only eight cows, which were not separated as a cluster by Figure 1A. The X chromosome stratification pattern in Figure 1D did not resemble the stratification pattern of all chromosomes shown in Figure 1C but showed more stratification than any individual autosomes (Additional file 1: Figure S1).
Despite the fact that the cow DNA samples in our GWAS were from diverse academic and industry donors, analysis of USDA's historical Holstein pedigree data revealed that the 1,654 cows were all related when traced through approximately 10 to 15 generations (back to the 1930's) of ancestors (Additional file 2: Figure S2). Of the 1,654 cows, 1,600 cows were related to a bull born in 1947, and 1382 cows were related to another bull also born in 1947. These two bulls had 1360 common descendents among the 1,654 cows, and only 32 of the 1,654 cows were not descendents of these two bulls.
Although all 1,654 cows were related, the genome stratification patterns shown in Figure 1 had a striking overlap with the half-sib pedigree structure from 355 sires (see Figure 2 for four examples of sire families and Additional file 3: Figure S3 for 18 sire families). The number of cows per half-sib family (per sire) ranged from 1 to 160. The upper-right cluster of Figure 1C had 160 cows, of which 153 cows were half-sibs from one sire (family 228 in Figure 2) that had 160 daughters among the 1,654 cows. The other seven daughters in family 228 were classified into the left two clusters of Figure 1C (family 228 in Figure 2). The genome from this sire, including autosomes and the X chromosome, is responsible for the stratification of the upper-right cluster of Figure 1C (or the right cluster of Figure 1A). The 153 half-sib cows should carry the same sire X chromosome (except the pseudo-autosomal region), and this should be the reason for the strong X stratification pattern shown in Figure 1B. In contrast, the 153 half-sib cows should have different sire autosomes due to recombination between each pair of sire autosomes. Collectively, the 29 autosomes had nearly identical stratification pattern as in Figure 1A (data not shown).
The genome and pedigree stratifications had various degrees of overlap with the phenotypic stratification for many of the 31 traits (see Figure 3 for the examples of four traits and Additional file 4: Figure S4 for all 31 traits). Fat, protein and milk yields had the strongest overlap between genome and phenotype stratifications. The genome stratification pattern using the data set of 2,366 contemporary and historical cattle shown in Figure 1C had a stronger overlap with the phenotype stratification than the genome stratification pattern using the 1,654 contemporary cows shown in Figure 1A. Dimension I (C1) of Figure 1C is strongly related to milk, fat and protein yields, with lowproducing cows on the left and high-producing cows on the right. Fitting C1 calculated from the 2,366 historical and contemporary cattle in the statistical model virtually removed all significant effects for milk, fat and protein yields detected by the same method without C1 in the model, whereas the C1 values calculated from the 1,654 contemporary cows essentially had no effect on SNP effects (data not shown).
The 160 cows that comprised the small cluster in the upper right of Figure 1C (or the small cluster on the right of Figure 1A) were primarily from one sire family (family 228 in Figure 2A) of elite cows. This cluster had a high frequency of cows with high PTA values for fat, protein and milk yields, fat and protein percentages, productive life, daughter pregnancy rate, net merit, strength, stature, body depth, rump angle, fore udder attachment, udder depth, foot angle, rear legs (rear view), feet-legs score, final score, and had low PTA values for somatic cell score, service-sire calving ease, daughter calving ease, service-sire stillbirth, dairy form, udder cleft, teat length, front teat placement, and rear legs (side view) (Additional file 4: Figure S4). This cluster had the most obvious overlap between the genome and phenotype stratifications for many of the 31 traits. The presence of the 160 elite cows in this small cluster and the presence of the 153 elite cows from one sire did not result in deviations from normal phenotypic distributions (Figure 4). The X chromosome's genetic stratification of the 1,654 contemporary cows ( Figures 1B and 1D) had a similar overlap with phenotypic stratification as all chromosomes discussed above. In contrast, individual autosomes did not show such a strong overlap (a comparison between chromosome 1 and the X chromosome is given in Additional file 5: Figure S5). The overlap between the X chromosome stratification and the phenotypic stratification could be a major reason why many X chromosome effects were detected in Cole et al. [6]. This overlap    should be an opportunity for detecting potential X chromosome SNP effects associated with the elite cows, although it is unclear whether this overlap contributed to detecting true or spurious X chromosome SNP effects.
Stratification analysis showed that the genome, pedigree and phenotype stratifications of the 1,654 contemporary Holstein cows used in the GWAS [6] overlapped. The implication of this overlap is that the removal of population stratification effects could remove some true SNP effects confounded with the population stratification that likely was a result of genetic selection, drift, and genetic hitchhiking of selection.

Association results from three methods for stratification correction
The three methods for stratification correction included a mixed model method implemented by EMMAX [3] using IBS (EMMAX-IBS; IBS = identify by state) or the Balding-Nichols (BN) kinship matrix [10] (EMMAX-BN) among all 1,654 cows, a GLS method using intraclass correlation among half sibs [7], and a PCA approach [2] using the top 20 principal components as covariables. The EMMAX method (including EMMAX-IBS and EMMAX-BN) was most severe in removing stratification effects or in reducing the number of significant effects, the PCA method using 20 Figure 2 Examples of overlap between genome stratification and half-sib family structure. C1 = dimension 1, C2 = dimension 2. C1 and C2 values were calculated using 2,366 Holstein cattle, including the University of Minnesota Holstein control line that remained unselected since 1964. Graphs for more selected families are given in Additional file 3: Figure S3. components and GLS method had similar significance levels for most traits. For 31 dairy traits, EMMAX-IBS had 15, and EMMAX-BN had six SNP effects reaching genome-wide significance, in addition to a cluster of SNP effects for fat percentage in and near the DGAT1-NIBP region of BTA14 identified by all three methods. The 15 significant effects of EMMAX-IBS (other than those for fat percentage) included two effects for fat yield, two effects for protein percentage, three effects for service-sire calving ease, one effect for daughter calving ease, three effects for service-sire stillbirth, one effect for front teat placement, one effect for teat length, one effect for rear legs (rear view), and two effects for foot angle. GLS and PCA had more effects reaching genome-wide significance, generally dozens more per trait than EMMAX. A global view of all SNP effects for each trait from the three methods is presented in Manhattan plots [11,12] in Additional file 6: Figure S6. The top 100 effects of each trait from EMMAX-IBS and EMMAX-BN are given in Additional file 7: Table S1, the top 100 effects of each trait from GLS are given in Additional file 8: Table S2, and the top 100 effects of each trait from PCA are given in Additional file 9: Table S3.
Effects from EMMAX, GLS, and PCA methods with genomewide significance The three methods had a small number of common effects reaching genome-wide significance. The common effects included a cluster of SNP effects for fat percentage in and near the DGAT1-NIBP region of BTA14, a SNP in DGAT1 and a SNP in A5D786 on BTA14 for fat yield (ranked #1 and #2 by all three methods), and a SNP 45kb upstream from PIGY on BTA6 with a 97% frequency of the favorable allele for protein percentage (#1 by EMMAX, GLS and PCA), and a SNP in NIBP on BTA14 for protein percentage (ranked #1 by EMMAX and PCA, and #16 by GLS).
All three methods identified the DGAT1-NIBP region as the most significant region for fat percentage and eliminated most significant effects of other regions detected by the least squares (LS) method without stratification correction ( Figure 5; Additional file 6: Figure S6), and identified a SNP in DGAT1 as the most significant and two NIBP markers among the top 15 effects for fat percentage (Additional file 7, Additional file 8, Additional file 9: Tables S1, S2, S3). EMMAX (EMMAX-IBS and EMMAX-BN) identified the DGAT1-NIBP region as the only significant region for fat percentage, while GLS and PCA also had SNP effects on other chromosomes that reached genome-wide significance but were far less significant than the DGAT1-NIBP region ( Figure 5). All three methods for stratification correction identified DGAT1 as the most significant for fat yield, and GLS also identified the A5D786-CYHR1-VPS28-DGAT1 region as the most significant for milk yield.
The three methods for stratification correction had no overlapping effects with genome-wide significance for protein yield by any two of the three methods, but EMMAX and GLS identified a 0.5 Mb region upstream of PLCB1 as highly significant for protein yield (#1 by EMMAX and #3 by GLS at 739,720 bp), with the EMMAX effect below the genome-wide significance level. EMMAX-IBS had three effects for service-sire calving ease and EMMAX-BN had a SNP in MBTPS2 of BTAX for fat yield reaching genome-wide significance, but none of these was among the top 100 effects by the other methods. For body conformation traits, EMMAX-IBS had five SNP effects with genome-wide significance for front teat placement (#78 from GLS and #75 from PCA), teat length (#8 from GLS), foot angle (#24 from GLS), and rear legs (rear view). EMMAX-BN generally had similar results as EMMAX-IBS with slightly lower significance (higher p-values).
Common effects among top 100 effects/trait from methods with and without stratification correction The two EMMAX models had only a small number of SNP effects reaching genome-wide significance so that significant results of this method had minimal overlap with results from the other methods. Since the exact threshold p-value that should be used for declaring significance for various methods under various conditions is unknown, the top 100 effects of each trait from the three methods with stratification correction (EMMAX-IBS, GLS and PCA) were compared with those from the LS method without stratification correction [6] (Table 1, Additional file 10: Table S4). EMMAX-IBS had 270 SNP effects, GLS had 364 and PCA had 95 effects overlapping with results from the LS method without stratification correction [6] ( Table 1). The four methods with and without stratification correction had limited common effects among the top 100 effects per trait, a total of 41 common effects for 13 of the 31 traits (Table 1).
For milk yield, Hapmap42685-BTA-81134 of BTA8 about 87 kb downstream of ANXA1 was ranked #1 by PCA, #4 by EMMAX-BN, #5 by EMMAX-IBS, #12 by GLS, and #19 by LS. The favorable allele frequency was 0.92 in the elite cluster of 160 cows (to be referred to as 'elite cluster' , the upper-right cluster of Figure 1C) and was 0.83 in the remaining 1,494 cows in the left two clusters of Figure 1C (to be referred to as 'average cluster'). For protein percentage, ARS-BFGL-NGS-56327 in NIBP was #2 by EMMAX-IBS and EMMAX-BN, #3 by GLS, #7 by PCA, and #19 by LS. The favorable allele frequency was 0.65 in the elite cluster and 0.29 in the average cluster. For daughter pregnancy rate, the four methods had two common SNP effects on the X chromosome. Hapmap55242-rs29011414 was #1 by EMMAX-IBS, #2 by EMMAX-BN and GLS, #31 by LS and #37 by PCA, and ARS-BFGL-NGS-94205 in GRIA3 was #2 by EMMAX-IBS, #11 by EMMAX-BN, #7 by LS, #11 by GLS, and #40 by PCA. The favorable allele frequency of Hapmap55242-rs29011414 was 0.88 in the elite cluster and 0.74 in the average cluster, and the favorable allele frequency of ARS-BFGL-NGS-94205 was 0.71 in the elite cluster and 0.44 in the average cluster. For service-sire stillbirth, the four methods had two common SNP effects on the X chromosome: Hapmap28373-BTA-160078 was #2 by EMMAX-IBS and PCA, #4 by LS, and #87 by GLS, and ARS-BFGL-NGS-61325 was #3 by EMMAX-IBS, #4 by PCA, #6 by LS and #91 by GLS. The favorable allele frequency was 0.58 in the EMMAX-IBS EMMAX-BN LS GLS PCA LS_1494 Figure 5 Global view of test results from four method for fat percentage. EMMAX-IBS is the EMMAX method using correlation measured by identity by state (IBS) among all individuals. EMMAX-BN is the EMMAX using the Balding-Nichols kinship matrix among all individuals. GLS is the generalized least squares method accounting for half-sib intraclass correlation. PCA is the method of principal component analysis for stratification correction using to top 20 principal components as covariables. LS is the least squares method without stratification correction reported in Cole et al. [6]. LS_1494 is the LS analysis without the 160 elite cows.
elite cluster and 0.18 in the average cluster for both markers. For body conformation traits, eight traits had overlapping effects among the four methods with and without stratification correction (Table 1). Allele frequency differences between the elite and average clusters generally were not as large as those discussed above (Additional file 10: Table S4).
Comparison with effect size distribution from USDA genomic prediction The USDA genomic prediction uses genomic correlations among all individuals [13] and routinely publishes the SNP effect size distributions at the website of Animal Improvement Programs Laboratory (AIPL) [14,15]. Effect size distributions from the April 2012 Holstein genomic evaluation based on 18,181 bulls and 21,118 cows were compared with the effect significance in this study for consensus SNP effects of the 31 dairy traits. A relatively small number of large effects (AIPL effects) involving an even smaller number of genes were observed from the USDA genomic evaluation. The three gene regions that accounted for most of the large AIPL effects included DGAT1 of BTA14, SIGLEC12 of BTA18 and the MIR584-8~KAL1 pseudo-autosomal region of BTAX. Of these three gene regions, DGAT1 and SIGLEC12 were reported in an AIPL study [14]. The five methods for SNP testing (LS, EMMAX-IBS, EMMAX-BN, GLS and PCA) had confirmation with the AIPL effects for 17 of the 31 traits ( Table 2, Additional file 11: Figure S7), and the LS method had two BTA18 markers that were the most significant for three traits (daughter pregnancy rate, daughter calving ease and net merit) with the largest AIPL effects at SIGLEC12. All five methods confirmed the large effects for fat percentage in the first 4.5 Mb region of BTA14. EMMAX, GLS and PCA confirmed the largest effect for fat yield in the DGAT1 region and the largest effect for protein percentage at a SNP 45kb upstream from PIGY on BTA6. Other confirmations included those by GLS and EMMAX for milk yield at DGAT1; LS and GLS for somatic cell score in the GC-NPFFR2 region of BTA6, EMMAX and PCA for productive life, service-sire calving ease, body depth and front teat placement at SIGLEC12; LS for stature, fore udder attachment, rear udder height and final score in the MIR584-8~KAL1 region of BTAX; EMMAX for rump width at SIGLEC12 and dairy form in the MIR584-8~KAL1 region of BTAX, and PCA for service-sire stillbirth at SIGLEC12. GLS had the best confirmation of the AIPL effects for milk yield ( Table 2) and had the strongest milk effects with seven markers among the top 10 effects for milk yield in the A5D786-GML region of BTA14 that had 13 genes including DGAT1. GLS had significant SNP effects for protein yield upstream of DGAT1 in VPS28 (#4) and CYHR1 (#6) but did not confirm DGAT1 which had the largest AIPL effect for protein yield and was located next to VPS28. LS confirmed a large AIPL effect at 93. 2  service-sire calving ease and daughter calving ease, and at 58. 7 Mb for daughter pregnancy rate. For these traits, the AIPL effects and the EMMAX and PCA effects for servicesire calving ease and the PCA effect for service-sire stillbirth were at SIGLEC12 ( Table 2, Additional file 11: Figure S7).
The SNP effects confirmed by AIPL's large effect sizes and statistical significance in this study could be considered as consensus SNP effects in U.S. Holstein cattle given the large sample size in the USDA Holstein genomic evaluation.  The number on the left is the AIPL effect rank and the number on the right is the significance rank of this method for the same marker effect. 2 Underlined ranking is not an exact match to an AIPL effect and the exact location is the underlined location in the same row under 'Gene region'.

Consensus effects in elite cows
Genetic variants relevant to the unique genetics of the elite cows should have high frequencies in the elite cows and low frequencies in the average cows. Similarly, SNP markers with similar frequencies between the elite and average clusters should be less likely to be the genetic factors separating the elite cows from the average cows. Allele frequency differences (AFD) between the elite and average cows showed that the consensus effects unlikely were the most important genetic effects for the unique genetics of the elite cows. The widely confirmed DGAT1 gene had the most significant SNP (ARS-BFGL-NGS-4939) for fat percentage. The 'A' allele of this SNP was favorable for milk yield but was unfavorable for fat yield and fat percentage, and the 'G' allele was the opposite, favorable for fat percentage and fat yield and unfavorable for milk yield. This SNP had AFD of 0.05 between the elite and average clusters, noting that more than half of the 45,878 SNP loci had AFD greater than 0.10 between the two clusters. Moreover, the favorable allele frequency in the elite cluster for fat yield was 0.05 less than in the average cluster, 0.13 in the elite cluster and 0.18 in the average cluster. Therefore, it is questionable whether DGAT1 had a major role in the elite cows for fat yield given that 87% of the DGAT1 alleles in the elite cluster had unfavorable effects for fat yield. The 87% frequency of the favorable DGAT1 allele for milk yield (unfavorable for fat yield) was consistent with the high milk production levels of the elite cows but again was unlikely to have a major role in the milk production levels of the elite cows because the frequency of the favorable allele for milk yield was only 5% higher in the elite than in the average. In contrast, the NIBP region with highly significant LS effects for fat percentage, fat yield, protein yield and protein percentage had the largest AFD (0.36) between the elite and the average cows in the DGAT1-NIBP region ( Figure 6A).
The SIGLEC12 gene had large AIPL effects for six functional traits, productive life, daughter pregnancy rate, service-sire calving ease, service-sire stillbirth, daughter calving ease, and net merit. The effect of service-sire calving ease was confirmed by EMMAX and PCA, and the effect of service-sire stillbirth was confirmed by PCA (Table 2). However, the favorable allele of SIGLEC12 had similar frequencies in the elite cows (0.945) and the average cows (0.914). The SNP immediately upstream of the SIGLEC12 effect had large AFD (0.376) but was insignificant by any of the five methods of SNP testing. The AIPL effect size distributions ranked this marker #3 for daughter calving ease, #18 for service-sire calving ease, and #20 for body depth. For five of the six functional traits (except service-sire stillbirth) with large AIPL effects at the SIGLEC12 marker, the most significant markers by LS had large AFD between the elite and the average at 53.95 Mb (AFD = 0.48) and LOC787057 (AFD = 0.46) ( Figure 6B).
For protein percentage, the favorable allele of Hap-map24324-BTC-062449), #1 by all methods for stratification correction and by the AIPL effect size distribution, had nearly identical frequency in both elite and average, about 0.967 in each cluster ( Figure 6C). For somatic cell score, the favorable allele of the SNP marker (BTB-01654826) between GC and NPFFR2 (#1 by AIPL effect size, #10 by LS and #16 by GLS) had an AFD 0.08 (allele frequency of 0.48 in the elite and 0.40 in the average). The favorable allele of the SNP (ARS-BFGL-NGS-17376) immediately upstream of BTB-01654826 had an AFD of 0.37 (0.64 in the elite and 0.27 in the average) and was ranked #15 by LS and #9 by GLS, but this marker was not among the top 100 effects for AIPL, EMMAX (#179) and PCA (#2902) ( Figure 6D).
The results of AFD for the favorable alleles of the consensus effect showed that the consensus effects for production, fertility and calving traits had similar frequencies for the favorable alleles in the elite and the average cows. These results indicate that the consensus effects shown in Figure 6 may not explain the unique genetics of the elite cows and that the unique genetics of the elite cows likely involved additional genes not represented by genes with consensus effects.

Elite cows and favorable alleles with minor frequencies
Dairy GWAS results in Cole et al. [6] had highly significant effects with low frequencies for the favorable alleles in the study population of 1,654 cows and had a large number of significant effects from the X chromosome. The analysis of genome, pedigree and phenotype stratifications identified the most likely reason for the rare favorable alleles: the rare favorable alleles had high frequencies in the elite cluster of 160 cows and low frequencies in the remaining 1,494 cows in the average cluster. Since each sire contributes 50% of the alleles to its daughters, sire alleles that had low population allele frequencies could have allele frequencies around 50% among the 153 daughters in the elite cluster, creating substantial comparison of allelic effects between the elite and the rest. This type of allelic comparison could identify true effects because elite cows are a small percentage of the population and should have high frequencies of favorable alleles that were rare in general populations. Although alleles satisfying this condition could have false positive effects due to chance, alleles that do not satisfy this condition should be less likely to be relevant to the elite status. The SNP markers with favorable alleles overwhelmingly possessed by the elite cluster should have provided a pool of SNP markers that are potentially responsible for the unique genetics of the elite cows.
Many of the highly significant SNP markers with rare favorable alleles had high frequencies in the elite cluster and low frequencies in the average cluster. A total of 298 SNP markers had AFD of 0.4 or greater and 2,248 SNP markers had AFD of 0.3 or greater between the elite and average clusters. Most of these markers had some effects on the 31 dairy traits by the LS method [6], particularly those with larger AFD. This fact indicates that the presence of the elite cluster contributed to the detection of significant SNP effects, although it is unknown which effects are true or false positive effects.
A notable SNP is BTA18's BFGL-NGS-117985 with a minor allele frequency of 0.091 (for allele G) in the 1,654 Figure 6 SNP effects and allele frequency differences (AFD) between the elite cows and the average cows for four chromosome regions with confirmation between GWAS and effect size distribution from USDA genomic evaluation. A) The DGAT1-NIBP region for production traits, showing that the consensus effect in DGAT1 had a low AFD of 0.05 while a SNP in NIBP had the largest AFD (0.36) in this region and also was also highly significant for protein percentage (#2 by EMMAX-IBS and EMMAX-BN, #3 by GLS, #7 by PCA, and #19 by LS). B) The BTA18 region for production, somatic cell score, daughter pregnancy rate and calving traits, showing that the two SNP markers detected by the LS method for many traits had the largest AFD (0.48 at 53.95 Mb and 0.46 at 58. 7 Mb) in this region while the consensus effect in SIGLEC12 had a low AFD of 0.03. In this figure, the vertical red line indicates the significant marker, and the vertical blue line indicates an adjacent marker. C) The BTA6 region for protein percentage, showing that the consensus effect between HERC3 and PIGY had nearly identical frequencies in the elite and the average cows. D) The BTA6 region for somatic cell score, showing that the consensus effect had a low AFD of 0.09, while the upstream marker identified by LS and GLS as highly significant for somatic cell had a high AFD of 0.37. cows. Among the 160 elite cows, the same G allele had a frequency of 0.53, compared to 0.05 in the average cluster of 1,494 cows (Additional file 10: Table S4). The G allele was the favorable allele for fat, protein and milk yields, fat and protein percentages, productive life, and net merit (high PTA values); and was the favorable allele for somatic cell score, service-sire calving ease, daughter calving ease and service-sire stillbirth (low PTA values). This SNP had the eighth largest AFD (0.48) between the elite and average clusters. Other highly significant SNP with large AFD included the most significant SNP for daughter pregnancy rate and somatic cell score (ARS-BFGL-NGS-4774) near INSR on BTA7 with AFD of 0.34, the second most significant SNP on BTA2 (ARS-BFGL-NGS-77438) for somatic cell score with AFD of 0.41, the most significant SNP for milk yield on BTA13 (ARS-BFGL-NGS-4774) with AFD of 0.38, the SNP with the largest number of highly significant effects for body conformation traits in REN on BTA16 (ARS-BFGL-NGS-83607) with AFD of 0.23, the most significant SNP on BTA26 near MGMT for foot angle (Hapmap28514-BTA-163525) with AFD of 0.43, and many SNP loci for dairy form, rear legs (side view), foot angle, strength, udder depth, and teat length (Additional file 10: Table S4). The unequal allele frequencies between the elite and average clusters indicate that the above SNP markers could be relevant to the elite cows. Nearly all SNP markers highly significant for fat yield without stratification correction [6] had large AFD between the elite and average clusters. For example, the NIBP marker that was ranked #76 by the LS method had AFD of 0.36, with favorable allele frequency of 0.65 in the elite cluster and 0.29 in the average cluster (Additional file 10: Table S4). This marker was also highly significant for protein percentage (#2 by EMMAX-IBS and EMMAX-BN, #3 by GLS, #7 by PCA, and #19 by LS).

Elite large half-sib family and X chromosome effects
Some of the highly significant X chromosome SNP markers reported in Cole et al. [6] had large AFD between the elite and average clusters. However, not all X chromosome markers with large AFD were highly significant. The 153 half sibs of the 160 elite cows should have identical sire X chromosomes except the pseudo-autosomal region. Therefore, X chromosome allele frequencies in the elite cluster could be substantially higher than in the average cluster. Among the 298 SNP markers with AFD of 0.4 or greater, 50 markers were the X chromosome markers (16.8%). In Cole et al. [6], ARS-BFGL-NGS-1096 was in a gene desert on chromosome X but was highly significant for protein percentage (#1), fat percentage (#4), service-sire calving ease (#5), fat yield (#9), protein yield (#11) and productive life (#12). This marker had AFD of 0.51 (second largest AFD), with favorable allele frequency of 0.53 in the elite cluster and 0.02 in the average cluster. The marker with the largest AFD (0.54) was also on the X chromosome (ARS-BFGL-NGS-19641), with favorable allele frequency of 0.545 in the elite and 0.032 in the average, but this marker was not among the top 20 effects for any of the 31 traits, indicating that large AFD alone was not guaranteed to be highly significant. The X chromosome SNP (Hapmap50291-BTA-25028) with the fourth largest AFD also was not among the top 20 effects for any trait. The SNP (ARS-BFGL-NGS-18028) 30.1 kb upstream of LOC52005 had AFD of 0.47, and was #1 for productive life and #4 for somatic cell score. The X chromosome SNP that was highly significant for several body conformation traits, Hapmap46795-BTA-30632 in PHKA2, had AFD of 0.32.

LS analysis without the elite cows
To test the hypothesis that the significant SNP results from the LS analysis was influenced by the elite cows and that stratification correction may have removed some true SNP effects we performed a separate LS analysis for 1,494 of the 1,654 cows by removing the 160 elite cows from the data set. Without the elite cows, many of the highly significant LS results for dairy functional traits disappeared, including BTA180s BFGL-NGS-117985 for a large number of traits, and X chromosome markers ARS-BFGL-NGS-1096 and ARS-BFGL-NGS-18028 (Additional file 11: Figure S7). For the widely confirmed fat percentage effects on BTA14, the LS analysis without the elite cows nearly achieved the effect of stratification correction and eliminated most significant LS results with the elite cows except the DGAT1-NIBP region of BTA14. The highly significant BTA18, BTA26 and BTAX SNP effects with the elite cows were eliminated and the highly significant BTA5 and BTA17 effects became much less visible when the elite cows were removed ( Figure 5). Similarly, many highly significant LS results for fat yield, protein yield, protein percentage, service-sire sire calving ease, daughter calving ease and foot angle were eliminated once the elite cows were removed. This analysis showed that many of the significant LS results for fat percentage and the other traits were due to the elite cows and that stratification correction removed effects associated with the elite cows. Without the LS analysis, genetic information unique to the elite cows would have been left undetected.
The GLS and EMMAX methods for stratification correction were little affected by the removal of the elite cows, further indicating that the SNP effects removed by stratification corrections were largely associated with the elite cows. A few exceptions were increased significance for some traits with the removal of the 160 elite cows. Increased significance of EMMAX-IBS included the BTA26 effects for milk yield, somatic cell score and feet and legs, and BTA23 effects for fore udder attachment and rear teat placement, and increased GLS significance included some BTA20 effects for udder cleft (Additional file 11: Figure S7).

Combined analysis of correlation based methods and PCA
The methods of correlation based, EMMAX and GLS, were combined with PCA by adding the top 20 PC's to the statistical model for EMMAX-IBS and GLS. EMMAX-PCA generally had slight decreases in significance, and GLS-PCA analysis substantially reduced the GLS significance for most traits. Two exceptions to the reduced significance resulting from adding PCA included the increased significance of the BTA18 SNP effect for PL by EMMAX-PCA and for GLS-PCA, and the increased significance for the BTA18 effect on RW by GLS-PCA that were consistent with the AIPL results. However, adding PCA diminished several significant effects from GLS that were consistent with the AIPL results, including the BTA14 effects for protein yield and the BTA6 effect for somatic cell score (Additional file 11: Figure S7).

Stratification correction may remove true effects associated with selection
Genetic selection and artificial insemination are major factors contributing to population stratification in dairy cattle. Genetic selection since 1964 has resulted in phenotypic changes for many traits. Most of the 31 traits in this study had phenotypic changes in one direction since 1964 (Table 3). Associated with phenotypic changes and breeding practice are gene frequency changes due to genetic selection. The widespread use of artificial insemination in dairy breeding accelerates the replication of selected male genomes. Genome sampling (drift) in breeding practice inevitably would contribute to genome stratification and spurious effects because only a fraction of the population is maintained for producing the next generation. Genetic hitchhiking of selection, which refers to allele frequency changes due to linkage disequilibrium with loci subjected to selection [16], also contributes to genome stratification and spurious association effects.
The three methods used in this study for stratification correction eliminated most effects associated with the elite cluster, indicating that the removal of stratification effects likely have removed some true effects associated with genetic selection.
The mechanism of removing true effects associated with selection is different for different methods. For methods using correlation among individuals such as EMMAX and GLS, the mechanism of removing true effects associated with selection is the correction of an individual's phenotypic value by phenotypic values of all related individuals so that the corrected individual phenotypic values may become less different from each other. For the PCA approach, the mechanism of removing true effects associated with selection is the correlation between principal components and phenotypic values. The stratification analysis in this study showed that genome and phenotype stratifications overlapped (Additional file 4: Figure S4). Consequently, the removal of genome stratification by PCA likely also removes true genetic effects associated with the phenotypic stratification. It should be noted that the PCA approach has a problem of subjectivity in choosing the number of principal components because the use of different numbers of principal components may result in different results. For the dairy GWAS data in this study, the use of the first component (data not shown) and the use of the first 20 components (Additional file 9: Table S3) had drastically different results. Current methods for stratification correction in association analysis do not separate drift and hitchhiking from selection. Selection signature analysis may provide evidence for genes and chromosome regions affected by selection [17,18]. Sample structure may differ for different genomic regions, and adjustment for local stratification may be instructive to accurately localize true signals [19,20].

Cleaner signals from stratification correction for large effects without genetic selection
The methods for stratification correction in association analysis could yield cleaner signals of SNP effects and could be used for detecting potential large effects for traits without genetic selection. Two of the 31 traits in our study clearly were not subjected to genetic selection since 1964: fat and protein percentages. These two traits were unchanged since 1964, with slight increase before 1985 and slight decrease after 1985 for fat percentage, and slight decrease before 1985 and slight increase after 1985 for protein percentage ( Table 3). The fat percentage effects of BTA14 is an excellent example showing how stratification correction could reduce potential noise and produce much cleaner signals of significant effects ( Figure 5). The two models of EMMAX, EMMAX-IBS and EMMAX-BN, had the cleanest signals, virtually eliminating all effects except those in and near the DGAT1-NIBP region of BTA14. However, it is unknown whether such clean signals came at the expense of the elimination of some less significant true effects. Another unknown is whether the fat percentage effects could have been detected with similar statistical confidence had fat percentage been subjected to many years of directional selection. Comparing protein percentage that also was not subjected to selection with fat percentage, it should be apparent that protein percentage did not have similar major effects to those for fat percentage in our study population of 1,654 contemporary Holstein cows. This assumption was supported by the effect sizes from the USDA Holstein genomic evaluation. The effect size range of the top 20 effects for fat percentage was 3.9 times as large as the effect size range of the top 20 effects for protein percentage (Table 2).

Conclusions
Genetic selection and artificial insemination were major factors associated with Holstein genome stratification that overlapped with pedigree and phenotype stratifications. The confounding between genome and phenotype stratifications may provide opportunity to discover genetic effects and cause difficulty in separating true and spurious effects. Stratification correction in genome-wide association analysis in Holstein cattle could remove true effects associated with stratification due to genetic selection.

Phenotypic data, study population and SNP genotyping
Thirty one production, health, and reproduction traits were studied, including 13 production, health, reproduction

Three methods for stratification correction
Three methods for stratification correction were evaluated, including a mixed model method implemented by EMMAX [3], a GLS method based on intraclass correlation [7], and a PCA method [2]. EMMAX was used in our analysis because of its optimized algorithm and fast computational speed in solving large mixed model systems [3]. The EMMAX method uses a linear mixed model approach that includes correlation among all individuals. The statistical model can be expressed as: y = μ + Zβ Z + Gβ + e , where y = column vector of phenotypic values, μ = population mean of the phenotypic values, Z = incidence matrix for a random additive effects, β Z = column vector of the random additive effects, G = the genotype value of the candidate SNP, β = the regression coefficient of the candidate SNP, and e = random residual. The phenotypic variance-covariance matrix is: var(y) = Var(β Z ) + var(e) = Kσ a 2 + Iσ e 2 , where K = IBS [3] or the Balding-Nichols kinship matrix [10], I = identify matrix, σ a 2 = additive variance, and σ e 2 = variance of random residuals.
Since the genome stratification overlapped with half-sib families, we used a GLS approach [7] to account for correlation within half-sib families based on intraclass correlation [22]. The phenotypic values of a quantitative trait is assumed to be: y = Xg + Zf + e, where y = vector of phenotypic values, g = effects of SNP genotypes, X = model matrix of g, f = random family effects with a common variance σ f 2 for sibs in the same family that could include common genetic and environmental effects, Z = model matrix of f. The variance-covariance matrix of the family effects is assumed to be G = var(f) = Iσ f 2 .
Then, the phenotypic variance-covariance matrix is Var(y) = V = ZGZ' + Iσ e 2 . The phenotypic values are assumed to follow a normal distribution with mean Xg and variance-covariance matrix V, which can also be expressed as Var where V i = variancecovariance matrix of phenotypic values of sibs in family i, and m = number of families. To improve computational efficiency in the setting of genome-wide SNP analysis, a simple formula of the V inverse was developed so that the direct inversion of the V matrix is no longer needed [7,8]. Let σ 2 = σ f 2 + σ e 2 , ρ = σ f 2 /σ 2 = intraclass correlation, n = total number of observations, r = ρ/[1 + (n i -1)ρ], and λ = r/(1r), where n i = the number of sibs in family i. Also let V -1 = L'L, where L is an upper triangular matrix. Then, each element in L can be calculated as With the above formulae, the V -1 can be constructed without actually inverting the V matrix. Variance components in the intraclass correlation, σ f 2 and σ e 2 , were estimated using a maximum likelihood approach [7,23]. Estimates of the three SNP genotypic values were obtained by the following GLS solution: where X 2 * = L'X 2 and y* = L'y = corrected phenotypic values after removing fixed non-genetic effects. Let k = the rank of X = [X 1, X 2 ], s 2 = residual mean sum of squares, and s i = row vector of contrast coefficient for defining additive or dominance effects based on the extended Kempthorne model [24]. Then, the following test statistic for testing each genetic effect (additive or dominance effect) follows a Student-t distribution with nk degrees of freedom: Stratification correction to the phenotypic value of each individual was also applied using the approach of principal component analysis (PCA) [2] as a comparison to the two methods using correlation among individuals. The top 20 principal components were used as covariables for stratification correction. The statistical model was: y = μ + Σx i b i + g + e, where Σx i b i = sum of principal component effects, with x i = eigenvector of principal component i, and b i = regression coefficient of principal component i, i = 1,. . .20.
Statistical tests of SNP effects by the GLS and PCA methods were implemented using the epiSNP computer package [25]. A genome-wide 5% type-I error with the Bonferroni correction was considered as the threshold P value (10 −6.4 ) for genome-wide significance. Gene and SNP locations were identified based on the University of Maryland bovine genome assembly [26,27]. SNP positions were based on UMD3.1 and gene locations were based on ENSEMBL [28] and NCBI [29].

Additional files
Additional file 1: Figure S1. Additional file 2: Figure S2. Pedigree of the 1,654 contemporary cows tracing back to ancestors born in 1930's (approximately 10-15 generations). Circles in gold color are the 1,654 cows used in the genome-wide association analysis. The pedigree shows that all 1,654 cows are related.
Additional file 3: Figure S3. Overlap between genome stratification and half-sib family structure. Additional file 4: Figure S4. Overlap between genome stratification and phenotypic stratification of 31 traits. Additional file 5: Figure S5. Overlap between genome stratification and phenotypic stratification for chromosome 1 and the X chromosome. Additional file 10: Table S4. (Excel file) Overlap between top 100 effects per trait for 31 dairy traits from methods for stratification correction and the top 100 effects from the analysis without stratification correction. A1_elite160: frequency of allele 1 in the elite cluster of 160 cows; A1_1494: frequency of allele 1 in the remaining 1,494 cows excluding the elite cluster; Allele 1 = A for AC, AG and AT, = C for CG and CT, = G for GT (GT not observed in our SNP data set). MY, milk yield; FY, fat yield; PY, protein yield; FPC, fat percentage; PPC, protein percentage; SCS, somatic cell score; DPR, daughter pregnancy rate; PL, productive life; SCE, service-sire calving ease; DCE, daughter calving ease; SSB, service-sire stillbirth; DSB, daughter stillbirth; NM, net merit; STA, stature; STR, strength; BD, body depth; RW, rump width; DF, dairy form; RA, rump angle; FUA, fore udder attachment; RUH, rear udder height; UD, udder depth; UC, udder cleft; FTP, front teat placement; RTP, rear teat placement; TL, teat length; FA, foot angle; RLS, rear legs (side view); RLR, rear legs (rear view); FL, feet and legs; FS, final score. E: the effect from the method without stratification correction [6] was among the top 100 effects from EMMAX-IBS; G: the effect from the method without stratification correction [6] was among the top 100 effects from the GLS method; E: the effect from the method without stratification correction [6] was among the top 100 effects from PCA methods; EG: the effect from the method without stratification correction [6] was among the top 100 effects from EMMAX-IBS and GLS; EP: the effect from the method without stratification correction [6] was among the top 100 effects from EMMAX-IBS and PCA; GP: the effect from the method without stratification correction [6] was among the top 100 effects from GLS and PCA; EGP: the effect from the method without stratification correction [6]