Heterozygosity and inbreeding
The current results for \({H}_{obs}\) were consistent with previous findings for the AUS Merino [13, 15, 16], SA Merino [21], Coopworth [14], Poll Dorset and White Suffolk [13] (Table 1). Current values were marginally higher than those previously reported for the African Dorper [13] and Border Leicester [13, 15], but the general ranking of the Merino, Poll Dorset and Border Leicester as breeds of respectively high, intermediate and low gene diversity agreed with previous results from marker data [13, 15, 22]. A similar range of 0.33 to 0.38 has been reported for mean heterozygosity of New Zealand pure and composite populations [14, 23] that originated from programs that included high levels of crossbreeding. Diversity of ‘pure’ breeds currently investigated could be expected less diverse compared to highly crossbred populations, suggesting either paricularly high diversity for current breed groups or that \({H}_{exp}\) and \({H}_{obs}\) has limited sensitivity as a measure of genetic diversity.
In deriving estimates of inbreeding by marker-based methods, it is important to account for the original definitions of inbreeding described as the correlations between homologous genes within a haploid individual [24], or the probability of identity by descent [25]. The former definition accommodates a negative F-value [26]. The latter does not as probabilities, by definition, are bounded between 0 and 1. Also, estimates of inbreeding derived from the diagonal of the GRM are sensitive to the extent that breed effects influence allele frequencies. The mean (1.13) of the diagonal of \(\boldsymbol{G}\) suggested the multi-breed composition of this GRM slightly inflated estimates of FVR across all groups. This is supported by the fact that FVR estimates of Merino bloodlines, derived from \({\boldsymbol{G}}_{M}\) (mean diagonal of 1.05), are slightly lower than the mean FVR for Merinos according to \(\boldsymbol{G}.\) These estimates of FVR could thus be considered appropriate for comparing mean levels across groups within the same GRM, but represent slightly inflated measures of individual inbreeding.
The high mean FVR suggested relative uniformity within populations such as the Dorper, SAMM, Poll Dorset and Border Leicester groups. The high levels of inbreeding for the AUS Dorper and AUS SAMM could be expected as these breeds were established in Australia from limited importation of genetic material from South Africa followed by grading-up programs using back-crossing. Also, FVR was similar or higher in the SA populations, implying that inbreeding in the AUS populations was also a characteristic of their ancestral lines, rather than only a consequence of across-country isolation. High levels of inbreeding precipitate a decline in quantitative genetic variance [27], and detrimental effects associated with excess homozygosity have been reported in sheep [28,29,30]. The overall level of inbreeding in populations observed here was generally low, but the comparatively high values for breeds such as the Border Leicester and Poll Dorset are notable for consideration in their breeding program design.
Pairwise FST statistics
Given analogous definitions for FST statistics [31, 32], FST estimates can be thought of as (1) the correlation between randomly sampled alleles within subpopulations relative to the total population or (2) the proportion of genetic variance that can be attributed to variance in allele frequencies between subpopulations [33]. With ‘across breed’ estimates ranging from 0.05 to 0.26 (Table 2), it was clear that the implications of this segregation depend greatly on the historic admixture between any two distinct breeds. Current FST values were higher than previous estimates of 0.062 and 0.053 between the Merino and, respectively, the Poll Dorset and Border Leicester groups [15], but methods were not similar. The low levels of divergence across countries for the SAMM, Dorper and Dohne Merino breed groups reflects the formation of these breeds in AUS by the importation of genetic material directly from SA. The low estimate for SA x AUS Merino groups suggested that the two Merino populations have not diverged greatly, or that the few recent genetic links were influential in reducing the genetic distance of samples included in this study. Interestingly, some FST values between bloodlines were comparable or larger than certain across breed estimates (Table 3). This further highlights the importance of lines that have already been noted for AUS Merinos [19], and also suggests that partial restriction of gene flow could have important implications, regardless of being ‘within breed’.
The low level of divergence between the Ultrafine and Cradock groups is somewhat expected since these bloodlines were known to have recent across-country links. However, the FST estimate in this across-country comparison was as low as pair-wise comparisons between SA lines, which is promising for the prospect of a common genomic evaluation for these populations.
Linkage disequilibrium and effective population size
Overall, the reported LD at ~ 55 KB agreed with previous findings that characterised LD in sheep to be generally low [22] compared to other domesticated species such as pigs [34] and dairy cattle [35]. Current results for Merinos agreed with earlier reports that characterized this breed by rapid LD decay [15, 22, 36]. However, Kijas et al. [22] showed substantial differences in LD measurements when determined at shorter distances such as ~ 10 KB, and higher density platforms could thus provide a more accurate indication of LD decay than currently reported.
LD observed for bloodlines reflected the importance of structuring by subpopulations. When compared to Merino breed groups, which is the pooled bloodlines, the pattern of decay showed relative consistency, but the absolute level of LD was substantially higher when structured by subpopulation (Fig. 6 vs. Fig. 8). Thus, while overall levels remained low, the persistence of LD was noticeably sensitive to the connectedness of set populations. Extrapolating this pattern, it could be speculated that the Ultrafine line consists of influential population substructures not accounted for by the current population assignment, which is supported by the low level of relatedness, but the slightly higher FVR within the Ultrafine line. However, no obvious substructures were observed in either PCA or ADMIXTURE results, as discussed below, and it is thus difficult to explain the uncharacteristic pattern of decay of this bloodline.
Calculating historical Ne by the rate of LD decay [37] has numerous examples in sheep [13, 14, 23]. Domestic populations, under strong selection programmes with the widespread use of preferential sires, deviates heavily from the assumption of random mating of Wright-Fisher populations [38] and the loss of diversity seen in Fig. 8 is expected. However, compared to other domestic species, sheep have been characterized by relatively high levels of diversity [13], attributed to large founding populations combined with less intense selection compared to other domestic species, such as cattle [39]. Using the same methodology, Kijas et al. [13] reported roughly similar estimates for African Dorper (264) and Border Leicester (242), but considerably higher estimates for the Australian Merino (833), Poll Merino (918) and Industry Merino (853) and Australian Poll Dorset (318) which indicates some inconsistency in Ne estimates of the same breeds.
The unexpectedly increasing trend of Ne over recent generations seen in Fig. 8(b) made it difficult to evaluate the breed’s comparative ranking across more recent generations. Other studies have also reported increasing Ne estimates in recent generations. Using a similar methodology, Brito et al. [23] reported an increasing Ne for the Primera and Lamb Supreme breeds over the last 5 generations. An increase in Ne was also observed for the Romney breed around 20 generations ago only to decline at around 5 generations ago [14]. The latter authors ascribe the increase in Ne to an increase in animal numbers following successful management and the recent implementation of technologies like artificial insemination [14]. However, non-random mating reduces Ne below census size, N [38]. Thus, given the high intensity of artificial selection and genetic drift, a population-wide increase in genetic diversity is unexpected in the absence of crossbreeding. Also, according to Hill & Robertson [9], LD and fixation have a linear relationship. Given that the Border Leicester group had the highest FVR and within-group relatedness, it was surprising not to observe a pattern of LD, and consequently Ne, more comparable to less diverse breeds such as the Poll Dorset group.
Genomic relatedness, PCA and ADMIXTURE
Estimates of relatedness according to \(\boldsymbol{G}\) agreed with FST estimates in describing a close association between breeds in common between countries. The across-country relationships for the SAMM and Dorper breeds were only marginally weaker than the internal relatedness of individual populations (within country). The lower level of relatedness between the AUS and SA Dohne Merino populations could be due to a more diverse SA population, as the SA Dohne Merino maintained a lower FVR and internal relatedness compared to the SA SAMM and SA Dorper groups.
For the Merino breed, the across-country relationship is known to be connected by both deep ancestral and relatively recent relationships, but the positive relationship was expected to be low for multiple reasons. Both groups were internally diverse, and it can be reasoned that a population cannot be more related, on average, to another population than its internal level of relatedness (than it is to itself). Also, the ancestral relationship between SA and AUS Merinos is defined by a distant linkage in the original development of the AUS Merino [4]. Lastly, the known recent links are limited to those between the Cradock and Ultrafine lines. Considering the defining influence of structure between bloodlines, this across-country relationship is likely to be isolated within these two lines, a tendency also observed in the ADMIXTURE, PCA and FST results reported here.
This study also reported across breed relationships that were comparatively high in magnitude. These cannot be attributed to recent links and must thus be prompted only by a deeper co-ancestry. For example, the SAMM and Dohne Merino share origins to the German Mutton Merino [5], and the origin of the AUS Coopworth is linked to the Border Leicester (www.coopworth.org.au).
From \({\boldsymbol{G}}_{\boldsymbol{M}}\), the positive relatedness between SA bloodlines suggested that separation on a flock level was less restricting compared to that of lines or strains, which was expected. The generally weak relationships with the Elsenburg line reflect the initial management on an isolated basis. The positive relationship between the Grootfontein and Industry lines is a good indication of the resource flock’s objective to represent commercial SA Merinos. Given the relatively few genetic links between the Ultrafine and Cradock lines, it was surprising to observe this across-country relationship as the only positive relationship in a comparison involving AUS bloodlines, including relationships to other AUS Merino bloodlines within country.
Results from PC- and ADMIXTURE analysis generally indicated very similar clusters of breed and bloodline populations. The discrete breed structures of PC1 and PC2 appeared to capture deep ancestral relationships and very little within-group variation. From ADMIXTURE analysis, the distinct genetic structures of Merino, Border Leicester and Poll Dorset agreed with those three breeds occupying the most distant branches of PC1 and PC2.
The ADMIXTURE and PCA results also agreed well with previous parameters that indicated only small effects of across-country separation. However, a study that imputed SA breeds from AUS reference panels showed a markedly higher accuracy of imputation for Dorpers compared to Dohne Merinos [40]. Although imputation accuracy is not directly related to ADMIXTURE and PCA results, it is thought that the homogenous nature of the Dorpers suggested by Fig. 5 is likely to have facilitated the more accurate imputation for this breed.
Generally, breed structures did not dissipate in succeeding principal components as groups remained clustered beyond PC1 and PC2 (Fig. S2), and the genotypes of most breeds were defined by a similar genetic composition in ADMIXTURE analysis (Fig. 5a). However, Merino groups were an exception and often segregated into multiple clusters across PC3 to PC14 and substructures of Merinos were also clear from ADMIXTURE analysis that commenced on an ‘across breed’ level. Following the advent of artificial insemination (AI), across bloodline links are considered to have become more common in AUS [18], but these results strongly suggested that the subpopulation, i.e., bloodline of origin, is an important determinant of the genetic composition of Merinos. This has been demonstrated by high levels of quatitative genetic variance across similar groupings (Ultrafine, Fine/Fine-medium, Medium/Strong) of AUS Merinos for key production traits [18]. Also, markedly different accuracies for genomic breeding values were reported for similar groupings (superfine-, fine- and strong-wool types) following genomic prediction of production traits from the same reference set [41].
The ADMIXTURE analysis of bloodlines revealed further complexities of these population structures (Fig. 5b). The identification of the Ultrafine and Strong lines within opposing clusters have been previously reported [17]. Similar to other metrics presented in this study, close association existed between the Cradock line and the Ultrafine and Fine-Medium-1 lines. However, ADMIXTURE results at K = 7 suggested that the remaining SA lines might not directly benefit from the relationship between the Cradock and Ultrafine lines. These results should also be seen in combination with the discussion below with reference to POV and the CV error.
The highly defined population structure of the Ultrafine line is notable considering the high estimates of diversity according to Ne and relatively low internal relatedness. However, this is possibly due to the orthogonal nature of principal components, which separated the ancestral relationships for which the Ultrafine line appears uniform, from within-line variation between sampled individuals. This could also partly explain the homogenous composition observed in ADMIXTURE analysis. If the Ultrafine line did consist of many small substructures as previously speculated in this paper, the level of K was likely too low to capture such structures for a population with a strongly defined ancestry.
Despite the good agreement and accuracy of PCA and ADMIXTURE analysis in identifying the known groups of origin, the low POV explained by the initial principal components in Fig. 5a supports previous results that specifically noted the high dimensionality as a characteristic of genetic architecture of sheep [13, 42]. Also, the lack of an inflection point in CV errors from 3 to 20 (Fig. S5b) implied that the model had difficulty estimating an ideal value of K within this range. It is possible that higher levels of K would perpetually identify lower-level structures, such as families or sire groups, as unique genetic groups. The high diversity of the animals in this study and in sheep in general could exacerbate this problem, causing difficulty in estimating a ‘best’ estimate for K. Thus, further analysis across higher increments of K were not explored, also because a similar pattern has been observed for similar sheep datasets with fewer breeds until K = 40 (P. Gurman, unpublished data).
Implications for genomic selection
The combining of populations into the same pool - such as the current scenario of merging bloodlines into respective SA and AUS Merino groups - hopes to benefit prediction by increasing sample size. In the presence of heterogeneity this will be accompanied by an associated increase in Ne, and thus a decrease in LD. This trade-off could be an important determinant in breeding program design, and the more diverse and diverged the populations, the more challenging this trade-off is likely to be. Thus, an alternative consideration is also valid that a population of high diversity could benefit from being subset into smaller groups of better-connected animals. Regarding genomic selection, Van der Werf et al. [11] showed that a small number of highly related individuals could be more informative than large numbers of distant individuals. While this previous study binned relatedness by categories (e.g. groups of half-sibs), it should also be valid across a continuous scale of heterogeneity such as currently seen in comparing combinations of populations. In a narrow spectrum approach, knowledge of important population structures would be essential to identify pockets within the population that could deliver optimal results. For a subset of Merino populations such as the Cradock and Ultrafine lines, the mean relatedness was as high as the mean internal relatedness for AUS Merinos, which are all currently evaluated in a single analysis [2]. Further measures of PCA, ADMIXTURE and FST indicated that these two groups are likely to be the best starting point for an across-country platform. However, these ‘bloodlines’ were the only examples of some, albeit low, linkage by pedigree. A minimum level of genetic exchange is thus likely to remain an important factor unless more distant population structures are better accounted for in future evaluations.
Accounting for population structures derived from the extended pedigree have delivered increased accuracies for predicted genomic breeding values in the AUS Merino [17]. Including eigenvalues from PCA analysis [42], or group proportions from the ADMIXTURE Q-matrix [17], has decreased accuracy, but the higher accuracy of non-adjusted values are likely biased by picking up on breed effects rather than individual variation. Initially, adjusting for population gene frequencies showed little benefit [12], but improved results have recently been reported by Gurman et al. [43] for a multi-breed GRM. Given that both PCA and ADMIXTURE results proved informative and accurate in characterizing populations by known group of origin, further research is needed to make efficient use of this information. However, other problems could persist in the likely case where across country prediction would utilize both pedigree and genomic information in the ‘single-step’ approach [44] which is now common for Australian sheep [2]. The assumption of unrelatedness of founder parents in the pedigree could be particularly problematic in the case of across country separation where disjoined pedigrees could in fact be well connected by unknown links. In this regard, the use of so called ‘metafounders’ [45] could be a promising approach to better align the pedigree and GRM for both disconnected and highly related base populations, but depends on all genetic groups being well represented in the genotypic dataset.