Analyses of key genes involved in Arctic adaptation in polar bears suggest selection on both standing variation and de novo mutations played an important role

Background Polar bears are uniquely adapted to an Arctic existence. Since their relatively recent divergence from their closest living relative, brown bears, less than 500,000 years ago, the species has evolved an array of novel traits suited to its Arctic lifestyle. Previous studies sought to uncover the genomic underpinnings of these unique characteristics, and disclosed the genes showing the strongest signal of positive selection in the polar bear lineage. Here, we survey a comprehensive dataset of 109 polar bear and 33 brown bear genomes to investigate the genomic variants within these top genes present in each species. Specifically, we investigate whether fixed homozygous variants in polar bears derived from selection on standing variation in the ancestral gene pool or on de novo mutation in the polar bear lineage. Results We find that a large number of sites fixed in polar bears are biallelic in brown bears, suggesting selection on standing variation. Moreover, we uncover sites in which polar bears are fixed for a derived allele while brown bears are fixed for the ancestral allele, which we suggest may be a signal of de novo mutation in the polar bear lineage. Conclusions Our findings suggest that, among other mechanisms, natural selection acting on changes in genes derived from a combination of variation already in the ancestral gene pool, and from de novo missense mutations in the polar bear lineage, may have enabled the rapid adaptation of polar bears to their new Arctic environment.


Background
The divergence of polar bears (Ursus maritimus) from brown bears (Ursus arctos) and their adaptation to a novel environment and lifestyle in the Arctic is a wellknown example of rapid evolution [1,2]. Despite having diverged relatively recently (ca. 479-343 thousand years ago (kya)) [1], polar bears evolved a novel and distinct ecology, behaviour, and morphology. This rapid evolution was proven even more substantial by stable isotope analysis of an ancient polar bear jawbone from Svalbard, indicating that the species had already adapted to a marine diet and life in the High Arctic by at least 110 kya [3]. Therefore, in perhaps as little as 20,000 generations, polar bears evolved a suite of unique adaptations that enable them to maintain homeostasis under low temperatures, occupy a hypercarnivorous niche, subsist on a diet of primarily seals and their blubber [4], process lipids as their predominant energy source [5,6], and camouflage into their surroundings with pigment-free fur [7].
The rapid adaptation of such an iconic species has sparked a number of studies seeking to unravel the genomic underpinnings through the use of various datasets and analyses [1,2,8]. Both Miller et al. [2] and Liu et al. [1], investigated whole-genome datasets to uncover genomic regions showing signatures of selection. Together, these studies revealed different yet complementary sets of genes, pathways, and phenotypes that were likely shaped by natural selection during polar bear evolution. Genes showing the strongest signal of positive selection are involved in adipose tissue development, fatty acid metabolism, heart function, and fur pigmentation [1], indicating these are key processes in polar bear adaptation to the Arctic.
Using all currently available nuclear genomes from polar bears (n = 109) and brown bears (n = 33), we set out to determine the nature of the polar bear-specific amino acid changes in the top genes showing the strongest signal of positive selection in the polar bear lineage as reported by Liu et al., (Table 1 and supplementary  Table S1) [1]. This increase in sample number and spatial coverage enabled us to more comprehensively investigate the origins of polar bear-specific substitutions. By comparing the target gene regions across the two species based on this comprehensive dataset, we (i) determine the number and location of missense substitutions specific to the polar bear lineage (ii) address whether the polar bear variants derive from standing variation or de novo mutations, (iii) evaluate whether the variants fixed in polar bears may lead to functional changes in the genes.

Population structure and admixture
To gain a better understanding of whether selection acted upon standing variation in the polar/brown bear ancestral lineage or on de novo mutations in the polar bear lineage, we first needed to determine whether derived variants were already present in the ancestral gene pool. However, as admixture from polar bears into brown bears has been well documented [1,2,[9][10][11], we first investigated admixture at our selected genes. If any introgression from polar bears persists in the brown bear gene pool, inferences of the origins of these variants (standing variation or de novo) may be biased. To do this, we ran independent PCAs on each of the twelve genes analysed including the 50 kb flanking regions up and downstream of the genes.
Clear genetic differentiation between polar bears and brown bears has been shown at a genome-wide scale [1,2]. For eleven of the twelve genes, polar bears and brown bears form separate clusters, which we expect if there is no introgressive admixture/ILS at these loci (Supplementary Figs. 1-5 and 7-12). However, for EDH3, seven of the brown bears cluster within the diversity of polar bears, indicating some level of introgression or ILS around this gene (Supplementary Fig. 6). The seven brown bears were all from the Admiralty, Baranof, and Chichagof islands in Alaska, an area known to be inhabited by introgressed brown bears [9]. We therefore excluded EDH3 from further analysis.

Non-synonymous differences between polar bears and brown bears
For this analysis, we only considered dinucleotide sites coding for non-synonymous amino acid changes. Firstly, we calculated the percentages of sites fixed within polar bears (Table 1), which ranged between 10 of 8424 coding sites (0.12%) in FCGBP to 18 of 3957 coding sites (0.45%) in LAMC3. We compared this to the number of sites also fixed in brown bears for an alternative allele. Seven out of the eleven genes contained sites showing this pattern (Table 1). More specific information regarding gene positions, amino acid changes, and allele counts can be found in supplementary Table S2.

Fixed derived missense mutations in polar bear
To understand the origins of the variants fixed in the polar bear lineage relative to brown bears (i.e. whether they are derived or ancestral), we compared the polar bear and brown bear alleles to the giant panda (Ailuropoda melanoleuca) reference sequence, assuming the ancestral variant is retained in the panda. This was to reduce any noise caused by changes occurring in the brown bear lineage after its divergence from polar bears. Therefore, we only included sites where polar bears acquired a derived allele, while brown bears retained the ancestral state. Seven of the eleven genes of interest had at least one site showing this pattern (Table 1, Fig. 1). The number of such sites varied from zero (CUL7, FCGBP, LAMC3, and XIRP1) to twelve (TTN). More specific information regarding the gene positions and allele counts can be found in supplementary Table S3.

Influence on protein structure
To evaluate whether the fixed derived amino acid changes in polar bears (also fixed for the ancestral allele in brown bears) had any functional influence on the protein function, we implemented three independent analyses to predict the functional effects of the amino acid change in humans. Amino acid changes that appeared to have a functional influence according to at least one of the three tests were found in four genes, with varying numbers of sites displaying this in each gene, TTN (9 sites), LYST (3 sites), AIM1 (2 sites), and COL5A3 (1 Table 1  Putative Arctic-associated phenotypes are included for each gene; NA refers to genes in which no associated phenotype appears to be explicitly related to adaptations to the Arctic. Phenotype information was obtained from GeneCards (genecards.org). Sites fixed in polar bears are either biallelic (segregate with two alleles), fixed for an alternative allele, or fixed for the ancestral allele (also found in giant panda), in brown bears. A schematic showing the distribution of alleles (coloured dots) in the polar bear (light blue circle) and brown bear (light brown circle) gene pools has been included to ease interpretation. Blue dots represent the allele fixed in polar bears; brown dots are the alternative allele (with unknown ancestral state) in brown bears; white dots represent the ancestral allele (also found in giant panda). Asterisk indicates the seven genes that have sites putatively indicating de novo mutations: they are fixed in polar bears for the derived allele, and fixed in brown bears for the ancestral allele. Table S3). However, none of these sites showed consistent deleterious effects across all three analyses. All other amino acid changes were deemed either benign, neutral or tolerated. It should be noted that these amino acid changes may have a different influence on polar bear proteins compared to humans, and this should be taken into account in the interpretation of these results.

Discussion
To increase our understanding of how polar bears adapted to the Arctic, we investigated the origins of the genomic variants in eleven candidate genes previously found to show the strongest signals of positive selection in the polar bear lineage [1], nine of which have functions that could be linked to Arctic-specific adaptations, including adipose tissue development, fatty acid metabolism, heart function, and fur pigmentation (Table 1). We analysed a comprehensive panel of polar bear and brown bear nuclear genomes, and identified biallelic sites in which alternative alleles result in an amino acid change within these genes. We find the majority of sites fixed in polar bears are biallelic in brown bears (Table 1, Fig. 1). This result may reflect that natural selection more readily acted upon standing variation already in the ancestral polar/brown bear gene pool, allowing for more rapid adaptation compared to selection on de novo mutations. However, we also identify a number of sites in seven of the eleven genes analysed, in which polar bears are fixed for the derived allele and brown bears are fixed for the alternative, ancestral allele, relative to the outgroup giant panda. Although the absolute number of sites showing this pattern is relatively low (ranging from 1 to 12 sites in each gene), these findings suggest selection may also have acted on de novo mutations in the polar bear lineage.
However, our results may also reflect the relatively low genetic diversity in polar bears compared to brown bears ( Supplementary Figs. 1-12), which is due to the longterm, low effective population size of polar bears [1,2]. Such low diversity would have exacerbated the impact of In contrast, it is unlikely that de novo mutations reached fixation by genetic drift alone, even in the fairly homogenous polar bear gene pool. The initial low frequency of a de novo mutation would more likely lead to the mutation being purged from the gene pool rather than being fixed via genetic drift. Therefore, it is more probable that the de novo polar bear-specific substitutions reached fixation due to strong selective pressures, as opposed to simply reflecting the relatively limited genetic diversity of polar bears relative to brown bears. Moreover, at both variant types (standing variation and de novo), we find evidence for protein structure-altering substitutions in the polar bear lineage (Supplementary  Table S3), potentially leading to phenotypic change in the species. This adds weight to our hypothesis that selection acted on both standing variation and on de novo mutations. However, we cannot exclude that the fixation of some of these phenotypically relevant mutations may be due to genetic hitchhiking from one causative mutation in the gene. Regardless, the fact that putatively phenotypic changes occur at both variant types suggests natural selection acted upon both standing variation and de novo mutations.
The genes with the highest proportion of sites indicating de novo mutations were APOB and ABCC6 (with four and two such sites, respectively), which are associated with the cardiovascular system, and LYST and AIM1 (with seven and two such sites, respectively), which are associated with pigmentation ( Table 1).
The APOB gene codes for apolipoprotein B (apoB), the primary lipid-binding protein of chylomicrons and low-density lipoproteins (LDL), which enables the mobility of fat molecules around the body [12,13]. The ABCC6 gene encodes for a protein belonging to the superfamily of ATP-binding cassette (ABC) transporters and is involved in transporting various molecules across extra-and intra-cellular membranes. Diseases associated with ABCC6 include Pseudoxanthoma elasticum (PXE), a neurocutaneous disorder that affects the elastic tissue of the cardiovascular system, causes arterial calcification, and increases the risk of coronary artery disease [14][15][16]. Selection on the APOB and ABCC6 genes may have played a role in the novel adaptation of polar bears to a lipid-rich diet, and increased the efficacy of cholesterol clearance from the blood [1].
The LYST gene codes for the lysosomal trafficking regulator Lyst. Mutations in the LYST gene have been reported to cause hypopigmentation, a melanosome defect characterized by light coat color [17,18]. AIM1 is also associated with colouration, as variable expression of AIM1 has been associated with tumor suppression in human melanoma, influencing melanin pigment production [19]. Selection on LYST and AIM1 may have led to the lack of fur pigmentation in polar bears, resulting in the characteristic white phenotype of the species that may confer a selective advantage in the Arctic. Detrimental amino acid changes may have significantly hindered the function of these genes, resulting in the lack of pigmentation for natural selection to act upon.

Conclusion
Although genes involved in adaptation of polar bears to their Arctic lifestyle have previously been uncovered [1,2], a comprehensive assessment of whether selection on these genes acted upon standing variation or de novo mutations has been lacking. In the present study, through the analysis of a comprehensive data set of polar bear and brown bear genomes, we were able to address this question and provide new insights into the origins of variants found in genes under selection (did they derive from standing variation in the ancestral gene pool or de novo mutation?) that putatively enabled the rapid adaptation in polar bears to the Arctic.

Population structure and admixture
To investigate whether admixture or incomplete lineage sorting (ILS) may be present between polar bears and brown bears at the genes of interest, we performed independent principal component analyses (PCAs) for each gene including the 50 kb regions up and downstream of the gene. For this, we included all polar and brown bear individuals. We used a genotype likelihood approach to construct the PCAs: input genotype likelihood files were constructed using ANGSD v0.929 [29], with the SAMtools genotype likelihood algorithm (−GL 1), and specifying the following parameters: remove reads that have multiple mapping best hits (−unique_only), remove reads with a flag above 255/secondary hits (−remove_ bads), include only read pairs with both mates mapping correctly (−only_proper_pairs), adjust mapQ for reads with excessive mismatches (−C 50), adjust quality scores around indels (−baq 1), a minimum mapping quality of 20 (−minMapQ 20), a minimum base quality of 20 (−minQ 20), discard sites where there is no data on at least 95% of the individuals (−minInd), skip tri-allelic sites (−skipTriallelic), and remove SNP sites with a pvalue larger than 1e − 6 (−SNP_pval 1e-6). The ANGSD output beagle file was run through PCAngsd v0.95 [30].

Gene investigation
We analysed twelve of the genes previously found to show the strongest signal of positive selection in the polar bear [1]. These included ABCC6, AIM1, APOB, COL5A3, CUL7, EHD3, FCGBP, LAMC3, LYST, POLR1A, TTN, and XIRP1. The phenotypes putatively associated with these genes can be found in Table 1 and  supplementary Table S1.
Genotypes were called using ANGSD, specifying the same parameters as the PCA analyses with the additional parameters: write major and minor alleles and the genotype directly (−doGeno 5), estimate the posterior genotype probability based on the allele frequency as a prior (−doPost 1), use the reference allele as the major allele (−doMajorMinor 4), output as beagle likelihood file (−doGlf 2), and calculate allele frequencies assuming a fixed major allele and an unknown minor allele (−doMaf 2). In order to decrease biases that could arise when calling heterozygous alleles from the low coverage genomes, we only called genotypes from individuals that had at least 4x coverage at the site of interest (−geno_minDepth 4). For the investigation into which alleles represent the ancestral variant, we downloaded each of the relevant giant panda gene transcript sequences from Genbank (Supplementary Table S5). Although assembled genome data is available from Ursidae species more closely related to the polar bear and brown bear (e.g. black bear [31]), we specifically avoided the use of other Ursine bears to determine the ancestral state, as introgressive gene flow has been identified across nearly all species of Ursine bears [32] which may lead to the misidentification of the ancestral variant. We additionally included genomic data from a~110 k year old polar bear sample from Svalbard [3] (NCBI Bioproject ID: PRJNA169236) in an attempt to understand when the variants may have arisen. However, due to the very low coverage (~0.34x) manner of the Poolepynten bear, we did not recover any sites with enough data to incorporate into our analyses.
We performed predictions of the effects of amino acid changes found in putative de novo mutations on the polar bear lineage on the function of homologous human proteins using Polyphen-2 [33], SIFT [34], and PROVEAN [35]. For this analysis we downloaded the relevant human gene transcript sequences from Genbank (Supplementary Table S5). This was done by aligning the human and polar bear protein genes and selecting only the positions that were fixed for the derived allele in polar bears and fixed for the ancestral allele (shared allele with the giant panda) in brown bears. Positions and amino acid changes were submitted to Polyphen-2 batch web service using HumDiv and Hum-Var model classifiers as well as to PROVEAN human protein batch web service for the PROVEAN and SIFT analyses.
Additional file 1: Supplementary Table S1. Phenotypes associated with the genes of interest. Phenotype information taken from GeneCards (genecards.org). * UniProtKB/Swiss-Prot summary as the associated phenotypes are not available on Genecards. Supplementary Table S5: Genbank accession codes for the polar bear, giant panda (annotation version 102*), and human transcript sequences used in the study. *Available from: https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_ releases/9646/102/. Supplementary Fig. 1: Principal component analysis of ABCC6 and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 2: Principal component analysis of AIM1 and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 3: Principal component analysis of APOB and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 4: Principal component analysis of COL5A and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 5: Principal component analysis of CUL7 and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 6: Principal component analysis of EHD3 and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 7: Principal component analysis of FCGBP and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 8: Principal component analysis of LAMC3 and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 9: Principal component analysis of LYST and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 10: Principal component analysis of POLR1A and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 11: Principal component analysis of TTN and the 50 kb flanking regions using all individuals included in this study. Supplementary Fig. 12: Principal component analysis of XIRP1 and the 50 kb flanking regions using all individuals included in this study.
Additional file 2: Supplementary Table S2. Table including all relevant information at the genes of interest, including nucleotide information, gene location, amino acid information, and allele distributions of the non-synonymous differences between polar bears and brown bears.
Additional file 3: Supplementary Table S3. Table including all relevant information at the genes of interest, including nucleotide information, gene location, amino acid information, allele distributions, and predictions of amino acid changes in the sites in which the alternative allele is also found in the giant panda. Red coloured cells indicate sites in which amino acid changes may have a functional influence according to at least one of the three tests: Polyphen-2, SIFT, or PROVEAN. D/D represents homozygous derived, D/A represents heterozygous derived and ancestral, A/A represents homozygous ancestral. Table S4. Polar bear and brown bear sample locality and mapping information.