TASRs and the components of the AR circuitry are key genes in keeping body homeostasis as they recognize chemical molecules that could be both sources of energy or threatening toxins and promote an adequate response. With the aim to characterize the coding genetic variation affecting swine TASRs and the AR circuitry genes, we sequenced 16 gDNA pools corresponding to 304 pigs from ten breeds and European wild boar and from two pools of an experimental F2 population with records on growth and retroperitoneal fat content. We have mapped thousands of coding region genetic variants, hundreds of which are expected to have a strong impact on protein sequence, some of which are breed specific. By comparing the pAAFs of these variants in two F2 pools divergent for growth and fat deposition, we also identified many genotype - phenotype relationships. Our data provides detailed information of the genetic variation present in TASRs and AR genes. We also developed an assay to genotype 128 of the most functionally relevant variants which is available to perform association studies with relevant traits in pig populations.
Technical considerations
Differences in DNA extraction methods and inaccuracies with respect to quantification might have a negative impact on the even distribution of sequencing reads among the genomes within a pool thereby reducing the accuracy of the pAAF/pMAFs as a proxy of the real allelic frequency. Nonetheless, the comparison of predicted versus observed allele frequencies in the 57 genotyped variants showed that pAAF/pMAFs were very good predictors and supply additional confidence of the quality of our results (Fig. 3).
To the best of our knowledge, this is the first published study from a targeted genome enrichment experiment in swine. As this approach reduces the sequencing throughput requirements, we were able to sequence the target sub-genome (the exome of 201 TASRs and AR genes) in the largest number of pigs (n = 304) reported to date, in a single experiment. By mapping nearly 163 million reads on the target exons, we reached an average depth of sequencing of 72× for each of the haploid genomes. This read depth allowed us to detect all variants present in the pools regardless of their frequency. Given that we sequenced 304 animals harbouring in total 608 chromosome sets (2 alleles each), we were able to detect rare variants with MAF ≥ 0.0016. No previous high throughput sequencing study in pigs reached this sensitivity to detect rare variants. We identified a large number of variant events, most mapping to exon flanking regions including promoters, introns, and upstream and downstream segments. Although some of these non-coding variants could be functional, their regulatory relevance is difficult to predict and was not the subject of our study, which focused mainly, on the detection of protein-damaging coding variation.
Variant identification
Overall, we have identified 2,793 coding variants but 27 of these are fixed in our sample set and are thus likely to be either errors in or private to the reference sequence. We annotated 217 variants segregating in the 10 canonical swine TASRs in our samples. Da Silva et al. [18] described 279 coding variants in their list of 21 taste and nutrient receptor genes after sequencing 79 pigs. However, Da Silva and co-authors included 8 genes that we did not study. These are 5 likely-paralogs of canonical TASRs and 3 genes that are not canonical TASRs but that are related to tasting fat (GPR41, and GPR84) and amino acids (GPRC6A). On the contrary, we included 2 canonical TASRs (TAS2R4 and TAS2R40) not studied by Da Silva et al. Moreover, 4 we classified as AR genes (GPR40, GPR120, CASR, GRM1 and GRM4) that are shared in both studies as they are not canonical TASRs or are receptors belonging to the glutamate pathway. 138 of the 279 variants, 71 in canonical TASR and 67 in the other shared genes, are common to both studies. The difference in the catalogue of identified variants is likely to be mostly due to the fact that Da Silva et al. studied a different set of animals, some belonging to breeds we did not target (Hampshire, creole, Brazilian and Tamworth). 395 of the 2,766 “segregating” variants are predicted to have a high impact (H and Mdel) on the protein sequence of 133 genes and are thus very likely to disrupt or strongly alter their function (Table 1 and Additional file 2). Some of these variants showed allelic frequency relationship with daily gain and retro-peritoneal fat deposition in the F2 resource with phenotypic records (Table 5 and Additional file 4). In keeping with previous findings in human [13], we also detected that H and M variants tend to be more abundant and have higher pAAFs in TASRs than in non-sensory genes (Tables 1 and 2) which indicates that TASRs are subject to balancing selection. TASRs are comparable to the major histocompatibility complex genes (HLA in humans and SLA in pigs) as both are expressed at the surface of cells to detect particular molecules that could be hazardous for the body. Whilst HLA and SLA detect antigens to promote an immune reaction, TASRs sense chemical compounds to stimulate appropriate responses (reward, acrosomal reaction, smooth muscle contraction or immune function among others) depending on the cell type that is involved. Thus, a healthy animal population needs to be highly polymorphic in these genes in order to maximize its adaptability and survival to variable environments facing multiple threads. As expected, H and Mdel variants had on average, lower pMAFs than Mtol or L variants (Table 2), as the former are more likely to be deleterious and consequently, subject to purifying selection. This is well exemplified by the recent exome sequencing experiments that have successfully identified rare deleterious variants causing rare and severe mendelian disorders in humans [14]. Indeed, H and Mdel variants are more abundant and frequent at the 3′-end of the protein, where they are less likely to have an impact of the function of the protein (Fig. 1).
H rare variants
Remarkably, we found 32 rare H variants which mapped to TAS1R1, TAS1R3 and 20 AR genes (Table 3). These mutations are predicted to fully disrupt the function of the affected gene and might thus have an impact on taste and AR perceptions. Eight of the 20 AR genes carrying rare H variants, have been directly associated to taste, food intake, body size, diabetes or triglyceride levels. Two of them, GRM4 and GABRR1, are glutamate receptors that have been associated to body size [21] and feeding behaviour [22], respectively. Natural knock-out pigs, (i.e., pigs homozygous or compound heterozygous for rare H variants) in these genes may have severe consequences in taste perception and feeding attitude. Reverse genetics experiments to study the phenotypic changes that occur in such natural knock-outs would be very informative to understand the importance of these mutations on feeding behaviour and on a broad range of other traits of relevance in animal breeding and bio-medical sciences. However, the identification of such homozygotes may require the screening of thousands of pigs. A convenient alternative would be to identify heterozygous animals and cross them to generate these natural knock-outs.
Breed particularities
We found several variants with breed particularities that could both explain in part the population history of these breeds and be the result of genetic adaptation to the particular environment or artificial selection. Overall, the pools of commercial breeds (Large White, Landrace, Pietrain, Duroc) contained more variants than the pools of the ancient/traditional breeds (Iberian, Majorcan Black, Mangalitza, Bazna). The sequencing of the commercial breeds involved a larger number of samples and consequently, more genetic variation could be captured. This difference is possibly due also to first, population bottlenecks and a low effective population size within ancient breeds and second, to the introgression of Asian germplasm in the European commercial breeds [23, 24] (Table 4). The Duroc pool is somehow in between European commercial and traditional breeds, with a relative low number of variants (Table 4). However, this was expected as the pig reference genome sequence was obtained from a Duroc animal and is therefore expected to share more similarities with our Duroc pool. On the other hand, the Asian breeds showed the largest number of variants (Table 4) which was also expected as our pool was made of two different swine populations (15 Meishan and 7 Vietnamese pigs) and also, these animals diverged from the European counterparts around one million years ago [25]. It is worth pointing out that as we merged the Meishan and the Vietnamese pigs, we cannot have specific information for these populations. Although the comparison of both Asian breeds would have been interesting, our aim was not to study the genetic variation within these breeds but to identify a large catalogue of variants by sequencing a set of divergent populations. For this reason, and to optimise the high throughput sequencing resources, we merged all Asian animals.
Although we tried to minimise co-ancestry between samples, we do not have pedigree data and hence, we cannot exclude the possibility that the trend on the reduction in the genetic diversity observed in some breeds is due to close familiar relationships. This is particularly true for our Mangalitza samples, which come from a highly inbred population from Romania and might thus not represent the genetic diversity existing within the Mangalitza breed as a whole. To better determine genetic diversity within each breed, we compared the genetic variability in the synonymous sites, which are considered to be neutral in evolutionary terms, in our TASR and AR genes using the method described by Watterson. This method corrects the number of variants by the number of individuals that were analysed [20]. We further corrected these values by the expected number of silent sites in the sequenced cds. Genetic variability among western breeds is highly similar (0.00090–0.00131 variants per mutant site) being lowest in Duroc and highest in Pietrain. The genetic variability seems to be higher in the Asian pool (0.00182).
We also compared the relative abundance of H + Mdel variants in each breed. In contrast to the recent results by Bianco and co-authors [26] who detected a higher ratio of deleterious variants in Western than in Asian breeds, we did not identify large differences across pools. This inconsistency could be explained by the fact that we interrogated a particular set of genes whilst Bianco et al. assessed all annotated genes in the pig genome. Also, both studies screened different animals from distinct breeds. Furthermore, the two studies used different sequencing strategies. Whilst we deep-sequenced 304 pigs in a pool-based strategy, Bianco et al. performed low depth whole-genome shotgun sequencing of 128 pigs. Our approach is better suited for the identification of rare variants and this difference could have altered the catalogue of variants identified in both studies. Therefore, the comparison of the two datasets needs to be taken with caution. Of note, we observed that 25 % of the 28 variants identified in TAS2R1 were wild boar specific (Additional file 3). This could indicate a particular haplotype that might have been lost in the European domestic breeds by artificial selection. Alternatively, as our wild boar gDNA pool was made with samples from three different European locations (Catalonia, Belgium, Romania), these variants, which have a relatively low pAAF [0.16–0.18], could well belong to one of these populations with no germplasm contributed to the analyzed domestic breeds. Noteworthy, some of these TAS2R1 variants are predicted to have a strong effect on the protein and could thus indicate adaptive selection to particular foods or environments (Additional file 3). Especially relevant are the breed-specific H variants with relatively high frequency in the affected breed (pAAF > 0.1) (Additional file 3). The stop codon that prematurely truncates TAS1R1 right in the middle of the protein in 17 % of the Mangalitza genomes might impair the ability to sense umami and might thus affect food preferences in a similar way as described in giant pandas, which lack a functional TAS1R1 potentially disrupting preferences for protein-rich sources [27]. Furthermore, two H variants in HTR3C, and CYP2A6 are unique and almost fixed in the Asian breeds (Additional file 3). As the pAAF between Asian and European pools are so dramatically different, we hypothesize that this might reflect an AR adaptation to very different environments.
Variant and phenotype relationships
In our study, we aimed not only at identifying damaging variants but also at checking whether we were able to detect potential relationships with production traits. This comparison was done with only 38 pigs, which would typically be a very small number of animals and as such, the power to detect genetic associations is low. Thus, it does not aim at finding statistically significant genetic associations but at detecting a trend that could indicate this association. Indeed, we identified several allele - phenotype relationships that could indicate real genetic associations (Table 5). We believe that comparing the pAAFs of the two F2 pools was a good strategy for the identification of allele - phenotype relationships, as these animals share a common genetic background and the only criteria used to make the pools, was their extreme and opposed phenotypic characteristics. Four of the 11 (36 %) TAS2R4 variants segregating in our F2 resource showed significant differences at the nominal level in pAAF between the obese and fast growing F2_F pool and the lean slow growing F2_L pool (Table 5). As this is a large percentage, we believe that this could be a real association and that a polymorphism, perhaps a regulatory variant not assessed here, is in part responsible of the phenotypic differences between the divergent groups of pigs. Five of the six genes with significant allelic differences at the nominal level (all with p-val ≤ 0.01) between the F2 pools have been directly linked to both taste and growth (umami taste and GRM1 [28], fat taste and CD36 [29], taste in general and P2RX2/P2RX7 [30] and weight gain and GABRA6 [31]) (Table 5) and could thus indicate a difference in the eating behaviour. This would explain the difference in growth and fat deposition between the F2 groups. Remarkably, three variants in CD36, a gene associated to fat taste, had significant pAAF differences between the high and the low fat deposition pigs after correction for multiple testing (Table 5).
Two of the markers, chr2_3566521_A_C and chr18_7019623_G_A, in the ALDH3B1 ortholog ENSSSCG00000026349 and in TAS2R41, respectively, were also genotyped in the F2 animals with TaqMan probes using the OpenArray technology. Only chr2_3566521_A_C was nominally significant for daily gain (p-val = 0.025), but did not reach the significance threshold after multiple testing (Additional file 7). chr18_7019623_G_A was not significant, but a marker 236 bp upstream from this SNP, chr18_7019387_A_T, was nominally significant (p-val = 0.044) for retroperitoneal fat content (Additional file 7). We seek to screen larger populations with phenotypic data to determine whether these associations could become significant.
Selection and genotype-based validation of SNPs
We also developed a genotyping array to validate the variants with highest damaging potential regardless of their frequency. We believe that a proportion of the 59 non-polymorphic positions might be real low-frequency polymorphisms that were present only in animals that we did not genotype due to the lack of available gDNA. However, we cannot rule out the possibility that some of these variants could simply be false positives caused by the erroneous mapping of some reads to highly similar regions. For example, none of the 17 frame-shift variants we included could be confirmed as a polymorphism but the high false positive rate among indels in high throughput sequencing experiments is well documented. Most of the H variants failed to proof polymorphic but they had very low pMAFs whilst most of the M variants, which on average showed higher pMAFs, were confirmed (Additional file 6). None of the rare H polymorphisms displayed the homozygous state for the minor allele in the genotyped samples. However, we identified two H variants that approached the rare MAF threshold, which showed the three genotypic classes in our animals (Additional file 6). The Vietnamese rare variant in HTR1A is a stop codon that was homozygous in two Vietnamese pigs. Nonetheless, we cannot consider these two animals as natural knockouts as this mutation is located at the very last amino acid of the canonical HTR1A protein, and it is not expected to have an impact on HTR1A function. Therefore, the low frequency of the minor allele might not reflect the existence of purifying selection forces. The Meishan H variant in GABRG1 is also an introduction of a stop codon, but this is located in the middle of the gene and is thus likely to disrupt its function. This gene increases neuronal activity and is associated to eating disorders and anxiety [32] and even with alcohol dependence in humans. Thus, natural GABRG1 knock-out pigs carrying this stop codon may show both signs of anxiety and a particular eating behaviour.
We have developed the first assay for massive genotyping of variants with high functional potential in swine TASR and AR genes, and we now seek to perform association analysis on pig populations with phenotypic records for eating attitude, feed intake, growth, obesity, but also on semen quality and fertility, infection and immunity and behaviour abnormalities such as stress-related stereotypies and tail biting. This approach will help us understanding the impact of genetic variation in TASR and AR genes on traits of interest for the pig breeding industry. This genotyping assay will also allow us to identify natural knock-outs that could then be used in reverse genetics studies. Nonetheless, we acknowledge that this array tags a very small proportion (eight TASRs and 23 reward) of the genes involved in taste, appetite and reward. Ideally, the list of variants and genes should be extended to achieve a comprehensive analysis of the impact of these gene pathways in pig breeding.