Identification of protein-damaging mutations in 10 swine taste receptors and 191 appetite-reward genes

Background Taste receptors (TASRs) are essential for the body’s recognition of chemical compounds. In the tongue, TASRs sense the sweet and umami and the toxin-related bitter taste thus promoting a particular eating behaviour. Moreover, their relevance in other organs is now becoming evident. In the intestine, they regulate nutrient absorption and gut motility. Upon ligand binding, TASRs activate the appetite-reward circuitry to signal the nervous system and keep body homeostasis. With the aim to identify genetic variation in the swine TASRs and in the genes from the appetite and the reward pathways, we have sequenced the exons of 201 TASRs and appetite-reward genes from 304 pigs belonging to ten breeds, wild boars and to two phenotypically extreme groups from a F2 resource with data on growth and fat deposition. Results We identified 2,766 coding variants 395 of which were predicted to have a strong impact on protein sequence and function. 334 variants were present in only one breed and at predicted alternative allele frequency (pAAF) ≥ 0.1. The Asian pigs and the wild boars showed the largest proportion of breed specific variants. We also compared the pAAF of the two F2 groups and found that variants in TAS2R39 and CD36 display significant differences suggesting that these genes could influence growth and fat deposition. We developed a 128-variant genotyping assay and confirmed 57 of these variants. Conclusions We have identified thousands of variants affecting TASRs as well as genes involved in the appetite and the reward mechanisms. Some of these genes have been already associated to taste preferences, appetite or behaviour in humans and mouse. We have also detected indications of a potential relationship of some of these genes with growth and fat deposition, which could have been caused by changes in taste preferences, appetite or reward and ultimately impact on food intake. A genotyping array with 57 variants in 31 of these genes is now available for genotyping and start elucidating the impact of genetic variation in these genes on pig biology and breeding. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2972-z) contains supplementary material, which is available to authorized users.


Background
There are five canonical tastes that are sensed in the taste buds of the tongue, which include salty, sour, sweet, umami and bitter. While salty and sour are detected by ion channels, the other three are sensed by a group of G-protein coupled receptors called taste receptors (TASRs). Sweet and umami are appetizing tastes that characterize energy-rich food sources, namely sugar and amino acid molecules, respectively, and are sensed by the TAS1R sub-type TAS1R1, TAS1R2 and TAS1R3 [1]. On the other hand, theunpleasant -bitter taste indicating the presence of toxic molecules, is sensed by TAS2Rs, also known as bitter taste receptors [1], which include a variable list of highly polymorphic genes with many species-specific orthologs. The annotation of the pig genome contains ten TAS2Rs according to the Ensembl database (www.ensembl.org). In the recent years, it has become obvious that TASRs are expressed in many other tissues and have additional chemosensing functions. For example, they are present in the respiratory system where they regulate innate immunity and infection [2], and in sperm they have been linked to motility and acrosomal reaction [3]. In the gastrointestinal tract, TASRs detect the molecules that are on transit and stimulate the appetite and reward (AR) circuitries to promote the appropriate feeding behaviour, thus keeping energy balance and body homeostasis [4,5]. The AR mechanisms are highly interconnected and involve complex networks containing nutrients, neuropeptides, neurotransmitters, hormones and their related receptors and enzymes. These pathways engage the gastrointestinal tract, pancreas, liver, muscle, adipose tissue and brain. Appetite-related genes such as leptin (LEP), leptin receptor (LEPR), cholecystokinin (CCK), Ghrelin (GHRL), Agouti-related protein (AgRP), neuropeptide Y (NPY), proopiomelanocortin (POMC) and melanocortin 4 receptor (MC4R) encode for products that inhibit or excite the dopamine, epinephrine, norepinephrine, serotonin, and glutamate receptor pathways [6,7] and modulate food intake and energy balance. In a nutshell, ghrelin and LEP are two hormones with opposite excitatory and inhibitory effects on the same neurons secreting appetite inducing NPY and AgRP or the feeding inhibitor POMC. These neuropeptides in turn, inactivate or excite MC4R, which is a hunger repressor. Ghrelin is secreted by the stomach when it is empty and LEP is released by adipocytes as a response to high energy stores [6]. CCK is a hormone secreted in the duodenum as a response to luminal fat and protein and is a strong inhibitor of food intake probably by decreasing gastric emptying and stimulating the vagus nerve [8]. LEP and ghrelin also inhibit and excite dopamine secretion, respectively [9]. Dopamine signalling in certain parts of the brain promotes appetite. Ghrelin secretion during fasting also promotes glutamate release. This neurotransmitter, via a large catalogue of receptors, will also excite NPY, AgRP and inhibit POMC neurons and boost appetite. The catabolic product of glutamate, gamma aminobutyric acid (GABA) also boosts appetite but using different neuronal mechanisms [10]. Moreover, glutamate is also able to excite dopamine-secreting neurons in appetite-relevant areas of the brain thereby indirectly promoting the feeling of hunger [9]. In contrast, GABA can inhibit the same neurons and indirectly promote satiety [9]. Serotonin, a neurotransmitter that is mostly present in the gastrointestinal tract but also at much lower levels in the central nervous system, modulates gastrointestinal motility, mood and appetite [11]. Serotonin inhibits appetite by stimulating its receptors HTR2C and HTR1B, which in turn, activate the wellknown appetite-inhibitors POMC and MC4R and inhibit the appetite-promoter genes NPY and AGRP. Epinephrine/norepinephrine are two additional neurotransmitters that seem to be key in food intake and in keeping energy balance and have been shown to respond to starvation and low glucose levels in blood by activating the secretion of ghrelin [12].
In humans, the recent advent of whole genome [13] and exome [14] sequencing has shown that mutations severely impacting on protein sequence are more abundant than previously thought, although due to purifying selection, they tend to have very low allele frequencies.
Thus, protein-damaging polymorphisms in TASRs and AR genes are likely to have an important impact on a broad range of traits including feed intake, immune function, behaviour or fertility both in livestock and humans. Consequently, understanding how these variants affect phenotypes in farm animals may both help improving the sustainability of the animal breeding sector as well as benefit bio-medical research. Scientific interest in this gene family in the pig is now emerging and it has recently been shown that TASRs are expressed in multiple porcine systems (immune, gastro intestinal, spermatogenic, etc. [15,16]). The recent publication of the swine genome sequence and annotation [17] and the development of genome capture assays open unprecedented possibilities to identify deleterious genetic variation. For instance, 295 coding variants in swine TASRs have recently been identified after analyzing the low coverage whole genome sequences of 79 domestic and wild pigs from Europe and Asia [18].
Motivated by their functional relevance in the pig, we have sequenced ten porcine canonical TASRs and 191 AR genes in 304 pigs from multiple breeds in 16 DNA pools with the aim to identify coding polymorphisms and to provide a catalogue of potentially deleterious mutations likely to affect the function of these genes. Moreover, we have also detected indications that some of these variants might be associated with growth and fatness giving new insights in regard to the potential phenotypic relevance of polymorphisms within these genes.

Sequencing statistics
We initially selected 459 kb of target genomic DNA (gDNA) covering the exons from the 12 TASRs and 201 AR genes. After genome capturing, sequencing and read mapping, we successfully covered 372 kb of target sequence with a read depth at each nucleotide position (DP) above 1,000 in the 16 libraries as a whole. This corresponds to 81 % of the initial target size and 201 genes fully or partially sequenced to a DP > 1,000. The poorly sequenced genes include TAS2R3, ENSSSCG 000000029894 (a swine ortholog of human TAS2R16), and 10 AR genes. The list of successfully sequenced genes is shown in Additional file 1. After genome capture, sequencing and read mapping, 162,848,637 reads mapped to the target gDNA regions.

Variant identification in TASR and AR genes
We successfully sequenced (DP ≥ 1,000) 14,598 bp (94.5 % of the initial selection) of TASR exons and identified 219 coding variants in TASR exons, 113 of which (52.1 %) do not have a dbSNP identifier and are thus considered novel (Additional file 2). Two TASR variants had the alternative allele fixed in the 16 pools and are thus likely to be either errors or private variants in the Duroc animal used to generate the reference sequence. We excluded them from our list of putative polymorphisms. Ten of the 217 remaining variants were classified by snpEff to have a high impact (H) on the coding sequence and consequently, on the function of the gene according to the gene annotation in the swine genome (Table 1). These include four single nucleotide variants (SNVs) and six short indels, which cause three stop-codon gains, one stop-codon loss, five frame-shift, and one novel splice-site donor. These variants affect four TASRs with a clear over-representation in TAS1R1 (Table 1). Three of these variants affecting TAS1R1 and TAS1R3 had a predicted minor allele frequency (pMAF) ≤ 0.01 (Tables 2 and 3). In addition, we identified 125 non-synonymous-coding variants and one codon-deletion, which are classified by snpEff as having a moderate (M) impact. Thirty-four of the nonsynonymous changes were predicted to be deleterious (Mdel) by SIFT [19] (Table 1). The remaining M variants were either SIFT predicted as tolerated (Mtol) or did not yield any prediction. Hence, we have identified 44 variants (10 H and 34 Mdel) that are likely to have an important effect on swine TASR function. Remarkably, all TASRs showed H or Mdel variants (Additional file 2). Finally, 81 variants were predicted to be synonymous changes with no apparent impact (L) on the subjacent proteins (Table 1). On average, the variants with strong impact on protein sequence (H and Mdel) were predicted to be rarer in the species than those having a mild impact (Mtol and L) ( Table 2) according to pMAF.
Likewise, we sequenced 357,844 (80.1 % of the initial selection) bp of exonic sequence from AR genes at a DP > 1,000, and identified 2,570 variant positions in AR exons, most of which were bi-allelic SNVs. 1,092 (42.4 %) variants were not annotated in dbSNP (Additional file 2). Four of these positions were multiallelic with each allele predicted to have a different effect on protein sequence. Of the 2,574 assigned variant effects, 25 had the alternative allele fixed in the 16 pools. From the remaining 2,549 variants, 350 were classified as deleterious (H and Mdel) and affected 123 genes (Table 1 and Additional file 2). As for TASRs, H and Mdel as a whole tended to have lower pMAF than Mtol and L ( Table 2). We identified 74 H variants in AR genes, 29 of which had pMAFs ≤ 0.01 and are thus considered rare. These H rare variants affected 21 genes (Table 3).
We also plotted the distribution of the AR gene variants predicted to have a strong impact on protein (H and Mdel) and those with a mild impact (Mtol and L) along the protein sequence divided in 10 consecutive position bins of equal amino acid length. We noted that the strong impact variants tended to be more abundant at the end of the protein (bin9 + bin10), and that these also tended to have, on average, larger pMAF. This trend was not observed in the set of mild impact variants ( Fig. 1).

Per breed variant distribution
We compared the ten purebred pools and observed that, as expected, larger pools contained more variants (both in TASR and AR genes). Nonetheless, the two Duroc pools together, with 45 samples, displayed less genetic diversity (1,042 variants), than the Large White, Landrace and the Pietrain, which had a similar number of animals and were above 1,300 variants each ( We also compared the percentage of variants that were H or Mdel per breed. No obvious differences were observed (p-val = 0.07) and the Duroc and Landrace were at the top (12.7 %) and bottom (9.4 %) ends, respectively (Table 4).
Altogether, 26 TASR variants were present in a single breed at pAAF ≥ 0.1 and might thus be breed-specific. Not surprisingly, the Asian pool, which involved 15 Chinese Meishan and 7 Vietnamese pigs, showed the highest proportion of pool-specific variants, with 12 being present only in this group (Additional file 3). The wild boar also displayed several allelic particularities, with nine unique variants. Noteworthy, seven of these wild boar variants, all with similar pAAF, mapped to TAS2R1 (Additional file 3). Overall, five (one H and four Mdel) breed-specific variants were predicted to have a strong impact on protein sequence (Additional file 3). The H variant is a stop gain in TAS1R1 that is present in 17 % of the Mangalitza genomes, respectively. We also identified two variants, a synonymous (L) and a nonsynonymous tolerated (Mtol) SNVs that affect TAS2R1, that whilst being present at very high frequencies or even fixed in all breeds (pAAF ≥ 0.5), were absent in the Asian pool (Additional file 3).
We detected 306 AR coding variants that were uniquely present or absent in one breed (Additional file 3). Of these, 35 variants involving 32 genes were predicted to be of functional importance (Additional file 3). As in TASRs, the breed specific H and Mdel variants in the AR genes were more abundant in the Asian and the wild boar with 20 and five unique features, respectively. It was noteworthy that two of these H and Mdel variants, both in the Asian pools, were close to fixation (pAAF ≥ 0.9). These variants map to the serotonin receptor, HTR3C, and to the cytochrome P450 gene, CYP2A6 (Additional file 3).
We performed hierarchical clustering using an Unweighted Pair Group Method with Arithmetic Mean (UPGMA) based on the 2,523 (217 TASR and 2,306 AR) variants that were present in at least one of the ten The percentages are for the total number of variants within the groups: TASR strong impact, TASR mild impact, AR strong impact and AR mild impact breeds and with pAAF information available in all these populations. One-hundred variants were excluded from the purebred comparison since they were unique to the F 2 animals. The resulting phylogenetic dendrogram is in line with what has been shown in other studies (Fig. 2). Briefly, the Western (European and USA) breeds cluster together and the Asian pool form a separate branch.
Within the Western cluster, the Duroc is the only member of a distant branch whilst the European commercial breeds (Large White, Landrace and Pietrain) belong to another sub-group and the more ancient breeds Iberian, Majorcan Black, and Mangalitza cluster together with the wild boar.

pAAF and phenotype relationships in the F 2 groups
We also wanted to see whether we were able to detect an indication of an effect of the variants on production traits. This was investigated by comparing the pAAFs of two F 2 pools from the same experimental population, each belonging to one of the tails of the phenotypic distribution, for average daily gain and retroperitoneal fat content. Out of the 217 TASR coding variants, 97 segregated in the F 2 resource, and within these, eight displayed significantly different pAAFs in five TASRs (Table 5 and Additional file 4). Remarkably, four TAS2R4 variants showed significant differences between the pools. Likewise, we could compare 1,280 variants in The variant identifier (ID) contains information on chromosome _ position _ reference allele _ alternative allele AR genes segregating in this population and identified 56 significant differences (p-val ≤ 0.05) involving 25 genes (Table 5 and Additional file 4). After correcting for multiple testing (p-val ≤ 0.05/(97 + 1,280) = 0.00003), one M variant in TAS2R39 and three variants in CD36 remained significant (Table 5).

Selection and genotype-based validation of a set of protein-damaging variants
With the double aim to validate the subset of variants with more likely damaging impact and, to evaluate by genetic association, their impact in pig breeding, we designed a TaqMan genotyping assay targeting 128 putative polymorphisms. We originally aimed at including all the H variants found in the study and to fill the remaining of the assay with several M variants to cover all TASRs and several AR genes. However, the assay had design requirements that not all the variants fulfilled.    We next compared the predicted and the matched genotype-observed frequency of the minor allele and detected a strong correlation (r 2 = 0.93) between the two (Fig. 3).
We also carried a genetic association analysis with the genotypes and phenotypic values for retroperitoneal fat and daily gain in the F 2 . 36 of the 57 variants segregated in the F 2 animals. The array contains two of the variants (chr2_3566521_A_C and chr18_7019623_G_A) that showed significant pAAF differences between the F 2 groups with the Fisher test. For daily gain, only the marker chr2_3566521_A_C was nominally significant (pval = 0.025) (Additional file 7) but it did not reach significance after Bonferroni correction for multiple testing (p-val ≤ 0.05/36 = 0.0013). For retroperitoneal fat, two markers, chr9_46397500_A_G and chr18_7019387_A_T, were nominally significant (p-val = 0.028 and 0.044, respectively) but again, none reached statistical significance after Bonferroni correction (Additional file 7). Chr18_7019623_G_A, which was significant at the pAAF comparison between both F 2 groups, did not reach significance for any of the two traits (pval = 0.09 and 0.1 for daily gain and retroperitoneal fat, respectively).

Discussion
TASRs and the components of the AR circuitry are key genes in keeping body homeostasis as they recognize chemical molecules that could be both sources of energy or threatening toxins and promote an adequate response. With the aim to characterize the coding genetic variation affecting swine TASRs and the AR circuitry genes, we sequenced 16 gDNA pools corresponding to 304 pigs from ten breeds and European wild boar and from two pools of an experimental F 2 population with records on growth and retroperitoneal fat content. We have mapped thousands of coding region genetic variants, hundreds of which are expected to have a strong impact on protein sequence, some of which are breed specific. By comparing the pAAFs of these variants in two F 2 pools divergent for growth and fat deposition, we also identified many genotype -phenotype relationships. Our data provides detailed information of the genetic variation present in TASRs and AR genes. We also developed an assay to genotype 128 of the most functionally relevant variants which is available to perform association studies with relevant traits in pig populations.

Technical considerations
Differences in DNA extraction methods and inaccuracies with respect to quantification might have a negative impact on the even distribution of sequencing reads among the genomes within a pool thereby reducing the accuracy of the pAAF/pMAFs as a proxy of the real allelic frequency. Nonetheless, the comparison of predicted versus Fig. 2 Phylogenetic dendrogram with the 10 breeds. The numbers indicate the support for each node according to 1,000 bootstrap iterations observed allele frequencies in the 57 genotyped variants showed that pAAF/pMAFs were very good predictors and supply additional confidence of the quality of our results (Fig. 3).
To the best of our knowledge, this is the first published study from a targeted genome enrichment experiment in swine. As this approach reduces the sequencing throughput requirements, we were able to sequence the target sub-genome (the exome of 201 TASRs and AR genes) in the largest number of pigs (n = 304) reported to date, in a single experiment. By mapping nearly 163 million reads on the target exons, we reached an average depth of sequencing of 72× for each of the haploid genomes. This read depth allowed us to detect all variants present in the pools regardless of their frequency. Given that we sequenced 304 animals harbouring in total 608 chromosome sets (2 alleles each), we were able to detect rare variants with MAF ≥ 0.0016. No previous high throughput sequencing study in pigs reached this sensitivity to detect rare variants. We identified a large number of variant events, most mapping to exon flanking regions including promoters, introns, and upstream and downstream segments. Although some of these noncoding variants could be functional, their regulatory relevance is difficult to predict and was not the subject of our study, which focused mainly, on the detection of protein-damaging coding variation.

Variant identification
Overall, we have identified 2,793 coding variants but 27 of these are fixed in our sample set and are thus likely to be either errors in or private to the reference sequence. We annotated 217 variants segregating in the 10 canonical swine TASRs in our samples. Da Silva et al. [18] described 279 coding variants in their list of 21 taste and nutrient receptor genes after sequencing 79 pigs. However, Da Silva and co-authors included 8 genes that we did not study. These are 5 likely-paralogs of canonical TASRs and 3 genes that are not canonical TASRs but that are related to tasting fat (GPR41, and GPR84) and amino acids (GPRC6A  (Table 1 and Additional file 2). Some of these variants showed allelic frequency relationship with daily gain and retroperitoneal fat deposition in the F 2 resource with phenotypic records (Table 5 and Additional file 4). In keeping with previous findings in human [13], we also detected that H and M variants tend to be more abundant and have higher pAAFs in TASRs than in non-sensory genes (Tables 1 and 2) which indicates that TASRs are subject to balancing selection. TASRs are comparable to the major histocompatibility complex genes (HLA in humans and SLA in pigs) as both are expressed at the surface of cells to detect particular molecules that could be hazardous for the body. Whilst HLA and SLA detect antigens to promote an immune reaction, TASRs sense chemical compounds to stimulate appropriate responses (reward, acrosomal reaction, smooth muscle contraction or immune function among others) depending on the cell type that is involved. Thus, a healthy animal population needs to be highly polymorphic in these genes in order to maximize its adaptability and survival to variable environments facing multiple threads. As expected, H and Mdel variants had on average, lower pMAFs than Mtol or L variants (Table 2), as the former are more likely to be deleterious and consequently, subject to purifying selection. This is well exemplified by the recent exome sequencing experiments that have successfully identified rare deleterious variants causing rare and severe mendelian disorders in humans [14]. Indeed, H and Mdel variants are more abundant and frequent at the 3′-end of the protein, where they are less likely to have an impact of the function of the protein (Fig. 1).

H rare variants
Remarkably, we found 32 rare H variants which mapped to TAS1R1, TAS1R3 and 20 AR genes (Table 3). These mutations are predicted to fully disrupt the function of the affected gene and might thus have an impact on taste and AR perceptions. Eight of the 20 AR genes carrying rare H variants, have been directly associated to taste, food intake, body size, diabetes or triglyceride levels. Two of them, GRM4 and GABRR1, are glutamate receptors that have been associated to body size [21] and feeding behaviour [22], respectively. Natural knockout pigs, (i.e., pigs homozygous or compound heterozygous for rare H variants) in these genes may have severe consequences in taste perception and feeding attitude.
Reverse genetics experiments to study the phenotypic changes that occur in such natural knock-outs would be very informative to understand the importance of these mutations on feeding behaviour and on a broad range of other traits of relevance in animal breeding and biomedical sciences. However, the identification of such homozygotes may require the screening of thousands of pigs. A convenient alternative would be to identify heterozygous animals and cross them to generate these natural knock-outs.

Breed particularities
We found several variants with breed particularities that could both explain in part the population history of these breeds and be the result of genetic adaptation to the particular environment or artificial selection. Overall, the pools of commercial breeds (Large White, Landrace, Pietrain, Duroc) contained more variants than the pools of the ancient/traditional breeds (Iberian, Majorcan Black, Mangalitza, Bazna). The sequencing of the commercial breeds involved a larger number of samples and consequently, more genetic variation could be captured. This difference is possibly due also to first, population bottlenecks and a low effective population size within ancient breeds and second, to the introgression of Asian germplasm in the European commercial breeds [23,24] ( Table 4). The Duroc pool is somehow in between European commercial and traditional breeds, with a relative low number of variants (Table 4). However, this was expected as the pig reference genome sequence was obtained from a Duroc animal and is therefore expected to share more similarities with our Duroc pool. On the other hand, the Asian breeds showed the largest number of variants (Table 4) which was also expected as our pool was made of two different swine populations (15 Meishan and 7 Vietnamese pigs) and also, these animals diverged from the European counterparts around one million years ago [25]. It is worth pointing out that as we merged the Meishan and the Vietnamese pigs, we cannot have specific information for these populations.
Although the comparison of both Asian breeds would have been interesting, our aim was not to study the genetic variation within these breeds but to identify a large catalogue of variants by sequencing a set of divergent populations. For this reason, and to optimise the high throughput sequencing resources, we merged all Asian animals.
Although we tried to minimise co-ancestry between samples, we do not have pedigree data and hence, we cannot exclude the possibility that the trend on the reduction in the genetic diversity observed in some breeds is due to close familiar relationships. This is particularly true for our Mangalitza samples, which come from a highly inbred population from Romania and might thus not represent the genetic diversity existing within the Mangalitza breed as a whole. To better determine genetic diversity within each breed, we compared the genetic variability in the synonymous sites, which are considered to be neutral in evolutionary terms, in our TASR and AR genes using the method described by Watterson. This method corrects the number of variants by the number of individuals that were analysed [20]. We further corrected these values by the expected number of silent sites in the sequenced cds. Genetic variability among western breeds is highly similar (0.00090-0.00131 variants per mutant site) being lowest in Duroc and highest in Pietrain. The genetic variability seems to be higher in the Asian pool (0.00182).
We also compared the relative abundance of H + Mdel variants in each breed. In contrast to the recent results by Bianco and co-authors [26] who detected a higher ratio of deleterious variants in Western than in Asian breeds, we did not identify large differences across pools. This inconsistency could be explained by the fact that we interrogated a particular set of genes whilst Bianco et al. assessed all annotated genes in the pig genome. Also, both studies screened different animals from distinct breeds. Furthermore, the two studies used different sequencing strategies. Whilst we deep-sequenced 304 pigs in a pool-based strategy, Bianco et al. performed low depth whole-genome shotgun sequencing of 128 pigs. Our approach is better suited for the identification of rare variants and this difference could have altered the catalogue of variants identified in both studies. Therefore, the comparison of the two datasets needs to be taken with caution. Of note, we observed that 25 % of the 28 variants identified in TAS2R1 were wild boar specific (Additional file 3). This could indicate a particular haplotype that might have been lost in the European domestic breeds by artificial selection. Alternatively, as our wild boar gDNA pool was made with samples from three different European locations (Catalonia, Belgium, Romania), these variants, which have a relatively low pAAF [0. 16-0.18], could well belong to one of these populations with no germplasm contributed to the analyzed domestic breeds. Noteworthy, some of these TAS2R1 variants are predicted to have a strong effect on the protein and could thus indicate adaptive selection to particular foods or environments (Additional file 3). Especially relevant are the breed-specific H variants with relatively high frequency in the affected breed (pAAF > 0.1) (Additional file 3). The stop codon that prematurely truncates TAS1R1 right in the middle of the protein in 17 % of the Mangalitza genomes might impair the ability to sense umami and might thus affect food preferences in a similar way as described in giant pandas, which lack a functional TAS1R1 potentially disrupting preferences for protein-rich sources [27]. Furthermore, two H variants in HTR3C, and CYP2A6 are unique and almost fixed in the Asian breeds (Additional file 3). As the pAAF between Asian and European pools are so dramatically different, we hypothesize that this might reflect an AR adaptation to very different environments.

Variant and phenotype relationships
In our study, we aimed not only at identifying damaging variants but also at checking whether we were able to detect potential relationships with production traits. This comparison was done with only 38 pigs, which would typically be a very small number of animals and as such, the power to detect genetic associations is low. Thus, it does not aim at finding statistically significant genetic associations but at detecting a trend that could indicate this association. Indeed, we identified several allele -phenotype relationships that could indicate real genetic associations (Table 5). We believe that comparing the pAAFs of the two F 2 pools was a good strategy for the identification of allele -phenotype relationships, as these animals share a common genetic background and the only criteria used to make the pools, was their extreme and opposed phenotypic characteristics. Four of the 11 (36 %) TAS2R4 variants segregating in our F 2 resource showed significant differences at the nominal level in pAAF between the obese and fast growing F2_F pool and the lean slow growing F2_L pool ( Table 5). As this is a large percentage, we believe that this could be a real association and that a polymorphism, perhaps a regulatory variant not assessed here, is in part responsible of the phenotypic differences between the divergent groups of pigs. Five of the six genes with significant allelic differences at the nominal level (all with p-val ≤ 0.01) between the F 2 pools have been directly linked to both taste and growth (umami taste and GRM1 [28], fat taste and CD36 [29], taste in general and P2RX2/P2RX7 [30] and weight gain and GABRA6 [31]) ( Table 5) and could thus indicate a difference in the eating behaviour. This would explain the difference in growth and fat deposition between the F 2 groups. Remarkably, three variants in CD36, a gene associated to fat taste, had significant pAAF differences between the high and the low fat deposition pigs after correction for multiple testing ( Table 5).
Two of the markers, chr2_3566521_A_C and chr18_7019623_G_A, in the ALDH3B1 ortholog ENSSSCG00000026349 and in TAS2R41, respectively, were also genotyped in the F 2 animals with TaqMan probes using the OpenArray technology. Only chr2_3566521_A_C was nominally significant for daily gain (p-val = 0.025), but did not reach the significance threshold after multiple testing (Additional file 7). chr18_7019623_G_A was not significant, but a marker 236 bp upstream from this SNP, chr18_7019387_A_T, was nominally significant (p-val = 0.044) for retroperitoneal fat content (Additional file 7). We seek to screen larger populations with phenotypic data to determine whether these associations could become significant.

Selection and genotype-based validation of SNPs
We also developed a genotyping array to validate the variants with highest damaging potential regardless of their frequency. We believe that a proportion of the 59 non-polymorphic positions might be real low-frequency polymorphisms that were present only in animals that we did not genotype due to the lack of available gDNA. However, we cannot rule out the possibility that some of these variants could simply be false positives caused by the erroneous mapping of some reads to highly similar regions. For example, none of the 17 frame-shift variants we included could be confirmed as a polymorphism but the high false positive rate among indels in high throughput sequencing experiments is well documented. Most of the H variants failed to proof polymorphic but they had very low pMAFs whilst most of the M variants, which on average showed higher pMAFs, were confirmed (Additional file 6). None of the rare H polymorphisms displayed the homozygous state for the minor allele in the genotyped samples. However, we identified two H variants that approached the rare MAF threshold, which showed the three genotypic classes in our animals (Additional file 6). The Vietnamese rare variant in HTR1A is a stop codon that was homozygous in two Vietnamese pigs. Nonetheless, we cannot consider these two animals as natural knockouts as this mutation is located at the very last amino acid of the canonical HTR1A protein, and it is not expected to have an impact on HTR1A function. Therefore, the low frequency of the minor allele might not reflect the existence of purifying selection forces. The Meishan H variant in GABRG1 is also an introduction of a stop codon, but this is located in the middle of the gene and is thus likely to disrupt its function. This gene increases neuronal activity and is associated to eating disorders and anxiety [32] and even with alcohol dependence in humans. Thus, natural GABRG1 knock-out pigs carrying this stop codon may show both signs of anxiety and a particular eating behaviour.
We have developed the first assay for massive genotyping of variants with high functional potential in swine TASR and AR genes, and we now seek to perform association analysis on pig populations with phenotypic records for eating attitude, feed intake, growth, obesity, but also on semen quality and fertility, infection and immunity and behaviour abnormalities such as stressrelated stereotypies and tail biting. This approach will help us understanding the impact of genetic variation in TASR and AR genes on traits of interest for the pig breeding industry. This genotyping assay will also allow us to identify natural knock-outs that could then be used in reverse genetics studies. Nonetheless, we acknowledge that this array tags a very small proportion (eight TASRs and 23 reward) of the genes involved in taste, appetite and reward. Ideally, the list of variants and genes should be extended to achieve a comprehensive analysis of the impact of these gene pathways in pig breeding.

Conclusions
We detected 2,766 variants predicted to have a potential (high, moderate or low) impact in the protein sequence of 201 TASR and AR genes. Of these, 395 were predicted to strongly impact on the protein sequence of the 10 TASRs and 123 AR genes and consequently, in their function. The importance of these genes in many traits contributing to body homeostasis has been well documented in human and animal models but remains unexplored in livestock. We have found significant relationships between the pAAF of some variants and growth and fat deposition. This, albeit not more than a mere indication, strongly encourages further studying the effect of these genes on traits of interest in body homeostasis and animal breeding. For this reason, we have developed a genotyping array with a subset of these variants and have validated 57 by genotyping the initially sequenced animals. This array is now ready to be used in genetic association studies for relevant traits including taste preferences, food intake, fertility or behaviour. Although this array is not comprehensive and does not contain all the variants that we identified, it contains a careful selection of the most likely deleterious variants and involves eight TASRs and 23 AR genes.

Selection of target regions
The genomic regions of TASR exons were selected using the genome annotations accessible via Ensembl's Biomart (www.ensembl.org/biomart; version 72, June 2013). We selected two TAS1Rs and the 10 annotated TAS2R genes (Additional file 1). TAS1R2 could not be included due to its unknown genomic location at the time of selection.
In total, 166 genes from the AR circuitries were selected from a study aimed at understanding the genetic signatures in giant panda that confer the highly selective diet based on bamboo only [33]. In their study, these authors retrieved the 166 genes from 4 review articles on appetite and food intake behaviour as described in their material and methods. In addition, 35 AR genes were identified by searching NCBI's PubMed database using the keywords: "appetite", "food intake", "dopamine", "serotonin", "glutamate receptor", "epinephrine", "norepinephrine", "reward". We used Ensembl's Biomart to identify and select the genomic coordinates of the exons from the swine AR orthologous genes (Additional file 1).

Samples
We used gDNA from 266 pigs belonging to eleven breeds or populations including Large White (United Kingdom), Landrace (Denmark), Pietrain (Belgium), Duroc (United States), Iberian (mainland Spain and Portugal), Majorcan Black (Balearic islands), Bazna (Romania), Mangalitza (Hungary), Meishan (China), Vietnamese pot-bellied and to the European wild boar. The Large White, Landrace, Pietrain and Duroc have been subjected to strong selection pressures in the last decades and at present are the most commonly used in intensive production systems. On the contrary, Iberian (mainland Spain and Portugal), Majorcan Black (Balearic islands), Mangalitza (Hungary, although our animals come from a closed herd in Romania) and Bazna (a Romanian breed obtained in the 19th century by crossing Mangalitza and Berkshire) are local breeds with much lower selection intensity and with very localised geographic locations and relatively small productions. The Large White was strongly introgressed with pigs from Asian origin back in the 18th and 19th century. The Landrace was developed in the late 19th century by crossing native Danish breeds with Large White pigs. All samples were collected from farms in Catalonia, Canada and France. Samples were selected based on DNA availability in our DNA archive. Although we do not have pedigree information, we tried to minimise co-ancestry by selecting samples from different farms when possible with the exception of the Mangalitza, which samples come from a close and highly inbred population and are thus likely to have strong familiar relationships. No phenotypic data is available from these samples. Furthermore, DNA from 38 F 2 pigs from an experimental intercross created with the aim of studying obesityrelated traits was used. This experimental intercross is described in detail by Kogelman et al. [34]. Briefly, the F 2 population was created by inseminating seven Large White and seven Duroc sows with Göttingen minipig semen from 14 males. Both Large White and Duroc breeds have been selected for leanness and growth traits during many years, while Göttingen minipigs are prone to obesity. 563 F 2 pigs were created. The animals were housed at a regular pig farm, and slaughtered at a commercial slaughterhouse under veterinary supervision. Tissue and blood samples were collected at slaughter. Extensive phenotypic collection was performed from birth to slaughter (242 ± 48 days) including obesity, obesity-related, and metabolic phenotypes; and measurements of fat compartments at slaughter [34]. The retroperitoneal fat compartment was removed from the carcass by blunt dissection and weighted on a bench scale. To calculate daily gain, body weight was measured individually at birth and at 7 month of age (220 ± 45 days). The 38 F 2 pigs used in this study were selected to represent two divergent groups with respect to average daily gain (g/day) and retroperitoneal fat content (kg). The two divergent groups, i.e. the fast growing, obese (F2_F) and the slow growing, lean (F2_L), showed highly significant differences (T-test) for both traits (Additional file 8). gDNAs from purebred animals were extracted using different tissues (blood and a variety of solid tissues) and protocols including standard phenol -chloroform -Isoamyl alcohool organic extraction and the Charge Switch gDNA Micro Tissue kit (Invitrogen). DNA from the F 2 pigs was extracted from EDTA stabilized blood using a salting out procedure. Samples were pooled in 16 tubes on a per-breed basis using semi-equal amounts of gDNA as measured by Nanodrop. The sample size of the pools ranged between 12 and 24 (Additional file 5). Likewise, the sample size of each breed ranged between 12 and 45 (Additional file 5).
The F 2 animals belong to an experimental population in Denmark and they were subject to animal care, maintenance and experimental work according to the ' Animal Maintenance Act' (Act 432 dated 09/06/2004) and the approval from the Danish Animal Experimentation Board (J.nr. 2007/561-1434). Specialized professionals at each institution obtained all the other blood and tissue samples following standard routine monitoring procedures and relevant guidelines. No animal experiment has been performed in the scope of this research.

Capture of genomic regions, library prep and high throughput sequencing
The 16 gDNAs pools were subjected to genomic capture and library preparation following Agilent's SureSelect protocol for Illumina paired-end sequencing. Briefly, three μg of porcine gDNA were sheared on a Covaris™ E220 instrument. The fragment size (150-300 bp) and the quantity were confirmed with the Agilent 2100 Bioanalyzer 1000 chip. Fragmented DNA was end-repaired, adenylated and ligated to Agilent specific paired-end adapters. The DNA with adapter-modified ends was PCR amplified (six cycles, Herculase II fusion DNA polymerase). PCR product size and quantity were determined on the Agilent 2100 Bioanalyzer DNA 1000 assay and hybridized to the genomic capture baits for 24 h at 65°C (Applied Biosystems 2720 Thermal Cycler). The hybridization mix was washed in the presence of magnetic beads (Dynabeads MyOne Streptavidin T1, Life Technologies) and the eluate was PCR amplified (16 cycles) in order to add the indexed tags using 6 bp SureSe-lectXT indexes for Illumina. The final library size and concentration was determined on an Agilent 2100 Bioanalyzer 1000 assay.
Each library was sequenced on an Illumina HiSeq 2000 instrument in a fraction of a sequencing lane following the manufacturer's protocol, with paired end run of 2x101bp. Images analysis, base calling and quality scoring of the run were processed using the manufacturer's software Real Time Analysis (RTA 1.13.48) and followed by generation of FASTQ sequence files by Illumina's proprietary CASAVA software.

Read mapping and variant calling
Reads were hard trimmed from the end of the read up to the first base with a quality of at least 10. Reads with at least 40 nt of length were mapped to Sus scrofa reference version 3 (http://hgdownload.cse.ucsc.edu/golden Path/susScr3/bigZips/susScr3.fa.gz). As the list of TASRs was clearly shorter than the catalogue of AR genes and to minimize type I error (false negative variants), we applied a slightly different read mapping strategy for the two gene sets: 1) For TASR variant detection, reads were mapped first with the GEM toolkit [35] allowing up to four mismatches, and unmapped reads were then aligned to the swine genome using the more permissive BFAST read aligner [36]. We further manually curated the TASR variant list by removing these variants that clustered in high variant density regions as these indicate the presence of wrongly mapped reads -most of them probably aligned by Bfast -and thus false variant calls; 2) For the detection of variants in the AR genes, reads were mapped using only GEM as the manual curation would have been too labour intensive and prone to errors. Alignment (.bam) files containing only properly paired, uniquely mapped reads without duplicates were submitted to variant calling. Each pool was processed separately. The ploidy of the pool was calculated as two times the number of individuals in the pool and input as an optional argument in GATK 3.1 Unified-Genotyper [37]. For variant calling, read numbers were down-sampled to 1,000 reads per position.
Single pool variant calls were merged into a multisample.vcf file using GATK CombineVariants [37]. To confirm that variant positions not called in certain pools had a homozygous reference genotype, a second round of single pool variant calling was performed restricted to the list of variant positions in the merged.vcf file. Subsequently, results were merged again. Functional annotations were added using snpEff [38] with the Sscrofa10.2.69 database, and variants were classified according to their predicted impact as High (H), Moderate (M), Low (L) and Modifier (Additional file 9). Porcine dbSNP version 138 and porcine SIFT scores and deleteriousness prediction were annotated using snpSift [39] and the Ensembl Variant Effect Predictor (VEP) online tool (http://www.ensembl.org/Tools/VEP). M variants were further classified as deleterious (Mdel) or tolerated (Mtol) according to SIFT predictions. Genes of interest and the original target region of the capture experiment were annotated using vcftools [40]. Base counts at variant positions in the merged.vcf file were annotated using GATK Variant Annotator [37]. We used the proportion of reads carrying each allele as an estimator of the allelic frequency for both the alternative allele (pAAF) and for the minor allele (pMAF).

Identification of breed-specific variants
We searched for allelic variants that were uniquely present or uniquely absent in only one breed. We chose these variants that were either breed-specific but with a pAAF in the specific breed ≥ 0.1 and the variants that having a pAAF ≥ 0.5 in all breeds, were absent in only one breed. These breed specific features were assessed on the 2,793 variants identified in TASR and AR genes.

Phylogenetic tree
We calculated the pair-wise Pearson correlation of the pAAFs from the eight European breeds, the wild boars and the Asian animals (Meishan and Vietnamese together) with the 2,523 variants with the alternative allele present in at least one breed and with allelic information in the ten breeds and the wild boar. We then constructed a phylogenetic tree using an UPGMA method. These calculations were done using an in-house developed R script (Do.upgma.pops.bootstrap; https://bioinformatics.cragenomica.es/numgenomics// people/sebas/software/software.html).
Assessment of allele frequencyphenotype relationships pAAFs at each variant position between the two F 2 pools F2_F and F2_L were compared and the significance of these differences were determined using the Fisher exact test in an R environment. We excluded multi-allelic variants and variants with no read count in at least one pool from the analysis. After this filtering, we were able to compare 1,377 variants.

OpenArray design, genotyping and variant validation
In order to validate the most likely deleterious variants and to develop a genotyping assay to be used in future association studies in pig populations with relevant phenotypes, we developed a genotyping array containing 128 of such potential polymorphisms. We chose the TaqMan® OpenArray® Real-Time technology for Genotyping (Life Technologies) and designed the baits using the online Custom Assay Design tool. This design requires the absence of polymorphisms in the 2 bp window centred at the target variant and has also some constrains on the probe's melting temperature. As a consequence, we could not design assays for all the variants using this approach. We first selected all the H variants in TASRs and AR genes and supplemented the assay with M variants in both gene groups.
The genotyping was performed in a QuantStudioTM 12 K Flex Real-Time PCR System (Life Technologies). This platform is a high performance, high-throughput technology based on real-time PCR, which enables to run up to 12,000 data points, including SNV, small insertions and deletions (indels), simultaneously. 250 ng of gDNA and master mix were loaded to the OpenArray plates using the AccuFillTM robotic system (Life Technologies), filled with an immersion fluid and sealed. OpenArray plates were genotyped according to the manufacturer's recommendations. Genotype analysis was performed using both Taqman Genotyper version 1.3 and Symphoni Suite software (Life Technologies).
Genetic association of OpenArray genotypes with daily gain and retroperitoneal fat in the F 2 We used a univariate mixed model to determine genetic associations of the 36 polymorphisms genotyped and segregating in the F 2 animals with the phenotypic records for daily gain weight and retroperitoneal fat content. The analysis was calculated with the software GEMMA [41].