Genetic diversity of canine olfactory receptors

Background Evolution has resulted in large repertoires of olfactory receptor (OR) genes, forming the largest gene families in mammalian genomes. Knowledge of the genetic diversity of olfactory receptors is essential if we are to understand the differences in olfactory sensory capability between individuals. Canine breeds constitute an attractive model system for such investigations. Results We sequenced 109 OR genes considered representative of the whole OR canine repertoire, which consists of more than 800 genes, in a cohort of 48 dogs of six different breeds. SNP frequency showed the overall level of polymorphism to be high. However, the distribution of SNP was highly heterogeneous among OR genes. More than 50% of OR genes were found to harbour a large number of SNP, whereas the rest were devoid of SNP or only slightly polymorphic. Heterogeneity was also observed across breeds, with 25% of the SNP breed-specific. Linkage disequilibrium within OR genes and OR clusters suggested a gene conversion process, consistent with a mean level of polymorphism higher than that observed for introns and intergenic sequences. A large proportion (47%) of SNP induced amino-acid changes and the Ka/Ks ratio calculated for all alleles with a complete ORF indicated a low selective constraint with respect to the high level of redundancy of the olfactory combinatory code and an ongoing pseudogenisation process, which affects dog breeds differently. Conclusion Our demonstration of a high overall level of polymorphism, likely to modify the ligand-binding capacity of receptors distributed differently within the six breeds tested, is the first step towards understanding why Labrador Retrievers and German Shepherd Dogs have a much greater potential for use as sniffer dogs than Pekingese dogs or Greyhounds. Furthermore, the heterogeneity in OR polymorphism observed raises questions as to why, in a context in which most OR genes are highly polymorphic, a subset of these genes is not? This phenomenon may be related to the nature of their ligands and their importance in everyday life.


Background
Olfactory receptors (OR) are expressed on the cilial membranes of olfactory sensory neurons embedded in the olfactory mucosa [1][2][3]. OR are transmembrane G-protein-coupled receptors and constitute the first element in a biochemical cascade leading to the perception and recognition of an odorant. OR genes constitute the largest mammalian gene family, with several hundred genes in the human genome and up to 1550 in the rat genome [4][5][6][7][8]. Comparisons of the amino-acid sequences deduced from orthologous and paralogous OR genes have shown a large number of positions to be highly conserved and others to be variable. The conserved residues are thought to be involved in signal transduction, whereas the variable residues are thought to be involved in binding thousands of odorant molecules in specific interactions [7,[9][10][11].
Mammals have evolved sophisticated systems for sensing the outside world and, in particular, for sensing odorant molecules indicating danger or the presence of a mate or food. Dogs are particularly interesting in this respect. They were domesticated from wolves some 15,000 years ago and have since undergone extensive breeding and selection, resulting in 400 or so different breeds, some of which were developed specifically for hunting, in which olfaction plays a central role [12][13][14][15]. The astounding ability of dogs to detect an odorant molecule and follow its trace results from the interaction of several brain functions. The first step in this process involves the efficient binding of an odorant molecule to a given set of OR. The absence of a particular OR or the presence of alleles giving rise to OR with a low binding efficiency would lead to poor downstream processing or the complete absence of such processing. As a case in point, links between nucleotide polymorphisms in two OR genes in humans (OR7D4 and OR11H7P) and the perception of specific odorants -androstenone and isovaleric acid, respectively -have recently been demonstrated [16][17][18].
We therefore wondered whether breeds or individual dogs known to be particularly skilled at odorant detection have different gene alleles encoding OR with a higher affinity for their ligands or more efficient at initiating the signal transduction cascade. In a preliminary study on a subset of 16 OR genes, we showed the rate of polymorphism to be high, with all genes having at least one SNP in their open reading frame (ORF) [19]. This finding led us to analyse the DNA sequences of a larger number of OR genes (109 OR genes) in a cohort of 48 dogs from six breeds known to differ in their ability to detect odorants: four breeds known for their strong sense of olfaction (German Shepherd, Belgian Malinois, English Springer Spaniel, and Labrador Retriever) and two breeds known to have a weak sense of olfaction (Greyhound and Pekingese).
We show here that OR genes are generally highly polymorphic, with a mean of one SNP per 577 nucleotides. However, the degree of polymorphism observed is highly variable, with some OR genes having few if any SNP and others being highly polymorphic (1 SNP/122 nt). This high level of genetic polymorphism, resulting in a large number of amino-acid substitutions in all parts of the OR, strongly suggests that a large proportion of the mutations occurring during DNA replication are not counterselected, facilitating the evolution of the OR repertoire and increasing its potential to recognise odorants.

DNA samples
DNA was obtained from 48 dogs from six breeds: German Shepherd Dog (GSD), Belgian Malinois (BM), Labrador Retriever (LR), English Springer Spaniel (ESS), Greyhound (Grey), Pekingese (Pek). In addition, blood samples from 8 Boxer (Box) dogs were processed for the analysis of a subset of OR genes.
Most of the DNA samples were obtained from the caniDNA bank [20] and were selected from dogs with no family links up to grandparental level. We also selected dogs from different breeders from different regions or countries, to minimise possible links between animals. When necessary, the panel was completed with additional samples provided by Gary S. Johnson (Department of Veterinary Pathobiology-University of Missouri, USA) and Paul G. Jones from Masterfoods (England).
DNA was extracted with the Nucleospin Blood L kit (Macherey Nagel). For samples with low DNA concentrations, whole genome amplification was carried out with the Illustra GenomiPhi V2 DNA Amplification Kit (GE Healthcare).

PCR amplification and OR gene sequencing
Pairs of specific primers (20-30 bp) were designed with Primer3 [21], for binding outside the reading frame, for amplification of the whole OR ORF. Primers were also designed to bind to regions with a unique sequence, to ensure that paralogous genes were not amplified. The nomenclature and sequences of OR genes were extracted from the paper by Quignon et al. [7] and can be obtained from [22]. PCR amplification was carried out in a final volume of 10 μl, containing 35 ng of dog DNA, GeneAmp 1 × PCR Gold Buffer, 2 mM MgCl 2 (Applied Biosystems), 250 μM dNTP (GE Healthcare), 0.5 U AmpliTaq Gold DNA Polymerase (Applied Biosystems) and 0.3 μM of each specific primer. PCR conditions were as follows: initial denaturation at 95°C for 7 min, 20 cycles of 94°C for 30 s, 61°C for 30 s with a touch-down process (-0.5°C per cycle) and 72°C for 1 min, 15 cycles of 94°C for 30 s, 51°C for 30 s, 72°C for 1 min, and a final extension at 72°C for 3 min. Aliquots of PCR products were subjected to electrophoresis in 1% agarose gels in 0.5 × TBE. We then purified 2.5 μl of PCR products from faithful amplifications using 1 μl of ExoSAP-IT (USB). The purified PCR products were incubated at 37°C for 15 min and then at 80°C for 15 min. Pairs of specific internal primers (18-21 bp) designed with Primer3 [21] were used for sequencing PCR products with the BigDye V3.1 Cycle Sequencing Kit (Applied Biosystems), used according to the manufacturer's instructions. Sequencing products were fractionated on a 3130xl genetic analyser from Applied Biosystems.

SNP identification
Sequences were aligned and analysed with SeqScape software V2.5 (Applied Biosystems), using the CanFam2 DNA sequence as a reference [23]. Only SNP corresponding to nucleotide sequence of the highest quality, as determined by the Phred algorithm [24], were retained.

Data analysis
Haploview software v4.0 [25] was used to calculate the SNP MAF (minor allele frequency) and LD values. We calculated r 2 values for OR genes and D' values for clusters, making it possible to compare our results with those of previous studies [23,26].

Haplotypes
Haplotypes were inferred using fastPHASE software v 1.0.1 with the default settings [27]. This software estimates the missing genotypes and reconstructs haplotypes from unphased genotype data from unrelated individuals.

N value calculation
As an index of the level of OR polymorphism, a mean distance N between SNP was calculated, based on the number of SNP detected through the pairwise comparison of all OR sequences and the occurrence of the two alleles of each SNP. Thus, the smallest N value denotes the highest level of polymorphism.
The N value for individual OR genes was calculated as follows: where n is the number of SNP per OR gene and a and b the occurrences of the two alleles.
The N value for the complete set of OR genes was calculated with the same formula, in which n is the total number of SNP and the individual ORF size is replaced by the sum of individual ORF sizes.

Ka/Ks
Ka/Ks was calculated for each OR gene, as described by Goldman and Yang [28], using the CODEML program (model = 0) from the PAML package [29]. Ka/Ks for the whole set of OR genes was obtained by determining mean Ka/mean Ks.

SNP number and distribution
We analysed the nucleotide sequences of 109 OR genes (102 genes and seven pseudogenes, as defined in the genome sequence [23]) selected from the entire OR repertoire of 872 genes and 222 pseudogenes [7,30]. These OR genes were selected to be representative of a large number of families (the five class I families and 15 of the 18 class II families), subfamilies and clusters (33 of 54) located on 20 chromosomes (Additional file 1). They were also selected as representative of genomic regions very rich in OR genes, as for cluster @40-44 on canine chromosome 18 (CFA18), or with a lower density of OR genes, as for cluster @3 on CFA15. We also studied five isolated OR genes. We determined the nucleotide sequences of PCR fragments amplified from DNA purified from a cohort of 48 dogs of six breeds: German Shepherd Dog (GSD), Belgian Malinois (BM), Labrador Retriever (LR), English Springer Spaniel (ESS), Greyhound (Grey) and Pekingese (Pek). We also analysed a subset of 27 OR genes in eight Boxers (Box).
Visual inspection of all sequencing traces obtained with the cohort of 48 dogs led to the identification of 710 SNP, corresponding to 549 transitions and 161 transversions. We also observed 17 short insertions/deletions (indels, 1 to 3 nt) and five longer indels of 6 to 74 nucleotides. As the occurrence of each indel probably corresponded to a single mutational event, these 732 mutations (SNP + indels) were combined for further analysis. Figure 1 shows the distribution of SNP within the 109 OR genes. It shows that all but four of the OR genes are polymorphic, with one to 22 SNP per OR gene. When analysed at the breed level, the total number of SNP differed significantly (chi 2 , P < 10 -3 ) between breeds, whereas their distribution did not (Wilcoxon-Mann-Whitney) ( Figure 2). However the numbers of OR genes without SNP differed markedly between breeds (chi 2 , P < 0.05), with 24 and 21 OR genes with no SNP for German Shepherd Dog and Greyhound, respectively, 14 for Labrador Retriever and only 10 for each of the three other breeds. The set of OR genes with no SNP was either breedspecific or shared by only a few breeds, in different combinations (Table 1) We investigated the possible correlation between OR gene polymorphism and the organization of these OR genes into clusters of different sizes, by ranking the 109 OR genes according to SNP content. We selected the 22 OR genes with no more than two SNP and the 27 OR genes with 10 or more SNP and compared the sizes of the clusters harbouring these OR genes. As shown in Figures 3.1 and 3.2, the least polymorphic OR genes were preferentially localised in small clusters (median cluster size 4.5 OR genes) and the highly polymorphic OR genes, in large clusters (median cluster size 240 OR genes). Mann-Whit-ney test showed this relationship to be significant (P < 10 -3 ). In addition, the 109 OR genes were ranked according to cluster size and we selected the 20 OR genes located in clusters containing five or fewer OR genes and the 18 OR genes present in the largest cluster (containing 243 OR genes). Again, OR genes in small clusters tended to be less polymorphic than OR genes in large clusters (median SNP numbers of 2 and 8 for the smallest and largest clusters, respectively, Mann-Whitney test; P < 10 -3 ) (Figures 3.3 and 3.4). Interestingly, the OR genes with the highest number of SNP tended to have paralogous genes with higher sequence homology (> 90%) than OR genes devoid of SNP or harbouring a small number of SNP.
Allele frequency SNP minor allele frequency (MAF) ranged from 1% to 50% (see additional file 3). However, MAF within breeds might differ considerably from MAF across breeds, with some alleles absent in all but one breed, in which they could be the major allele (see  ) were common to all breeds, whereas 79 were common to two breeds and 50 were common to three breeds (Tables 3, 4 and 5).
Assuming, as is most likely, that each SNP appeared once in the evolutionary history of the dog, it follows that the 199 SNP common to all breeds probably arose before the separation of the six breeds and that most of the private SNP arose following breed separation. Based on the same rationale, it could be hypothesised that SNP common to two or three breeds arose before the separation of these breeds. Although the number of pairs in common differed significantly (chi 2 , P <10 -3 ), the use of HCLUST [31] to construct dendrograms did not result in any clusters matching breed history. This is probably because the  number of SNP common to pairs of breeds with a MAF > 10% was too small.

Polymorphism level
Nucleotide polymorphism level reflects the number of differences between two sequences. It can be represented by N, the mean distance, expressed in nucleotides, between two SNP. OR genes are generally highly polymorphic, but the distribution of SNP is far from even ( Figure  4). CfOR0034, in which 22 SNP were detected, was the most polymorphic OR gene studied, with an N of 98 for the whole population, ranging from 89 for Pekingese to Distribution of SNP within the 6 breeds  4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  Calculation, at the whole-population level, of N for the 109 OR genes gave a mean value of 577. Comparison at the breed level indicated that the English Springer Spaniel was the most polymorphic breed, with an N value of 594, whereas the German Shepherd Dog was the least polymorphic breed, with an N value of 926 (chi 2 , P < 10 -3 ) ( Table 6).
Only 27 OR genes were analysed in Boxer, and we obtained an N value of 1728. We therefore wondered whether the large differences in N values between the other six breeds and Boxer were due to the 27 OR genes selected for study in Boxer or whether they reflected a truly lower level of polymorphism in Boxer. However the N values for these same 27 OR genes calculated for each of the six breeds were not statistically different (Mann-Whitney test) from those calculated for the whole set of 109 OR genes (Table 6). This last finding ruled out the possibility of a bias due to the sampling of this subset of OR genes and indicated that the level of polymorphism really was lower for Boxer OR genes -this finding is relevant to the choice of the Boxer Tasha DNA sample (less polymorphic than the other DNA samples tested) for determination of the dog genome sequence [23].
We compared the level of OR gene polymorphism with that of non-coding regions and coding regions devoid of OR, by sequencing a series of exons, introns (only regions close to splice sites) and intergenic sequences with no known coding function. We obtained N values of 8631 for exons, 1992 for introns and 732 for anonymous intergenic sequences (Table 6). These values are consistent with previous reports [23]. A comparison of these values indicates that the coding regions of OR genes are more polymorphic than most exon sequences and more polymorphic than the non-coding DNA (chi 2 , P <10 -3 ).
In a similar study, Sutter et al. [26] sequenced five noncoding regions of the dog genome in a cohort of 95 dogs of five breeds and detected 201 SNP and 19 indels. These results, indicating a lower level of genetic diversity than that observed in OR genes, confirm the high level of genetic diversity of the OR coding exons. The isolated OR genes and genes belonging to small clusters analysed in this study were overrepresented among the 109 OR genes as with respect to their presence in the whole repertoire. As these OR genes tended to be less polymorphic than the OR genes from large clusters, their presence increases the value of N, and the actual difference between OR genes and intergenic sequences should thus be even greater.

Ka/Ks and protein sequence polymorphism
We noted that 152 of the 732 SNP identified within the 109 OR genes led to pseudoalleles (alleles with an interrupted coding frame). Theoretical translation of intact OR genes showed that 307 of the remaining 580 SNP were silent mutations. Of the 273 missense mutations (47% of Numbers correspond to the frequency of the allele identified as the minor allele in the whole population. The second allele in the SNP column is always the minor allele in the whole population. As expected, the minor allele at the whole population level is also the minor allele in most breeds. Exceptions in which the minor allele at the whole population level is the major allele in one breed are indicated in bold typeface. This table is a subset of additional file 3.  the total), 130 would result in the incorporation of an amino acid of a different chemical group ( Table 7).
Calculation of the Ka/Ks ratio, where Ka is the number of non-synonymous substitutions (missense mutations) per non-synonymous site and Ks is the number of synonymous substitutions (silent mutations) per synonymous site between two closely related species, is the traditional method of assessing the strength of selection affecting proteins during evolution. In a recent study, it was shown that the A/S ratio calculated from the SNP content of the human genome is equivalent to the Ka/Ks ratio for the assessment of selective pressure [32].
Using the SNP detected in this study, a Ka/Ks value of 0.37 was obtained for the 95 OR genes analysed here (109 minus pseudogenes and non-polymorphic genes). Similar values were obtained at the breed level (from 0.31 for Labrador Retriever to 0.37 for Pekingese). A Ka/Ks value of 0.098 has been reported for a large set (n = 13,816) of canine genes [23]. Comparison of these two values (0.37 and 0.098) indicates an absence of strong selective constraint, resulting in greater diversification for the OR genes, as already observed for a small subset of human and chimpanzee OR genes and for the gene encoding the human bitter taste receptor, than for most other genes [33,34] We also analysed the distribution of SNP within codon positions and found that 161, 130 and 289 of the 580 SNP were located at the first, second and third codon positions, respectively. This distribution, with 50% of mutations affecting one of the first two positions, at which nearly all mutations induce an amino-acid change, and 50% affecting the third position, at which half of all mutations induce an amino-acid change, is consistent with many mutations (75%) randomly affecting the DNA sequence being retained and not counter-selected.
SNP were found throughout the OR gene sequences, resulting in amino-acid substitutions evenly distributed along the length of corresponding proteins, in the transmembrane, inner and outer parts of the receptors (Table  7).
However, if we take into account the respective sizes of the various domains, the number of missense mutations is significantly larger in intracellular (IC) than in extracellular (EC) and transmembrane (TM) domains (chi 2 , P < 10 -3 ), whereas the number of silent mutations does not appear to differ significantly between domains (chi 2 , P > 0.7). These results were obtained for the whole set of data considered together, or when OR belonging to small clusters (≤ 5 OR genes) and OR belonging to the large cluster (243 OR genes) were considered independently. This indicates the existence of stronger selective pressure to maintain the structural conformation of the parts of the OR related to ligand binding (TM 3, TM5 and EC3 [9]) than to maintain the structure of the part of the protein involved in signal transduction and processing. This finding, which conflicts with those of Buck and Axel [1], should be interpreted taking into account the fact that we compared the sequences of the same gene in different breeds, whereas Buck and Axel [1] compared paralogous OR genes from a single rat and thus compared OR with different binding properties. It would thus be of interest to determine whether the amino-acid changes within IC domains affect the efficiency of the transduction pathway and, in turn, odorant sensing properties. The distributions of missense and silent mutations for the 136 SNP present in only one breed (private SNP) and for the 168 SNP shared by all six breeds indicate a significant bias, with missense mutations more frequent among private SNP (chi 2 , P < 10 -2 ), suggesting selection pressure related to breeding practices.
We used the CORP program to determine the effects, if any, of the 273 missense mutations [35]. Of the 83 OR genes with missense mutation(s), 44 conserved the same Ψ L value, whereas changes < 0.3 were observed for 20 OR and changes > 0.3 for 19 OR. Variations of this type were also associated with higher or lower functionality as defined by the CORP program. As concerns a putative to an amino acid changes affect the 22 most highly conserved positions [9]. In addition, five missense mutations involved the arginine of the MAYDRY conserved motif.

Pseudogene formation
Mammalian OR repertoires contain a large number of pseudogenes -up to 60% for the human repertoire and around 20% for the rodent and dog OR repertoires [4][5][6][7][8].
These pseudogenes are not retrogenes and have resulted from nonsense mutations or short indels occurring during evolution. Of the 109 OR genes analysed in this study, seven were strictly pseudogenes, 86 were intact in all breeds and 16 OR genes had both intact and interrupted ORF (pseudoallele). In each breed, a subset of 10 to 13 of these 16 OR have been identified as having one or more pseudoalleles ( Table 8). The frequency of SNP closing the frame varies across breeds (Table 8). For example, The graph shows that more than 50% of OR genes are highly polymorphic, with an N value even smaller than that for anonymous sequences (see Table 6), whereas ~10% are barely polymorphic (N > 5000) (see additional file 2). Note that six OR genes with a very high N value were off-scale and were not plotted on this graph.    Table 8. More extreme distributions exist, with SNP closing the frame in one or more breeds, but not all, such as the SNP 84 of CfOR0821 or SNP 49 of CfOR0401, which close the frame only in Pekingese and English Springer Spaniel, respectively. Gen-  PAF was calculated independently for each breed and for the whole population. Pseudoallele distribution varies considerably between breeds, with some pseudoalleles present in all breeds and some in only one or a few breeds: 9 are common to all breeds, 5 are found in only one breed and 8 are shared by two to five breeds, giving 10, 12, 13, 10, 10 and 11 OR pseudogenes for GSD, BM, ESS, Grey, LR and Pek, respectively. NS: nonsense, indel: insertion/deletion. otype analysis (data not shown) indicates that the distribution within breeds is not homogeneous, with dogs having zero, one or two alleles with an interrupted ORF.

Variability in OR gene polymorphism level
These results indicate that the status of a gene as active or inactive (pseudogene) does not necessarily apply to the whole dog population, depending instead upon breed or even the individual dog. These observations suggest that pseudogene formation is still an active process, as previously reported [18,36], related to the acceptance of a large proportion of mutational events to the probable continuing diversification of the OR repertoire -the risk attached to deleterious mutations being counter-balanced by the highly combinatory nature of the OR repertoire [37,38], partly accounted for by gene redundancy.

Haplotype structures and distribution
We used the Fast Phase algorithm [27] to identify a total of 809 haplotype structures for all OR genes with more than two SNP (see additional file 4). We found that the mean number of haplotypes per gene and per breed varied between 2.83 for German Shepherd Dog and 3.73 for English Springer Spaniel. Not surprisingly, the number of haplotypes per gene increased with the number of SNP. However this relationship is not simple and many exceptions were noted. We plotted the haplotype/SNP number ratio against the number of SNP ( Figure 5). We calculated the Manhattan distances between the points and generated four groups of OR genes by agglomerative hierarchical clustering, with the two extreme groups having 11 OR and 5 OR genes. As examples of these two extreme groups, CfOR12A07 has 4 SNP and 11 haplotypes and DOPRH07 has 21 SNP and 4 haplotypes (see additional file 4).
The existence of the two extreme groups ( Figure 5) suggests two different evolutionary processes. However, comparisons of gene status (family, subfamily, CFA position, cluster position for OR genes belonging to these two extreme groups) identified no specific feature.
As pointed out above, most of the SNP common to all six breeds had different MAF. Not surprisingly, this leads to very different haplotype patterns in different breeds, with some breed-specific haplotypes, such as the GCAGAGG-TAAT haplotype (CfOR5413), which was found in 11 of the 16 Pekingese haplotypes but was absent from the other breeds (see additional file 4).
In total, we identified 332 breed-specific haplotypes (41%). Many (205) were found only once, but some (38) accounted for 25% or more of the 16 possibilities per OR gene per breed and might even be the most frequent haplotype in the breed concerned ( Table 9). The combination of a small number of haplotypes may result, for each breed, in a haplotype signature. This signature could be used to certify that a given animal does or does not belong to a specific breed, based on the analyses of limited numbers of OR genes. For example, the haplotype structure of CfOR0050 and CfOR16H04, deduced from the analysis of 11 SNP, would be sufficient to identify a dog as a German Shepherd Dog.

Linkage disequilibrium (LD)
Linkage disequilibrium indicates an association between two polymorphic markers, for which pairs of alleles are inherited together. Previous studies have shown that dogs display higher levels of LD than humans. However, LD has also been shown to be heterogeneous, with alternating genomic long and short regions of LD [23]. This pattern of alternating long and short LD regions, which differs between breeds, has been attributed to the history of the dog population, which has been characterised by two bottlenecks and expansion periods [23,26]. We investigated the evolution of the OR gene repertoire by calculating LD both within and between OR genes.

LD within OR genes
All pairs of SNP (MAF > 0.05) within each OR were used to calculate the mean r 2 per breed -range of 0.52 for Pekingese to 0.70 for German Shepherd Dog, with a mean of 0.33 for the whole population (Table 10). These values indicate (1) that the extent of LD for OR genes is one tenth the mean extent of LD previously reported [23]; (2) the lower r 2 value (0.33) obtained for the whole population than for individual breeds is consistent with greater homogeneity within breeds. This low LD value indicates that SNP alleles within individual OR genes are not inherited as a block and suggests an ongoing gene conversion process potentially generating many OR genes with higher levels of polymorphism than the bulk DNA [39,40]. These results indicate that the constraints imposed on OR cluster evolution are not identically distributed in the different breeds. The LD value calculated per breed was also higher than that calculated for the whole cohort (Table  11). This result contrasts with the findings of Sutter et al. [26], showing that the LD value calculated at the whole-Relationship between SNP and haplotype number Figure 5 Relationship between SNP and haplotype number. Distances between points were calculated with R software (maximum distances) [43] and used to cluster OR genes. With k = 4, a group of 5 OR genes (in light blue) with a large number of SNP but a small number of haplotypes was identified, together with a group of 11 OR genes (in green) with a large number of haplotypes and a small number of SNP. We excluded from this last group the 4 OR genes with only one SNP and 2 haplotypes. Note that an individual point may correspond to more than one OR gene.

SNP Number
Haplo Number/SNP Number  There are 11 specific haplotypes, each present once, in GSD and 1 specific haplotype present 11 times in Pek.
population level for regions devoid of OR genes was similar to that obtained for individual breeds. However, our result is consistent with that reported by Menashe et al. [41] for the analysis of a human OR cluster in different populations.

Conclusion
We have shown here that overall OR gene diversity is very high, with a mean distance (N) between SNP of 577 nt, slightly less than that calculated for non-coding sequences and much shorter than the distances calculated for exon sequences. However, this diversity is not uniformly distributed, some OR genes having few or no SNP, whereas others may have as many as 22 SNP in their coding sequence. In addition, individual OR genes may be highly polymorphic in one or a few breeds and devoid of SNP in other breeds. Thus, the overall level of polymorphism was found to differ between breeds, with a mean distance of 628 for the Pekingese and 926 for German Shepherd Dog. An even higher N value was calculated for the Boxer, consistent with previous suggestions of a lower level of genetic diversity in this breed [23].
As the presence of different alleles of specific OR genes has been shown to affect the perception of isovaleric acid and androstenone in humans [16,17], this OR genetic diversity, with 47% of SNP leading to missense mutations, should clearly affect the odorant sensing capabilities of dogs. However, as the ligands of most of these OR are unknown, it is not possible yet to correlate the OR genetic polymorphism with variation in odorant perception. The level of polymorphism for about 50% of the OR genes was found to be higher than that for anonymous sequences, for which all, or almost all mutations arising during DNA replication are probably conserved. As there is no evidence to suggest that replication is itself defective, another mechanism, such as gene conversion, should be considered to account for this higher level of polymorphism, as suggested by the low LD values calculated within OR genes.
This process, which is of great importance in maintaining sequence homogeneity in genes with multiple copies, such as histone genes, has been proposed as a mechanism guiding the evolution of paralogous OR genes [40,42]. We suggest that this mechanism may be involved in the accumulation of SNP, although some of these mutations may lead to a less functional OR or may be nonsense mutations.
The accumulation of mutations diversifying OR aminoacid sequences may have two opposite effects that must be balanced: an increase in odour pattern recognition and the risk of a loss of function. Such losses of function do occur, as indicated by the ongoing pseudogenisation observed. However, the risk of losing the ability to sense a particular odorant is minimized by the highly combinatory code [37,38]. Nonetheless, not all OR genes are polymorphic, and up to 22% of the OR genes in an individual breed may be entirely non-polymorphic. This raises the possibility that these non-polymorphic OR may be involved in recognising odorants of particular importance or may have a unique binding specificity not shared by other OR. Finally, we observed that, for each breed studied, it was possible to define specific haplotypes for a number of OR genes characteristic of the breed, which could be used as a genetic signature to determine whether or not a particular dog belongs to a particular breed.

Authors' contributions
SR carried out molecular genetic studies, interpreted the data and drafted the manuscript. ST carried out molecular genetic studies and participated in sequence alignment. MR carried out molecular genetic studies. AV participated in the statistical treatment of the data. SD determined the nucleotide sequences. CA provided the DNA samples. CH participated in the statistical treatment of the data. FG conceived, designed, coordinated the study and helped to draft the manuscript. All authors read and approved the final manuscript.  These values were identified within 5 clusters of 104 to 182 kb (these clusters are described in additionnal file 5). ND: not determined (too few SNP pairs).