Previously we reported that the DUB/USP17 family of deubiquitinating enzymes had multiple tandemly repeated family members on chromosomes 4 and 8 in humans as well as on chromosome 7 in mice and that multiple family members were also present in rats . This suggested that members of this family were present within a tandemly repeated sequence, which was conserved among these species and all the family members had resulted from one common ancestral sequence which had duplicated in each species after their separation . In this current study, using the most current human genome assembly, we now show that our original assumptions insufficiently explained DUB/USP17 evolution. In particular, we now show that in the most current human genome reference assembly there is a block of tandemly repeated sequences on chromosome 4 containing a minimum of 23 DUB/USP17 sequences. Furthermore, the 9 DUB/USP17 sequences present on chromosome 8 are found in blocks of 3 repeats embedded within, in at least 2 cases, the copy number variable beta-defensin cluster. In addition, it is evident that whilst some of the family members in mice and rats may lie head to tail, there are not the same tandem repeat blocks seen on human chromosome 4. In addition, we have now identified multiple additional sequences within chimpanzee (3 on chromosome 4, 2 on chromosome 8, 3 on chromosome 11, 2 on chromosome 12), rhesus monkey (1 on chromosome 8), orang-utan (unresolved repeat on chromosome 8), horse (1 on chromosome 27), cow (3 on chromosome 4 and 1 on chromosome 3) and dog (5 on chromosome 16 and unannotated sequences on chromosome 21) which also appear to represent members of this family of genes (Results summarised in Additional file 9). Again the relationship between the sequences observed in humans, mice, rats, cows and dogs would suggest the presence of one ancestral sequence which has been duplicated in all of these species independently. However, it would also appear from comparison of human and chimpanzee sequences that some duplication has taken place prior to their divergence.
The arrangement of the human family members would suggest that the variation in RS447 copy number, and as a result the number of DUB/USP17 genes, may come about via two mechanisms. Firstly, the presence of tandemly repeated sequences on human chromosome 4 would suggest that this sequence is susceptible to tandem duplication, something which is further illustrated by the tandem repeats observed in murine and rat sequences as well as the large unresolved tandem repeat on orang-utan chromosome 8. Secondly, the localisation of a block of three RS447 repeat units within both copies of the beta-defensin cluster, previously shown to be present in the current human genome assembly , and to show copy number variation from 2 copies to 12 , may well account for part of the variation in RS447 copy number previously observed. It is therefore likely that both of these blocks contribute to the differences in RS447 copy number. However, the presence of an assembly gap between the two blocks of repeats observed on chromosome 4 indicating this repeat is not fully resolved, as well as assembly gaps between the beta-defensin clusters on chromosome 8 would suggest the genome assemblies to date are rather tentative and preclude us from determining how much each block contributes to the variation in copy number.
The tentative nature of our observations also makes it difficult to determine which of these blocks may contain the ancestral sequence. It is interesting that all the sequences in the tandemly repeated blocks on chromosome 4 show little divergence and the majority retain an intact ORF. This may well suggest this is a relatively recent evolutionary event, which has resulted from the transfer of DUB/USP17 sequences from chromosome 8. This conclusion is supported by the observation of a small number of beta-defensin and olfactory receptor sequences in close proximity to these tandem blocks as well as the observation that within all of the beta-defensin clusters, 2 of the 3 DUB/USP17 genes lack either an intact ORF or residues necessary for catalytic activity. Although, it must be noted that the sequences which appear to lack the necessary active residues could also function as ubiquitin binding proteins or competitors for the active family members. In addition, the observation of DUB/USP17 genes in association with beta-defensin, FAM90A and olfactory receptor genes in multiple species including humans, chimpanzee, rhesus monkey, mice, rats, horses and dogs would also support this hypothesis. This would also suggest that the ancestral DUB/USP17 sequence may well have existed in close proximity to beta-defensin and olfactory receptor genes and that this family has evolved along with these sequences in the species outlined. However, no beta-defensin genes are found within the vicinity of the murine and rat DUB/USP17 family members and syntenic regions to the beta-defensin clusters on human chromosome 8p23.1 exist on murine chromosomes 8qA1.3-A2 and 14qC3 as well as rat chromosomes 16q12.5 and 15p12, regions which are distinct to their DUB/USP17 genes on murine chromosome 7 and rat chromosome 1 . This could be explained by the observation that, in the different mammalian species a number of beta-defensin blocks have been identified, two of which are specific to rodents, and several of which show no overlap between humans and rodents . Therefore, the absence of the beta-defensin genes from the DUB/USP17 region in rodents may be due to the evolutionary pressures which have prompted them to evolve a distinct beta-defensin repertoire from other mammals.
Alternatively, the DUB/USP17 sequences may be mobile genetic elements which are more readily inserted into areas of the genome which are unstable and therefore their frequent co-localisation with the olfactory receptor and beta-defensin genes is due to the nature of the areas into which they insert and not to any co-evolution. This is supported by the observation that the closest relatives of the DUB/USP17 family, USP36 and USP42, have much more complex genomic structures suggesting that the DUB/USP17 family may not have resulted from gene duplication, but from the insertion of a mRNA sequence into an unstable genomic region.
The defensins are cationic antimicrobial peptides which are produced by mucosal epithelial cells lining the respiratory, gastrointestinal, and genitourinary tracts and as such are thought to play an important role in immune defence . The defensin genes on chromosome 8p23.1 consist of a block of alpha-defensin genes and a copy number variable block of beta-defensin genes. Only one of the alpha-defensins varies in copy number (DEFA1A3) due to its propensity for tandem duplication [21, 22]. However, the entire beta-defensin cluster shows copy number variation [16, 17, 19] and the presence of multiple and diverging copies of these genes may be important to boost host defence.
The olfactory receptors are often found as tandemly repeated sequences and there are approximately 400 apparently functional genes in humans, as well as an equivalent number of pseudogenes [23, 24]. These genes are also proportionally over-represented at regions that show copy number variation as well as regions that show segmental duplications . There should be positive selection to maintain a highly variable repertoire of olfactory receptors to allow the recognition of a wide array of odorant molecules, although it is apparent that primates have a smaller number of functional receptors in comparison to other mammals, possibly due to their acquisition of tri-chromatic vision [26, 27].
If the DUB/USP17 genes have co-evolved with these genes it could suggest they are under similar evolutionary pressures to the beta-defensin and olfactory receptor genes. However, on the basis that they have not diverged significantly, that their repertoire has not spread beyond chromosomes 4 and 8; and that on chromosome 8 all but one DUB/USP17 gene copy in each repeat block has been inactivated would collectively suggest this is not the case. Indeed, it has been observed that in regions of copy number variation, which are inherently unstable, the selection pressures to get rid of unnecessary additional copies of the olfactory receptors and other genes are reduced, and this may well account for their over representation . It is also interesting to note that each of the beta-defensin clusters maintains a DUB/USP17 gene with an intact ORF, suggesting there may be positive selection to maintain an active member in each repeat of this copy number variable repeat, especially, if as hypothesised, this is the ancestral sequence.
Moreover, unlike the beta-defensins and olfactory receptors, the known function of the DUB/USP17 family genes would suggest there should be no selective pressure for the acquisition of additional family members and as yet we have no experimental evidence that any of the intact ORFs, other than DUB-3 and USP17 are active. In fact, several lines of evidence suggest that the DUB/USP17 family regulate cell growth and survival, processes which are controlled by delicately balanced systems which can be easily disrupted and would not benefit from additional gene expression. In particular, DUB-1 expression results in cell cycle arrest prior to S-phase , DUB-2 expression markedly inhibits apoptosis induced by cytokine withdrawal  and we have previously reported that constitutive expression of DUB-3 blocks cell proliferation [6, 9, 28] through its regulation of the ubiquitination and activity of the 'CAAX' box protease RCE1 [9, 29]. In addition, it has also been observed that overexpression of USP17 family members can lead to apoptosis  and most recently it has been reported that DUB-3 regulates the ubiquitination and stability of CDC25A and thereby the progression of the cell cycle . Indeed, previous studies of RS447 would suggest mechanisms are present to regulate the transcript levels of DUB/USP17 genes. In particular, previous studies have indicated that cosmid vectors containing significantly different copy numbers of the RS447 repeat may produce similar levels of DUB/USP17 protein due to the production of anti-sense transcripts . In addition, it has been reported that some copies of the RS447 sequence can be methylated . Therefore, it is probable that the expression of the DUB/USP17 genes is tightly controlled and their existence within a highly polymorphic sequence is not so much a reflection of the selective evolutionary pressures on these genes, but a consequence of the unstable and copy number variable region of the genome in which they have evolved.
However, having now established that each copy of the copy number variable beta-defensin cluster contains a block of DUB/USP17 sequences using the current human genome build, it would also be interesting to examine the association of their copy number with immune related diseases. Previously, it has been demonstrated that higher copy numbers of this cluster are associated with protection against Crohn's disease , something which was hypothesised to result from the increased barrier to infection resulting from over production of the beta-defensins. In addition, higher copy numbers have also been associated with an increased risk of psoriasis . The DUB/USP17 family members are cytokine induced genes [3–6] which will be expressed in many immune cell types and could modulate the immune response. Therefore, their presence within this cluster could be related to its association with these immune related diseases. As a result, it may be informative to examine the overall copy number of these genes and see if there is any association with these diseases.