HSFY genes are amplified on pig Yp
Amplified genes and gene families are a common feature of Y chromosomes in mammals - indeed, of sex chromosomes in general. The HSFY genes are an example of this in pigs. We have shown here that there are two forms of HSFY, long and short. Both forms are present at high copy number on the Y chromosome, almost entirely located within a single cytogenetic band on the short arm. Expression analysis reveals that both forms are expressed, though evidence from EST libraries and our sequencing suggests that only the short forms have coding potential. Nonetheless, pseudogenes can acquire biological functions, for example as regulatory long non-coding RNAs , and thus there remains a possibility for functionality to be identified in future.
The major structural difference between the long and short forms is the presence or absence of a SINE within the intron. This SINE - Pre0_SS - is annotated in Repbase as being a still active pig lineage specific tRNA SINE . Given that we can find long and short forms in all the suiform species in this study, it is probable that the SINE originally inserted when there were a small number of HSFY copies in the ancestral genome, and subsequently both long and short copies underwent amplification. Given that we found two copies that do not cluster with long or short form, it is likely that there are other variants of HSFY not detected by our primer sets.
Estimates of the overall copy number of HSFY (Table 2, Fig. 5) suggest that there are about 100 copies in the domestic pig genome, split between long and short forms, with a bias toward the short form. This number is based on comparison to SRY, which until recently was believed to be a single copy gene in suids (see ). The other four species presented may also have two SRY copies, but the estimates we generate consider both possibilities. The other Sus member, S. celebensis, has 70 or 140 copies; again with a bias towards the short form. This suggests there has been only limited expansion of the HSFYs in either lineage from their common ancestor. The babirusa (B. babirusa) is an outgroup to the other species, and only a small number of copies were detected (1-2 short, 6-12 long); either B. babirusa has significant copy number loss or sequence divergence, or the HSFY amplifications predominantly occurred after the B. babirusa lineage diverged from other suids. The remaining two species tested provide tentative support to the latter scenario: the warthog P. africanus has a low number of copies (20–40), compared to the bushpig P. larvatus (70–140). Based on the phylogeny of these species, the most plausible explanation for this pattern is an amplification within the P. larvatus lineage. Consequently, even with the caveats of SRY copy number and broadness of primer coverage, there is evidence supporting two independent bursts of HSFY amplification within the suids.
The study here has focussed on the HSFY genes. However, the FISH analysis has demonstrated that the ~5 Mb HSFY region of the Y chromosome is not solely composed of HSFY copies. The full extent of the other sequences within this region is not known, due to the difficulty of assembling highly repetitive sequences reliably. Still, there are two other identified genes close to HSFY copies that are also amplified, thought to be pseudogenes (RPS2 and XKR3-like; see also  for the complete context of the pig Y).
Amplified genes on the sex chromosomes have been associated with genomic conflicts in mice (e.g. ). These genes generally act by favouring the transmission of the chromosome on which they reside, or by suppressing the transmission of their opposite gametologue . The situation in pig is different to known genomic conflict models, however, in that there are no observed gene family expansions on the X chromosome that might be responding to the expansion on the Y (see ), and we therefore consider that a similar mechanism of genomic conflict is unlikely. The X-chromosome homologue of HSFY, HSFX, was previously predicted (Genbank: XM_005654314.1). As with HSFX/HSFY comparisons in other species (e.g. ), there is little sequence identity between the X and Y copies. Indeed, the only alignable region is the DNA binding domain. It is clear that if there is any biological role for HSFY, it has been distinct from HSFX for the majority of mammalian Y chromosome evolution.
A further possibility is that the expansion is evolutionarily neutral - a concentration of repetitive material provided a substrate for process such as non-allelic sister chromatid exchange, causing sequence amplification, but without any selective pressure, or a biological function associated with the increase. This seems less likely; if there were no functional role for the extra HSFY copies, we would expect to see an accumulation of mutations within both short and long forms, abolishing the open reading frame. However, the short form copies appear to be predominantly translatable. The status of the promoters is not clear, given we have sequences for only a subset of the total HSFY complement: weakened or disrupted promoter activity could ‘normalise’ expression to the level of a single gene copy (and this would be consistent with the apparently lower expression levels we found from the short form (Fig. 3). Further work is required to distinguish between these possibilities.
Our tests for evidence of selection for rapid amino acid change suggested no evidence for such positive selection. However, there was strong evidence for purifying selection amongst the coding copies between Sus scrofa and the other species, and between the Sus scrofa copies themselves. This again supports the idea that the copy number of these genes is functionally relevant, and that this function is maintained amongst the suid species studied here.
Further HSFY variants may be present
Two S. scrofa non-coding HSFY variants lack a SINE, but also do not cluster with the short form copies (OTTSUSG00000005614 and 5682; orange in Fig. 3). These variants have nucleotide differences within the binding regions for the primer sets 1, and could not be detected in expression or copy number studies in S. scrofa or any other species. It is thus possible that these are two representatives of a further diversification of the HSFY family; the sampled fosmids cover only a small portion of the complete ~5 Mb HSFY-block.
One species showed a different organisation to the others: Tayassu pecari, the white-lipped peccary. Neither the consensus short form nor the long form was identified. Instead, three similar variant species-specific forms were seen (purple sequences in Fig. 4). None of these appear to have coding potential, nor is it known what the copy number of HSFY is in any of the peccary species. Both peccary species share a common ancestor after the divergence with the suids approximately 40 million years ago. Since P. tajacu has at least one each of long and short forms, it is most likely that there has been little amplification in the peccary lineage, and species-specific diversification of the HSFY copies in T. pecari. Previous comparative chromosome painting studies have suggested that the peccaries have higher rates of chromosomal rearrangement than suids . Of the two peccary species in this study, the T. pecari karyotype appears the more derived [23, 24], and this may contribute to the differences seen in T. pecari.
A single HSFY pseudocopy lies outside the main block near TSPY
One HSFY copy (OTTSUSG00000005716) in domestic pig lies outside the HSFY-block, close to TSPY . It has a premature stop codon within the DNA binding domain of the first exon, and thus cannot form a valid HSFY product, nor do we have evidence it is expressed. The sequence is similar to the short form, but clusters distinctly outside the other short forms (Fig. 3; S. scrofa lone). Its presence could be attributable to (1) an ancestral HSFY copy (many other species have multiple HSFY copies, and perhaps one of these copies gave rise to the long and short forms while the other remained unamplified; or (2) this is derived from another short form copy that relocated from the HSFY block during the evolution of the pig Y chromosome. We reconstructed the series of rearrangements on the Y chromosome from the ancestral mammalian Y as described in our associated X and Y sequencing paper , but see no obvious opportunity for an HSFY copy to be relocated to the vicinity of SRY or RBMY. This does not preclude more complex undocumented rearrangements. Further cross-species cytogenetics will be able to investigate this possibility.
Comparison with cattle suggests independent amplifications
Cattle also have a documented expansion of HSFY [11, 12], also with no apparent corresponding HSFX expansion on the X chromosome. This opened the possibility that the amplification predated the bovine/suid divergence, and was then maintained in each lineage. Recent evidence from sheep has suggested that this is not the case, the cattle HSFY expansion occurring after sheep and cattle diverged about 22 million years ago , with variation in HSFY copy number between different cattle breeds . Accordingly, our alignments of pig HSFY sequences to documented cattle HSFY sequences show no evidence for the intronic SINE that distinguishes the long and short forms, and which must predate the initial amplification of the copies in pig. As a result, there are multiple lines of evidence pointing to independent amplifications of HSFY in these two lineages.
Further to this, our qPCR data provide tentative support for at least two separate amplifications of HSFY within the suids: once within the P. larvatus lineage, and again in the Sus lineage. However, this is subject to uncertainties of SRY copy number in each species and variation in qPCR primer binding sites; full confirmation of the copy numbers will require a more detailed sequencing approach to detect all variants of HSFY in each species.
From an evolutionary perspective, recurrent amplifications are very interesting; we do not know if the HSFY expansion is neutral, driven by chance and the genomic landscape within which they occur, or subject to selection for increased copy number, with an important biological role. In humans, the active HSFY genes are expressed in Sertoli cells and spermatogenic cells, potentially with a different role in each ; in cattle, HSFY is expressed in spermatogonial and spermatocyte cells . Some evidence from cattle breeds has suggested an inverse correlation between HSFY copy number and testicular size, and a positive correlation with conception rate . It is possible that similar phenotypes will be associated with the HSFY genes in pigs. Testicle size in pigs is correlated with the levels of the hormone androstenone in body fats , which is predominantly genetically determined . High levels of androstenone contribute to an unpleasant odour in male carcasses called boar taint; currently male piglets are often castrated to reduce the risk of this taint developing. Consequently, understanding any associations of HSFY genes with fertility and testis development will be of particular interest to the animal breeding industries. However, it remains to be determined whether HSFY copy number is variable between individuals or breeds of domestic pigs.