Insights into the ancestral organisation of the mammalian MHC class II region from the genome of the pteropid bat, Pteropus alecto

Bats are an extremely successful group of mammals and possess a variety of unique characteristics, including their ability to co-exist with a diverse range of pathogens. The major histocompatibility complex (MHC) is the most gene dense and polymorphic region of the genome and MHC class II (MHC-II) molecules play a vital role in the presentation of antigens derived from extracellular pathogens and activation of the adaptive immune response. Characterisation of the MHC-II region of bats is crucial for understanding the evolution of the MHC and of the role of pathogens in shaping the immune system. Here we describe the relatively contracted MHC-II region of the Australian black flying-fox (Pteropus alecto), providing the first detailed insight into the MHC-II region of any species of bat. Twelve MHC-II genes, including one locus (DRB2) located outside the class II region, were identified on a single scaffold in the bat genome. The presence of a class II locus outside the MHC-II region is atypical and provides evidence for an ancient class II duplication block. Two non-classical loci, DO and DM and two classical, DQ and DR loci, were identified in P. alecto. A putative classical, DPB pseudogene was also identified. The bat’s antigen processing cluster, though contracted, remains highly conserved, thus supporting its importance in antigen presentation and disease resistance. This detailed characterisation of the bat MHC-II region helps to fill a phylogenetic gap in the evolution of the mammalian class II region and is a stepping stone towards better understanding of the immune responses in bats to viral, bacterial, fungal and parasitic infections.


Background
Bats are the second most species-rich group of mammals after rodents, accounting for approximately 20% of all classified living mammals [1]. Bats are in the order Chiroptera and can be further classified into two suborders; Yinpterochiroptera, which includes all of the megabats and four microbat families (Rhinopomatidae, Rhinolophidae, Hipposideridae, and Megadermatidae), and Yangochiroptera which includes all remaining microbat families [2,3]. Relative to other species of similar body size, bats have a longer lifespan [4][5][6][7][8] and are one of the most successful groups of mammals having evolved to fill a variety of ecological niches across all continents, with the exception of the polar regions [1].
In recent years, bats have been increasingly recognised for their role in maintaining numerous pathogens (viruses, parasites and bacteria), with new pathogens being continually identified each year [9][10][11][12][13][14][15][16][17][18]. In addition to their association with viruses, bats are also host to a variety of other pathogens, including bacteria and parasites of zoonotic potential. A high prevalence of the bacteria, Bartonella sp and Leptospira, have been reported in a variety of bats [14,[19][20][21]. Intracellular hemosporidian parasites (including Plasmodium, Polychromophilus, Nycteria, and Hepatocystis) have also been identified across both suborders of bats [16]. Despite the number of pathogens that have been linked to bats, they rarely cause any clinical signs of disease in bats, a characteristic that has been hypothesised to be associated with a unique immune system [16,[22][23][24][25][26][27][28][29]. Curiously, the ability of bats to control intracellular pathogens may not extend to extracellular pathogens including some bacteria and fungi [30]. The devastating consequences of the fungal pathogen, Pseudogymnoascus destructans responsible for white nose syndrome (WNS), is an extreme example of an extracellular pathogen that is capable of causing disease in North American microbats [31,32].
The MHC is the most gene-dense and polymorphic region of the genome and the majority of genes encoded within this region play a role in immune defence. The MHC region of eutherian mammals can be broadly divided into three regions, class I, class II and class III. The class I and II genes are highly polymorphic and evolve through gene duplication and conversion in response to strong selection pressure by pathogens [33,34]. The organisation of the MHC region is highly dynamic and has been reorganised throughout vertebrate evolution as species evolve and adapt to new pathogenic and environmental pressures [35,36]. Understanding the evolution of the MHC region has the potential to provide valuable insights into hostpathogen evolution. In most eutherian mammals studied to date, the MHC-II region is highly conserved spanning~0.5 megabases (Mb) in pig to~1. 4 Mb in the horse genome [36][37][38][39][40][41]. The MHC-II region can be further divided into the extended and classical sub-regions, with all MHC-II genes localised in the classical sub-region [42]. Vital antigen-processing (AP) genes for the class I presentation pathway, such as proteasome subunit β types 8 and 9 (PSMB8 and 9), transporter associated with antigen processing 1 and 2 (TAP1 and 2) and Tapasin (TAPBP), are also found within the class II region, forming the AP gene cluster (DOB-TAP2-PSMB8-TAP1-PSMB9-DMB-DMA-BRD2 (bromodomain -containing protein 2)-DOA) [38,42]. Numerous autoimmune diseases have also been associated with genes found within the MHC-II region [43].
MHC-II molecules are heterodimers consisting of non-covalently linked α and β chains encoded by separate genes within the MHC-II region. They are expressed only on the surface of antigen presenting cells, such as B cells, monocytes, macrophages and dendritic cells, and accommodate antigens of 11 to 20 amino acid residues in length [44]. MHC-II genes can be further divided into classical and non-classical class II genes. The polymorphic classical MHC-II molecules (DP, DQ and DR) are responsible for the presentation of extracellular antigens (usually of bacterial, fungal and parasitic origin) to activate CD4 + (cluster of differentiation 4) T helper cells [45]. Activation of CD4 + T helper cells in turn coordinates the antibody-and cell-mediated immune responses. The non-polymorphic, non-classical class-II molecules (DM and DO) do not present antigens, but instead play an important role in antigen-processing [46].
Despite the importance of the MHC-II region in disease resistance, few studies have described MHC-II genes in bats. The limited work that has been reported to date has focused on the DRB locus of a variety of microbats and one species of megabat (Rousettus aegyptiacus) [47]. Extreme differences in MHC-II allelic polymorphism have been observed between different microbat species, with possible links between variation in population size, environmental and pathogen pressure [48][49][50][51][52]. Correlations between specific DRB alleles, ectoparasite load and reproductive state were also identified in the insectivorous bat, Noctilio albiventris [50,53,54].
Recently we described the class I region of the Australian black flying fox, Pteropus alecto, revealing a relatively condensed, yet conserved MHC-I region [55]. Comparative analysis of the bat MHC-I region provided insights into the evolution of the mammalian MHC and resulted in the identification of MHC-I genes with unique insertions in their PBG. To further build on this work and obtain deeper insights into the evolution of the MHC region, we describe the organisation of the MHC-II region of P. alecto, with the aid of its whole genome sequence [56]. As bats are an ancient lineage of mammals having diverged from other eutherian mammals approximately 88 million years ago (mya) [56], the bat MHC region fills an important phylogenetic gap in the evolution of the mammalian MHC. Furthermore, the identification of MHC-II genes is the first step towards understanding the role of MHC-II genes in defence against extracellular pathogens, including bacteria and parasites. To our knowledge, this is the first detailed analysis and characterisation of the MHC-II region and its content in any species of bat.

Bat genome data and annotation
The recently completed P. alecto genome was interrogated for MHC-II, AP and conserved class II flanking genes using the BLAST algorithm [57]. A single scaffold containing MHC-II related genes was re-annotated manually using GENSCAN [58] for gene prediction and gene identity confirmed using BLAST [57] against the NCBI database. The newly annotated MHC-II region was then visualised and analysed using Clone Manager Professional 9 (Sci-Ed Software, Denver CO USA) and Geneious version R7 (Biomatters Ltd, Auckland NZ).

Comparative analysis of the bat MHC-II region and genes
The human (Homo sapiens), horse (Equus caballus) and pig (Sus scrofa) MHC-II regions were adapted from the Ensembl annotation (versions GRCh37.p11 for human, EquCab2 for horse and Sscrofa10.2 for pig) for comparative analysis with the bat (P. alecto) MHC-I region using EasyFig software [59].

Promoter analysis
The region 600 bp upstream of human MHC-II genes (HLA-DOA, −DOB, −DPA1, −DPB1, −DQA1, −DQA2, −DQB1, −DQB2, −DRA, −DRB1 and -DRB5) were retrieved from Ensembl (version GRCh37.p11). The corresponding region was obtained for the bat MHC-II genes. The promoter regions of the bat class II genes were analysed by comparison to the human genes. All sequences upstream from the start codon were analysed using Clone Manager Professional Version 9 software to manually identify putative promoter S-X-Y motifs. Sequences were then collated and aligned, with sequence logos [60] of the S-X-Y motifs illustrated using the Geneious version R7 software package (Available from http://www.geneious.com/).

Gene and phylogenetic analysis
MEGA software version 5.2.1 [61] was used for all gene and phylogenetic analyses. Bat MHC-II and AP nucleotide sequences were aligned based on the protein alignment to retain codon positions with human HLA sequences as reference using MUSCLE [62]. All putative interaction sites within MHC-II genes were predicted based on Marsh et al. [63] and Bondinas et al. [64]. Corresponding nucleotide alignments were used for phylogenetic analysis using the Maximum Likelihood (ML) model with discrete Gamma distribution and 1000 bootstrap replicates [65][66][67]. The "Find Best Model (ML)" function was used to determine the appropriate substitution models for each dataset. The model with the lowest Bayesian Information Criterion (BIC) score was considered to best describe the substitution pattern for that dataset and was subsequently chosen for phylogenetic analysis. Neighbour Joining (NJ) [68] and Minimum Evolution (ME) [69] trees, with 1000 bootstrap replicates, were also constructed to corroborate with the ML trees. Tree Explorer was used for tree visualisation and illustration. Base-By-Base [70] was used to determine nucleotide and amino acid sequence identity between the bat and other mammalian MHC-II and AP genes.

Accession numbers
The GenBank (http://www.ncbi.nlm.nih.gov/Genbank) accession numbers and Ensembl (http://asia.ensembl.org/ index.html) transcript ID for the genes and gene products discussed in this paper are P. alecto scaffold202

Results and discussion
The bat MHC region is contracted but conserved in content The mammalian MHC-II region is divided into a classical region containing all of the MHC-II genes and an extended sub-region containing antigen processing genes [42]. A single scaffold (scaffold202) spanning 5,012,706 basepairs (bp) identified in the P. alecto genome [56] contained a 1,262,706 bp region corresponding to the entire MHC-II region. A total of 47 loci including 12 MHC-II genes, five AP genes and 30 other genes were identified, spanning from~360 kilobases (kb) upstream of the extended class II subregion to the end of the classical class II subregion (Fig. 1). Two open reading frames (ORFs) with no homologues were also predicted. The coordinates and accession numbers of the predicted genes are summarised in Table 1. Genes were annotated based on their similarity to orthologous genes in other species. MHC-II genes were named according to the nomenclature proposed by Klein et al. [71] and their evolutionary relationships with MHC-II families (described in the phylogenetic analysis below).
The bat classical and extended MHC-II sub-regions, bordered by butyrophilin-like protein 2 (BTNL2) and kinesin family member C1 (KIFC1), were highly contracted, spanning~0.78 Mb compared to~1.3 -2.2 Mb in other mammals [38,39,42,72,73]. The pig (Sus scrofa) is the only other mammalian species known to have a contracted MHC-II region (~0.5 Mb) [41,74]. A similar pattern has previously been described for the MHC I region, which is contracted in both bats and pigs [55]. The contracted MHC region of bats and pigs is consistent with their smaller genome sizes;~2.3 gigabases (Gb) for bats and~2.7 Gb for pigs compared to humans and other mammals, which have an average genome size of~3.5 Gb [56,75].
The organisation of the bat MHC-II region was compared with corresponding regions from the human, horse and pig genomes. The human MHC region was used as a reference since it is the most well characterised, the horse was included due to its close phylogenetic relationship with bats, and pig was included due the similarity in size of its MHC to that of bats [56,76]. The organisation of the bat MHC-II region is highly syntenic with the human, horse and pig MHC-II regions (Fig. 2). The mammalian classical class II sub-region, between reference framework genes BTNL2 and collagen, type XI, alpha 2 (COL11A2), is highly conserved in eutherians and contains all of the MHC class II genes [36,37,40]. Although this region is highly contracted in P. alecto, the gene order and content remain conserved, including the presence of the entire AP gene cluster (DOB-TAP2-PSMB8-TAP1-PSMB9-DMB-DMA-BRD2-DOA). Eleven MHC-II genes were found within the bat classical class II sub-region, compared to 16 in human [38,42], seven in horse [39] and 19 in pig [41]. The extended sub-region gene organisation, between reference framework genes COL11A2 and KIFC1, is also highly conserved, with contraction observed in the bat (~200 kb) compared with human (~250 kb), horse (~250 kb) and pig (~400 kb). The essential AP gene, TAPBP, was also highly conserved in terms of location and orientation.
The bat class II region contains all of the known classical class II gene families; DP, DQ and DR that are responsible for antigen presentation. The DM and DO genes, that encode non-classical class II molecules and play a vital role in peptide loading onto classical class II molecules, were also present in the bat MHC-II region. Bats appear to lack functional DP α and β chains, with only a single DPB1 locus encoding a partial DP β chain sequence, likely corresponding to a pseudogene (Fig. 3). The significance of this finding has yet to be elucidated.  However, the overall genetic diversity of the DP loci in mammals are generally lower compared to their DQ and DR counterparts (human, IPD -IMGT/HLA v3.27; pig [74]). The bat DP locus may historically have been under less selective pressure, contributing to lower DP diversity [34]. Alternatively, low recombination within the bat MHC-II region may have resulted in fewer DP loci compared to other mammals [33]. As illustrated in Fig. 3, pairs of MHC-II genes are encoded adjacent to each other, with the exception of DOA and DOB. Horse was omitted from this analysis due to incomplete annotation of its MHC-II genes. The location of essential nonclassical class II genes, DM and DO, are also well conserved in the bat. An intriguing finding is that P. alecto possesses two potentially functional copies of DQA and DQB. This finding is consistent with previous observations of multiple DQ genes in herbivores, including horses [39], sheep [73] and pandas [77], but not in non-herbivorous mammals, such as humans, pigs and dogs.
Wan et al. [77] speculated a possible link between multiple functional DQ loci and herbivory. Possession of a larger repertoire of functional DQ loci in herbivores could potentially confer disease resistance to bacterial, fungal and parasitic infection, which are encountered more frequently through ingestion of plants than meat [77]. An unusual finding was the identification of an MHC-II gene outside the MHC-II region. A gene designated DRB2 was located~355 kb upstream of SYNGAP1 which marks the end of the extended class II sub-region in other species and~580 kb from the nearest class II gene (DPB) (Fig. 1). Comparative analysis of the composition of class II region of P. alecto with that of other mammals illustrates the unusual nature of the presence of a class II gene (light blue block) in this location in the bat genome (Fig. 3). An MHC-II gene located outside the conserved framework genes COL11A2 and BTNL2 has not been described in any other eutherian mammal to date. However, differences in the organisation of the Locus tags refer to annotations in the P. alecto whole genome ψ represents putative pseudogenes Fig. 2 Comparative gene maps of bat MHC-II region (centre) against human, horse and pig MHC-II regions. Red arrows represent classical MHC-II genes, green arrows represent non-classical MHC-II genes, blue arrows represent flanking MHC-II region genes and purple arrows represent essential AP genes [37]. The areas highlighted in purple represent the antigen-processing (AP) cluster. ψ represents putative pseudogenes. The direction and orientation of the MHC-II regions relative to the telomere centromere are shown. The human, horse and pig gene maps were adapted from the Ensembl annotation class II region have been reported in marsupials. The opossum has a combined class I/II region and tammar wallaby class II genes are located in two regions of the genome, thus providing evidence for the presence of class II genes outside the class II region prior to the divergence of marsupials and eutherians [72,78]. The bat DRB2 gene appears to be a pseudogene and may therefore be a remnant of an ancient class II duplication block that became extinct during mammalian evolution.
Bat MHC-II genes , which is within the range of two pseudogenes in pigs to eight in humans. Five MHC-IIA and four MHC-IIB transcripts previously identified in a P. alecto transcriptome dataset [76] corresponded to MHC-II genes identified in this study, thus providing further evidence that these loci encode functional genes (Additional file 1). Alignments of deduced protein sequences of bat MHC-IIA and IIB genes, with sequences from other mammals, revealed the presence of conserved cysteine residues, peptide-binding and CD4 interaction sites in the bat class II proteins (Additional file 2 and 3 respectively). The conserved location of cysteine residues responsible for intra-chain disulphide bonds in the bat class II sequences is consistent with conservation of the 3D structure with human class II molecules [44].

Sequence similarity and phylogenetic analysis of bat MHC-II genes
Overall, the bat MHC-IIA and -IIB genes are highly conserved with those of other mammals (Additional file 2B). The α1 and β1 domains which form the antigen binding region shared consistently lower nucleotide similarity compared to the α2 and β2 domains (76 -98 and 61 -92% respectively), reflecting the evolution of the antigen binding region in response to pathogen pressure (Additional files 2 and 3). DOB was the only exception, sharing higher sequence similarity in the β1 domain compared to the β2 domain (Additional file 3B). In humans and mice, DO negatively regulates DM by stably associating with DM to inhibit peptide loading, thus affecting the peptide repertoire presented to T cells [79]. The high conservation of the bat DOB β1 domain may reflect a similar role for this molecule in P. alecto. Amino acid similarity followed similar trends to those described above for nucleotide similarity across the α2 and β2 domains.
MHC-II genes evolve via the birth-and-death process [34] but their gene turnover is much slower compared to their class I counterparts, resulting in orthologous relationships between class II genes from different species [33]. Phylogenetic analysis of the five bat class IIA genes (α subunit; Fig. 4a) and seven bat class IIB genes (β subunit; Fig. 4b) were performed using exon 3 (α2 or β2 domains) nucleotide sequences to include all functional genes and putative pseudogenes. Since the α2 and β2 domains are not involved in antigen presentation, they evolve in the absence of pathogen selection pressures, thus providing the most accurate representation of the phylogenetic relationships of MHC-II genes. As shown in Fig. 4, the bat class II genes formed orthologous relationships with corresponding class II genes from other species. The bat DMA, DOA, DOB, DRB1 and DRB2 genes were most closely related to the horse as expected, the two sharing a common ancestor~88 mya [56]. The remaining bat class II genes cluster with corresponding sequences from the laurasiatherian mammals (Fig. 4). This result concurred with phylogenetic trees in Kupfermann et al. (1999), demonstrating MHC-II intron sequences from two species of microbats and one megabat cluster in a species specific manner, similar to other mammals. Interestingly, Ptal-DQA1 is basal to Ptal-DQA2 and other mammalian DQA genes analysed, possibly reflecting its ancestral nature and the evolution of Ptal-DQA2 through a recent duplication event.

Analysis of the bat MHC-II promoter region
The transcription of MHC-II genes is regulated by conserved sequences in the proximal promoter regions (S, X and Y boxes) [80,81]. These motifs are highly conserved among class II loci, both within and across mammalian species, and are necessary for optimal constitutive and cytokine induced gene expression. Transcription is largely regulated by the class II transactivator (CIITA), which interacts with several transcription factors, particularly those that bind to the SXY motif [82]. To identify regulatory motifs in the bat class II sequences, the region 500 bp upstream of the translation start site of the 12 bat MHC-II genes was analysed. Using manual examination and annotation, with reference to other mammalian promoter elements, the S-X-Y motifs were identified in seven of the 12 bat class II genes (Ptal-DMA, −DOA, −DOB, −DQA2, −DQB1, −DQB2 and -DRA), with coordinates within the class II genomic region summarised in Table 2. No S-X-Y motifs could be identified for Ptal-DMB, −DPB1, −DQA1, −DRB1 or -DRB2. No CAAT or TATA boxes were identified in any of the bat MHC-II genes, similar to class II genes from the Tasmanian devil [83]. Although putative CAAT and TATA boxes are present in human MHC-II genes, their functional significance is unknown [81]. Successful identification of putative promoter elements provides further evidence that at least seven of the bat class II genes are likely functional and is consistent with evidence for the presence of two pseudogenes, DRB2 and DPB1.
Using the 11 human MHC-II genes (HLA-DOA, −DOB, −DPA1, −DPB1, −DQA1, −DQA2, −DQB1, −DQB2, −DRA, −DRB1 and -DRB5) for comparison, sequence logo diagrams of the S-X-Y motifs of the bat and human class II genes were generated (Fig. 5). The distance between the S and X motifs ranged from 1 to 25 bp in the bat α and β chains compared to 1 to 16 bp in the human class II genes. In human B cells, class II genes with greater distance between S and X motifs have higher constitutive expression [84]. Whether the greater S-X distance observed in the bat genes is indicative of higher expression in bat cells remains to be determined. The X and Y motifs interact with DNA-binding   heterotrimers RFX [85] and NF-Y [86,87] respectively, which are subsequently bound by CIITA acting as a master regulator of MHC-II expression [88]. The distance between the X and Y motifs in the bat, which ranged from 16 to 25 bp, falls within that of human class II genes of 17 to 18 bp, consistent with bats potentially using similar factors (transcriptional enhancers or repressors) to regulate their class II gene expression.

Analysis of the Bat Antigen-Processing (AP) genes
The AP gene cluster (DOB-TAP2-PSMB8-TAP1-PSMB9-DMB-DMA-BRD2-DOA) resides within the bat classical class II sub-region as discussed above (Figs. 1 and 2). Although this sub-region is considered to be highly polymorphic due to its MHC-II gene content, the organisation of the AP cluster is highly conserved across most mammals [36,37,40,42,72,77,89]. The bat AP gene cluster is contracted compared to that of humans (~180 kb vs~200 kb respectively), but remains highly conserved in synteny (Fig. 2). This conservation is essential in ensuring successful cleavage of peptides by PSMBs and subsequent peptide transportation for processing by TAPs via the class I presentation pathway. Detailed analysis of bat AP genes (PSMB8, PSMB9, TAP1, TAP2 and TAPBP) within the class II region revealed high genetic conservation compared with other mammals. Sequence alignments of AP genes indicated high nucleotide and amino acid similarity (Additional file 4) with sequences from other mammalian species. This would suggest that AP genes in bats are functional, therefore ensuring proper processing and loading of peptides onto bat MHC-I molecules. Together, these results are consistent with the presence of both functionally and structurally conserved bat MHC-II molecules, with similar antigen-presenting capabilities and properties to those found in other mammals.

Conclusion
The bat MHC-II region is condensed in size but highly conserved with that of other eutherian mammals. At least 12 MHC-II genes are present in the bat genome, two of which (DPB1 and DRB2) appear to be pseudogenes, with the DPA locus remaining elusive. All identified bat class II loci have relatively conserved gene structures and were orthologous to other mammalian class II loci. Bats also possess an atypical MHC-II region, with at least one MHC-II gene (DRB2) located outside the class II region. The presence of a class II gene outside the MHC-II region is a first for a eutherian mammal and provides evidence for an ancient class II duplication block. This first resolution of the bat MHC-II region contributes valuable information on the comparative genomic evolution of the mammalian immune system. Detailed analysis of AP genes further suggests highly conserved and functional MHC-I and -II antigen presentation pathways, supporting the importance of this region in disease resistance in general. The characterisation of genes within the MHC-II region provides the first step towards understanding their roles in resistance to disease in response to the high diversity of viruses, bacteria and parasites identified in bats. Further studies to examine the presentation of peptides associated with extracellular pathogens by MHC-II molecules may contribute to our understanding of the response of bats to infection with extracellular pathogens. This study also provides the basis for further characterisation of the diversity of individual MHC-II genes in P. alecto and the role of selective pressures, including pathogens, in shaping their diversity. Detailed characterisation of MHC-II regions from different bat species will also be required to determine whether the architectural pattern observed in P. alecto applies across other bat species.