A complete DNA sequence map of the ovine Major Histocompatibility Complex

Background The ovine Major Histocompatibility Complex (MHC) harbors clusters of genes involved in overall resistance/susceptibility of an animal to infectious pathogens. However, only a limited number of ovine MHC genes have been identified and no adequate sequence information is available, as compared to those of swine and bovine. We previously constructed a BAC clone-based physical map that covers entire class I, class II and class III region of ovine MHC. Here we describe the assembling of a complete DNA sequence map for the ovine MHC by shotgun sequencing of 26 overlapping BAC clones. Results DNA shotgun sequencing generated approximately 8-fold genome equivalent data that were successfully assembled into a finished sequence map of the ovine MHC. The sequence map spans approximately 2,434,000 nucleotides in length, covering almost all of the MHC loci currently known in the sheep and cattle. Gene annotation resulted in the identification of 177 protein-coding genes/ORFs, among which 145 were not previously reported in the sheep, and 10 were ovine species specific, absent in cattle or other mammals. A comparative sequence analyses among human, sheep and cattle revealed a high conservation in the MHC structure and loci order except for the class II, which were divided into IIa and IIb subregions in the sheep and cattle, separated by a large piece of non-MHC autosome of approximately 18.5 Mb. In addition, a total of 18 non-protein-coding microRNAs were predicted in the ovine MHC region for the first time. Conclusion An ovine MHC DNA sequence map was successfully assembled by shotgun sequencing of 26 overlapping BAC clone. This makes the sheep the second ruminant species for which the complete MHC sequence information is available for evolution and functional studies, following that of the bovine. The results of the comparative analysis support a hypothesis that an inversion of the ancestral chromosome containing the MHC has shaped the MHC structures of ruminants, as we currently observed in the sheep and cattle. Identification of relative large numbers of microRNAs in the ovine MHC region helps to provide evidence that microRNAs are actively involved in the regulation of MHC gene expression and function.


Background
The sheep is one of the major domestic animal species for human meat protein, milk, and its wool is a source of industrial fiber. The Major Histocompatibility Complex (MHC) of the sheep, also designated as ovine Lymphocyte Antigen (OLA), harbors clusters of immunological genes involved in overall resistance/susceptibility of the animal to infectious diseases [1][2][3]. A number of agriculturally important traits, especially those related to disease resistance to various pathogenic viruses, bacteria and parasites, are closely linked to genes in the MHC [4][5][6]. Furthermore, genetic loci in the MHC are organized to form distinct functional clusters as class I, class II, and class III, which show a considerable level of conservation among mammal species [7][8][9][10][11][12][13][14][15][16][17][18][19]. The importance of sheep MHC molecules in disease resistance [6,[20][21][22][23] and the associated structure features in artiodactyls have led to increased studies on the sheep MHC [5,21,[24][25][26]. However, the detailed sequence information for ovine MHC is not sufficiently adequate, and only a small number of ovine MHC genes have been identified as compared to those in sheep and cattle.
Studies of the ovine MHC also help to provide valuable information on comparative genome evolution in mammals. The extreme high level of polymorphism observed for MHC loci may be a result of the evolutionary consequences of intensive interactions between infectious pathogens and the host defensive system [7]. Haplotype difference among different breeds adds another level of complexity. Previous studies on the OLA have largely been focused on the gene content and polymorphisms of the class region [27][28][29][30][31][32]. Based on the genetic linkage studies, the ovine MHC seems to have a special feature in that the class II has been divided into two sub-regions, similar to that of bovine [33][34][35][36][37]. However, with the limited sequence information available for the sheep, such structural features can not be adequately assessed by comparison with that of the cattle.
We previously constructed a BAC-clone-based physical map of the ovine MHC for Chinese merino fine-wood sheep [26], a valued sheep breed predominant in Northwest China especially in the Xinjiang Uygur autonomous region. The DNA used for BAC library construction was obtained from a heterozygous Chinese merino male, this animal being a merino ram that shares less than 1/32 of the blood from a local Chinese sheep breed. The BAC clone source we established facilitates the physical map construction for sheep MHC and for whole sheep genome, which serve as a reference frame work for subsequent sequencing. To facilitate the DNA sequencing, a BAC clone gap which previously existed between locus Notch4 and Btnl2 was successfully closed by addition of two more overlapping BAC clones [38].
Here we describe our work on sequencing of the entire ovine MHC by shotgun sequencing of the 26 BAC clones, assembling of the sequence data into a finished DNA sequence map as guided by the physical map, and the sequence analysis that resulted in identification and annotation of 177 genes and 18 microRNAs in ovine MHC region.

Results and Discussion
DNA shotgun sequencing was successfully performed for 26 overlapping BAC clones, generating approximately 8-fold coverage of the genome equivalent data. The fully-assembled sequences for all of the BAC clones were deposited into GenBank with accession numbers FJ986852 -FJ985877 ( Table 1). The quality of the sequence determined was adequate, with an estimated error rate less than 0.025% for most of the BAC clones. An average of 1.3 gaps existed per BAC clone, mostly due to highly repetitive sequence. A gap here refers to a stretch of DNA for which the exact nucleotide base identity (A, G, T, or C) remain ambiguous after resequencing, represented by a tandem number of "N" between the determined sequences.
A complete DNA sequence map of the ovine MHC was successfully assembled as guided by the BAC clone physical map ( Figure 1). The map spans approximately 2,434,000 nucleotide bases in length, covering almost all MHC loci currently known for both ovine and bovine species. The finished sequence map was discontinuous, as expected from the physical map. The major sequence segment spans approximately 2,071,000 nucleotide bases, harboring class I, class III, and class IIa of the ovine MHC. The shorter sequence segment spans approximately 363,000 nucleotide bases, harboring loci in the class IIb region and extending into the non-MHC region.
Sequence analysis resulted in the identification and annotation of 177 protein-coding genes/ORFs in the ovine MHC ( Figure 1, Additional table 1). Of the 177 ovine genes identified, 131 were homologous to previously annotated genes in cattle, sheep or other mammal species, 36 matched to the predicted but not yet annotated genes in the cattle, and 10 were ovine species specific, having not been found in human, mouse, cattle or other mammal sequences. The location, transcriptional orientation, and relative size of the identified genes were determined ( Figure 1). Among the genes identified, a total of 145 identified ovine genes were reported for the first time by this study. The ovine-specific genes were temporally nominated as "OaN" followed by a numeric number, where "Oa" is abbreviation for Ovis aries, and "N" for novel (Additional file1). Preliminary experiments confirmed the mRNA transcripts for 4 of the predicted ovine-specific genes (data not shown). The distribution of these novel genes seems to be random throughout the ovine MHC region. It is interesting to notice that a multiple DQ loci (DQ cluster) were identified, each with different orientation of transcription, when compared with those of other sheep breeds [39,40]. Such difference may be due either to breed or haplotype differences, as a subsequence of differential gene duplication [41].
An additional 18 genes encoding micro RNAs were identified by software prediction in an effort to search for non-protein-coding genes/components using the Rfam database analysis tools ( Table 2). The orientation and distribution of these micro RNAs showed a randomized pattern throughout MHC region. This is the first time that a relatively large number of microRNAs have been identified in ovine MHC region. Given the functional importance of microRNAs for regulating gene expression by mRNA cleavage or repression, this preliminary finding help to provide evidence that micro-RNAs may be actively involved in the MHC response to pathogens in general.
Sequence alignments among the human, sheep, and cattle MHC showed an overall conservation, with the level of homology reaching over 85% for the MHC class I, class III, and part of class II regions. The major difference in the MHC structures was found in the class II region. In human it was a continuous segment with no interruption, while in the sheep and cattle it was divided into IIa and IIb subregions by a large piece of non-MHC autonomic insertion. In addition, the gene order of class IIb in both ovine and bovine regions showed an opposite orientation relative to that of human ( Figure 2).
Analysis of the sequence homology between ovine and bovine MHC regions demonstrated a remarkable conservation, with the overall homology reaching 86%. The actual level of homology could be higher because a number of gaps (over 10-40 kb) in the available bovine sequence contributed negatively to the homology score. For virtually any locus currently identified in bovine MHC, a homologous match could be identified in the ovine MHC, including those in the class IIb region ( Figure 2). It is noteworthy that the ovine and bovine MHC class IIa and IIb regions exhibited exactly the same gene order and structural layout. In addition, the piece of non-MHC autonomic insertion between IIa and IIb was estimated to be of the same length (approximately 18.5 Mb) for both species. Furthermore, the order of bovine and ovine genetic loci within the inserted autonomic region was essentially the same as tested by over 120 SS-PCRs (data not shown). Taken together, these results support the hypothesis that cattle and sheep shared an ancestral chromosome containing the MHC before their divergence by evolution.
The hypothesis that cattle and sheep shared an ancestral chromosome was previously proposed in the studies of cattle [42][43][44]. Detailed mapping of BTA23 by radiation hybrid analysis [43,45] suggested that the ancestral MHC a Defined as a ratio between total number of base pairs sequenced and total number of base pairs of the inserts in a given BAC clone. b Error probability of a particular base call, corresponding to a quality value as determined by the equation: Q = -10log 10 (P e ), where P e is the error probability. c The total number of shotgun DNA sequencing reactions performed for a given BAC clone. d In genomic mapping, a series of contigs that are in the right order but not necessarily connected in one continuous stretch of sequence. e The number of regions where the exact nucleotide base (G, A, T, or C) could not determined, represented by a strips of "N" in a given BAC clone.
was likely disrupted by a large inversion that produced the bovine MHC class IIa and IIb regions. With the availability of detailed sequence information from the two ruminant species (bovine and ovine), the hypothesis has now gained additional support from the experimental data. Our sequence analysis also identified a butyrophilinlike (Btnl ) cluster at the boundary between the ovine class IIa and III (Figure 3). Banal is critical for milk secretion and production [46]. Comparison of Btnl loci duplication showed that sheep has a moderate number of Btnl copies, more than that seen in platypus but less than those shown by mouse, rat or swine that have a larger litter sizes (Figure 3). Btnl is absent in non-mammal species like amphioxus, frog, and chicken, appears (Btnl2) in platypus, and is duplicated extensively in mammals that have more litter sizes. This might be an indication that milk production was closely associated with the function of MHC in mammals, due to the apparent need for mammals to protect their offspring from microbial infections via milk ingestion. Taken together, we propose a hypothesis that, formation of the Btnl loci is associated not only with the gene duplication of immunological loci, but also with the emergence of mammals in evolutionary history.

Conclusion
A complete ovine MHC sequence map was assembled by successful shotgun sequencing of 26 overlapping BAC clones. This makes the sheep the second ruminant species for which the MHC sequence is available for evolutionary and functional studies. Gene annotation resulted in the identification of 177 genes, among which 145 were identified for the first time, and 10 were ovine-species specific. In addition, a total of 18 micro-RNAs coding sequences were predicted in the ovine MHC for the first time. Comparative analysis revealed a remarkable conservation of MHC sequence between sheep and cattle, supporting the hypothesis that the two species shared an ancestral chromosome that shaped the ruminant MHC as currently observed. Identification of a relatively large number of micro RNAs in the ovine MHC region helps to provide evidence that micro RNAs are actively involved in the regulation of MHC gene expression and function.

DNA shotgun sequencing
Shotgun sequencing libraries were constructed individually for each of the 26 BAC clones following the modified protocols described by Celera Genomics Group [47]. Briefly, E. coli stock containing the target BAC clones were used to prepare the BAC clone DNA, which were solicited to form randomized small DNA fragments between 0.5 -2.0 kb. After cloning of the small fragments into the plasmids, random DNA sequencing was performed with an ABI 3730 automated DNA sequencers (Applied Biosystems, USA) to generate the randomized short DNA sequence reads.

Assembling of BAC clone sequences
The short random DNA sequences generated by the sequencing were assembled into full-length sequence utilizing the Prep program (U.W., Seattle, WA, USA) for each  of the BAC clones. Resequencing was performed when necessary for gaps detected during the sequence assembly, including sequencing by primer walking of the PCR-amplified fragments for regions showing low level of accuracy. Blast alignments [48] of the repeat-masked, assembled sequence against NCBI EST and non-redundant nucleotide databases were performed to identify expressed sequences and other highly conserved regions likely to contain functional genes.

Sequence analysis
The assembled ovine MHC sequence was analyzed using an automatic Ensemble pipeline [49] with modifications to aid the manual duration process. Simple and interspersed repeats were detected using Tandem Repeats Finder [50] and Repeat Masker, respectively, using the mammalian library along with cow-specific repeats submitted to EMBL/NCBI/DDBJ. The combination of simple and interspersed repeats was used as a filter to mask the sequence during analysis. Novel genes or CDS loci were identified by having an open reading frame (ORF), plus certain similarity to the known genes or proteins. A predicted gene was defined by having high sequence homology to the predicted gene or ORF in other species. Pseudo genes were identified by sequence homology to known Pseudo genes (not shown). Comparative sequence alignments were performed using the waviest pipeline http://genome.lbl.gov/cgi-bin/WGVistaInput.