Characterization of the equine 2'-5' oligoadenylate synthetase 1 (OAS1) and ribonuclease L (RNASEL) innate immunity genes

Background The mammalian OAS/RNASEL pathway plays an important role in antiviral host defense. A premature stop-codon within the murine Oas1b gene results in the increased susceptibility of mice to a number of flaviviruses, including West Nile virus (WNV). Mutations in either the OAS1 or RNASEL genes may also modulate the outcome of WNV-induced disease or other viral infections in horses. Polymorphisms in the human OAS gene cluster have been previously utilized for case-control analysis of virus-induced disease in humans. No polymorphisms have yet been identified in either the equine OAS1 or RNASEL genes for use in similar case-control studies. Results Genomic sequence for equine OAS1 was obtained from a contig assembly generated from a shotgun subclone library of CHORI-241 BAC 100I10. Specific amplification of regions of the OAS1 gene from 13 horses of various breeds identified 33 single nucleotide polymorphisms (SNP) and two microsatellites. RNASEL cDNA sequences were determined for 8 mammals and utilized in a phylogenetic analysis. The chromosomal location of the RNASEL gene was assigned by FISH to ECA5p17-p16 using two selected CHORI-241 BAC clones. The horse genomic RNASEL sequence was assembled. Specific amplification of regions of the RNASEL gene from 13 horses identified 31 SNPs. Conclusion In this report, two dinucleotide microsatellites and 64 single nucleotide polymorphisms within the equine OAS1 and RNASEL genes were identified. These polymorphisms are the first to be reported for these genes and will facilitate future case-control studies of horse susceptibility to infectious diseases.

The murine flavivirus resistance gene, Flv, was positionally cloned and identified as Oas1b [12]. A cDNA sequence comparison among susceptible and resistant strains of mice identified a single nucleotide substitution that causes a premature stop codon in the Oas1b transcripts of susceptible mice [12,13].
The human OAS gene cluster, consisting of genes OAS1, OAS3 and OAS2, is located on chromosome 12q24.2 [14]. The small synthetases are transcribed from the OAS1 gene while the medium and large synthetases are encoded by the OAS2 and OAS3 genes, respectively. Alternative splicing was previously reported in both OAS1 and OAS2 transcripts [15,16]. For example, the human OAS1 transcript E16 corresponds to the p42 protein, which is translated from a 1.6 kilobase (kb) mRNA, while the alternatively spliced E18 transcript encoding the p46 protein is about 1.8 kb [17]. Both p42 and p46 proteins are identical in their first 346 N-terminal amino acids but differ at the C-terminus [18]. Variations in the human OAS1 gene that may be relevant to the outcome of virus infections have been reported [19][20][21][22][23].
The human RNASEL gene maps to chromosome 1q25 [24]. The 741 amino acid, 83,539 Dalton protein is translated from a ~2.8 kb transcript [25,26]. The RNase L protein consists of three domains: 1) an N-terminal domain of ankyrin repeats with P-loop motifs between the seventh and eighth repeat, 2) a serine/threonine protein kinase domain, and 3) a C-terminal ribonuclease domain [27]. RNase L activation requires binding of a single 2-5A molecule to the N-terminal ankyrin repeats 2-4 [28,29]. 2-5A binding reverses the naturally repressive state of the RNase L ankyrin repeats, ultimately producing a functional homodimer with ribonuclease activity [27,[29][30][31].
In this report, a subclone library generated from CHORI-241 BAC 100I10 was sequenced and then used to construct a contig assembly spanning the OAS1 gene. The equine RNASEL gene was identified in multiple BAC clones of the CHORI-241 library and was FISH mapped on metaphase spreads to ECA5p17-p16. Equine RNASEL genomic sequence was obtained from BAC clone 159N12 and an assembly similar to that for OAS1 was constructed. Full-length RNASEL cDNA from 8 species were determined and compared in a phylogenetic analysis. Resequencing of genomic DNA from multiple horses of different breeds identified a total of 64 SNPs and 2 microsatellites within the OAS1 and RNASEL genes.

BAC 100I10 sequencing and OAS1 contig assembly
A shotgun subclone library was constructed from sheared fragments of CHORI-241 BAC 100I10. Nine hundred subclones were bi-directionally sequenced, resulting in 513,390 bases with quality scores > 15, providing 3.95X coverage. The individual chromatogram files were analyzed by Phred, Phrap and Consed [33][34][35][36][37] and individual contigs were scaffolded on the human genome sequence using BLAST. The scaffold was further validated by the addition of multiple sequences from TraceDB [38] retrieved via BLAST searches using full length equine OAS1 mRNA [GenBank: AY321355]. The scaffold contained four genomic contigs spanning a substantial part of the equine OAS1 gene, including 4.5 kb of promoter sequence upstream of exon 1 and 1.6 kb of sequence downstream of exon 6, and was submitted to GenBank under accession number DQ536887. The genomic assembly also included sequence for the downstream equine OAS3 gene as well as an upstream gene orthologous to human RPH3A (data not shown). This assembly completely overlaps two whole genome shotgun sequences, AAWR01028567 (55,475 bp) and AAWR01028568 (31,407 bp), that were recently submitted to GenBank from the Broad Institute.

Identification of OAS1 microsatellites
The genomic sequence assembly identified two microsatellites, one located within the promoter and the other downstream of exon 6. The promoter GT-microsatellite is located 575 bp upstream of the ATG translation initiation site. A shorter GT-microsatellite is in the same relative position in the human OAS1 promoter and the flanking regions were well conserved between the two sequences ( Figure 1). This microsatellite may affect the functions of flanking regulatory elements. Sequencing the OAS1 promoter regions of 13 horses established that this promoter microsatellite is polymorphic in length. The second polymorphic microsatellite was a GT-dinucleotide repeat located 43 bp downstream of exon 6 within the 3' UTR. It has previously been reported that a 3' UTR microsatellite can alter the level of synthesis of a mRNA. [39].

OAS1 SNP identification
The assembled OAS1 scaffold was aligned to the full length, 1.6 kb cDNA equine transcript [GenBank: AY321355] to delineate individual exons and flanking intron sequences from the genomic contigs. Genomic primers were designed within flanking intron sequences as well as for the proximal promoter (Table 1).
Sequence data obtained from the screening population and from CHORI BAC 100I10 were analyzed using Phred, Phrap and Consed programs [33][34][35][36][37]. Both visual analysis Local alignment of human and horse OAS1 promoters Figure 1 Local alignment of human and horse OAS1 promoters. BLAST2 alignment of the 1000 bp upstream of the transcription start for human OAS1 and equine OAS1 genes. The following BLAST parameters were used: a mismatch penalty of -1 and word size of 7. Lower case masking of repeats was used. The alignment shows that the sequence from ~800 bp to ~-350 bp in the horse promoter is similar to a region of the human promoter interrupted by a 200 bp Alu repeat (~-811 bp ~-590 bp). The horse microsatellite is shown in underlined bold and corresponds to a smaller dinucleotide repeat in the human sequence. Numbering shown in the alignments is from the translation ATG start sites.
of the chromatogram data to identify heterozygotes and computer analysis using the Consed visualization tool identified 33 single nucleotide substitutions within the proximal promoter and exons of OAS1 (Table 2). Of these, 11 were within coding regions, 9 within non-coding regions and the remaining 13 within the proximal promoter upstream of exon 1. Four of the 9 non-coding polymorphisms were located within the 5' and 3' untranslated regions (UTR). Of the 11 coding polymorphisms, 4 were synonymous and 7 were non-synonymous. Five of the 7 non-synonymous SNPs resulted in substitutions of amino acids with different properties. Interestingly, the amino acids encoded by the major alleles of 4 of the 7 non-synonymous mutations were identical to the corresponding amino acids in the human OAS1 protein [Uni-ProtKB: P00973]. The genotypes of each individual were used to identify potential haplotypes within equine OAS1 using PHASE v2.1 software [40,41]. Only those SNPs verified within multiple individuals were used for the haplotype analysis (minor allele frequency = 0.08). The best reconstruction produced 15 haplotypes from the 33 diallelic SNPs ( Table 3). The polymorphic microsatellites were not included in the analysis.
Assembling full-length RNASEL mRNA sequences of cattle, dog, horse, cat, domestic pig, Guinea pig, elephant and opossum A limited number of mammalian RNASEL mRNA sequences were previously deposited to GenBank and some of these sequences were predicted from whole genome annotations. However, this GenBank information was not sufficient to identify evolutionarily conserved regions in mammalian RNASEL sequences that could be used to design PCR primers to amplify equine RNASEL fragments. The predicted sequences of cattle [GenBank: XM_597290] and dog [GenBank: XM_547430] RNASEL ORFs were amplified from commercial cDNA (BioChain, Hayward, CA), directly sequenced and extended to full-length cDNA sequences by DNA walking. The full-length cattle and dog RNASEL sequences were submitted to GenBank under accession numbers DQ497162 and DQ497163, respectively. These two sequences as well as the human full-length RNASEL sequence NM_021133 were aligned and degenerate primers were designed from conserved regions (Table 4) and used to amplify the middle portions of equine RNASEL cDNA. This partial sequence was extended to the fulllength sequence by DNA walking and submitted to Gen-Bank under accession number DQ497159.
Several additional mammalian RNASEL sequences were also determined and subsequently used to perform a phylogenetic analysis. The GenBank feline Whole Genome Sequence (WGS) database was searched with the canine RNASEL sequence [GenBank: DQ497163]. Four genomic contigs, AANG01026257, AANG01026302, AANG01630549 and AANG01026248, were detected. These contigs contain the first, second and third, as well as the fifth and sixth feline RNASEL coding exons, respectively. No contigs containing the fourth coding exon of the feline RNASEL gene were found in GenBank. Two primers were designed based on the 3'-end AANG01026302 sequence and the 5'-end AANG01630549 sequence (Table 4) and used to amplify and sequence this region from a commercial cat genomic DNA (Novagen, Madison, Wisconsin). The sequence of   this exon was submitted to GenBank under accession number EF062998. Using this sequence as well as the other exon sequences derived from GenBank (see above), the predicted full-length mRNA sequence of the feline RNASEL gene was assembled.
The TIGR porcine database [42] was searched using the cattle sequence [GenBank: DQ497162] and five partial RNASEL sequences were found. The TC212507 and TC212872 sequences correspond to the 5'-end of porcine RNASEL mRNA, while the TC218317, TC237301, and TC236970 sequences represent the 3'-end. An additional 5'-end cDNA sequence, 20060611S-038813, was detected in the Pig EST Data Explorer [43]. A pair of primers were designed based on the partial sequence (Table 4) and used to amplify pooled cDNA (kindly provided by Dr. Jonathan E. Beever, University of Illinois at Urbana-Champaign). The middle portions of the porcine RNASEL cDNA were directly sequenced. The partial sequence was then extended to the full-length sequence by DNA walking and submitted to GenBank under accession number DQ497160.

Phylogenetic analysis of vertebrate RNASEL gene sequences
Only sequences of human and mouse RNASEL genes were previously reported [26]. Sequences of orthologous rat (GenBank: AY262823) and chicken (GenBank: AM0492248) genes were recently submitted to GenBank but have not been reported in any publications. In addition, annotations of chimpanzee, orangutan and rhesus macaque genomes using a GNOMON method resulted in predicted RNASEL sequences in these three species. Primate, rodent and avian RNASEL sequences were downloaded from GenBank and aligned to orthologous sequences described above to build a phylogenetic tree ( Figure 2). Rodents showed the highest rate of nucleotide substitutions, while primates showed the lowest rate of evolution. Evolution rates were found to be fairly uniform in the three different RNase L domains: ankyrin repeats, serine/threonine protein kinase domain, and ribonuclease domain. The percent identity between the RNASEL ORFs of horse and the other species compared is shown in Table 5.

Assignment of the RNASEL gene to horse chromosome ECA5p17-p16
The horse CHORI-241 BAC library was searched with a probe derived from the partial equine RNASEL cDNA fragment. Twelve positive clones were identified and two of them, 108P15 and 189I19, were FISH mapped to assign the RNASEL gene to the horse chromosomal location ECA5p17-p16 ( Figure 3).  Phylogenetic tree of RNASEL genes Figure 2 Phylogenetic tree of RNASEL genes. RNASEL ORF sequences from 15 vertebrate species were aligned and the njtree program was used for tree construction.  SEL gene. This exonic composition is similar to that of a number of other mammalian RNASEL genes. However, two and three 5'-terminal non-coding exons were found in the chicken and mouse RNASEL genes, respectively. The coding vertebrate RNASEL exons were designated A through F. Comparison of the genomic and mRNA sequences of vertebrate RNASEL genes revealed significant length variation in both the 5'-(1402-1510 bp) and 3'terminal (130-187 bp) coding exons (Table 5).

SNP identification in the horse RNASEL gene
After identification of the equine RNASEL introns, exonspecific genomic primers were designed (Table 1). Exonspecific sequencing of DNA from the screening population identified 31 SNPs within the RNASEL gene ( Table  6). Of the 10 non-coding polymorphisms, one was within the second intron and the others were located in the 5' and 3' UTRs. Seventeen of the 31 SNPs were located within the ankyrin repeat-encoding exon 2, 13 of which are non-synonymous, with 10 resulting in substitutions of amino acids with different properties. Three non-synonymous polymorphisms were identified within exons 3 and 5. The remaining exons, including the non-coding exon 1 were invariant among these horses. The amino acids encoded by the major allele of 11 of the 16 non-synonymous mutations were identical to the corresponding human RNase L amino acid [UniProtKB: Q05823]. Using MOTIF Search [45] to identify putative transcription factor binding motifs in the TRANSFAC database, the promoter SNP was found to be located within a potential cAMP-response element binding site (Score: 90) upstream of the first exon. Haplotypes were assembled in the same manner as for the equine OAS1 gene. The best reconstruction from Phase analysis produced 10 haplotypes among the 31 verified diallelic SNPs with minor allele frequencies = 0.08 (Table 3). As with OAS1, only good quality, unambiguous resequencing data were used for the haplotype analysis.
Identifying single nucleotide polymorphisms by sequencing DNA from multiple individuals enhances the possibility of artifacts either from PCR or sequencing error. The 64 SNPs identified from the equine OAS1 and RNASEL genes were considered valid if each allele was identified in at least two individuals. Eight additional SNPs were identified but could not be verified in more than one individual (minor allele frequency < 0.08). Within the 3,864 and 5,406 base pairs re-sequenced during the SNP identification for OAS1 and RNASEL, respectively, equine OAS1 contained an average of one polymorphism per 117 bases, while equine RNASEL averaged one polymorphism per 174 bases.

Discussion
Sequence characterization of the horse OAS1 gene in CHORI-241 BAC 100I10 enabled a partial genomic sequence assembly [GenBank: DQ536887] and comparison among multiple equine individuals. We identified 2 polymorphic microsatellites and 33 single nucleotide polymorphisms from a group of 13 individuals and CHORI-241 BAC 100I10 (Table 2). In an attempt to identify potential structural and/or functional consequences of the coding non-synonymous SNPs, each was analyzed using PolyPhen software [46][47][48]. Each polymorphic variant identified in equine OAS1 was predicted to cause benign effects at their respective residue position. However, the single mutation resulting in an Arg209Cys substitution may significantly change OAS1 enzymatic activity. Arg209 in the equine OAS1 protein corresponds to Arg544 in the human OAS2 protein, which is located in the donor binding domain. Substitution of Arg544 with either Ala or Tyr significantly decreased enzymatic activity of the OAS2 protein [49]. In addition, the equine OAS1 promoter SNP at position 4531 is located in an interferon stimulating response element [29]. Inactivation of this regulatory element by a single nucleotide substitution may alter expression of the equine OAS1 gene.
RNASEL enzymatic activity was previously reported in reptiles, birds, and mammals [50]. However, no RNASEL genes have been found for amphibians or fishes to date. Interestingly, the same classes of vertebrates also do not have OAS genes [51].
Thirty one SNPs were identified for equine RNASEL (Table 6). Interestingly, all but three of the 20 coding SNPs identified are located within exon 2. The RNase L protein contains 9 N-terminal ankyrin repeats responsible for binding 2-5A molecules that are essential for activation [27]. Exon 2 of the human RNASEL gene encodes the entire ankyrin repeat region (amino acid 24 to 329). The high frequency of non-synonymous polymorphisms within exon 2 suggests that a single SNP or haplotype could ablate 2-5A binding and/or other RNase L interac- tions. As well, the SNP identified within the promoter upstream of the first exon is located within a potential cAMP-response element binding site. Mutations within this promoter element have been shown to affect gene expression [58][59][60]. PolyPhen analysis was also conducted on the non-synonymous coding SNPs identified within equine RNASEL. All but 4 of the RNase L SNPs were predicted to have benign effects. However, the SNP at residue 287 was predicted to change hydrophobicity at a buried site within the RNase L protein and the effect of this on protein function is unknown. The predictions provided by PolyPhen analysis are based on functional effects identified using human nsSNPs and may differ for the horse RNase L. Four SNPs within the ankyrin repeat region in exon 2 (residues 414, 463, 467 and 487) were predicted to have a negative effect on function. These data support our hypothesis that a single SNP or haplotype could affect 2-5A binding within the equine RNase L ankyrin repeats.
A number of SNPs were detected within the 3'UTR region of the equine RNASEL gene. Of the eight SNPs found within this region, six result in transitions. The 3'UTR regions of mRNAs contain regulatory regions capable of protein and microRNA binding that control mRNA stability, translation and localization. A simple analysis of octamer motifs containing equine 3' UTR SNPs identified SNP 10247 as being within a human miRNA target site [61]. If this target site is conserved in horses, this SNP could significantly affect the synthesis of RNase L. However, this particular octamer motif was not found in human or rodent RNASEL 3'UTRs. Furthermore, crossspecies sequence comparison using mVISTA [62,63] also revealed no significant longer range conservation in this region between species (data not shown).
Genotype analysis using PHASE v2.1 [40,41] identified 15 and 10 haplotypes among equine OAS1 and RNASEL genes, respectively, and suggested the existence of haplotype blocks spanning most of each gene (Table 3). Even if efforts to show an association between viral-induced disease susceptibility and OAS1 and/or RNASEL SNPs are successful, it may prove difficult to unambiguously identify a single causal SNP because of potential linkage disequilibrium at these loci. As determined from our screening population, a single haplotype occurred more frequently than any other, with a frequency of 0.19 and 0.23 in OAS1 and RNASEL, respectively ( Table 3).
The frequency of SNP identification in this study in two equine genes was high considering the previously estimated equine SNP frequency of 1 per 1500 bp [64]. In dogs, the estimated SNP rate is ~1 per 1600 bp (based on entire genome re-sequencing), but a higher frequency of 1 per 900 bp was estimated between breeds [65]. Resequencing of specific genes in several breeds of the domestic dog identified polymorphisms at frequencies comparable to our estimates, with 1 SNP per ~250-330 bp [S. Canterbury, personal communication]. Furthermore, re-sequencing within an Elk (Cervus elaphus nelsoni) putative promoter region, which is highly conserved between mule deer, cow and sheep, detected an average SNP frequency of 1 per 69 bp [unpublished data].
The microsatellite identified within the promoter region in this study may also alter expression of the equine OAS1 gene. The alleles observed to date indicate that dinucleotide repeat lengths of 9 and 18 may represent the major alleles at this locus. The over-representation of these alleles may be due to the fact that they correspond to one complete rotation of the DNA helix. If this microsatellite separates cis-regulatory elements, alterations in its length could affect the binding of transcriptional regulators to these elements and significantly alter gene expression [66][67][68][69][70][71]. In support of this hypothesis, there is a high degree of conservation between human and horse OAS1 promoters in the regions flanking the microsatellite ( Figure 1). As well, recent micro-array data provide evidence of an inverse relationship between gene expression and dinucleotide microsatellite length, supporting the significantly higher frequency with which we identified the (GT) 9 allele within the individuals screened [66]

Conclusion
We report the genomic sequences of the equine OAS1 and RNASEL genes and identify 64 single nucleotide polymorphisms and 2 polymorphic microsatellites in these genes. On the basis of the allelic variants characterized, we conclude that a number of these are plausible candidates for regulatory or structural mutations which may influence transcription or enzymatic activity of OAS1 and RNase L proteins. Also, RNASEL cDNA sequences were determined for 8 mammals and utilized in a phylogenetic analysis. The chromosomal location of the RNASEL gene was assigned by FISH to ECA5p17-p16.

RNASEL cDNA and FISH
Preparation of horse cDNA was described previously [32]. Partial RNASEL sequences were extended using a DNA Walking SpeedUp Kit (Seegene USA, Del Mar, CA) according to the manufacturer's protocol. Four high-density filters for segment 1 of the CHORI-241 equine genomic BAC library were purchased from the Children's Hospital Oakland Research Institute (CHORI), Oakland, CA. These filters were screened using a P 32 -labeled equine RNASEL cDNA probe according to the supplier's protocol. Two positive equine BAC clones were purchased from CHORI. Each of these BAC clones was grown individually in 500 mL of LB media. BAC DNA was isolated using the Nucle-oBond BAC Maxi Kit (BD Biosciences Clontech, Palo Alto, CA) and used as the template for direct partial sequencing with a BigDye terminator v1.1 Cycle Sequencing Kit on an ABI 3100 Genetic Analyzer according to the manufacturer's recommendations. DNA from equine BAC clones 108P15 and 189I19 was FISH mapped as described previously [67]. International cytogenetic nomenclature of the domestic horse [68] was used to identify individual horse chromosomes.
The njtree program was used to construct a phylogenetic tree as described previously [51] and tree topology was inferred by the Neighbor-Joining algorithm. The bootstrap algorithm with 1000 replications was used to estimate the confidence of each node. The njtree program is available upon request.

Construction of subclone library
BAC clone 100I10 was isolated from segment 1 of the CHORI-241 equine BAC library at Texas A&M University and confirmed by PCR as containing OAS1. The colonyisolated clone was cultured and BAC DNA was isolated by standard alkaline/lysis miniprep using Millipore Solutions and treated with Plasmid-Safe ATP-dependent DNAse (Epicentre, Madison, WI). BAC DNA was fragmented using a HydroShear ® DNA Shearing Device (Gen-eMachine, San Carlos, CA) at Speed Code 8 for an estimated fragment size of 2.5 kb. The fragmented product was analyzed by agarose gel electrophoresis stained with ethidium bromide and gel extracted using the QIAquick Gel Extraction Kit (Qiagen, Valencia, CA). Extractions were eluted in water according to the manufacturer's protocol. Purified fragments were cloned into vector pCR ® 4Blunt-TOPO ® using the TOPO ® Shotgun Subcloning Kit (Invitrogen, Carlsbad, CA) following the manufacturer's protocol. Ligation reactions were incubated 30 minutes at room temperature and electroporated into E. coli. Colonies were screened for lack of β-galactosidase activity and selected for ampicillin resistance on LB-agarose plates containing 50 µg/mL ampicillin. White colonies were cultured and screened for appropriate insert size by PCR using vector-sequence M13 primer sites flanking the cloned insert, prior to sequencing.

Sequencing of clones
Individual OAS1 inserts were amplified directly from individual colonies by PCR using vector-sequence M13 primer sites flanking the cloned insert. Amplification products were purified by centrifugation with the PSI-Clone PCR 96 kit (Princeton Separations, Adelphia, NJ) according to manufacturer's protocol. Purified products were sequenced in separate reactions with each M13 primer using a cycle sequence of 96C, 10 sec; 50C, 5 sec; 60C, 4 min with BigDye ® Terminator Mix v1.1 (Applied Biosystems, Foster City, CA). Sequencing reactions were analyzed using an ABI Prism 3100 Genetic Analyzer (Applied Biosystems, Foster City, CA).
Primers were designed to amplify the immediate promoter and exons of OAS1 and RNASEL genes from 13 individual horses by PCR (Table 1). Sequencing was carried out in the same manner as used for the library subclones. Sequences obtained were compared between individuals to identify SNPs within the amplified regions.
Additional sequences were added to the assembly data and re-analyzed with Phrap and BLAST until the consensus sequence spanned the genes from the promoter to the 3' UTR. The genomic equine consensus sequence was confirmed using data from the Equine Genome Sequencing Project (2x) [38] and intron/exon boundaries were assigned by local alignment to the full-length equine OAS1 [GenBank: AY321355] and RNASEL [GenBank: DQ497159] cDNAs. The equine genomic sequences of OAS1 and RNASEL were submitted to GenBank and assigned the accession numbers DQ536887 and EF070193, respectively.

Genotyping population
Blood samples were collected at the Texas A&M University Equestrian Center in accordance with ethical standards. The sampled set used for screening consisted of 13 horses, including 10 geldings/stallions and 3 mares, ranging in age from 21 months to 20 years. Breeds represented include American Quarter Horse (9), Arabian (1), American Paint Horse (1), Appaloosa (1) and Thoroughbred (1).