Single nucleotide polymorphisms (SNPs) distinguish Indian-origin and Chinese-origin rhesus macaques (Macaca mulatta)

Background Rhesus macaques serve a critical role in the study of human biomedical research. While both Indian and Chinese rhesus macaques are commonly used, genetic differences between these two subspecies affect aspects of their behavior and physiology, including response to simian immunodeficiency virus (SIV) infection. Single nucleotide polymorphisms (SNPs) can play an important role in both establishing ancestry and in identifying genes involved in complex diseases. We sequenced the 3' end of rhesus macaque genes in an effort to identify gene-based SNPs that could distinguish between Indian and Chinese rhesus macaques and aid in association analysis. Results We surveyed the 3' end of 94 genes in 20 rhesus macaque animals. The study included 10 animals each of Indian and Chinese ancestry. We identified a total of 661 SNPs, 457 of which appeared exclusively in one or the other population. Seventy-nine additional animals were genotyped at 44 of the population-exclusive SNPs. Of those, 38 SNPs were confirmed as being population-specific. Conclusion This study demonstrates that the 3' end of genes is rich in sequence polymorphisms and is suitable for the efficient discovery of gene-linked SNPs. In addition, the results show that the genomic sequences of Indian and Chinese rhesus macaque are remarkably divergent, and include numerous population-specific SNPs. These ancestral SNPs could be used for the rapid scanning of rhesus macaques, both to establish animal ancestry and to identify gene alleles that may contribute to the phenotypic differences observed in these populations.


Background
The rhesus macaque (Macaca mulatta) has served a critical role in the study of human disease for more than half a century. This macaque remains the animal of choice for much of biomedical research and is the primary model for the study of human immunodeficiency virus (HIV) and acquired immune deficiency syndrome (AIDS) [1]. Though Indian-origin rhesus were originally used in most research protocols, the 1978 ban on the export of primates from India resulted in reduced availability of these animals. Because the growing demand for rhesus macaques has exceeded the domestic supply, the U.S. breeding colonies have imported large numbers of these animals from China.
In recent years, a variety of studies have investigated the relationship between Indian and Chinese rhesus macaques. Comparisons of Indian and Chinese rhesus mitochondrial DNA (mtDNA) sequences, including the hypervariable sequence I (HVS1), 12S and 16S rRNA loci, have shown that as much as 90% of the mtDNA genetic heterogeneity is accounted for by country of origin differences [2][3][4]. Studies of chromosomal microsatellite loci have also identified marked differences in allele frequencies between Indian and Chinese rhesus macaque populations [5][6][7][8]. Similarly, population-specific differences in the allele distributions within both Class I and II major histocompatibility complex (MHC) loci support the contention that the two populations have distinct genetic characteristics [9,10]. All of these studies support the conclusion that since their geographic separation, Indian and Chinese rhesus macaques have diverged to become two separate subtypes.
The genetic divergence of Indian and Chinese rhesus macaques is thought to underlie the observed phenotypic differences between the two subtypes in experimental protocols. These differences are observed at many levels, including morphology, behavior and physiology [11,12]. Significant differences in host response and disease progression have been observed in Indian and Chinese rhesus macaques exposed to the same simian immunodeficiency virus (SIV) challenge [13,14]. Both viremic peaks and set points were, in general, lower in the Chinese rhesus macaques than in Indian animals exposed to SIV/ DeltaB670, leading to much longer survival in the Chinese animals [14]. The Chinese rhesus macaques are also more resistant to SIVmac239, maintaining both lower acute and chronic viral loads than Indian rhesus macaques infected with the same viral challenge [13].
Because Indian and Chinese rhesus macaques can respond very differently to the same research protocol, consideration of animal ancestry is warranted in both study design and result interpretation. Most breeding facilities in the United States include both Indian and Chinese-origin rhesus. Although efforts are taken not to interbreed the two populations, lack of information or misinformation regarding animal history can result in unrecognized crossing of the two subtypes. The introduction of misclassified animals may compromise research results. Thus, being able to correctly identify rhesus macaque ancestry, including hybrids, would be very valuable to both researchers and animal breeders.
Current approaches for rhesus macaque ancestry determination include: 1. sequence comparison of mtDNA loci, [2,7]; 2. microsatellite analysis [6,7]; and 3. SNP analysis [15]. Since mtDNA is only inherited through the maternal line, this analysis cannot identify Indian-Chinese hybrid animals. The microsatellite analysis involves three loci in which the allele frequencies differ significantly between Indian and Chinese populations. The current literature only reports five SNPs in 3 genes (NDN, H19 and IGF2) that are unique to either Indian or Chinese rhesus macaques, identified by Fujimoto [15]. Unfortunately, though both the reported microsatellite and SNP markers can work for distinguishing purebred Indian and Chinese rhesus, neither offers a sufficient number of markers to reliably detect hybrid animals.
We have developed algorithms for identifying primers to amplify the 3' end of all rhesus macaque genes [16]. Over 5,000 of the sequences we obtained using these primers were used in the design of the Affymetrix rhesus macaque GeneChip [16,17]. We tested whether these primers could also be used to discover gene-based SNPs. The 3' ends of 94 selected genes were analyzed in 20 Indian and Chinese rhesus macaques. This effort identified 661 SNPs. Four hundred and fifty-seven of these SNPsappeared uniquely in either the Indian or Chinese rhesus macaques. A subset of those SNPs was further tested using a SNP genotyping assay against a population of 75 additional Chinese and Indian animals, as well as 4 known Chinese/Indian hybrid rhesus. Eighty-four percent of the SNPs were confirmed to show population-specificity. All of the known hybrids were detected with the Indian/Chinese SNP Assay panel.
The Indian/Chinese SNP Assay panel will be useful for animal ancestry determination, for the identification of hybrid animals and for the genetic analysis of the phenotypic differences that characterize the two populations. Further, this work validates the efficiency of using genebased, 3' primers to discover SNPs in rhesus macaques.

Rhesus macaque SNP discovery
We selected 20 unrelated rhesus macaques for use in SNP discovery, 10 each of alleged Indian or Chinese ancestry.
All of the Chinese rhesus monkeys were imported directly from primate suppliers in China. However, neither records of capture site or breeding history were available to more specifically identify their geographic origins within Asia. Two of the Indian animals had been wildcaught in Northern India, while the remaining 8 rhesus macaques came from U.S. breeders who indicated that the animals were of Indian-descent. To validate the ancestries of all of the animals, we sequenced the 12S rRNA portion of the mitochondrial DNAs and compared them with previously published sequences from Indian and Chinese rhesus macaques [2,4]. We also used the allele sizes of three microsatellite loci with disparate allele frequencies in Indian and Chinese populations (DX18S537, D1S548, DXS2506) to corroborate the animal ancestries [6]. Only rhesus macaques that satisfied each of these criteria for being of pure Indian or Chinese heritage were included in this study.
The 3' ends of 94 different genes were amplified and sequenced in this study, with the average amplicon length being 712 bp/gene (range: 334-905 bp). The 94 genes included at least one from every chromosome with the exceptions of 17 and Y (see Additional file 1 -Gene summary). In total we identified 661 SNPs in the rhesus macaques (see Additional file 2 -SNP identity and NCBI accession numbers). All except for three of the genes (AR, CHRM4, IL7R) identified at least 1 SNP, with the overall average being 7.3 SNPs/gene ( Figure 1). Some genes were very polymorphic: 20 genes contained between 10 and 17 SNPs (Figure 1; see Additional file 1 -Gene summary). The overall minor allele frequency (MAF) of the SNPs was 0.127, with 281 SNPs having a MAF ≥ 0.2 in the sample population.
To assess how the SNPs identified in rhesus macaques compared with those found in the same regions of the human orthologs, we identified 10 genes that were also analyzed in a study of 48 individuals of either European and African decent [18]. The rhesus sequences contained more polymorphisms than did the human sequences, averaging 1 SNP/107 bp in the rhesus compared with 1 SNP/179 bp in humans ( Table 1). None of the common SNPs in these gene regions were detected in both humans and rhesus macaques.

Population comparison of rhesus macaque SNPs
We analyzed the SNPs discovered within each rhesus macaque population. We identified 555 SNPs in the Chinese group, while the Indian animals had a total of 312 SNPs (Table 2; see Additional file 1 -Gene summary). The distribution of the minor allele frequencies of the SNPs within each population ranged from 0.04 to 0.5 ( Figure 2). There were a higher percentage of rare alleles Number of SNPs identified in the 3' end of 91 rhesus macaque genes Figure 1 Number of SNPs identified in the 3' end of 91 rhesus macaque genes. SNPs were identified in 20 Chinese and Indian rhesus macaques. There were no SNPs identified in 3 genes; between one and 18 polymorphisms were found in the other 91 genes.  (Table 2; see Additional file 1 -Gene summary). Almost two thirds of the Chinese-specific SNPs were relatively rare, occurring in less than 10% of the population. However, 61 of the Chinese-specific SNPs and 39 of the Indian-specific SNPs had an allele frequency of ≥ 0.2. In some cases, the population-specific SNP allele was the major allele in that population. (For example, CD40LG:738 -the "G" allele is unique to the Indian population and occurs at 0.65 frequency.) The MAOA gene contained a polymorphism that was fixed in each population.

Genotype assay of population-specific SNPs
To further explore the population-specificity of the The 44 putative population-specific SNPs were interrogated using the Sequenom iPLEX assay [19]. Six of the SNPs (14%) were excluded when they were found to have

Discussion
This study identified a striking difference in the genome composition of Indian and Chinese rhesus macaques. There were 76% more SNPs identified in Chinese than in Indian rhesus macaques, a finding consistent with the greater mtDNA sequence and morphological diversity described for Chinese rhesus macaques [3,10,20,21]. The increased genetic heterogeneity in Chinese rhesus could be a consequence of the Chinese animals inhabiting a large geographic range and evolving more regional population substructure [3]. In addition, the reduced diversity of the Indian rhesus macaques suggests this population may have experienced an evolutionary bottleneck or expanded relatively recently from a small founder population.
A surprising finding was that 69% of the SNPs identified were unique to either Chinese or Indian rhesus macaques. If the initial cohort of animals chosen for sequencing included any misclassified or hybrid animals, then the real number of private SNPs would be even higher. However based upon both the background screening and the genotype analysis of a larger set of animals, the initial cohort appears to have correctly reported ancestries.  [3]. Alternatively, the spurious population exceptions may have resulted from the intermixing of Indian and Chinese rhesus macaques at some breeding colonies. Future studies with native populations may clarify this point.
Based upon the results of an expanded genotype assay, we predict that as many as 85% of the population-specific SNPs identified in this study will remain largely exclusive to one ancestral group when examined in additional studies. We therefore estimate that 80-85 of the SNPs identified in this study are both population-specific and occur at high frequency (MAF ≥ 0.2). These SNPs will be useful for validating rhesus macaque ancestry in an efficient manor. The overall high frequency of population-specific SNPs also suggests that a large-scale genome array of either Indian or Chinese-specific SNPs could be generated. Such an array would enable admixture studies of population-associated phenotypes.
Some of the alleles queried in the 38 SNP genotyping panel (ADBRK2:109, CD40LG:738, CD4:558) were found to deviate from Hardy-Weinberg equilibrium. The apparent under representation of particular genotypes may reflect the small number of animals sampled, population substructure, or possibly selective pressure against these, or linked, genotypes.
Based upon a comparison of 10 gene sequences that were analyzed in similar studies of rhesus macaque and human populations, it appears that the Indian and Chinese rhesus has about 1.6 times as many SNPs as the African and European human populations. Given that only 20 animals were included in this study, that estimate is likely to be conservative. The higher level of heterogeneity is not surprising, since the evolutionary history of the macaques is longer, affording more time for sequence variations to accumulate.
Some of the SNPs identified in this study may be functional, disrupting microRNA target sequences or affecting gene expression through another mechanism. Indeed, 3'UTR polymorphisms in the rhesus macaque DAT [22] and TPH2 genes [23] were shown to be associated with altered gene expression, and in the case of TPH2, with hypothalamic-pituitary-adrenal (HPA) axis activity. A finding from this study was that the MAOA gene contains alleles that significantly differ in Indian and Chinese rhesus macaques. The MAOA gene encodes an enzyme that contributes to the breakdown of serotonin; thus we speculate that disparate MAOA expression or activity may contribute to the different levels of 5-HIAA (serotonin metabolite) and the aggressive behavior that distinguish Chinese and Indian rhesus macaques [12,24].

Conclusion
This work demonstrates that the 3' end of rhesus macaque genes is a rich source for identifying sequence polymorphisms. This study also demonstrates that there is a high frequency of SNPs that are unique to either Indian or Chinese-origin rhesus macaques. These population-specific SNPs can be used to rapidly screen and establish rhesus macaque ancestry and to identify genes that underlie phenotypes characteristic of each population.  Indian animals and hybrid animals being from the ONPRC. Three of the latter animals were captured and imported directly from Northern India in 1976.

DNA extraction and validation of animal ancestry
Ten ml of blood was drawn from each animal in accordance with the Oregon Health and Sciences University Institutional Animal Care and Use Committee (0492-09), using universal precautions. DNA was extracted from each blood sample (QiAMP Blood Midi Kit, Qiagen, Inc.).
Two genetic analyses were used to corroborate the ancestry of each animal. First, a portion of the mtDNA 12S rRNA locus was PCR amplified (primers F-5' ACT-GGGATTAGATACCCCACT-3' and R-5'AGGGTGACG-GGCGGTGTGT-3'), sequenced and compared to published Indian and Chinese rhesus macaque sequences [4]. Second, the allele sizes of three microsatellite markers (DX18S537, D1S548, DXS2506), which have differing allele frequencies in Chinese and Indian rhesus macaques, were established for each animal (Veterinary Genetics Laboratory, Davis, CA). Only rhesus macaques with mtDNA and microsatellite results supporting their documented ancestry were included in this study.

Amplification and sequencing of DNAs
The genomics extracted from the 10 Chinese and 10 Indian rhesus macaques were used as templates for the PCR amplifications. PCR was performed as described in previously [16]. PCR products were sequenced in both directions either at Agencourt Biosciences, Inc. (Beverly, MA), or at the ONPRC using a Genetic Analyzer 3130 (Applied Biosystems, Inc.) using the amplification primers.

SNP analysis
The workflow for the DNA sequence analysis and SNP identification is outlined in Figure 4. After receipt of raw data from sequencing source, files were compiled on a Macintosh XServer at the OHSU Advanced Computing Center cluster. The Phred/Phrap/Consed package was used to perform sequence alignment [25,26]. The output sequences were aligned directly against the publish reference sequences [27], rather than being assembled into contigs first. Identification of polymorphisms on the aligned sequences was done using PolyPhred [25]. Control of these procedures was implemented in a Perl program designed for this work. Outputs from PolyPhred were then processed and loaded into a local MS-SQLbased genetics database, using a Java-based program developed for this project. In most cases, both the forward and reverse sequence reads for an individual were in agreement; in the rare case that the two sequence reads contained a difference in a single base assignment, the read with the higher PolyPhred quality score was automatically identified and entered into the database as the correct genotype for that individual. In cases where identical quality scores were attributed to a divergent base, a manual identification of the correct genotype was determined using Sequencher software (Gene Codes Corporation, Ann Arbor, MI) and visual inspection. A visual review of the sequence quality of each gene contig was also carried out before inclusion of SNPs into the final data set. A second Java program written for this project facilitated the statistical genetic calculations. It was used to calculate allele frequencies for animals in the Indian and Chinese populations, both individually and when combined; this program can also compare individuals to pre-defined populations. The verified SNPs were deposited into dbSNP [28] and the rhesus-specific database, Monkey SNP [29]. Figure 4 SNP analysis workflow. The DNA sequence output was analyzed using PolyPhred software to identify SNPs and to report the relative quality score of the SNP calls. SNPs with low quality scores or with inconsistent reports were reviewed manually. The approved SNP data was deposited into a local database for population, frequency and genotype analysis. Based upon the statistical results, some SNPs were selected for inclusion in Sequenom iPlex SNP panel.

SNP genotyping
Genotyping was performed using iPLEX reagents and protocols for multiplex PCR, single base primer extension (SBE) and generation of mass spectra, as per the manufacturer's instructions (for complete details see iPLEX Application Note, Sequenom, San Diego). Three multiplexed assays contained 28, 17, and 7 SNPs, for a total of 52 SNPs queried, of which 50 successfully generated genotyping data. Briefly, initial multiplexed PCR was performed in 5 μl reactions on 384-well plates containing 5 ng of genomic DNA. Reactions contained 0.5 U HotStar Taq polymerase (QIAGEN), 100 nM primers, 1.25× HotStar Taq buffer, 1.625 mM MgCl 2 , and 500 μM dNTPs. Following enzyme activation at 94°C for 15 min, DNA was amplified with 45 cycles of 94°C × 20 sec, 56°C × 30 sec, 72°C × 1 min, followed by a 3 minute extension at 72°C. Unincorporated dNTPs were removed using shrimp alkaline phosphatase (0.3 U, Sequenom). Single-base extension was carried out by addition of SBE primers at concentrations from 0.625 μM (low MW primers) to 1.25 μM (high MW primers) using iPLEX enzyme and buffers (Sequenom, San Diego) in 9 μl reactions. Reactions were desalted and SBE products measured using the MassAR-RAY Compact system, and mass spectra were analyzed using TYPER software (Sequenom, San Diego), in order to generate genotype calls and allele frequencies.