Intercenter reliability and validity of the rhesus macaque GeneChip
© Duan et al; licensee BioMed Central Ltd. 2007
Received: 31 October 2006
Accepted: 28 February 2007
Published: 28 February 2007
The non-human primate (NHP) research community has been intensely interested in obtaining whole-genome expression arrays for their work. Recently, novel approaches were used to generate the DNA sequence information for a rhesus GeneChip. To test the reliability of the rhesus GeneChip across different centers, RNA was isolated from five sources: cerebral cortex, pancreas, thymus, testis, and an immortalized fibroblast cell line. Aliquots of this RNA were sent to each of three centers: Yerkes National Primate Research Center, Oregon National Primate Research Center and the University of Nebraska Medical Center. Each center labeled the samples and hybridized them with two rhesus macaque GeneChips. In addition, rhesus samples were hybridzed with human GeneChips to compare with samples hybridized with the rhesus GeneChip.
The results indicate that center effects were minimal and the rhesus GeneChip appears highly reliable. To test the validity of the rhesus GeneChip, five of the most differentially expressed genes among tissues identified in the reliability experiments were chosen for analysis with Quantitative PCR. For all 5 genes, the qPCR and GeneChip results were in agreement with regard to differential expression between tissues. Significantly more probesets were called present when rhesus samples were hybridized with the rhesus GeneChip than when these same samples were hybridized with a human GeneChip.
The rhesus GeneChip is both a reliable and a valid tool for examining gene expression and represents a significant improvement over the use of the human GeneChip for rhesus macaque gene expression studies.
The non-human primate (NHP) research community has been intensely interested in obtaining whole-genome expression arrays for their work. The recent production of a rhesus macaque GeneChip (Affymetrix, Santa Clara, CA) now satisfies this need .
Novel approaches were used to generate the DNA sequence information for the rhesus GeneChip. In 2005, when the rhesus macaque GeneChip was in the design stage, the percent of the total genes in the rhesus macaque genome covered by the ESTs was quite small. In addition, the rhesus macaque genome sequences were at an early stage of assembly and with limited redundancy. To overcome these limitations, we used a targeted PCR approach to acquire necessary sequences for the probes for over 5,000 genes . All human last exons were identified and aligned with Probe Selection Region (PSR) sequences obtained from Affymetrix. Primers were designed that flanked the PSRs. These primers were used to amplify orthologous PSRs in rhesus macaques from rhesus genomic DNA. The PCR products were cloned, sequenced and deposited in GenBank. In an in silico version of our targeted PCR approach, sequences from an early draft of the Baylor Rhesus Genome assembly were aligned with human PSRs and information for probe design extracted. Using primarily these two sources of sequence information, a whole genome rhesus macaqe expression GeneChip was created by Affymetrix. The targeted acquisition of the 3' gene sequence information used for the production of the rhesus macaque GeneChip is unique among the Affymetrix GeneChips.
The rhesus macaque GeneChip uses 52,000 probesets to monitor the expression of over 47,000 transcripts, including transcripts with multiple polyadenylation sites. Reliability and reproducibility are major issues that must be addressed to successfully apply microarray technology to biomedical experiments [3, 4]. They are particularly important when researchers want to compare and integrate microarray experiments from multiple laboratories [5–7]. Due to the time and expense associated with NHP experiments, it is critical that the results generated in one center be comparable with results at another.
Given the novel strategies underlying the design of the rhesus macaque GeneChip, we felt it was important to test its reliability and validity. To test the reliability of the rhesus GeneChip across different centers, RNA was isolated from five sources: cerebral cortex, pancreas, thymus, testis, and an immortalized fibroblast cell line. Two aliquots of RNA from each tissue type were sent to each of three centers: Yerkes National Primate Research Center, Oregon National Primate Research Center and the University of Nebraska Medical Center. Each center labeled the two aliquots individually and hybridized them to two separate rhesus macaque GeneChips. The results indicate that center effects were minimal and the rhesus GeneChip appears highly reliable. To test the validity of the rhesus GeneChip, five of the most differentially expressed genes among tissues identified in the reliability experiments were chosen for analysis with Quantitative PCR. The results indicated the rhesus GeneChip provides valid information.
Prior to the production of the rhesus GeneChip, many investigators used various human expression arrays with rhesus samples . Although useful in obtaining some information, studies indicated that performing these cross-species hybridizations were resulting in the loss of considerable data [9, 10]. We wanted to determine whether more probesets would be called present when rhesus samples were hybridized with the rhesus GeneChip in comparison to the human GeneChip. We found the rhesus GeneChip was superior to the human GeneChip for use with rhesus samples in all five tested samples.
Scaling factors (SF) of 30 arrays hybridized to Affymetrix GeneChips. Table 1 presents the scaling factors (SF) of all 30 arrays, where SF was calculated by setting up the Target Intensity (TGT) to be 100.
Reliability of the rhesus GeneChip at different centers
Validation of the rhesus GeneChip
The performance of the Affymetrix rhesus array and the qRT-PCR assay were compared in terms of each gene's relative expression (i.e., log2 fold change) between tissues (Figure 6). For each selected gene, the fold change represents the ratio of the expression of the gene in its highly-expressed tissue to its expression in each of the other tissues. All fold change measurements from the qRT-PCR assay were considerably larger than 0, indicating each of the selected genes was highly expressed in one of five tissues.
For all 5 genes, the qPCR and GeneChip results were in agreement with regard to differential expression between tissues. With respect to each individual gene, on the average, the largest fold change measurements from the Affymetrix rhesus array occurred in the gene PRSS2, which was also consistent with the results from the qRT-PCR results. In most cases, the fold change measurements from the qRT-PCR assay were larger than the corresponding ones from the Affymetrix rhesus array, which is often observed [16, 17].
Comparison of the rhesus GeneChip with the human GeneChip for studying gene expression in rhesus macaque samples
We assessed the reliability of the Affymetrix Rhesus Macaque GeneChip, between three laboratories using five tissues. The reliability of the Rhesus GeneChip was excellent; similar to the reported reliability of the human GeneChip . This is important because macaques are in short supply and quite expensive. The proven reliability of the macaque GeneChip will facilitate large study designs utilizing animals from different centers.
We also assessed the validity of the rhesus GeneChip by comparing the expression patterns of a group of genes that were found to be differentially expressed in the GeneChip experiments with a qRT-PCR assay. The results indicate that the rhesus GeneChip yields accurate and valid data. This is important because the design of this GeneChip was unique. The sequences that were used to choose the probes were almost all genomic. The results from this study validate the idea that, in closely related species, probes from last exons can be successfully used. Exhaustive, expensive EST projects are not necessary to create genome wide gene chips when last exons of the target species can be identified.
The development of a reliable, valid rhesus macaque GeneChip should facilitate many important studies. Our results indicate that using the rhesus GeneChip with rhesus samples represents a significant improvement over use of the human GeneChip. It is expected that researchers in the fields of AIDS, vaccine development, transplant biology, stem cell/reproductive biology and the neurosciences will be able to use the rhesus macaque GeneChip to make important breakthroughs.
The rhesus macaque GeneChip is reliable across centers and is a valid tool for measuring gene expression. It offers significant advantages in sensitivity as compared to the human GeneChip for expression analysis.
RNA isolation and GeneChip hybridization
The RNA samples were extracted from four tissues and one cell line of rhesus macaques: immortalized fibroblasts (Nebraska), cerebral cortex (Yerkes), pancreas (Oregon), testis (Oregon) and thymus (Oregon). Two aliquots of each RNA sample were distributed to each of three centers: Oregon National Primate Research Center, Yerkes National Primate Research Center and the University of Nebraska Medical Center. Each center labeled and hybridized samples using standard Affymetrix protocols. The Rhesus Macaque Genome GeneChip that consists of over 52,000 probe sets (Affymetrix, Santa Clara, CA) was used for all hybridizations. A total of 30 hybridizations were performed (three centers x five samples x two replicates) using Affymetrix reagents (Affymetrix). In addition, two aliquots of each sample were hybridized with the human GeneChip (HGU133plus2.0).
Total RNA was extracted from these tissues using Trizol Reagent (Invitrogen, Carlsbad, CA) as described by the manufacturer: fibroblast, pancreas, thymus, testis, and cerebral cortex. RNA was treated with DNase 1 (Invitrogen) and followed by a cleanup procedure using Qiagen RNeasy mini kit (Qiagen, Valencia, CA) according to the manufacturer's protocols. The RNA quality was assessed using an Agilent 2100 Bioanalyzer and RNA 6000 Nano LabChips (Agilent, Palo Alto, CA). All RNA samples showed a 260/280 ratio between 1.8 and 2.0 and 28S:18S ratio of 1.5 and higher. Target RNA labeling, hybridization and post-hybridization processing were performed following the Affymetrix GeneChip Expression Analysis manuals (Affymetrix). The 5 μg of RNA sample was first reverse-transcribed using T7-Oligo(dT) Promoter Primer and SuperScript II in the first-strand cDNAs synthesis reaction. Following RNase H-mediated second-stranded cDNA synthesis, the double-stranded cDNAs were purified by use of a GeneChip sample clean-up module and served as templates in the generation of biotinylated complementary RNAs (cRNAs) in the presence of T7 RNA Polymerase and a biotinylated nucleotide analog/ribonucleotide mix by in vitro transcription (IVT) reaction. The biotinylated cRNAs were cleaned up, fragmented, and hybridized to the rhesus macaque expression arrays at 45°C for 16 h with constant rotation at 60 rpm. The microarrays were washed and stained with Affymetrix fluidics stations and scanned on Affymetrix scanner 3000. The images were processed to collect raw data with GeneChip Operating Software (GCOS).
Affymetrix GeneChip preprocessing
We exported .cel files from Affymetrix GCOS software and preprocessed them with robust multiarray analysis (RMA) . Three steps were taken to process the probe-level data of the Affymetrix oligo arrays: background correction, normalization and probe-level summarization at the log2-scale. Instead of the default RMA background correction, we used MAS 5.0 (Affymetrix) to correct for background because, in our experience, probe-level intensities are more normally distributed after MAS 5.0 background correction. Quantile normalization was applied to arrays from each Center separately to control for variation in hybridizations at the different Centers. Intensity values from all 30 arrays were combined and summarized by RMA to extract probeset-level data. Data quality was assessed using the affy Bioconductor package . The raw data (30 .CEL files) has been uploaded to th GEO repository  (GEO accession number is GSE7094).
Various quality assessment measures were calculated to assess the quality of the experiments, some of which were outputted from Affymetrix GCOS/MAS 5.0, such as the background levels, the range of the present percentage, the scaling factor values, the 3'/5' ratios of housekeeping genes (e.g., β-actin, GAPDH), and the signal intensities of spike-in hybridization controls. Two byproducts from RMA --- RNA degradation indices and NUSE values --- were also used to assess the quality of the experiments .
The scaling factor is the amount of scaling applied to the values for each array to make its intensity mean equal to a pre-specified value, which is called Target intensity value (TGT); the default value is usually 100. In practice, the trimmed (e.g., 0.02) intensity mean is commonly computed for each experiment. The assumption behind SF is that gene expression does not change significantly for the vast majority of transcripts in an experiment. Affymetrix suggests that the SF values for the arrays should be within 3-fold of each another. There are a variety of reasons for an array's SF value to deviate from the others, such as the quality and amount of starting material, issues with RNA labeling and scanning, and issues with array manufacture.
NUSE is calculated by standardizing RMA-estimated standard errors across arrays. In this way, the differences in variability between probesets can be adjusted and the boxplot of these standardized values can be used to compare chip qualities .
The rank-based Spearman correlation coefficient was computed to quantify the degree of similarity between each pair of arrays. The results were visualized with a heat map plot. This plot was generated using the Bioconductor library affyQCReport.
Identification of differentially expressed genes
To identify genes that are differentially expressed among five tissues, we conducted "significance analysis of microarrays" (SAM) proposed by Tusher, Tibshirani and Chu . SAM is a supervised learning program for cDNA and oligo microarrays. It identifies genes with statistically significant changes in expression by assimilating a set of gene-specific t-tests, while the standard deviation is adjusted by adding a small positive constant to ensure that the variance of the calculated score is independent of gene expression. To account for the multiple comparisons and overcome the difficulties of having few replicates, SAM performs a random permutation among experiments and sets up a cut-off threshold (e.g., at 5%) to identify the significant genes. For the cut-off threshold, SAM estimates the false discovery rate (FDR) by counting the number of random-permutated genes which exceeds the cut-off threshold in the original setting.
Quantitative real-time PCR
All of the above primer and probe sequences were generated and prepared by ABI custom gene expression assay service using rhesus macaque mRNA sequences.
Percent present computation
The Affymetrix MAS 5.0 algorithm was used to generate the detection call information for each probe set, where the Affymetrix default values (0.15, 0.04 and 0.06) were used for parameters tau, alpha1 and alpha2, respectively. Percent present for each array was computed as the number of probesets called Present out of the total number of probesets.
This project was supported by a grant from NIH (RR017444) to RBN. We thank Dr. Mark Pauley for setting up a secure copy site for sharing GeneChip data between centers.
- GeneChip® Rhesus Macaque Genome Array. [http://www.affymetrix.com/products/arrays/specific/rhesus_macaque.affx]
- Spindel ER, Pauley MA, Jia Y, Gravett C, Thompson SL, Boyle NF, Ojeda SR, Norgren RB: Leveraging human genomic information to identify nonhuman primate sequences for expression array development. BMC Genomics. 2005, 6: 160-10.1186/1471-2164-6-160.PubMed CentralPubMedView ArticleGoogle Scholar
- Draghici S, Khatri P, Eklund AC, Szallasi Z: Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 2006, 22 (2): 101-109. 10.1016/j.tig.2005.12.005.PubMed CentralPubMedView ArticleGoogle Scholar
- Shi L, Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Scherf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, Leclerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24 (9): 1151-1161. 10.1038/nbt1239.PubMedView ArticleGoogle Scholar
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005, 2 (5): 345-350. 10.1038/nmeth756.PubMedView ArticleGoogle Scholar
- Bammler T, Beyer RP, Bhattacharya S, Boorman GA, Boyles A, Bradford BU, Bumgarner RE, Bushel PR, Chaturvedi K, Choi D, Cunningham ML, Deng S, Dressman HK, Fannin RD, Farin FM, Freedman JH, Fry RC, Harper A, Humble MC, Hurban P, Kavanagh TJ, Kaufmann WK, Kerr KF, Jing L, Lapidus JA, Lasarev MR, Li J, Li YJ, Lobenhofer EK, Lu X, Malek RL, Milton S, Nagalla SR, O'Malley J P, Palmer VS, Pattee P, Paules RS, Perou CM, Phillips K, Qin LX, Qiu Y, Quigley SD, Rodland M, Rusyn I, Samson LD, Schwartz DA, Shi Y, Shin JL, Sieber SO, Slifer S, Speer MC, Spencer PS, Sproles DI, Swenberg JA, Suk WA, Sullivan RC, Tian R, Tennant RW, Todd SA, Tucker CJ, Van Houten B, Weis BK, Xuan S, Zarbl H: Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods. 2005, 2 (5): 351-356. 10.1038/nmeth754.PubMedView ArticleGoogle Scholar
- Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J: Independence and reproducibility across microarray platforms. Nat Methods. 2005, 2 (5): 337-344. 10.1038/nmeth757.PubMedView ArticleGoogle Scholar
- Norgren RB: Expression arrays for macaque monkeys. Transplantation Rev. 2006, 20: 115-120. 10.1016/j.trre.2006.05.006.View ArticleGoogle Scholar
- Chismar JD, Mondela T, Fox HS, Roberts E, Langford D, Masliah E, Salomon DR, Head SR: Analysis of result variability from high-density oligonucleotide arrays comparing same-species and cross-species hybridizations. Biotechniques. 2002, 33: 516-524.PubMedGoogle Scholar
- Wang Z, Lewis MG, Nau ME, Arnold A, Vahey MT: Identification and utilization of inter-species conserved (ISC) probesets on Affymetrix human GeneChip platforms for the optimization of the assessment of expression patterns in non human primate (NHP) samples. BMC Bioinformatics. 2004, 5 (1): 165-10.1186/1471-2105-5-165.PubMed CentralPubMedView ArticleGoogle Scholar
- Parman C, Halling C: affyQCReport: QC Report Generation for Affy Batch objects. R package version 1.8.0. 2005Google Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001, 98 (9): 5116-5121. 10.1073/pnas.091062498.PubMed CentralPubMedView ArticleGoogle Scholar
- Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3 (1): Article3-PubMedGoogle Scholar
- Wang Y, Barbacioru C, Hyland F, Xiao W, Hunkapiller KL, Blake J, Chan F, Gonzalez C, Zhang L, Samaha RR: Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays. BMC Genomics. 2006, 7: 59-10.1186/1471-2164-7-59.PubMed CentralPubMedView ArticleGoogle Scholar
- Qin LX, Beyer RP, Hudson FN, Linford NJ, Morris DE, Kerr KF: Evaluation of methods for oligonucleotide array data via quantitative real-time PCR. BMC Bioinformatics. 2006, 7: 23-10.1186/1471-2105-7-23.PubMed CentralPubMedView ArticleGoogle Scholar
- Yoshida S, Mears AJ, Friedman JS, Carter T, He S, Oh E, Jing Y, Farjo R, Fleury G, Barlow C, Hero AO, Swaroop A: Expression profiling of the developing and mature Nrl-/- mouse retina: identification of retinal disease candidates and transcriptional regulatory targets of Nrl. Hum Mol Genet. 2004, 13 (14): 1487-1503. 10.1093/hmg/ddh160.PubMedView ArticleGoogle Scholar
- Jeong JW, Lee KY, Kwak I, White LD, Hilsenbeck SG, Lydon JP, DeMayo FJ: Identification of murine uterine genes regulated in a ligand-dependent manner by the progesterone receptor. Endocrinology. 2005, 146 (8): 3490-3505. 10.1210/en.2005-0016.PubMedView ArticleGoogle Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31 (4): e15-10.1093/nar/gng015.PubMed CentralPubMedView ArticleGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80-10.1186/gb-2004-5-10-r80.PubMed CentralPubMedView ArticleGoogle Scholar
- Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R: NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res. 2005, 33 (Database issue): D562-6. 10.1093/nar/gki022.PubMed CentralPubMedView ArticleGoogle Scholar
- Bolstad B: Model based QC Assessment of Affymetrix GeneChips. Bioconductor Vignettes. 2005, [http://www.bioconductor.org/docs/vignettes.html]Google Scholar