Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations
© Shippy et al; licensee BioMed Central Ltd. 2004
Received: 19 April 2004
Accepted: 02 September 2004
Published: 02 September 2004
Despite the widespread use of microarrays, much ambiguity regarding data analysis, interpretation and correlation of the different technologies exists. There is a considerable amount of interest in correlating results obtained between different microarray platforms. To date, only a few cross-platform evaluations have been published and unfortunately, no guidelines have been established on the best methods of making such correlations. To address this issue we conducted a thorough evaluation of two commercial microarray platforms to determine an appropriate methodology for making cross-platform correlations.
In this study, expression measurements for 10,763 genes uniquely represented on Affymetrix U133A/B GeneChips® and Amersham CodeLink™ UniSet Human 20 K microarrays were compared. For each microarray platform, five technical replicates, derived from the same total RNA samples, were labeled, hybridized, and quantified according to each manufacturers' standard protocols. The correlation coefficient (r) of differential expression ratios for the entire set of 10,763 overlapping genes was 0.62 between platforms. However, the correlation improved significantly (r = 0.79) when genes within noise were excluded. In addition to levels of inter-platform correlation, we evaluated precision, statistical-significance profiles, power, and noise levels for each microarray platform. Accuracy of differential expression was measured against real-time PCR for 25 genes and both platforms correlated well with r values of 0.92 and 0.79 for CodeLink and GeneChip, respectively.
As a result of this study, we recommend using only genes called 'present' in cross-platform correlations. However, as in this study, a large number of genes may be lost from the correlation due to differing levels of noise between platforms. This is an important consideration given the apparent difference in sensitivity of the two platforms. Data from microarray analysis need to be interpreted cautiously and therefore, we provide guidelines for making cross-platform correlations. In all, this study represents the most comprehensive and specifically designed comparison of short-oligonucleotide microarray platforms to date using the largest set of overlapping genes.
There are several commercial microarray systems currently available on the market for genome-scale gene expression analysis. Different microarray manufacturers provide distinct underlying technologies, protocols and reagents specific to each system . Despite the widespread use of microarrays, much ambiguity regarding data analysis, interpretation and correlation of the different technologies exists. There is a need for standardization that will facilitate comparison of microarray data from different platforms . Comparison and cross-validation between microarray platforms would greatly increase the understanding and value of the wealth of data generated from each microarray experiment . A number of cross platform comparisons have reported a failure to demonstrate an acceptable level of correlation between different microarray technologies [4–7]. Some of the difficulties in correlating data can be attributed to fundamental differences between cDNA and oligonucleotide based microarray technologies. For example, target preparation differences and single vs. dual labeling techniques complicate the comparisons. Furthermore, cDNA arrays have difficulty in distinguishing between splice variants and highly homologous genes, while oligonucleotide arrays can make these distinctions if designed appropriately. However, when considering oligonucleotide platforms, which have widespread popularity, direct comparisons between different platforms should be less complex and more direct. We assert that differences in platform sensitivity, reproducibility and annotation cross-referencing accuracy account for a majority of the irreconcilable differences previously reported between different platforms [4–7]. When considering these factors we demonstrate a strong correlation between expression ratio data from two different commercially available short oligonucleotide based microarray technologies. This paper provides a comprehensive guideline for microarray analysis, interpretation and cross-platform correlation.
There are two commercially available high-density microarray platforms that use short oligonucleotides for expression profiling. CodeLink (GE Healthcare formerly Amersham Biosciences, Chandler, AZ) and GeneChip (Affymetrix, Santa Clara, CA) microarray platforms utilize oligonucleotide gene target probes of 30 and 25 bases, respectively. Some of the notable differences between the GeneChip and CodeLink systems are, respectively, multiple probes vs. one pre-validated probe per gene target, two-dimensional surface vs. three-dimensional array matrix, and in situ synthesized oligonucleotides vs. pre-synthesized, non-contact oligonucleotide deposition. We restricted our comparative analysis to these two platforms because these systems are most similar with respect to oligonucleotide length, target preparation, and single color indirect labeling methodology. Since these commercial assays are similar, and systematic variables were isolated by using the same total RNA starting material for all target preparations, we expected disparity in performance to reflect differences inherent to the microarray platforms. To provide data for comparison of the platforms, five technical replicates of brain and pancreas were processed on each platform and the results were compared for reproducibility, sensitivity, and similarity of results. Standard manufacturer-recommended protocols and settings were employed to obtain the raw data from each platform. In the case of Affymetrix GeneChip, a recent cross-platform microarray evaluation  used two additional algorithms [8, 9] for analysis of the GeneChip data and found the same level of discordance across the three analysis algorithms as was observed in the cross-platform microarray comparisons . We therefore restricted our analysis of the GeneChip data to the Affymetrix recommended MAS 5.0 software . This methodology was followed to simulate the results users would achieve by following current protocols supplied with each microarray system.
Two different tissue types were compared in this study to ensure a large number of differentially expressed genes, and provide expression ratios across a wide dynamic range for derivation of the correlation coefficient between the two platforms. The array-to-array precision of each microarray platform was calculated from the five replicates within each tissue sample.
False-change rate for GeneChip and CodeLink microarray platforms. The false-change rate is defined as the percentage of ratios, derived from the population of concordantly 'present' genes, which fall outside 2-fold (i.e. |log2 ratio| > 1). The table above contains the average and standard deviation of the false-change rate, calculated across the 10 pair-wise array combinations within a sample. False-change rate was calculated from signals above noise across the arrays being compared.
Precision ratio summary for GeneChip and CodeLink microarray platforms. Precision measurements were calculated from signals above noise across the arrays being compared. For CodeLink, there were 7,882 and 6,603 ratios, on average, for each pair-wise array-to-array comparison, within brain and pancreas respectively. For GeneChip, there were 6,734 and 5,137 ratios, on average, for each pair-wise array-to-array comparison, within brain and pancreas respectively. For each of the 10 pair-wise combinations, the ratio range within 95% of the ratios fall was calculated. This table contains the average and standard deviation, in which 95% of ratios fall within, across all 10 pair-wise array combinations within a sample.
List of genes evaluated using qrtPCR. For each gene, the microarray and qrtPCR brain/pancreas log2 ratios are listed. Raw CT values, qrtPCR primer/probe sequences, and corresponding array probe names are available in supplementary material [see additional files 1, 2, and 3, respectively].
cytochrome c oxidase subunit VIIa polypeptide 2 like
collagen, type VI, alpha 3, transcript variant 1
peroxiredoxin 3, nuclear gene encoding mitochondrial protein
cyclin-dependent kinase inhibitor 1A
nuclear transport factor 2
CCAAT/enhancer binding protein, delta
collagen, type IX, alpha 3
transforming growth factor, alpha
estrogen receptor 1
flavin containing monooxygenase 3
v-akt murine thymoma viral oncogene homolog 1
phosphoribosyl pyrophosphate synthetase subunit I
replication protein A3
slit homolog 2 (Drosophila)
HIC protein isoform p40 and HIC protein isoform p32
transcription factor SMIF
transcription factor CP2-like 1
peptidylprolyl isomerase E (cyclophilin E)
hypothetical protein FLJ14800
late upstream transcription factor
xylosylprotein beta 1,4-galactosyltransferase, polypeptide 7
Increased access and utilization of microarray data through core facilities and affordable commercial microarray systems is driving the need for direct comparisons of data between the different available platform technologies. The ability to exchange data across different platforms gives the research community the ability to cross-validate results and extend understanding of biological processes through integration of published data collected with different technologies. The results presented here demonstrate that we are closer to reaching this goal than previously reported [4–7].
We have compared two commercial platforms and in doing so present several steps required for making comparisons between short oligonucleotide microarray data sets. First, one must normalize annotation. Unfortunately, despite the completion of the human, rat and mouse genome sequencing projects, accurate and stable gene annotation information is not available. The existence of inaccurate sequence information, absence of an exact gene count, incomplete understanding of splicing variations, and the complexity of highly homologous gene sequences all contribute to the challenges of generating a controlled vocabulary for uniquely and constantly annotating genes at the present time. In addition, when considering commercially available arrays, the consumer is left to rely on the manufacturer to provide a probe with a one to one correlation to the intended gene target. Furthermore, until recently manufacturers have withheld the release of the exact probe sequences to researchers . Now that with a simple disclosure agreement probe sequences from the major manufacturers are readily available to the users, discrepancies in some results will be explained by differences in actual probe and probe sets targets as defined by sequence homology. Some probes target different or multiple splice variants and some probes are not specific to a single gene, but instead, target multiple homologous genes. Since the use of GeneChip probe sequences for deriving inter-platform overlap is currently prohibited by Affymetrix for publication purposes, we needed to rely upon public annotation to determine the overlap between products rather than more informative sequence-based comparisons. We believe that the use of probe sequences will help to further refine the accuracy of the gene overlap set, and increase the already strong correlation between platforms demonstrated here. In addition, without the use of sequence information, we filtered the data to include only those probes and probe sets that identify a specific gene target or common regions of splice variants of a single gene target. Both manufacturers in some cases carry multiple probes or probe sets per target gene. Trying to determine which probes to compare in this case without the use of sequence information is nearly impossible. Therefore, only uniquely represented gene probes by both manufacturers were used for comparisons. By employing this conservative methodology, we reduce the risk of inappropriately comparing data from probes designed to detect different transcripts or genes despite having a similar annotation. Importantly, we used a common build of UniGene cluster IDs to find unique gene probes which overlap between the two products.
When comparing between the two platforms using tissue ratio data without regard for noise, the correlation between platforms is not very strong (r = 0.62, Figure 3A), similar to what was reported by Tan et al. 2003 . This brings us to the second step, removing background signals. Considering background noise has random sources and sources that are different in nature for the two platforms, one would not expect to find a strong correlation when using noise values in platform comparisons. Each manufacturer warns users to be critical of confidence in calls that are below the defined threshold or considered 'absent'. Therefore, we removed noise and made correlations based only on calls that were 'present' in both tissue samples and microarray systems. Kuo et al. made a limited but similar attempt to reduce noise by using what they termed a "variance filter" . Our process of filtering noise reduced the overlap of 10,763 genes to 3,362, 2,569 or 1,760 genes if one accepts 'present' calls on at least 1, 3 or all 5 of the array replicates, respectively, across both tissues and platforms. Using this methodology, however, we found a stronger ratio correlation between the two platforms (r = 0.70, 0.74 or 0.79, Figure 3B,3C,3D). We have found that when limiting the comparison set to those probes which are uniquely represented, specific for their targets of interest, and called 'present' in the samples tested on each platform, the correlation between technologies is very reasonable for data sharing. Supporting this methodology, a recent study found a substantial improvement in the correlation between spotted long-oligo arrays and the Affymetrix platform with data filtering by removing low intensity signals below the median . Interestingly, when Barczak and colleagues removed low intensity signals, the Pearson correlation coefficient improved from 0.60 to 0.80, which is in the same range as in our study . Rather than removing all low intensity signals below the median, we recommend data filtering by using each manufacturer's standard software package to identify those genes which are within noise. This approach to filtering noise offers great value to microarray users since our recommendation does not require the immediate loss of 50% of the data in making cross-platform comparisons.
Finally, an alternative expression-profiling technology, qrtPCR, was used to follow up on a smaller subset of the concordantly correlated set to demonstrate that the data generated here was not merely an anomaly specific to oligonucleotide arrays (Figure 7). Both platforms correlated well to this alternative expression-profiling technology with r values of 0.92 and 0.79 for CodeLink and GeneChip, respectively. Previous studies have found agreement between genes screened with microarray technology and subsequent qrtPCR verification of those expression measurements [12, 13]. We are in the experimental process of using qrtPCR with a larger set of genes as an independent method to resolve discordant gene expression results between the two microarray platforms.
The comparison described here parses the data into three sets: (1.) Concordantly 'present' which was used to calculate the correlation comparisons; (2.) Concordantly 'absent', where both platforms agree that the transcript is not 'present' in the samples tested; and (3.) 'Present' on one microarray platform but not the other, which are considered a separate set of discrepant results. In the studies presented here, the CodeLink platform generates a higher percentage of detectable signals above noise (Figures 1, 2, and 5). This finding is consistent for all replicate arrays across both tissues analyzed. Previously, Ramakrishnan et al. 2002 reported detection down to an estimated sensitivity level between 1:750,000 and 1:900,000 for the CodeLink platform . However, biological validity of these low level calls by qrtPCR or other method have not been confirmed the results. In addition, a significant number of signals were detected by the GeneChip platform and were not detected by the CodeLink platform. Therefore, follow up studies are necessary to definitively determine which of the discordant calls are biologically relevant and which may be potential false positive calls. It would be informative to understand the underlining basis of the discordant calls. Assigning cause such as differences in sensitivity, analysis algorithms, or characteristics of the two platforms would be of great values to furthering comparative studies.
Discrepant calls between the two platforms may derive from differences in the GeneChip and CodeLink platform technologies. The platforms differ in the oligodeoxyribonucleotide probe length and number of probes per gene. A microarray study, using covalently attached oligodeoxyribonucleotides, found that 30- and 35-mer oligodeoxyribonucleotides generated signals two- to five-fold higher than 25-mers . Relogio et al. suggested that 30-mers offered the best compromise between sensitivity and specificity . However, the GeneChip platform offers multiple probes per gene, potentially offsetting the need for longer probes through multiple hybridization points. The CodeLink platform contains one pre-validated probe per gene that was screened for performance from an original panel of three probes per gene. Previous research has demonstrated that one probe per gene is sufficient to accurately measure differential expression . Having one pre-validated probe per gene rather than a panel of probes per gene on a microarray platform may be advantageous towards improving sensitivity since there is no requirement that many probes within a gene must agree for expression to be detected and called. A single probe must, however, be very accurately designed to cover the range of splice variants feasible, and must reside in an area accessible to the RNA or DNA fragments hybridizing. Variation in signals may also derive from the nature of the substrate for probe attachment. Previous publications have indicated that the use of a three-dimensional matrix coated slide results in a larger number of potential attachment sites than modified glass [17–19]. Stillman and Tonkinson  have shown higher specific hybridization signals on a three-dimensional matrix compared with glass. In addition, it has been demonstrated that the CodeLink three-dimensional matrix allows for reduced steric hindrance and increased availability of the entire oligonucleotide for hybridization with its intended target . Side-by-side comparisons of the performance of the same probe set and analysis technique would be required to confirm any contribution to discrepant results observed in this study.
Discrepant calls between the two platforms may also likely derive from differences in the GeneChip and CodeLink analysis algorithms. The use of mismatches on the GeneChip platform may limit detection since others have reported that, in general, one third of GeneChip mismatches are higher in signal than their perfect match counterparts [9, 22, 23]. Alternative analysis methodologies that do not utilize the mismatch controls may alter the discordant set, but as described earlier, there is a large potential variation in the different methodologies and a lack of a single majority method. Therefore, we chose to analyze the dataset in this study with the MAS 5.0 algorithms, as recommended by Affymetrix. It is likely that each of the aforementioned factors, in addition to annotation differences, contribute to variable results, and taken together account for the set of discrepant calls observed between the GeneChip and CodeLink platforms (Figure 5).
This paper highlights the value of separating signal from noise in order to improve microarray cross-platform correlations. We also demonstrate a stronger correlation between platforms than previously reported based on our data filtering and parsing methodology. We believe there is strong similarity in calls by each system and differences in sensitivity and levels of noise are largely responsible for lower levels of correlation. Furthermore, as a standardized annotation system develops and freely open access to the use of microarray probe sequences is realized, it will help clear up discrepancies on a case by case basis.
Array design and fabrication
CodeLink UniSet Human 20 K Bioarray (Amersham Biosciences, Chandler, AZ) contains a collection of approximately 20,289 probes within a single reaction chamber on each individual slide. All oligonucleotide probes are 30 bases in length. The core of the CodeLink platform is a glass slide coated with a polyacrylamide gel matrix to create a three-dimensional aqueous hybridization environment. Modified 5'-amine-terminated oligonucleotides are deposited onto the polymer using piezoelectric dispensing robots and then covalently attached to activated functional groups within the gel matrix. Oligonucleotides are co-dispensed with a fluorescein-derivative dye, which enables scanning and inspection of every feature element on every slide after the dispensing. Additional sites are then blocked and slides are washed, rinsed and dried prior to attachment of an integrated, proprietary, polypropylene hybridization chamber. All probes appearing on the final product have been pre-validated for performance and screened from an original panel of up to three probes per gene.
The HG-U133 GeneChip Set from Affymetrix (Santa Clara, CA, USA) contains 44,928 probes, on 2 chips, that represent 42,676 unique sequences from the GenBank database corresponding to 28,036 unique UniGene clusters. The GeneChip technology is based on a photolithographic in situ synthesis. Individual probes consist of 25 base DNA sequences.
Target preparation and array hybridization
One lot of human brain and pancreas total RNA (brain lot#033P010402009A and pancreas lot#022P0102B from Ambion) was assessed for quality using the Agilent 2100 Bioanalyzer and split equally between Amersham Biosciences in Chandler, Arizona and the Genomics Shared Service at the Arizona Cancer Center. The Affymetrix target preparations and hybridizations were performed entirely at the Arizona Cancer Center to ensure that these microarrays were run by an independent party with GeneChip expertise. In addition, an aliquot from these lots of total RNA was saved and subsequently used in qrtPCR reactions for verifying the expression profiles obtained by each microarray platform.
For each Affymetrix GeneChip, double-stranded cDNA was synthesized from 5 ug of total RNA with the SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen) and dT24-T7 primer (Operon) according to the manufacturer's instructions. Biotin-labeled cRNA was prepared by in vitro transcription using the BioArray High Yield RNA Transcript Labeling Kit (Enzo). The dsDNA was mixed with 1× HY reaction buffer, 1× biotin labeled ribonucleotides (NTPs with Bio-UTP and Bio-CTP), 1× DTT, 1× RNase inhibitor mix and 1× T7 RNA polymerase. The mixture was incubated at 37°C for 5 hours. The labeled cRNA was then purified using an RNeasy mini kit (Qiagen) according to the manufacturer's protocol and ethanol precipitated. Fragmentation of cRNA, hybridization, washing, staining, and scanning were performed as described in the Affymetrix GeneChip Expression Analysis Technical Manual . Briefly, the purified cRNA was fragmented in 1× fragmentation buffer (40 mM Tris-acetate, 100 mM KOAc, 30 mM MgOAc) at 94°C for 35 minutes. For hybridization with GeneChip cartridge (Affymetrix), 15 ug of fragmented cRNA was incubated with 50 pM control oligonucleotide B2, 1× eukaryotic hybridization control (1.5 pM BioB, 5 pM BioC, 25 pM BioD, and 100 pM cre), 0.1 mg/ml herring sperm DNA, 0.5 mg/ml acetylated BSA and 1× manufacturer recommended hybridization buffer, and hybridization was performed with a GeneChip Fluidic Station (Affymetrix) using the appropriate antibody amplification, washing and staining protocol. The phycoerythin-stained array was scanned, resulting in a digital image file. In all, 5 replicates of U133A and U133B were processed for each total RNA sample. Therefore, 10 target preparation reactions were performed for each of the two tissues to generate the necessary cRNA for this study.
For each CodeLink Bioarray, double-stranded cDNA and subsequent cRNA was synthesized from 5 ug of total RNA using the CodeLink Expression Assay Kit (Amersham Biosciences) according to manufacturer's instructions . Briefly, cRNA was prepared by in vitro transcription using a single, labeled nucleotide, biotin-11-UTP in the IVT reaction at a concentration of 1.25 mM. Unlabeled UTP was present at 3.75 mM, while GTP, ATP, and CTP were at 5 mM. The mixture was incubated at 37°C overnight for 14 hours. The labeled cRNA was then purified using an RNeasy® mini kit (Qiagen). Fragmentation of cRNA, hybridization, washing, staining, and scanning were performed as described . Briefly, the purified cRNA was fragmented in 1× fragmentation buffer (40 mM Tris-acetate pH 7.9, 100 mM KOAc, 31.5 mM MgOAc) at 94°C for 20 minutes. For hybridization with CodeLink bioarrays (Amersham Biosciences), 10 ug of fragmented cRNA in 260 ul of hybridization solution was added to each bioarray via the Flex Chamber port and incubated for 18 hours at 37°C, while shaking at 300 r.p.m. in a New Brunswick Innova™ 4080 shaking incubator. The 10 bioarrays, in this study, were processed in parallel using the CodeLink Shaker Kit and CodeLink Parallel Processing Kit (Amersham Biosciences). Bioarrays were stained with Cy5™-streptavadin (Amersham Biosciences) and scanned using a GenePix® 4000 B scanner (Axon Instruments).
Deriving expression values and classifying probes within noise ('absent') for each platform
For the U133 GeneChip technology, each gene is represented by 11 probe pairs containing both a perfect match probe (PM) and a mismatch probe (MM) where the middle (13th) base of each 25-mer probe is incorrect. The MM probe is designed to give an indication of the degree of nonspecific hybridization . The MAS 5.0 software uses both PM and MM values for the expression calculation, one that avoids the production of negative values. MAS 5.0 employs a scenario-based approach to expression calculations and in general hypothesizes that MM probes should show lower hybridization signal than the corresponding PM probes. A decision process is used when this PM > MM assumption is broken. When all MM values are less than their PM counterparts, an expression value is calculated using a one-step bi-weight estimate of the log(PM – MM) values for each probe pair. However, when the MM value for a probe pair is greater than the PM value, two differing scenarios are applied. 1.) If the values of the PM probes are sufficiently large and separable from the background and MM signals, then the MM value is replaced with a value calculated as typical for the probe set. 2.) If it is difficult to separate the probe signals from background then the MM signal is substituted with a value slightly less than the PM signal. Once an expression value is calculated for each probe set the next step is the calculation of a Detection p-value and the comparison of each Discrimination score to the user-definable threshold (Tau). Tau is a small positive number that can be adjusted to increase or decrease sensitivity and/or specificity of the analysis (default value = 0.015). The One-sided Wilcoxon's Signed Rank test is the statistical method employed to generate the Detection p-value. It assigns each probe pair a rank based on how far the probe pair discrimination score is from Tau. The user-modifiable Detection p-value cut-offs, Alpha 1 (α1) and Alpha 2 (α2) provide boundaries for defining 'Present', 'Marginal' or 'Absent' calls. At the default settings (α1 = 0.04 and α2 = 0.06), any p-value that falls below α1 is assigned a 'Present' call, and above α2 is assigned an 'Absent' call. 'Marginal' calls are given to probe sets which have p-values between α1 and α2. In our study, the MAS 5.0 default parameters were retained. For a complete description of the MAS 5.0 algorithms and statistical tests please refer to the Affymetrix manuals [10, 27, 28].
For the CodeLink bioarrays, spot signals are quantified using ImaGene 5.5 software (BioDiscovery, Marina Del Ray, CA). The mean intensity is taken for each spot and background corrected by subtracting the surrounding median local background intensity. A spot is considered 'absent' (within noise) if the spot's signal mean is less than its corresponding local background mean plus one standard deviation of local background pixels. For each probe the local background is comprised of a circular area of pixels surrounding the segmented signal. The image segmentation and quantification process is outlined in the ImaGene 5.5 user's manual .
Cross-platform comparisons of expression data
To facilitate comparisons between data sets, CodeLink probes and GeneChip probe sets were mapped to specific sequence clusters according to the NCBI Human UniGene build #166 relative to the manufacturer's provided NCBI accession numbers. Multiple probe or probe sets targeting a single UniGene cluster or single probe or probe sets targeting multiple clusters were removed from consideration. The overlapping and uniquely represented UniGene clusters were used to identify 10,763 gene probes for comparison between platforms. Gene-expression values were global linearly normalized according to manufacturers' standard normalization procedure [9, 26]. The 96% trim-mean of the entire GeneChip array was used for Affymetrix normalization while CodeLink values were normalized against the array median. The globally normalized data from both platforms were scaled to 1.0 in order to bring both platforms to the same intensity range for comparative purposes. The analysis was performed using SAS statistical software and Microsoft Excel.
Power analysis of CodeLink and GeneChip platforms
A power analysis is a computational tool used to determine the replication needed to achieve a desired level of confidence in results from a particular experiment [30–32]. Determining the number of microarray replicates necessary for classification of expression profiles has been presented as an important issue [33, 34] and should be one of the first things to consider when designing any experiment. Fore each tissue we hybridized the same target on each of five microarrays; therefore the expected fluorescence values for each independent probe should be the same from each array to array replicate, making the expected fold change equal to 1 (i.e. μ1 = μ2). The power analysis was modeled from log2 transformed ratios derived from all pair-wise array-to-array combinations across the five replicates within the brain sample, since this tissue had the greatest similarity in performance between microarray platforms. Expression profiling of the pancreas sample showed many more genes within noise ('absent') for the GeneChip platform relative to CodeLink. The power analysis was conducted as previously described [35, 36] for the population of all 10,763 genes within each platform and the population of genes above noise ('present').
The TaqMan® One-Step RT-PCR Master Mix Reagent Kit (Applied Biosystems, Foster City, CA, USA) was used with each custom designed, gene-specific primer/probe set to amplify and quantify each transcript of interest. Optimal primer/probe sets were selected using Primer Express software version 1.0 B6 (Applied Biosystems). Reactions (25 ul) contained 100 ng of total RNA, 300 nM forward and reverse primers, 200 nM TaqMan probe, 12.5 uL 2X Master Mix without the enzyme uracil DNA glycosylase (UNG), 0.625 mL MultiScribe™ and RNAase Inhibitor Mix, and 6.875 uL RNAse-free water. RT-PCR amplification and real-time detection were performed using an ABI PRISM 7700 Sequence Detection System (Applied Biosystems) for 30 min at 48°C (reverse transcription), 10 min at 95°C (AmpliTaq Gold activation), 38 cycles of denaturation (15 s at 95°C), and annealing/extension (60 s at 60°C). Data were analyzed using ABI PRISM Sequence Detection Software version 1.6.3 and then further processed using Microsoft® Excel (Microsoft, Redmond, WA). Cyclophilin (PPIE) served as the endogenous control for the normalization of input target RNA. Raw CT values, qrtPCR primer/probe sequences, and corresponding array probe names are available in supplementary material [see additional files 1, 2, and 3, respectively].
Affymetrix GeneChip analyses were performed by the Genomics Shared Service at the Arizona Cancer Center, supported by grant CA23074.
- Sendera TJ, Dorris D, Ramakrishnan R, Nguyen A, Trakas D, Mazumder A: Expression Profiling with Oligonucleotide Arrays: Technologies and Applications for Neurobiology. Neurochem Res. 2002, 10: 1005-1026. 10.1023/A:1020948603490.View ArticleGoogle Scholar
- Brazma A, Hingamp P, Quackenbush J, Sherlcok G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics. 2001, 29: 365-371. 10.1038/ng1201-365.View ArticlePubMedGoogle Scholar
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30: 207-210. 10.1093/nar/30.1.207.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics. 2002, 18: 405-412. 10.1093/bioinformatics/18.3.405.View ArticlePubMedGoogle Scholar
- Li J, Pankratz M, Johnson JA: Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicol Sci. 2002, 69: 383-390. 10.1093/toxsci/69.2.383.View ArticlePubMedGoogle Scholar
- Kothapalli R, Yoder SJ, Mane S, Loughran TP: Microarray results: how accurate are they?. Bioinformatics. 2002, 3: 22-10.1186/1471-2105-3-22.PubMed CentralPubMedGoogle Scholar
- Tan PK, Downey TJ, Spitznagel EL, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC: Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 2003, 31: 5676-5684. 10.1093/nar/gkg763.PubMed CentralView ArticlePubMedGoogle Scholar
- Li C, Wong W: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science USA. 2001, 98: 31-36. 10.1073/pnas.011404098.View ArticleGoogle Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31: e15-10.1093/nar/gng015.PubMed CentralView ArticlePubMedGoogle Scholar
- Affymetrix: Microarray Suite User Guide, Version 5. Affymetrix. 2001, [http://www.affymetrix.com/support/technical/manuals.affx]Google Scholar
- Barczak A, Rodriquez MW, Hanspers K, Koth LL, Tai YC, Bolstad BM, Speed TP, Erle DJ: Spotted long oligonucleotide arrays for human gene expression analysis. Genome Res. 2003, 13: 1775-1785. 10.1101/gr.1048803.PubMed CentralView ArticlePubMedGoogle Scholar
- Taniguchi M, Miura K, Iwao H, Yamanaka S: Quantitative assessment of DNA microarrays-comparison with Northern blot analysis. Genomics. 2001, 71: 34-39. 10.1006/geno.2000.6427.View ArticlePubMedGoogle Scholar
- Yuen T, Wurmbach E, Pfeffer RL, Ebersole BJ, Sealfon SC: Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. Nucleic Acids Res. 2002, 30: e48-10.1093/nar/30.10.e48.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramakrishnan R, Dorris DR, Lublinsky A, Nguyen A, Domanus M, Prokhorova A, Gieser L, Touma E, Lockner R, Tata M, Shippy R, Sendera T, Mazumder A: An assessment of Motorola CodeLink™ microarray performance for gene expression profiling applications. Nucleic Acids Res. 2002, 30: e30-10.1093/nar/30.7.e30.PubMed CentralView ArticlePubMedGoogle Scholar
- Relogio A, Schwager C, Richter A, Ansorge W, Valcarcel J: Optimization of oligonucleotide-based DNA microarrays. Nucleic Acids Res. 2002, 30: e51-10.1093/nar/30.11.e51.PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniatis SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH, Linsley PS: Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001, 19: 342-347. 10.1038/86730.View ArticlePubMedGoogle Scholar
- Livshits MA, Mirzabekov AD: Theoretical analysis of the kinetics of DNA hybridization with gel-immobilized oligonucleotides. Biophys J. 1996, 71: 2795-2801.PubMed CentralView ArticlePubMedGoogle Scholar
- Afanassiev V, Hanemann V, Wolfl S: Preparation of DNA and protein microarrays on glass slides coated with agarose film. Nucleic Acids Res. 2000, 28: e66-10.1093/nar/28.12.e66.PubMed CentralView ArticlePubMedGoogle Scholar
- Lindroos K, Liljedahl U, Raitio M, Syvanen AC: Minisequencing on oligonucleotide microarrays: comparison of immobilization chemistries. Nucleic Acids Res. 2001, 29: e69-10.1093/nar/29.13.e69.PubMed CentralView ArticlePubMedGoogle Scholar
- Stillman BA, Tonkinson JL: Expression microarray hybridization kinetics depend on length of the immobilized DNA but are independent of immobilization substrate. Anal Biochem. 2001, 295: 149-157. 10.1006/abio.2001.5212.View ArticlePubMedGoogle Scholar
- Dorris DR, Nguyen A, Gieser L, Lockner R, Lublinsky A, Patterson M, Touma E, Sendera TJ, Elghanian R, Mazumder A: Oligodeoxyribonucleotide probe accessibility on a three-dimensional DNA microarray surface and the effect of hybridization time on the accuracy of expression ratios. BMC Biotechnology. 2003, 3: 6-10.1186/1472-6750-3-6.PubMed CentralView ArticlePubMedGoogle Scholar
- Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.View ArticlePubMedGoogle Scholar
- Naef F, Hacker C, Patil N, Magnasco M: Empirical characterization of the expression ratio noise structure in high-density oligonucleotide arrays. Genome Biol. 2002, 3: RESEARCH0018-10.1186/gb-2002-3-4-research0018.PubMed CentralView ArticlePubMedGoogle Scholar
- Affymetrix: Expression Analysis Technical Manual. Affymetrix. 2002, [http://www.affymetrix.com/support/technical/manuals.affx]Google Scholar
- Amersham Biosciences: CodeLink Gene Expression User Manual. Amersham Biosciences. 2003, [http://www5.amershambiosciences.com//aptrix?upp01077.nsf/Content/codelink_user_protocols]Google Scholar
- Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ: High density synthetic oligonucleotide arrays. Nat Genet. 1999, 21 (1 Suppl): 20-24. 10.1038/4447.View ArticlePubMedGoogle Scholar
- Affymetrix: Statistical algorithms description document. Affmetrix Santa Clara, CA. 2002Google Scholar
- Affymetrix: Statistical algorithms reference guide. Affmetrix Santa Clara, CA. 2002Google Scholar
- BioDiscovery: ImaGene 5.5 User Manual. BioDiscovery Inc. 2003, [http://www.BioDiscovery.com]Google Scholar
- Cohen J: Statistical power analysis for the behavioral sciences. 1988, Erlbaum, Hillsdale, NJ, 2Google Scholar
- Kraemer HC, Thiemann S: How Many Subjects? Statistical Power Analysis in Research. Sage Newbury Park, CA. 1987Google Scholar
- Mace AE: Sample-size determination. Krieger, Huntington, NY. 1974Google Scholar
- Pan W, Lin J, Le C: How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Biostatistics. 2001, University of MN Technical ReportGoogle Scholar
- Zien A, Fluck J, Zimmer R, Lengauer T: Microarrays: How many do you need?. Proceedings of the Sixth Annual International Conference on Computational Biology. 2002, RECOMBGoogle Scholar
- Hwang D, Schmitt WA, Stephanopoulos G, Stephanoloulos G: Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics. 2002, 18: 1184-1193. 10.1093/bioinformatics/18.9.1184.View ArticlePubMedGoogle Scholar
- Stafford P, Liu P: Microarray Technology Comparison, Statistical Analysis, and Experimental Design. Micorarrays Methods and Applications-Nuts & Bolts. 2003, DNA Press, 273-324.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.