Comparison of hybridization-based and sequencing-based gene expression technologies on biological replicates
© Liu et al; licensee BioMed Central Ltd. 2007
Received: 23 November 2006
Accepted: 07 June 2007
Published: 07 June 2007
High-throughput systems for gene expression profiling have been developed and have matured rapidly through the past decade. Broadly, these can be divided into two categories: hybridization-based and sequencing-based approaches. With data from different technologies being accumulated, concerns and challenges are raised about the level of agreement across technologies. As part of an ongoing large-scale cross-platform data comparison framework, we report here a comparison based on identical samples between one-dye DNA microarray platforms and MPSS (Massively Parallel Signature Sequencing).
The DNA microarray platforms generally provided highly correlated data, while moderate correlations between microarrays and MPSS were obtained. Disagreements between the two types of technologies can be attributed to limitations inherent to both technologies. The variation found between pooled biological replicates underlines the importance of exercising caution in identification of differential expression, especially for the purposes of biomarker discovery.
Based on different principles, hybridization-based and sequencing-based technologies should be considered complementary to each other, rather than competitive alternatives for measuring gene expression, and currently, both are important tools for transcriptome profiling.
During the last decade, a considerable number of high-throughput technologies for transcriptome profiling have been developed. These include hybridization-based technologies, such as DNA microarrays [1–3], and sequencing-based approaches like SAGE (Serial Analysis of Gene Expression)  and MPSS (Massively Parallel Signature Sequencing) . The power of DNA microarrays lies in the simultaneous hybridization of mRNA extract from biological samples to a pre-selected mRNA library, which can contain up to tens of thousands of various mRNA transcripts. The expression levels of each transcript are obtained by reading out intensities of hybridization signals. Sequencing-based methods are based on a substantially different strategy as compared to microarray technologies. SAGE and MPSS do not require any pre-compilation of an mRNA library of sequences, but instead, they use type IIS restriction endonucleases, i.e. tagging enzymes, to collect short tags (typically 10–22 bases) from each mRNA molecule, provided a relevant recognition site exists for an anchoring enzyme. Then, either by sequencing long concatamers of tags using conventional sequencer (SAGE), or by performing iterative parallel sequencing using a proprietary technique (MPSS), the identity of a sufficiently large amount of tags can be determined in an efficient manner. The abundance of each mRNA transcript is assumed to be proportional to the count of occurrence of its representative tag.
Favorable features of hybridization-based approaches include a significantly lower workload and relatively low cost. However, the probe collection on a chip, which necessarily relies on the coverage and the accuracy of both genomic sequences and clone libraries, presents a hard constraint on its detection power. In contrast, the "tag-and-count"-based methodologies require more advanced instruments that are more cost- and labor-intensive, but their capability of exhaustive transcript sampling allows the potential identification of novel mRNAs.
The present co-existence of various DNA microarray platforms and sequencing-based technologies offers biomedical research increased options for transcriptome profiling. It is, however, important to understand how to compare data generated by these different technologies. Recent studies indicate that performance of various microarray platforms, as measured by data consistency, have been shown to be comparable [6, 7]. Several attempts have also been made to compare heterogeneous types of technologies, for instance, between microarray and SAGE [8–15], and between microarray(s) and MPSS [16–20]. The results of these studies demonstrate moderate concordance between technologies.
In the present study, which is part of an ongoing cross-platform comparison framework , we compare gene expression data from MPSS with data from five different commercially available one-dye microarray platforms. The present study extends our previous study in three ways: 1) the inclusion of the Illumina BeadArray® and MPSS data, 2) the inclusion of gene expression data of a second pool of mouse retina (MRP2) for all microarray platforms, and 3) the investigation of variation across biological replicates, as measured over two different pools of mouse retina samples (MRP1 and MRP2). It has been a consensus that the use of biological replicates is an important element to ensure the reliability of microarray results . To our knowledge, this is the first study investigating the differences between microarray and MPSS data on biological replicates. MPSS libraries are usually constructed without technical replication and data variation across samples are generally estimated by applying a statistical model simulating the random sampling process during tag selection for sequencing . It remains unclear whether significant variation across samples examined by MPSS is comparable with those detected by microarrays. As our study included technical replicates for the microarray platforms, this provided a unique opportunity to investigate the sampling model by comparison to microarray data where technical replicates permitted a more robust statistical testing. The data sets of microarray data for the first mouse retina pool (MRP1) and mouse cortex (MC) were analyzed in our previous study, but some results on these are also included here for the sake of performing comparisons between the two mouse retina pools, as we believe the investigation of biological replicates is an important and novel aspect of this study.
In summary, our results showed that there were moderate, yet significant, correlations between microarray data and MPSS data, while the data from microarray platforms, including the recently included Illumina arrays, generally were well correlated. The majority of discrepant measurements between the technologies based on hybridization versus sequencing were genes with low-abundance transcripts. Tag-to-gene mapping ambiguity and the absence of tagging enzyme recognition site could also explain some of the discrepancy between MPSS and microarrays. Using two-way ANOVA and SAM, we examined the microarray data for the magnitude of data variation introduced by biological replicates and technical replicates, and showed that biological variations are smaller than platform variations. The genes we found most susceptible to variations between biological replicates were likely to be associated with metabolic pathways, biosynthesis, and binding related pathways. Due to their vulnerability to sample variation, any changes observed in these pathways and related genes should be interpreted more cautiously in biomarker discovery applications. Furthermore, as demonstrated in our study, the relative affordability of array replicates makes them useful to corroborate MPSS experiments where technical replicates are seldom feasible due to higher costs. For comprehensive transcriptome profiling, we suggest that complementary use of hybridization-based and sequencing-based technologies is likely to provide a better solution than pursuing a single type of technology alone.
Characterization of MPSS data
MPSS libraries for MRP1 and MRP2 were generated using 'Signature' MPSS protocol. Two alternative sequencing reactions conducted independently, i.e. two-stepper and four-stepper sequencing, provided two read-outs of tag sequences for each sample, and referred to as MPSS 17-bp and MPSS 20-bp, respectively.
In the 17-bp signature sequencing, a total number of 34,341 signatures were detected for MRP1, and 29,509 signatures for MRP2. The total number of signatures was obtained in the 20-bp signature sequencing was 34,424 and 30,967, for MRP1 and MRP2, respectively.
The assignments of signature to gene were performed as described in the Methods section. Only the reliable MPSS signatures were kept in the downstream analysis. In the 17-bp signature libraries, 6,001 and 5,340 unique UniGene identifiers for MRP1 and MRP2, respectively, were identified. This corresponds to the sum of transcript copies as 615,944 and 615,771 tpm (transcript per million). For the 20-bp tag collections, the numbers of unique UniGene identifiers were slightly smaller: 5,793 and 5,125 for MRP1 and MRP2, respectively, which corresponds to the sum of transcript copies as 655,919 and 652,588 tpm.
Characterization of microarray data
All platforms showed generally good consistency among technical replicates for all samples, both in terms of CVs and correlation coefficients (see Additional file 1). These two metrics did not reveal noticeable differences between the two mouse retina mRNA pools within each platform.
In order to compare data from microarray platforms, we calculated both Pearson and Spearman correlation coefficients for absolute expression levels and relative expression changes. To transform expression measurements from diverse platforms onto a common scale, percentile transformation was applied to absolute expressions for each array data set, and then the median of percentile transformed intensities across replicated measurements per gene was used (see Methods for details). Filtering was applied but was not observed to increase the correlations between intensities, while the correlations between log2ratios were considerably improved with filtering.
The inter-platform data agreement by measuring correlation coefficients is shown in Additional file 1. The correlations between Illumina and the other platforms were generally lower than between other pairs of microarray platforms. This could possibly be due to the lack of technical replicates for Illumina.
Comparison between microarrays and MPSS
Statistics of overlapped genes between microarray and MPSS
Statistics on overlapping genes between microarray platforms and MPSS libraries.
Correspondence between microarray and MPSS
Pearson correlation coefficients across data sets
MPSS (17 bp)
MPSS (20 bp)
MPSS (17 bp)
MPSS (20 bp)
Furthermore, we found that 305 genes (286 in MRP1 and 272 in MRP2) were present across five microarray platforms, but were not detected by MPSS, neither from the 17-bp library nor the 20-bp library. For both libraries, about 64% of these genes were filtered out in the tag-to-gene mapping procedure. A total of 97 genes were not detected by MPSS at all (any sample or library), among which 11 were labeled "GATC"-negative. Using percentile-transformed data, we found that 67.8% of the remaining genes (60 of 86) were detected as having low expressions (percentile less than or equal to 50). This indicates that low-end sensitivity is the major reason for data discrepancy.
Taking the whole MPSS libraries as gold standard, we defined false negative detections by each microarray as the number of genes that were detected by MPSS, as well as printed on the chip, but were not detected by the microarray. We found that the proportion of false negatives in the five microarray platforms ranged from 12.99% (Affymetrix, MRP1) to 2.18% (ABI, MRP1).
Evaluation of data variations in biological replicates
Using SAM to characterize variation in detection of expression changes across biological replicates
Two-way ANOVA to identify individual genes with significant variation
Two-way ANOVA analysis was performed to identify the contribution of platforms and samples on data consistency. We examined filtered log2ratios (MRP1::MC and MRP2::MC) after normalization matched by both RS (RefSeq) and RSEXON (RefSeq ID and exon) across all four platforms where technical replicates were available. Only those genes that did not have missing values across the 10 measurements were used in this analysis. In RS- and RSEXON-based matching, 563 and 305 genes, respectively, satisfied the criterion. The threshold of statistical significance was chosen as p < 0.001.
Summary of two-way ANOVA results
total # of genes (i.e. RSs or RSEXONs)
# of genes that showed neither sample- nor platform- related variation
# of genes that showed sample-related variation
# of genes that showed platform-related variation
# of genes that showed both sample- and platform- related variation
Comparison of data variation between microarray and MPSS
From the four microarray platforms with technical replicates, we used one-way ANOVA to identify genes with differential expression in MRP1 versus MRP2. These results were then compared with the Z-test statistic commonly used for detecting differential expression in MPSS data. The microarray data used for this analysis was restricted to genes that had valid measurements for all ten technical replicates. We also excluded measurements from probes with ambiguous UniGene mapping, which was used to match microarray data with MPSS data. For the four microarray platforms, Affymetrix, Amersham, Mergen, and ABI, we obtained 4,783, 5,987, 3,610, and 8,460 UniGene identifiers with corresponding p-values from one-way ANOVA, respectively.
Comparison of differentially expressed gene identification between microarrays and MPSS
Number of genes included in the comparison
Number of genes considered as differentially expressed in both microarray and MPSS
Number of genes considered as having NO differential expression in both microarray and MPSS
Number of genes that are considered as differentially expressed in MPSS but not in the microarray
Number of genes that are considered as differentially expressed the microarray but not in MPSS
GSEA to summarize differences between biological replicates using biological themes
For each microarray platform, we applied GSEA to generate hypotheses regarding which biological processes or pathways might be responsible for the differences between the two replicate sample pools. Only the genes (RSs) which had valid values across all 10 chips were used, that is, 2,490 for Affymetrix, 4,899 for Amersham, 3,132 for Mergen, and 7,556 for ABI. In GSEA, the GO hierarchy level was set to 4, and the permutation times as 1000. Several biological themes resulted from GSEA with enrichment score, but none were statistically significant. Although different microarray platforms revealed different degrees of variation between MRP1 and MRP2, the GSEA analyses on the affected gene sets were similar among platforms. When considering enriched themes found in all four platforms, there were 15 GO terms from the "biological process" category enriched in MRP1 compared to MRP2 and 12 GO terms from the "molecular function" category. There were no terms found to be enriched in MRP2 common to all platforms [see Additional file 3].
With microarray technology being rapidly developed and advanced in the past decade, it has become an important tool for studying gene expression patterns. In parallel, the technologies based on the "tag-and-count" principle, such as SAGE and MPSS, are also being used for exploring the full transcriptome. Although some efforts in designing and adopting standards of microarray experiments (e.g. ERCC ) and data deposition (e.g. MIAME ) are paving the way towards data meta-analyses and integration, it remains a critical challenge to systematically compare cross-sample, cross-platform, and cross-technology data. To this end, we have established a framework which accommodates various platforms and various technologies, using quality-controlled biological samples . Several recently published studies [7, 23, 26] have concluded that, for DNA microarray technology, the reproducibility of technical replicates both within a given platform and across platforms are generally good, especially when the experimental design, protocols, and data analyses are standardized . In this study, we have examined data correspondence and discrepancy between MPSS and one-dye microarray platforms, representing sequencing-based and hybridization-based technologies, respectively. We examined whether and how the behavior of these distinct technologies would vary when challenged with biological replicates.
The main observations made in this study were that across the microarray platforms, both intra- and inter-consistency of data was generally high, but the agreement between MPSS and any microarray platform was moderate, as also reported in previous studies [17, 18]. The differences in signal detection between MPSS and microarray platforms were observed both in terms of lower correlations and in terms of genes that were consistently found as expressed in MPSS but not in microarrays and vice versa. These findings were also identified in a similar study , where this could be the reflection of the known limitations of the MPSS technology . Although mapping problems may have contributed to the observed discrepancies, it is more likely that inherent differences between hybridization-based and sequencing-based technologies caused the systematic differences in gene expression detection between MPSS and microarrays. For a given microarray design, the set of genes that can be detected is pre-determined, while for MPSS and similar technologies, the main limitation is that presence of a recognition site is a requirement for detection. However, when comparing the genes represented on the various microarrays to the genes detected as expressed with MPSS, we found approximately 80% of the genes on the ABI and Illumina arrays had not been found in the MPSS libraries and about half of the genes for the Affymetrix, Amersham and Mergen arrays. This suggests that all the microarray designs were comprehensive with respect to genome coverage, and that the fixed probe sets may not have been a main limitation in this study. Furthermore, in all microarray platforms there were a number of genes that were detected as present but not found in the MPSS libraries. Among the 97 such genes (UniGene IDs) common to all five microarray platforms, there were 11 that were found to be lacking the DpnII recognition site ("GATC") and could not be expected to give any signal in MPSS. For the remaining 86 genes, the expression measurements on the microarrays varied and included genes classified as consistently highly expressed, as well as genes classified as having consistent low expression. An investigation of the probe sequences of the most highly expressed genes in the microarray platforms did not reveal noticeable differences in the number of possible sequence matches between those found in MPSS and those not found in MPSS. It is nevertheless possible that some of the false-positives in microarrays relative to MPSS could be caused by cross-hybridization due to suboptimal probe design. However, without a gold-standard, it is not possible to ascertain that the microarrays are overestimating the expression for these genes. For the genes identified as having lower expression in microarrays but not found in MPSS, it is possible that this may have been caused by running the MPSS experiments with insufficient sampling depths resulting in a less representative sampling of tags. There were also a number of genes that can be regarded as "false-negatives" on the microarrays relative to MPSS, in the sense that they were represented on the microarrays and detected by MPSS but not detected as expressed by the microarrays. For these genes there was less consistency across the platforms and only one gene detected by MPSS was not identified as expressed by any of the microarrays where it had been represented. Again, suboptimal probe design due to incomplete sequence knowledge can be a factor. Other possible reasons include MPSS sequencing errors , high complexity in transcriptional activities , heterogeneity in polyadenylation cleavage sites  and various sequence-introduced biases [30, 31]. We have however not been able to estimate the size of such contributions in this study. It has also been reported that the existence of SNPs  can influence the interpretation of digital-based experimental data such as MPSS or SAGE, but this is not expected to have been a major contributing factor in this study. The fact that the mapping to UniGene IDs had been based on two different versions of the UniGene database is a possible confounding factor. However, this cannot explain all the discrepancies as several genes were manually checked against the latest UniGene build and found to have consistent mappings.
Data filtering is commonly applied both to microarray and MPSS data. Both in the present study and an earlier study, we have shown that data filtering considerably improves data consistency between microarray platforms, and in particular on relative expression (log2ratios). The low intensity signals generally corresponding to low-abundance transcripts are typically filtered as it is expected that the signal-to-noise ratio becomes too small. The comparisons between MPSS and microarrays indicate that the MPSS technology also has problems in detecting low-abundance transcripts. It appears that neither technology can reliably detect transcripts expressed at very low frequencies in an environment where the expression levels of all transcripts could span several orders of magnitude. This is a major problem as many transcripts have low-abundance, and there is currently a consensus that many of the corresponding genes are associated with critical regulatory roles in the cells . For microarrays to improve the low-end sensitivity, the optimization of probe design, as well as the advances of scanning techniques, may be key issues. In sequencing-based systems, increase of library size and sequencing accuracy may increase the confidence in low-abundance transcripts identification.
Recent studies have demonstrated that factors such as strain [34, 35], gender [36, 37], as well as diet [38, 39] and circadian variation,  can influence gene expressions in various organisms and tissues. In our study we created two retina samples by randomized pooling of samples from a large number of individuals aiming to cancel out the effects of such factors. We examined the biological variability at several levels: (1) the influence of biological replicates on the measures of intra- and inter-platform data consistency; (2) the overall capability of differentially expressed gene identification by each biological replicate; and (3) the genes or gene sets that were most susceptible to biological variability between MRP1 and MRP2. In general, the two pools showed very similar gene expression patterns as expected, but for some genes there was a difference between the two mouse retina pools that was consistent across the platforms and technologies. Moreover, MPSS also differed considerably from the microarray platforms in terms of identification of genes differentially expressed in MRP1 versus MRP2. As is commonly done due to high instrumental complexity and cost, MPSS data for each sample was collected without technical replicates, and differentially expressed genes were identified by a statistical approach. The Z-test, found by Man et al.  to perform well in terms of specificity, power, and robustness for determining statistical significance in SAGE, detected far more differentially expressed genes than the statistical tests applied to the microarray data with technical replicates. Also in terms of fold-change, MPSS detected far more genes as having two-fold or larger change than the microarray platforms. In this respect, Illumina data, which also did not have technical replicates, behaved similarly as the other microarray platforms, hence the lack of technical replicates alone cannot explain all of these differences. In light of the construction of the two retina pools and the good agreement between the microarray platforms, this may be a sign of caution for those who intend to use MPSS for the purpose of biomarker discovery without incorporating technical replicates. Until technological advances make technical replicates in MPSS feasible and MPSS data are further studied and confirmed using independent and complementary technologies, MPSS may not be the optimal choice for identification of novel fingerprints based on differential expression.
The GSEA results across the microarray platforms with technical replicates indicated some common biological themes for the differences between MRP1 and MRP2. Although the results were not statistically significant, they were very consistent across the platforms and indicated metabolism and transcriptional regulation as general biological themes describing the differences. This was also confirmed by GOstat  analysis using a list of genes consistently identified as differentially expressed when also including MPSS data (data not shown). Based on the construction of the two retina pools, these results indicate that caution should be exercised when interpreting results of differential expression.
An earlier study , which compared microarray data, EST-based expression experimental data and SAGE using published data sets, reached the conclusion that the agreement between the methods was highly variable from gene to gene, and the authors advocated the need for gene-by-gene validation of important global gene expression measurements using non-global methods. The present results support a similar conclusion, and we emphasize that sequencing-based methods in general as well as hybridization-based methods have inherent technological limitations. Microarrays are limited by pre-selected gene sets and possible cross-hybridization problems, as was indicated in this study. Apart from the obvious limitations of restriction site presence, MPSS is an open-ended system but has problems related to the mapping of tags limiting the set of genes for which it is possible to obtain reliable measurements. Altogether, this suggests that exploitation of the complementarity of these technologies is a better approach for global transcriptome analysis.
Overall, the agreement between MPSS and microarrays was significant, but lower than between different microarray platforms. Measurements of genes with low expression more often disagreed than highly expressed genes, as expected, but also for genes with high expression there were systematic differences in detection. We found that differences in gene expression measurements between MPSS and microarrays are not only due to increased sensitivity of MPSS to low abundance transcripts and the ability of MPSS to measure new transcripts. Further studies comparing sequencing-based and hybridization-based technologies, including both biological replicates using different types of tissue samples as well as technical replicates are warranted in order to delineate in more detail the shortcomings of these technologies. Future methodological development will be necessary to maximize the information derived from the two complementary types of technology.
RNA samples were isolated from three sources: two pools of C57/B6 adult mouse retina (MRP1 and MRP2, n = 700) and Swiss-Webster post-natal day one (P1) mouse cortex (MC) (n = 19). Retinas were dissected, collected and stored in Trizol (one pair of retinas per eppendorf tube) at -80°C prior to pooling. During the RNA extraction process, two pools of adult mouse retina (MRP1, MRP2) were created (700 retinas per pool), aliquoted. All samples were stored at -80°C until being used in experiments. The animal experiments were approved by the Institutional Animal Care Facility at Harvard University.
Microarray platforms, data processing and consistency assessment
Whole-genome mouse gene expression arrays (one-dye oligonucleotide microarrays) were investigated in this study, including: Affymetrix GeneChip®, Amersham (now GE Healthcare) CodeLink®, Mergen ExpressChip®, Applied Biosystems (ABI) microarrays, and Illumina BeadArray®. Microarray experiments are composed of sample preparation, hybridization, scanning and image quantitation, which are a series of integrative procedures being conducted at a laboratory, generally according to the manufacturer's recommended protocols. To obtain sufficient statistical confidence in the data analysis, for each biological replicate (MRP1 and MRP2), five technical replicates on each platform were obtained, with an exception on Illumina. We wanted to include Illumina in this study to examine the magnitude of data variation in MPSS experiments and microarray experiments when no technical replicates are performed. For details of the experimental protocols and laboratories, we refer to Kuo et al.,  except for Illumina, which can be found in the Additional file 4.
The raw data sets of 63 chips after image scanning and quantification in each platform were collected. For Illumina data, we set the filtering threshold as "Detections" = 0.9. Filtering for the other microarray platforms are described in Kuo et al. . We also performed percentile transformation of intensities, quantiles normalization and log2ratio calculation, as described.
Data repeatability and reproducibility  are two important aspects of microarray data consistency assessment. In this study, the former will refer to the degree of data variations among technical replicates of a platform, and the latter will refer to data agreement across different microarray platforms when using the same biological samples. Two popularly used metics, coefficient of variations (CV) among replicated measurements per gene and correlation coefficient (Pearson and Spearman correlations) between any pair of replicated experiments, were adopted to assess repeatability and reproducibility. For intra-platform data consistency, the mean and standard deviation of CVs or correlation coefficients were used as summations of each platform's performance. For inter-platform data agreement, either the mean (for normalized log2ratios) or the median after percentile transformation (for intensities) of repeated measurements on each platform were used in calculating correlation coefficients.
MPSS experiment and data processing
Total RNA of MRP1 and MRP2, which were identical to those used in microarray experiments, was sent to Lynx Therapeutics, Inc. (now Illumina, Hayward, CA) for 'Signature'-based MPSS experiments. Following an RNA quality test on a Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA), cDNA libraries were generated according to the Megaclone protocol [5, 43]. Signatures adjacent to poly (A) proximal DpnII restriction sites ("GATC") were cloned into a Megaclone vector. The resulting library was amplified and yielded about 1.6 million loaded microbeads, which were loaded onto a flow cell. Thereafter, an iterative series of enzymatic reactions decoded the signatures as 17-bp or 20-bp sequences (including DpnII recognition sites "GATC") .
The abundance of each signature was converted to transcripts per million (tpm), and the MPSS signatures were mapped to UniGene clusters by Lynx Therapeutics, based on the mouse genome sequence (Release #3, Feb 2003)  and the mouse UniGene sequences (UniGene Build #122) . Briefly, the mapping procedure included: extraction of 'virtual' signatures from genomic sequences, classification of 'virtual' signatures from genomic sequences, and matching of MPSS expressed signatures to genomic signatures . For the comparison with microarray data, we included only the reliable signatures which were located closer to polyadenylation signal or poly(A) tail on a mRNA sequences with known orientation information . If a UniGene cluster was found to be corresponding to multiple signatures in a given library, all tag counts were pooled to obtain the abundance of the UniGene cluster. If a tag was found to map to multiple UniGene clusters, the corresponding tag count was discarded.
Gene mapping among microarray platforms and between microarray and MPSS
Two approaches to match probes across different microarray chips, annotation-based and sequence-based probe matching were used . Briefly, by the annotation-based approach, we obtained UniGene (UG) and LocusLink (LL) based matching, whereas probe matches at the RefSeq (RS) and RefSeq-exon (RSEXON) levels by utilizing actual sequence information belong to the latter.
MPSS signatures were mapped to UniGene clusters, using an in silico constructed "virtual tags" library, as described above. Thus, the gene expression data measured by microarrays and by MPSS were paired up for comparisons via UniGene clusters.
Biological variations and technical variations
Two separate total mRNA extraction processes were conducted on mouse retina under the same experimental settings and protocols, which generated mouse retina pool 1 (MRP1) and pool 2 (MRP2). Note that Illumina data were not included in this part of analyses due to lack of technical replicates.
Two-way ANOVA to identify individual genes of significant variation
To investigate the effects of biological variation and platform variation, two-way ANOVA (Analysis of Variance) was performed. As concluded in our previous study , the sequence-based cross-platform probe matching is more reliable than the annotation-based probe matching. Therefore, RS- and RSEXON-based mapping were used for this evaluation. For a given set of transcripts (RSs or RSEXONs) that were reliably detected in all chips of all platforms, the significances of sample-dependent bias, platform-dependent bias and interaction between sample and platform were determined for each transcript. Thus, we were able to observe the gene-specific effects of sample and platform biases.
SAM to characterize biological replicates' behavior in detecting expression changes
A group of differentially expressed genes represents the desired result of most microarray users. SAM (Significance Analysis of Microarray), proposed by Tusher et al.,  is a method that can determine the significance of gene expression changes by permuting replicated measurements followed by an estimation of the false discovery rate (FDR). SAM assesses both the sensitivity and specificity of a microarray platform.
In our study, for each platform, the five pairs of chips on which MRP1 and MC were hybridized respectively were considered as one experiment group, while similarly another experiment group consisted of MRP2 and MC data. Prior to SAM analysis, the normalized log2ratios underwent two sequential filtering steps: (1) filtering according to spot quality flags; and (2) filtering out the probes which had less than three valid measurements out of the five replicates. For each microarray platform and each experiment group, the number of "called" genes (as differentially expressed) and FDR were recorded for every threshold "delta", which steps from 0.1 to 4 at an interval of 0.1.
GSEA to summarize differences between biological replicates using biological themes
GSEA (Gene Set Enrichment Analysis) [49, 50] combines functional annotations and a statistical mechanism to determine a few sets of genes, each with a biological theme, which are over-/under-represented between two data sets representing two different classes. Instead of looking at per individual gene, GSEA is focused on gene sets pre-defined according to Gene Ontology (GO) category, pathway, or localization, etc, whose results have more robust and explicit biological interpretation. We used GSEA, a web-based application provided by Babelomics , to identify which biological process(es) and molecular function(s) were the most susceptible to the random differences between MRP1 and MRP2. The GSEA results of each platform were also compared.
One-way ANOVA analysis of microarray replicates and Z-test of MPSS runs
One-way ANOVA was conducted for each microarray platform (10 experiments: five for MRP1, five for MRP2). One-way ANOVA uses those genes that have valid measurements (filtered intensities after normalization) across all experiments as input, and assigns a p-value to each gene indicating whether this specific gene displayed significantly varied expression levels between two biological replicates.
MPSS analyses were also performed on the two pools of retina sample independently. Due to its high cost, it is not feasible to conduct repeated MPSS experiments for each biological sample. The common practice for those who use MPSS to identify differentially expressed genes has been to apply statistical tests that are able to handle non-replicate data based on certain sampling assumptions. The Z-statistics test  was applied to identify genes that were significantly differentially expressed between MRP1 and MRP2.
The microarray data and MPSS data for the manuscript has been submitted to the GEO Omnibus. The series record number is GSE6313.
We would like to thank Applied Biosystems, GE Healthcare, Mergen, and Lynx (now Illumina, Inc.) for providing and running the experiments as part of this large scale study. WPK was supported by the National Institutes of Health (NIH) EY014466 grant, the Bioinformatics Division of the Harvard Center for Neurodegeneration and Repair and Harvard School of Dental Medicine Dean Scholars award. CLC was supported by the Howard Hughes Medical Institute. FL and EH were supported by the functional genomics program (FUGE) in the Research council of Norway.
- Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ: High density synthetic oligonucleotide arrays. Nat Genet. 1999, 21 (1 Suppl): 20-24. 10.1038/4447.PubMedView ArticleGoogle Scholar
- Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996, 14 (13): 1675-1680. 10.1038/nbt1296-1675.PubMedView ArticleGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270 (5235): 467-470. 10.1126/science.270.5235.467.PubMedView ArticleGoogle Scholar
- Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science. 1995, 270 (5235): 484-487. 10.1126/science.270.5235.484.PubMedView ArticleGoogle Scholar
- Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000, 18 (6): 630-634. 10.1038/76469.PubMedView ArticleGoogle Scholar
- Kuo WP, Liu F, Trimarchi J, Punzo C, Lombardi M, Sarang J, Whipple ME, Maysuria M, Serikawa K, Lee SY, McCrann D, Kang J, Shearstone JR, Burke J, Park DJ, Wang X, Rector TL, Ricciardi-Castagnoli P, Perrin S, Choi S, Bumgarner R, Kim JH, Short GF, Freeman MW, Seed B, Jensen R, Church GM, Hovig E, Cepko CL, Park P, Ohno-Machado L, Jenssen TK: A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat Biotechnol. 2006, 24 (7): 832-840. 10.1038/nbt1217.PubMedView ArticleGoogle Scholar
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Scherf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, Leclerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24 (9): 1151-1161. 10.1038/nbt1239.PubMedView ArticleGoogle Scholar
- Evans SJ, Datson NA, Kabbaj M, Thompson RC, Vreugdenhil E, De Kloet ER, Watson SJ, Akil H: Evaluation of Affymetrix Gene Chip sensitivity in rat hippocampal tissue using SAGE analysis. Serial Analysis of Gene Expression. Eur J Neurosci. 2002, 16 (3): 409-413. 10.1046/j.1460-9568.2002.02097.x.PubMedView ArticleGoogle Scholar
- Iacobuzio-Donahue CA, Ashfaq R, Maitra A, Adsay NV, Shen-Ong GL, Berg K, Hollingsworth MA, Cameron JL, Yeo CJ, Kern SE, Goggins M, Hruban RH: Highly expressed genes in pancreatic ductal adenocarcinomas: a comprehensive characterization and comparison of the transcription profiles obtained from three major technologies. Cancer Res. 2003, 63 (24): 8614-8622.PubMedGoogle Scholar
- Ibrahim AF, Hedley PE, Cardle L, Kruger W, Marshall DF, Muehlbauer GJ, Waugh R: A comparative analysis of transcript abundance using SAGE and Affymetrix arrays. Funct Integr Genomics. 2005, 5 (3): 163-174. 10.1007/s10142-005-0135-4.PubMedView ArticleGoogle Scholar
- Ishii M, Hashimoto S, Tsutsumi S, Wada Y, Matsushima K, Kodama T, Aburatani H: Direct comparison of GeneChip and SAGE on the quantitative accuracy in transcript profiling analysis. Genomics. 2000, 68 (2): 136-143. 10.1006/geno.2000.6284.PubMedView ArticleGoogle Scholar
- Jung SH, Lee JY, Lee DH: Use of SAGE technology to reveal changes in gene expression in Arabidopsis leaves undergoing cold stress. Plant Mol Biol. 2003, 52 (3): 553-567. 10.1023/A:1024866716987.PubMedView ArticleGoogle Scholar
- Kim HL: Comparison of oligonucleotide-microarray and serial analysis of gene expression (SAGE) in transcript profiling analysis of megakaryocytes derived from CD34+ cells. Exp Mol Med. 2003, 35 (5): 460-466.PubMedView ArticleGoogle Scholar
- Nacht M, Ferguson AT, Zhang W, Petroziello JM, Cook BP, Gao YH, Maguire S, Riley D, Coppola G, Landes GM, Madden SL, Sukumar S: Combining serial analysis of gene expression and array technologies to identify genes differentially expressed in breast cancer. Cancer Res. 1999, 59 (21): 5464-5470.PubMedGoogle Scholar
- van Ruissen F, Ruijter JM, Schaaf GJ, Asgharnegad L, Zwijnenburg DA, Kool M, Baas F: Evaluation of the similarity of gene expression data estimated with SAGE and Affymetrix GeneChips. BMC Genomics. 2005, 6: 91-10.1186/1471-2164-6-91.PubMed CentralPubMedView ArticleGoogle Scholar
- Bhattacharya B, Cai J, Luo Y, Miura T, Mejido J, Brimble SN, Zeng X, Schulz TC, Rao MS, Puri RK: Comparison of the gene expression profile of undifferentiated human embryonic stem cell lines and differentiating embryoid bodies. BMC Dev Biol. 2005, 5: 22-10.1186/1471-213X-5-22.PubMed CentralPubMedView ArticleGoogle Scholar
- Grigoriadis A, Mackay A, Reis-Filho JS, Steele D, Iseli C, Stevenson BJ, Jongeneel CV, Valgeirsson H, Fenwick K, Iravani M, Leao M, Simpson AJ, Strausberg RL, Jat PS, Ashworth A, Neville AM, O'Hare MJ: Establishment of the epithelial-specific transcriptome of normal and malignant human breast cells based on MPSS and array expression data. Breast Cancer Res. 2006, 8 (5): R56-10.1186/bcr1604.PubMed CentralPubMedView ArticleGoogle Scholar
- Oudes AJ, Roach JC, Walashek LS, Eichner LJ, True LD, Vessella RL, Liu AY: Application of Affymetrix array and Massively Parallel Signature Sequencing for identification of genes involved in prostate cancer progression. BMC Cancer. 2005, 5: 86-10.1186/1471-2407-5-86.PubMed CentralPubMedView ArticleGoogle Scholar
- Stolovitzky GA, Kundaje A, Held GA, Duggar KH, Haudenschild CD, Zhou D, Vasicek TJ, Smith KD, Aderem A, Roach JC: Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression. Proc Natl Acad Sci U S A. 2005, 102 (5): 1402-1407. 10.1073/pnas.0406555102.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu Y, Shin S, Zeng X, Zhan M, Gonzalez R, Mueller FJ, Schwartz CM, Xue H, Li H, Baker SC, Chudin E, Barker DL, McDaniel TK, Oeser S, Loring JF, Mattson MP, Rao MS: Genome wide profiling of human embryonic stem cells (hESCs), their derivatives and embryonal carcinoma cells to develop base profiles of U.S. Federal government approved hESC lines. BMC Dev Biol. 2006, 6: 20-10.1186/1471-213X-6-20.PubMed CentralPubMedView ArticleGoogle Scholar
- Chuaqui RF, Bonner RF, Best CJ, Gillespie JW, Flaig MJ, Hewitt SM, Phillips JL, Krizman DB, Tangrea MA, Ahram M, Linehan WM, Knezevic V, Emmert-Buck MR: Post-analysis follow-up and validation of microarray experiments. Nat Genet. 2002, 32 Suppl: 509-514. 10.1038/ng1034.PubMedView ArticleGoogle Scholar
- Man MZ, Wang X, Wang Y: POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics. 2000, 16 (11): 953-959. 10.1093/bioinformatics/16.11.953.PubMedView ArticleGoogle Scholar
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005, 2 (5): 345-350. 10.1038/nmeth756.PubMedView ArticleGoogle Scholar
- Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton H, Conley MP, Elespuru R, Fero M, Foy C, Fuscoe J, Gao X, Gerhold DL, Gilles P, Goodsaid F, Guo X, Hackett J, Hockett RD, Ikonomi P, Irizarry RA, Kawasaki ES, Kaysser-Kranich T, Kerr K, Kiser G, Koch WH, Lee KY, Liu C, Liu ZL, Lucas A, Manohar CF, Miyada G, Modrusan Z, Parkes H, Puri RK, Reid L, Ryder TB, Salit M, Samaha RR, Scherf U, Sendera TJ, Setterquist RA, Shi L, Shippy R, Soriano JV, Wagar EA, Warrington JA, Williams M, Wilmer F, Wilson M, Wolber PK, Wu X, Zadro R: The External RNA Controls Consortium: a progress report. Nat Methods. 2005, 2 (10): 731-734. 10.1038/nmeth1005-731.PubMedView ArticleGoogle Scholar
- Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001, 29 (4): 365-371. 10.1038/ng1201-365.PubMedView ArticleGoogle Scholar
- Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J: Independence and reproducibility across microarray platforms. Nat Methods. 2005, 2 (5): 337-344. 10.1038/nmeth757.PubMedView ArticleGoogle Scholar
- Jongeneel CV, Delorenzi M, Iseli C, Zhou D, Haudenschild CD, Khrebtukova I, Kuznetsov D, Stevenson BJ, Strausberg RL, Simpson AJ, Vasicek TJ: An atlas of human gene expression from massively parallel signature sequencing (MPSS). Genome Res. 2005, 15 (7): 1007-1014. 10.1101/gr.4041005.PubMed CentralPubMedView ArticleGoogle Scholar
- Iseli C, Stevenson BJ, de Souza SJ, Samaia HB, Camargo AA, Buetow KH, Strausberg RL, Simpson AJ, Bucher P, Jongeneel CV: Long-range heterogeneity at the 3' ends of human mRNAs. Genome Res. 2002, 12 (7): 1068-1074. 10.1101/gr.62002. Article published online before print in June 2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Pauws E, van Kampen AH, van de Graaf SA, de Vijlder JJ, Ris-Stalpers C: Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis. Nucleic Acids Res. 2001, 29 (8): 1690-1694. 10.1093/nar/29.8.1690.PubMed CentralPubMedView ArticleGoogle Scholar
- Margulies EH, Kardia SL, Innis JW: Identification and prevention of a GC content bias in SAGE libraries. Nucleic Acids Res. 2001, 29 (12): E60-0. 10.1093/nar/29.12.e60.PubMed CentralPubMedView ArticleGoogle Scholar
- Siddiqui AS, Delaney AD, Schnerch A, Griffith OL, Jones SJ, Marra MA: Sequence biases in large scale gene expression profiling data. Nucleic Acids Res. 2006, 34 (12): e83-10.1093/nar/gkl404.PubMed CentralPubMedView ArticleGoogle Scholar
- Silva AP, De Souza JE, Galante PA, Riggins GJ, De Souza SJ, Camargo AA: The impact of SNPs on the interpretation of SAGE and MPSS experimental data. Nucleic Acids Res. 2004, 32 (20): 6104-6110. 10.1093/nar/gkh937.PubMed CentralPubMedView ArticleGoogle Scholar
- Holland MJ: Transcript abundance in yeast varies over six orders of magnitude. J Biol Chem. 2002, 277 (17): 14363-14366. 10.1074/jbc.C200101200.PubMedView ArticleGoogle Scholar
- Seidel SD, Hung SC, Kan HL, Gollapudi BB: Background gene expression in rat kidney: influence of strain, gender, and diet. Toxicol Sci. 2006, 94: 226-233. 10.1093/toxsci/kfl082.PubMedView ArticleGoogle Scholar
- Wang YH, Byrne KA, Reverter A, Harper GS, Taniguchi M, McWilliam SM, Mannen H, Oyama K, Lehnert SA: Transcriptional profiling of skeletal muscle tissue from two breeds of cattle. Mamm Genome. 2005, 16 (3): 201-210. 10.1007/s00335-004-2419-8.PubMedView ArticleGoogle Scholar
- Richards SM, Jensen RV, Liu M, Sullivan BD, Lombardi MJ, Rowley P, Schirra F, Treister NS, Suzuki T, Steagall RJ, Yamagami H, Sullivan DA: Influence of sex on gene expression in the mouse lacrimal gland. Exp Eye Res. 2006, 82 (1): 13-23. 10.1016/j.exer.2005.04.014.PubMedView ArticleGoogle Scholar
- Richards SM, Yamagami H, Schirra F, Suzuki T, Jensen RV, Sullivan DA: Sex-related effect on gene expression in the mouse meibomian gland. Curr Eye Res. 2006, 31 (2): 119-128. 10.1080/02713680500514644.PubMedView ArticleGoogle Scholar
- Dhahbi JM, Kim HJ, Mote PL, Beaver RJ, Spindler SR: Temporal linkage between the phenotypic and genomic responses to caloric restriction. Proc Natl Acad Sci U S A. 2004, 101 (15): 5524-5529. 10.1073/pnas.0305300101.PubMed CentralPubMedView ArticleGoogle Scholar
- Sreekumar R, Unnikrishnan J, Fu A, Nygren J, Short KR, Schimke J, Barazzoni R, Nair KS: Effects of caloric restriction on mitochondrial function and gene transcripts in rat muscle. Am J Physiol Endocrinol Metab. 2002, 283 (1): E38-43.PubMedView ArticleGoogle Scholar
- Tosini G, Chaurasia SS, Michael Iuvone P: Regulation of arylalkylamine N-acetyltransferase (AANAT) in the retina. Chronobiol Int. 2006, 23 (1-2): 381-391. 10.1080/07420520500482066.PubMedView ArticleGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20 (9): 1464-1465. 10.1093/bioinformatics/bth088.PubMedView ArticleGoogle Scholar
- Haverty PM, Hsiao LL, Gullans SR, Hansen U, Weng Z: Limited agreement among three global gene expression methods highlights the requirement for non-global validation. Bioinformatics. 2004, 20 (18): 3431-3441. 10.1093/bioinformatics/bth421.PubMedView ArticleGoogle Scholar
- Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, DuBridge RB, Burcham T, Albrecht G: In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc Natl Acad Sci U S A. 2000, 97 (4): 1665-1670. 10.1073/pnas.97.4.1665.PubMed CentralPubMedView ArticleGoogle Scholar
- Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S: The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res. 2004, 14 (8): 1641-1653. 10.1101/gr.2275604.PubMed CentralPubMedView ArticleGoogle Scholar
- UCSC GoldenPath Genome Browser. [http://hgdownload.cse.ucsc.edu/downloads.html]
- UniGene. [ftp://ftp.ncbi.nih.gov/repository/UniGene/]
- Chen J, Rattray M: Analysis of tag-position bias in MPSS technology. BMC Genomics. 2006, 7: 77-10.1186/1471-2164-7-77.PubMed CentralPubMedView ArticleGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001, 98 (9): 5116-5121. 10.1073/pnas.091062498.PubMed CentralPubMedView ArticleGoogle Scholar
- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003, 34 (3): 267-273. 10.1038/ng1180.PubMedView ArticleGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.PubMed CentralPubMedView ArticleGoogle Scholar
- Al-Shahrour F, Minguez P, Tarraga J, Montaner D, Alloza E, Vaquerizas JM, Conde L, Blaschke C, Vera J, Dopazo J: BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res. 2006, 34 (Web Server issue): W472-6. 10.1093/nar/gkl172.PubMed CentralPubMedView ArticleGoogle Scholar