The use of TransCount to retrieve absolute units from oligoarray data in our study enabled a quantitative comparison of transcript concentrations across MPSS, SAGE, and spotted oligoarrays. Although several studies have compared the performance of tag-based and hybridization-based gene expression platforms [4, 12–19], our focus on a common measurement unit has to our knowledge not received detailed attention so far. Previous comparisons involving microarrays have utilized the signal intensities [4, 12–19]. The intensities of in situ-synthesized oligoarrays may possibly reflect the transcript concentration reasonably well, but intensities are not suitable when spotted arrays are used and not directly comparable across experiments and platforms. By our approach, the numbers of transcripts per cell were calculated genome-wide for all three technologies. These values could be compared directly across the technologies, and a thorough validation of oligoarrays for quantitative exploration of the transcriptome could be performed.
Our study did not allow for a general evaluation of the transcriptome coverage of each platform, since differences in the sampling depth among the technologies would bias the outcome. Given our sampling depths of 1.6 million and 55,000, which are commonly used in MPSS and SAGE experiments respectively, only about 60% (MPSS) and 10% (SAGE) of the transcripts with 1–5 copies per cell are expected to be detected . Hence, to identify 90% of the expressed genes, sequencing of about three million tags is probably required , leading to a significant increase in the costs of these experiments. In contrast, transcriptome coverage in the oligoarray data was more explicitly defined and easier to be ensured by performing four replicate experiments. Rapid advances in the development of next generation sequencing technologies may eventually fill this gap by allowing for significant improvements in the sampling depth at dramatically reduced cost and time. On the other hand, the aim of this study was to validate the quantitative potential of the oligoarrays. We therefore focused primarily on the subset of genes detected by all three technologies, through a stringent mapping of MPSS and SAGE tags to known genes, with a further limit to those also present in the oligoarray design. This explains why the number of unique transcripts detected was lower for the tag-based techniques than for the oligoarrays. Although the increase in sampling depth of tag-based technologies may considerably facilitate better transcriptome coverage, it is not a crucial concern in our study.
The oligoarray estimates showed a stronger relationship to the MPSS and SAGE data than that previously reported for spotted oligoarrays , possibly because we used absolute transcript concentrations and not intensities in the analysis. Hence, we have previously shown that the absolute transcript concentrations derived from TransCount is more strongly correlated to qRT-PCR data than are the relative values achieved from traditional microarray analysis, suggesting that they are more reliable measures of the transcript abundance . Otherwise, our results, including the particularly poor correlation at low concentrations, were in agreement with earlier reports [12–19]. A correlation coefficient of about 0.50–0.60 therefore probably reflects the overall consistency across the technologies when genes at all expression levels are included. The correlations involving the oligoarray data were not weaker than, but, similar to the correlation between the MPSS and SAGE data, both when the entire concentration range was considered and at high concentrations. Inherent technological differences in the detection and quantification processes between the techniques probably caused some inconsistency between the data sets. Disadvantages related to the respective technologies, such as cross-hybridization, sampling variances, and tag annotation ambiguities may have contributed [18, 19, 21]. The discrepancy was therefore probably caused by erroneous measurements in all data sets.
The correlations involving the SAGE data may have been influenced by the use of another RNA pool in these experiments than in the MPSS and oligoarray experiments. Mouse retina generally shows low variability in gene expression, minimizing possible confounding effects caused by differences in the RNA pools. Hence, in a recent study we showed that data variation introduced by biological replicates of the mouse retina is small compared to the variation caused by using different technologies . Moreover, the correlation to oligoarray data was somewhat stronger for the SAGE than the MPSS data. The use of another RNA pool for the SAGE experiments had therefore probably minor influence on our results.
The number of transcripts per cell was considerably higher based on oligoarrays and TransCount than based on MPSS and SAGE, both at high and low concentrations. The difference in the absolute scale of the measurements depends on the values used for the sum of all transcripts and the total RNA content per cell in the MPSS/SAGE and oligoarray calculations, respectively. Our oligoarray results suggest that the maximum number of transcripts may exceed one million, which is more than two-fold higher than the 5·105 reported in a study from 1976  that was used in our MPSS and SAGE estimations. Adjusting this number to 1.5·106 transcript per cell would have led to MPSS and SAGE values more comparable to the absolute oligoarray data. Consequently, the calculated transcript detection efficiency will also be more similar for the three technologies. Hence, the transcript detection efficiency seemed to be considerable higher for the oligoarrays when the value of 5·105 transcript per cell was used in the MPSS and SAGE estimations. More recent studies exploring the number of transcripts in cells have not been performed, except for a microarray study where spike-in controls were used to define a standard curve, which related signal intensity to the absolute transcript numbers . The transcript values were two- to three-fold lower than ours, but the apparent discrepancy was solely due to the use of a highly conservative value of 2–3 pg total RNA per cell in their calculations. Our estimate of 10 pg per cell is within the range of previously reported data [23, 24], and probably closer to the true value. The findings reported in the other microarray study  are therefore in agreement with our results. Although the error range of the oligoarray estimates is relatively large  and the data may be somewhat overestimated, due to possible unspecific binding to array probes  and experimental uncertainty in the scaling of the absolute values , these findings strongly suggest that the value of 5·105 should be re-examined and probably elevated. MPSS and SAGE would then be found to have a weaker coverage than previously anticipated.
The increased estimate for the sum of all transcripts per cell based on oligoarrays was also reflected in a higher number of transcripts per cell and gene, as compared to the MPSS and SAGE results. A number of about 10,000 transcripts was estimated for several genes, and numbers above 5,000 were found for 15 genes, when considering all the detected transcripts. In contrast, only three genes had a transcript number above 5,000 by MPSS, whereas by SAGE the highest number was 3,240. More than 10,000 transcripts per cell have been reported for individual genes and gene groups in several studies on mouse tissues [20, 26–28], consistent with our oligoarray data. Moreover, TransCount estimations for cervical cancers based on cDNA microarrays led to values in agreement with the present oligoarray results . These observations further question the validity of the total transcript number of 5·105 per cell that was used in the MPSS and SAGE calculations. In that respect, a recent SAGE study showed that using a total number of 1·106 transcripts per cell to convert the tag counts led to absolute transcript numbers consistent with the published values mentioned above , supporting our findings.
A thorough evaluation of genes expressed in adult mouse retina and their putative function have been presented in previous studies based on the SAGE data [29, 30]. Here, we focused on 40 genes with particularly high or low expression regardless of technology, suggesting that these are truly up- or downregulated compared to the average expression level. The most abundant transcripts are known to be involved in visual perception (Pdc, Rbp3, Guca1a, Unc119, Guca1b, Pde6b) or play another role in retinal function (Calm1, Syp) . Moreover, high expression of Bsg, Plekhb1, Reep6, and Stxbp1 has been reported in the retina, photoreceptors, and/or eye [31–33]. Our findings are therefore consistent with previous reports and point to more genes that may be explored to increase our understanding of retinal function, like Ubb and Vamp2. The data also support the hypothesis that the most abundant transcripts are tissue specific and involved in specialized functions, whereas the larger number of less abundant transcripts may be involved in housekeeping activities and shared between tissues, as suggested from studies on the mouse liver .