Validation of oligoarrays for quantitative exploration of the transcriptome.

BACKGROUND
Oligoarrays have become an accessible technique for exploring the transcriptome, but it is presently unclear how absolute transcript data from this technique compare to the data achieved with tag-based quantitative techniques, such as massively parallel signature sequencing (MPSS) and serial analysis of gene expression (SAGE). By use of the TransCount method we calculated absolute transcript concentrations from spotted oligoarray intensities, enabling direct comparisons with tag counts obtained with MPSS and SAGE. The tag counts were converted to number of transcripts per cell by assuming that the sum of all transcripts in a single cell was 5.105. Our aim was to investigate whether the less resource demanding and more widespread oligoarray technique could provide data that were correlated to and had the same absolute scale as those obtained with MPSS and SAGE.


RESULTS
A number of 1,777 unique transcripts were detected in common for the three technologies and served as the basis for our analyses. The correlations involving the oligoarray data were not weaker than, but, similar to the correlation between the MPSS and SAGE data, both when the entire concentration range was considered and at high concentrations. The data sets were more strongly correlated at high transcript concentrations than at low concentrations. On an absolute scale, the number of transcripts per cell and gene was generally higher based on oligoarrays than on MPSS and SAGE, and ranged from 1.6 to 9,705 for the 1,777 overlapping genes. The MPSS data were on same scale as the SAGE data, ranging from 0.5 to 3,180 (MPSS) and 9 to1,268 (SAGE) transcripts per cell and gene. The sum of all transcripts per cell for these genes was 3.8.105 (oligoarrays), 1.1.105 (MPSS) and 7.6.104 (SAGE), whereas the corresponding sum for all detected transcripts was 1.1.106 (oligoarrays), 2.8.105 (MPSS) and 3.8.105 (SAGE).


CONCLUSION
The oligoarrays and TransCount provide quantitative transcript concentrations that are correlated to MPSS and SAGE data, but, the absolute scale of the measurements differs across the technologies. The discrepancy questions whether the sum of all transcripts within a single cell might be higher than the number of 5.105 suggested in the literature and used to convert tag counts to transcripts per cell. If so, this may explain the apparent higher transcript detection efficiency of the oligoarrays, and has to be clarified before absolute transcript concentrations can be interchanged across the technologies. The ability to obtain transcript concentrations from oligoarrays opens up the possibility of efficient generation of universal transcript databases with low resource demands.


Conclusion:
The oligoarrays and TransCount provide quantitative transcript concentrations that are correlated to MPSS and SAGE data, but, the absolute scale of the measurements differs across the technologies. The discrepancy questions whether the sum of all transcripts within a single cell might be higher than the number of 5·10 5 suggested in the literature and used to convert tag counts to transcripts per cell. If so, this may explain the apparent higher transcript detection efficiency of the oligoarrays, and has to be clarified before absolute transcript concentrations can be interchanged across the technologies. The ability to obtain transcript concentrations from oligoarrays opens up the possibility of efficient generation of universal transcript databases with low resource demands.

Background
Genomic advances, particularly in sequencing projects, have fueled the progressive development of high throughput technologies for measurement of transcript abundance. The most frequently used techniques are the gene expression microarrays [1], serial analysis of gene expression (SAGE) [2], and massively parallel signature sequencing (MPSS) [3]. There are weaknesses and strengths associated with each of the technologies, and the choice of method depends on the problem to be solved. MPSS and SAGE rely on open-based sampling of transcripts, allowing for the identification of novel transcribed sequences. The complexity of these methods has, however, limited their utility. The less resource demanding and more routinely used microarray platform is a hybridization-based, closed system where the transcript information is restricted to pre-selected probes immobilized on the array [1]. The technologies complement each other and are useful for different purposes, implying that the ability to interchange data across them can be of high value [4]. Hence, large amounts of data that are generated with these techniques and accumulated in publicly available repositories could potentially be merged to create transcript databases of various tissues and used for validation and meta-study purposes. However, before the repositories can be fully utilized in this way, the consistency in the measurements on an absolute scale has to be verified. To our knowledge this has not been done so far, probably due to the lack of a common measurement unit that enables direct comparisons across the technologies. SAGE and MPSS provide absolute transcript abundance through transcript sampling, sequencing, and identification. They both identify and quantify the transcripts through the generation of short sequence tags (10-22 bp) from the mRNA molecules and present the data as tag counts, facilitating comparisons across these techniques. Tags generated by SAGE are concatemerized and cloned into vectors for conventional dideoxy-sequencing [2]. In MPSS the tags are amplified, loaded onto a microbead library, and immobilized in a flow cell for automated highly-parallel sequencing [3]. Quantification is obtained by counting the frequency of each of the tag sequences in the library, followed by a mapping procedure, which annotates tag to gene. An advantage of MPSS compared to SAGE is the larger library size obtainable when using the same number of sequencing runs [5]. Moreover, the MPSS tags are generally longer, conferring higher specificity with respect to tag annotation.
The microarray technique uses the signal intensity of each array probe as a measure of the mRNA level [1]. There are three major platforms, spotted cDNA arrays, spotted oligoarrays, and in situ-synthesized oligoarrays [6]. The data from the spotted arrays are generally presented as the intensity ratio between two samples hybridized together, whereas for in situ-synthesized oligoarrays the intensities per se have also been used. A fairly good concordance between the data achieved from the different platforms has been demonstrated [7,8]. However, the relative quantification format makes a direct comparison of microarray data with results from other techniques difficult.
Methods to estimate absolute transcript concentrations from microarray data that may be useful for comparisons across technologies have been proposed [9,10]. The TransCount method developed by Frigessi et al. [10] is based on Bayesian statistical modeling and utilizes covariates of the microarray experiment to calculate the concentration from the signal intensity of each probe on spotted arrays. The concentration estimate seems to be a more reliable measure of the transcript abundance than the expression ratio usually derived [10]. We have previously applied the method to determine the number of transcripts needed prior to mRNA amplification to obtain reliable expression data from a limited sample quantity [11]. The use of TransCount and the absolute transcript concentrations provides a unique opportunity to explore the con-sistency in the data achieved with spotted microarrays, MPSS, and SAGE on an absolute scale.
In a recent study we explored the correlations between microarray and MPSS data on a relative scale [4]. The limitations inherent to these technologies were discussed as reasons for the reduced consistency across the technologies compared to within the different hybridization-based techniques. In the present work, we have applied TransCount and estimated transcript concentrations from oligoarray intensities of adult mouse retina, in order to investigate whether the spotted oligoarray technique could provide data that correlated with and had the same scale as those obtained with MPSS and SAGE. We performed a direct comparison between the data sets by converting the transcript concentrations and tag counts to the number of transcripts per cell for each individual gene. Our data suggest that the oligoarrays and TransCount can be used as a substitute to the tag-based techniques for a quantitative exploration of the transcriptome provided that a discrepancy in the absolute scale across the technologies is clarified.

Transcript concentrations in adult mouse retina
Oligoarray transcript data were achieved for 14,045 of the 14,076 array probes, showing signal intensities within the whole detection range. The probes were mapped to 9,786 unique UniGene identification numbers (IDs). Transcripts were detected for all UniGene IDs, with concentration ranging over five orders of magnitude (Table 1, Figure  1A). The total transcript concentration for the 9,786 genes was estimated to 1.1·10 11 transcripts per μg total RNA, corresponding to 1.1·10 6 transcripts per cell and an average value of 112 (range 0.3 -14,387) transcripts per cell and gene (Table 1).
MPSS had 6,572 signatures that were reliably mapped to UniGene IDs, out of a total of 34,341 unique tags detected in our library. Among the tags filtered out, 647 were suspected to be in repeated regions, 6,044 had hits on the genomic sequence, but not within transcripts, 5,415 hit the reverse strand, and 14,432 hit the transcripts either without known orientation or without annotated poly(A) tail or polyadenylation signal. There was also a remaining small fraction (4.4%) of signatures that produced no sequence match, most likely attributable to sequencing errors. The 6,572 reliable signatures were mapped to 6,088 unique UniGene IDs with a tag count ranging over four orders of magnitude (Table 1, Figure 1B). The sum of tag counts for all genes was 5.6·10 5 tpm, leading to 2.8·10 5 transcripts per cell and on average 46 (range 0.5 -11,004) transcripts per cell and gene (Table 1).
Our SAGE library contained 12,588 unique tags, which were mapped to 4,827 unique UniGene IDs with a tag count ranging over less than three orders of magnitude (Table 1, Figure 1C). A total of 999 of these were UniGene clusters and included more than one tag. The sum of tag counts for 4,827 genes was 7.6·10 5 tpm, the number of transcripts per cell was 3.8·10 5 , and the average number of transcripts per cell and gene was 38 (range 9 -3,240) ( Table 1).  1 Mapped to UniGene identifiers. 2 Sum for all genes. 3 10 pg total RNA per cell was assumed for oligoarrays, and a total number of 5·10 5 transcripts per cell was applied for SAGE and MPSS in the calculation. 4 Average and range for all genes.

Cross-platform correlations
The three data sets were matched pair-wise according to the UniGene IDs. The number of genes in common for oligoarrays and MPSS was 3,192, while 2,536 and 3,328 genes overlapped between oligoarrays and SAGE, and between MPSS and SAGE, respectively ( Figure 2). A subset of 1,777 genes was identified in all three data sets, showing transcript concentrations in the range of 1.6·10 5 -9.7·10 8 transcripts per μg total RNA (oligoarrays), 1 -6,360 tpm (MPSS), and 18 -2,536 tpm (SAGE) ( Table 1, Additional file 1). Comparison of the oligoarray and MPSS data, oligoarray and SAGE data, and MPSS and SAGE data of the 1,777 overlapping genes showed similar relationship with correlation coefficients within the range of 0.54 -0.60 ( Figure 3). The corresponding coefficients based on log transformed data were somewhat lower and ranged from 0.46 -0.48, showing that the highest transcript concentrations contributed considerably to the correlations of the untransformed data.
To further explore the consistency in the data at different transcript concentrations, for each technology we considered two subsets of 100 genes each, one with the highest and another with the lowest concentrations, selected from the data sets of the 1,777 overlapping genes. The expression level of the poorly and highly expressed genes was confirmed by qRT-PCR analysis ( Table 2). At the highest transcript concentration, 35 of 100 genes were in common to all technologies, whereas only 5 genes overlapped at the lowest concentration ( Figure 4). Similar patterns of intersection were found when more genes were considered (data not shown), showing increased consistency at

Absolute scale comparisons
Transforming the data to numbers of transcripts per cell and gene allowed us to compare the absolute scale of the measurements for each individual gene across the technologies, applying the three data sets of the 1,777 overlapping genes. The oligoarray values ranged from 1.6 to 9,705 transcripts per cell and gene and were significantly higher than the MPSS and SAGE values (p < 0.001, Friedman test in ANOVA on ranks), which ranged from 0.5 to 3,180 (MPSS) and from 9 to 1,268 (SAGE) transcripts per cell and gene (  Table 2). The median detection efficiency based on all overlapping genes was 21% (MPSS) and 23% (SAGE), as compared to the oligoarray data. The oligoarrays therefore seemed to be more sensitive in detecting known transcripts.

Discussion
The use of TransCount to retrieve absolute units from oligoarray data in our study enabled a quantitative comparison of transcript concentrations across MPSS, SAGE, and spotted oligoarrays. Although several studies have compared the performance of tag-based and hybridizationbased gene expression platforms [4,[12][13][14][15][16][17][18][19], our focus on a common measurement unit has to our knowledge not received detailed attention so far. Previous comparisons involving microarrays have utilized the signal intensities [4,[12][13][14][15][16][17][18][19].  Table 2. bly well, but intensities are not suitable when spotted arrays are used and not directly comparable across experiments and platforms. By our approach, the numbers of transcripts per cell were calculated genome-wide for all three technologies. These values could be compared directly across the technologies, and a thorough validation of oligoarrays for quantitative exploration of the transcriptome could be performed.
Our study did not allow for a general evaluation of the transcriptome coverage of each platform, since differences in the sampling depth among the technologies would bias the outcome. Given our sampling depths of 1.6 million and 55,000, which are commonly used in MPSS and SAGE experiments respectively, only about 60% (MPSS) and 10% (SAGE) of the transcripts with 1-5 copies per cell are expected to be detected [20]. Hence, to identify 90% of the expressed genes, sequencing of about three million tags is probably required [20], leading to a significant increase in the costs of these experiments. In contrast, transcriptome coverage in the oligoarray data was more explicitly defined and easier to be ensured by performing four replicate experiments. Rapid advances in the development of next generation sequencing technologies may eventually fill this gap by allowing for significant improvements in the sampling depth at dramatically reduced cost and time. On the other hand, the aim of this study was to validate the quantitative potential of the oligoarrays. We therefore focused primarily on the subset of genes detected by all three technologies, through a stringent mapping of MPSS and SAGE tags to known genes, with a further limit to those also present in the oligoarray design. This explains why the number of unique transcripts detected was lower for the tag-based techniques than for the oligoarrays. Although the increase in sampling depth of tag-based technologies may considerably facilitate better transcriptome coverage, it is not a crucial concern in our study.
The oligoarray estimates showed a stronger relationship to the MPSS and SAGE data than that previously reported for spotted oligoarrays [13], possibly because we used absolute transcript concentrations and not intensities in the analysis. Hence, we have previously shown that the absolute transcript concentrations derived from Trans-Count is more strongly correlated to qRT-PCR data than are the relative values achieved from traditional microarray analysis, suggesting that they are more reliable measures of the transcript abundance [10]. Otherwise, our results, including the particularly poor correlation at low concentrations, were in agreement with earlier reports [12][13][14][15][16][17][18][19]. A correlation coefficient of about 0.50-0.60 therefore probably reflects the overall consistency across the technologies when genes at all expression levels are included. The correlations involving the oligoarray data were not weaker than, but, similar to the correlation between the MPSS and SAGE data, both when the entire concentration range was considered and at high concentrations. Inherent technological differences in the detection and quantification processes between the techniques probably caused some inconsistency between the data sets. Disadvantages related to the respective technologies, such as cross-hybridization, sampling variances, and tag annotation ambiguities may have contributed [18,19,21]. The discrepancy was therefore probably caused by erroneous measurements in all data sets.
The correlations involving the SAGE data may have been influenced by the use of another RNA pool in these experiments than in the MPSS and oligoarray experiments. Mouse retina generally shows low variability in gene expression, minimizing possible confounding effects caused by differences in the RNA pools. Hence, in a recent study we showed that data variation introduced by biological replicates of the mouse retina is small compared to the variation caused by using different technologies [4]. Moreover, the correlation to oligoarray data was somewhat stronger for the SAGE than the MPSS data. The use  The number of transcripts per cell was considerably higher based on oligoarrays and TransCount than based on MPSS and SAGE, both at high and low concentrations.
The difference in the absolute scale of the measurements depends on the values used for the sum of all transcripts and the total RNA content per cell in the MPSS/SAGE and oligoarray calculations, respectively. Our oligoarray results suggest that the maximum number of transcripts may exceed one million, which is more than two-fold higher than the 5·10 5 reported in a study from 1976 [22] that was used in our MPSS and SAGE estimations. Adjusting this number to 1.5·10 6 transcript per cell would have led to MPSS and SAGE values more comparable to the absolute oligoarray data. Consequently, the calculated transcript detection efficiency will also be more similar for the three technologies. Hence, the transcript detection efficiency seemed to be considerable higher for the oligoarrays when the value of 5·10 5 transcript per cell was used in the MPSS and SAGE estimations. More recent studies exploring the number of transcripts in cells have not been performed, except for a microarray study where spike-in controls were used to define a standard curve, which related signal intensity to the absolute transcript numbers [9]. The transcript values were two-to three-fold lower than ours, but the apparent discrepancy was solely due to the use of a highly conservative value of 2-3 pg total RNA per cell in their calculations. Our estimate of 10 pg per cell is within the range of previously reported data [23,24], and probably closer to the true value. The findings reported in the other microarray study [9] are therefore in agreement with our results. Although the error range of the oligoarray estimates is relatively large [10] and the data may be somewhat overestimated, due to possible unspecific binding to array probes [25] and experimental uncertainty in the scaling of the absolute values [10], these findings strongly suggest that the value of 5·10 5 should be re-examined and probably elevated. MPSS and SAGE would then be found to have a weaker coverage than previously anticipated.
The increased estimate for the sum of all transcripts per cell based on oligoarrays was also reflected in a higher number of transcripts per cell and gene, as compared to the MPSS and SAGE results. A number of about 10,000 transcripts was estimated for several genes, and numbers above 5,000 were found for 15 genes, when considering all the detected transcripts. In contrast, only three genes had a transcript number above 5,000 by MPSS, whereas by SAGE the highest number was 3,240. More than 10,000 transcripts per cell have been reported for individual genes and gene groups in several studies on mouse tissues [20,[26][27][28], consistent with our oligoarray data.
Moreover, TransCount estimations for cervical cancers based on cDNA microarrays led to values in agreement with the present oligoarray results [10]. These observations further question the validity of the total transcript number of 5·10 5 per cell that was used in the MPSS and SAGE calculations. In that respect, a recent SAGE study showed that using a total number of 1·10 6 transcripts per cell to convert the tag counts led to absolute transcript numbers consistent with the published values mentioned above [20], supporting our findings.
A thorough evaluation of genes expressed in adult mouse retina and their putative function have been presented in previous studies based on the SAGE data [29,30]. Here, we focused on 40 genes with particularly high or low expression regardless of technology, suggesting that these are truly up-or downregulated compared to the average expression level. The most abundant transcripts are known to be involved in visual perception (Pdc, Rbp3, Guca1a, Unc119, Guca1b, Pde6b) or play another role in retinal function (Calm1, Syp) [30]. Moreover, high expression of Bsg, Plekhb1, Reep6, and Stxbp1 has been reported in the retina, photoreceptors, and/or eye [31][32][33]. Our findings are therefore consistent with previous reports and point to more genes that may be explored to increase our understanding of retinal function, like Ubb and Vamp2. The data also support the hypothesis that the most abundant transcripts are tissue specific and involved in specialized functions, whereas the larger number of less abundant transcripts may be involved in housekeeping activities and shared between tissues, as suggested from studies on the mouse liver [22].

Conclusion
The transcript concentrations estimated from spotted oligoarrays by use of TransCount are correlated to those obtained with MPSS and SAGE. Oligoarrays and Trans-Count may therefore play a role in an efficiently building of transcript repositories at low costs and labor demands. Such quantitative data may also enable insight into new aspects of the transcriptome and a better understanding of gene networks [34]. Clarification of the discrepancy in the absolute scale of the measurements would imply that data may be interchanged across hybridization-and tag-based technologies.

Tissue sample
Total RNA from B6 adult mouse retina was used throughout the study. The oligoarray and MPSS experiments were based on the same RNA pool, whereas a different pool was used in the SAGE experiments [29]. Details of the sample collection and RNA extraction can be found in Kuo et al. [7]. Quality assessment was performed on a Bio-analyzer (Agilent Technologies, Inc., Santa Clara, CA) to ensure that high quality RNA was used.

Microarray experiments
Spotted mouse 70-mer oligoarrays produced at the microarray facility at the Norwegian University of Science and Technology were used. The arrays contained 32,448 spots with 14,076 unique probes printed from an oligonucleotide set originating from the Operon mouse oligo collection v3.0 (Operon Biotechnologies, Inc, Huntsville, AL). Control probes from the Spot Report Alien Oligo Array Validation System (Stratagene, La Jolla, CA) were printed 48 times each across the array. The control spots were used by TransCount to find the absolute scale of the transcript concentrations [10]. A self-self hybridization design with 4 array replicates was used.
Cy3 and Cy5 labeled cDNA was synthesized from 13.5 -15 μg total RNA, as described previously [10]. The quality of the labeled cDNA was assessed from the ratio of absorbance at 260 nm and 280 nm, as measured by use of a NanoDrop spectrophotometer (NanoDrop Technologies, Wilmington, DE). The quality was found to be satisfactory. To target the control probes, 9 control mRNA spikes from the Spot Report system were added to the reaction mixtures at well-defined concentrations, ranging from 3.3 × 10 7 to 2.7 × 10 9 mRNA molecules. A 25% formamide-based hybridization buffer was added to the labeled target mixture, and the mixture was applied to the array for overnight hybridization at 42°C in a water bath. The slides were scanned by an Agilent G2566AA scanner (Agilent Technologies, Inc., Santa Clara, CA) at two PMT settings of 100 and 50, enabling correction of saturated spot intensities [35] and estimation of the scanner amplification factors needed in TransCount to calculate the transcript concentrations [10].
The TransCount method, originally developed for cDNA microarrays [10], was applied with small modifications for oligoarrays. The sequence length of all probes was 70 bp, and the intensities from a slide stained with SYTO nucleic acid staining dye (Molecular Probes, Inc, Eugene, OR) were used as probe quantities. Since oligoarrays are affected by less experimental variation than cDNA microarrays, TransCount could be directly applied with these modifications [10]. The transcript concentrations were estimated from the saturation and background corrected intensities of each probe and oligoarray. The intensities of the control spots covered the whole detection range from near background values to saturation. The estimated concentrations of these spots showed a highly linear relationship to the true concentrations, suggesting reliable scaling of the concentrations of the other spots. The mean concentrations of the four data sets were used in the further analyses. The average number of transcripts per cell was calculated by assuming a total RNA content of 10 pg per cell [23]. Each probe was assigned a UniGene ID by searching the best sequence match in the mouse UniGene build 151 to the probe sequence. If more than one probe was mapped to the same UniGene ID, their transcript estimates were averaged.

MPSS
Total RNA samples were sent to Lynx Therapeutics (now Illumina, Hayward, CA) for processing. The MPSS library was generated according to the Megaclone protocol [3]. Signatures adjacent to poly(A) proximal DpnII restriction sites, comprising of 20 nucleotides each, including the DpnII recognition sequence "GATC", were cloned into a Megaclone vector. The resulting library was amplified and loaded onto microbeads. About 1.6 million microbeads were loaded into a flow cell, and the signature sequences of 17 bases were read out by a series of enzymatic reactions. The abundance of each signature was converted to transcripts per million (tpm).
The mapping of signatures to genes was based on the mouse genome sequence (UCSC GoldenPath genome database, Release 3, Feb 2003) and the UniGene build 151, using the Automatic Correspondence of Tags and Genes (ACTG) tool [36]. A complete set of possible "virtual signatures" was extracted from the sequence database to generate a comprehensive mouse signature collection, and all signatures were ranked and classified according to the likelihood of being a true and detectable signature. If a signature had been located close to a polyadenylation signal or a poly(A) tail on mRNA sequences with known orientation, the credibility of the tag-to-gene assignment was the highest, and the signature was included. In contrast, if a signature was extracted from mRNA sequences whose transcriptional orientation, polyadenylation features, or position information was unknown, or had been found only in non-coding regions or repeated structures, it was filtered out. Such hits may have been generated due to the currently incomplete annotation of the murine genome, or due to sequencing errors in the MPSS experiments. For genes with more than one representative tag sequence, all the corresponding tag counts were summed up. To calculate the average number of transcripts per cell the number of tags per million was divided by two, assuming that the total number of transcripts per cell was 5·10 5 [22].

SAGE
The SAGE data from a previously published study were used [29]. Generation of the SAGE transcript library and the data processing to extract tags and eliminate duplicates have been described earlier [29]. The total number of sequenced tags in the library was about 55,000. The tag counts, originally normalized to 55,000, were converted to tags per million (tpm). The tag-to-gene mapping was performed using the ACTG tool based on a SAGEmap data release with UniGene build 151 [37]. Only the tags that had a reliable tag-to-UniGene match in SAGEmap were included for further analyses. For cases where more than one tag was mapped to one gene, the tag counts were pooled in the same manner as for the MPSS data. The average number of transcripts per cell was calculated as for the MPSS data.

Quantitative real-time PCR
We used qRT-PCR to confirm the mRNA levels of 28 genes listed in Table 2. Our criteria for designing the primers included that they were intron-spanning. This was not the case for 7 of the 35 genes in Table 2, and these were therefore not analysed. QRT-PCR was applied, using Roche 480 LightCycler. Mouse Universal ProbeLibrary probes and target-specific PCR primers (Additional file 2) were selected using the ProbeFinder assay design software [38]. All assays were prepared using standard conditions in a master mix solution (Roche Applied Sciences). cDNA was synthesized from 10 μg of total RNA for each sample using Roche reverse-transcriptase. The reactions were run in triplicate for each gene, using 20 μl reaction volumes and the following conditions: 95°C for 5 minutes, 45 cycles for 95°C for 10 seconds, 60°C for 15 seconds, and 72°C for one second. Dilution curves were made to ensure appreciable amplification efficiency (Additional file 3). The transcript concentrations were calculated relative to the endogenous control β-actin (Actb) as , where Ct Gene and Ct Actb correspond to the mean cycle thresholds for the test gene and β-actin, respectively [39].

Array express accession
The raw data from the oligoarray platform have been deposited to the Array Express repository (E-TABM-422).

Authors' contributions
VN, FL, MH, AF, IKG, EH, and HL conceived and designed the study. VN and FL performed the microarray experiments and matched the data from the different techniques. VN, FL and HL wrote the article. MH, AF, IKG, MAvdW, and HL contributed to the Transcount analysis. WPK, JT, LO-M, and CLC provided the SAGE, MPSS, and qRT-PCR data. All authors helped to draft the manuscript and read and approved the final version.