Microarray analysis is a powerful methodology for high throughput gene expression study which contributes to the understanding of complex events or biological systems . In the present paper we describe the design and benchmarking of a custom-made oligonucleotide microarray named Actichip as a tool to study the actin cytoskeleton in normal or pathological situations.
We designed, produced and evaluated Actichip using optimised and standardised experimental procedures and a data evaluation pipeline we established according to the guidelines developed by the Microarray Gene Expression Data (MGED) Society . Actichip hybridisation signals obtained with our optimised experimental settings were of high quality (Figure 1) leading to accurate and highly reproducible quantification of gene expression levels (Table 2, Figure 2). Importantly, our data indicated that two or three replicates would be sufficient for reliable measurements when applying the standardised procedures we established. Consistent with recent studies [37–40], our results show that a thorough standardisation of the array and experiment design, protocols and data analysis procedures, can greatly improve microarray data quality and comparability. This is crucial for the generation of meaningful universal gene expression index based on the exchange and integration of data between microarray platforms and laboratories.
The reliability and sensitivity of gene expression measurements are other important issues when using microarrays. In this study, we analysed two well-contrasted RNA samples, each characterised by a specific organisation of their actin cytoskeleton and by known marker genes. Many of these genes were found significantly expressed using Actichip (Table 3), underlining the reliability of this array as a transcriptome analysis platform and its value for the characterisation and classification of biological samples based on their transcriptome profiles. Our data further showed that Actichip not only detects reliably qualitative gene expression changes, but has also the potential to accurately measure the amplitude of these variations (Figure 3). In addition, we determined that Actichip has the potential to identify transcripts over a biologically meaningful range including high, intermediate and rare abundance classes of RNAs.
The fraction of probes on an array that yield a significant hybridisation signal can be used as a measure of platform sensitivity. We found a magnitude of detectable genes ranging from 53 to 60 % with both the Actichip, Affymetrix and Operon microarrays (Figure 5), indicating that the reactivity of the three platforms is similar. These results are in good agreement with data from similar studies [16,41], and suggest that a significant fraction of cytoskeletal genes were not or very lowly expressed in our samples, consistent with the concept that only part of the genome is usually expressed in a given differentiated cell line or tissue .
Comparison of the expression profiles obtained from the three platforms revealed a moderate concordance between the datasets, the best score (49 %) being observed between the Actichip and Affymetrix arrays. Nevertheless, we found good correlations between the relative expression data from the different arrays when considering the subset of concordant genes. The correlation in gene expression levels between the Actichip and Affymetrix arrays was particularly strong and was comparable to those reported in similar studies for best performing arrays [16,43–46]. Identifying the source of variability between the different microarray platforms was not straightforward since many factors could have influenced the expression data. Indeed, microarray platforms differ on numerous technological aspects including array format and fabrication, protocols and instrumentation, as well as computational and statistical tools. It has been shown that these differences could account, at least in part, for discrepancy in the data generated by different array technologies [33,47–50]. Although we carefully standardised our protocols, we could not avoid some differences in the procedures specific for each platform. Biases in our data may partly result from dissimilarities between the methods we used to generate and label the samples or from differences in sensitivity between the procedures we applied to acquire and analyse the data.
We found that 7.0 % or 10.1 % of the Actichip targets were not represented in the Affymetrix GeneChip or Operon array, respectively [see Additional file2]. This result is not surprising considering that the three array platforms were implemented using different databases or different releases of the same database (Table 4) harbouring modifications of transcript sequences, identifiers or annotations. However, our data question the reliability of the high throughput design of pangenomic probe libraries. Focusing on a limited, easy-to-handle set of genes constitutes a more careful and robust approach. In line, several focused microarrays were recently described as powerful alternatives to whole genome arrays to study complex biological systems [45,51,52].
On the other hand, many of the genes represented on Actichip are highly similar and are not easy to discriminate using long oligonucleotide microarrays. When considering the actin gene family, only very limited regions of the transcript sequences can be used for the design of probes with convenient physical properties and specifiCity. To design high quality probes, we developed the CADO4MI program which allows a validation of oligonucleotides by cross-comparison of their sequences with data from several reference databases. For 219 of the 327 target genes represented on Actichip, combining information available from the UniGene and RefSeq databases actually allowed us to select probes with an enhanced specifiCity compared to those obtained using only one database. The fact that Actichip was able to differentiate the highly similar actin isoforms confirms that CADO4MI generates highly specific probes (Figure 4, Table 5). By contrast, some probes specific for the actin isoforms in the Affymetrix GeneChip and in the Operon set target regions having a high degree of similarity with several unrelated transcripts. As a consequence, these probes may generate false positive data due to cross-reactivity. This could explain the erroneous detections of some actin isoforms we observed with the Affymetrix or Operon platforms. In line, probe sequence alignment showed that the ACTA2 Operon probe has actually the potential to cross-hybridise with several transcripts [see Additional file3]. By using the probe match tool at the NetAffx analysis center, we also found that the ACTA2 and ACTG1 probe sets from the U133A GeneChip both perfectly match with the ACTA2 mRNA. However, our data showed that the specifiCity of a probe can not be simply inferred from its design characteristics. Although giving false positives in our study, the ACTA1 Operon probe appeared to be specific as judged by sequence alignment [see Additional file3], and the ACTG2 Affymetrix probe set perfectly matched with the corresponding transcript sequence.
It is conceivable that using latest versions of commercial arrays based on better-quality genome assembly and annotations or on new design concept may improve measurement accuracy and sensitivity. As an illustration, the GeneChip Exon array recently designed by Affymetrix with over six million probes targeting all annotated and predicted exons in the human genome appears as a promising tool to investigate both gene expression and alternative splicing with a high resolution. Data from the literature show that this chip may provide more accurate gene expression measurements than traditional microarrays [53,54], but requires a more complex strategy for the analysis of expression data [53,55]. Complex and time-consuming analysis is a typical trait of high densitiy microarrays and often represents the bottleneck of pangenomic expression studies. In the particular context of studies focusing on a limited number of genes, thematic arrays offer the possibility to overcome these limitations.