RNA is extremely vulnerable to degradation and as such has the potential to introduce a systematic bias in gene expression measures. Reliable measures of sample and data quality are therefore essential to evaluate the effects of RNA integrity on accuracy, sensitivity and specificity of gene expression results. From previous studies as well as our own, it is now clear that the level of acceptable RNA degradation within an experiment depends largely on the experimental design, platform and application. Multiple studies have demonstrated an improvement in microarray and qRT-PCR performance by using random priming when RNA integrity is in doubt. Here we observed a direct association between RINs and array quality in the majority of cases. To gauge the consequences of using these arrays in downstream analysis, we compared quality-flagged to quality-passed arrays and found a relatively small subset of genes, 1172/20019, to be significantly affected (p-value 0.05, FC ≥ |2|) in our samples on the Gene 1.0 ST platform. It is of course possible that the exact identity and proportion of the affected genes may differ between studies on Gene 1.0 ST arrays but, based on our data, we suggest that the overall proportion of affected genes is unlikely to be significantly different to that observed here. Depending on the application, this may or may not have an effect on the study outcome. However, the most common microarray applications such as finding differentially expressed genes between two conditions, pathway analysis, and clustering do not rely on interrogating specific genes and appear to be largely robust to the effects of RNA degradation on this platform (Table
Using within- and between-array quality measures, we investigated the relationship between RNA integrity and array quality on Affymetrix Gene 1.0 ST arrays. We found a combination of within- and between-array quality measures useful to rank samples by array quality. However, the single most useful array quality measure appears to be GNUSE, since it provides a more general measure of array quality relative to a large set of publically available arrays. We found that 86% of samples with RINs ≤ 3.3 were flagged by at least two of our quality control measures. One sample with RIN score < 3 passed all three quality measures, although it did have relatively low array quality weight. Furthermore, 10 out of 17 samples with RIN scores ≤ 7 passed at least 2 out of 3 quality measures, suggesting that the widely used RIN cutoff of 7 is too stringent for Gene 1.0 ST arrays.
We then examined the genes most affected by RNA degradation and demonstrated a relationship between accuracy and length of the original transcript, with both longer than average, and very short transcripts being under- and overrepresented in quality-flagged samples respectively. This is in contrast to the findings by Opitz et al who found that short transcripts were more vulnerable to the perceived effects of degradation, whereas long transcripts were more stable relative to the average length transcript
. Interestingly, of the genes that were overrepresented in quality-flagged samples, 70% were small non-protein coding RNAs, including 94 small nucleolar RNAs, and 4 microRNAs, consistent with reports that microRNAs are more robust to RNA degradation compared to mRNA
, perhaps because they are more thermodynamically stable than mRNAs.
Without excluding any genes, we then compared the orthogonal approaches of either excluding quality-flagged arrays or compensating for RNA degradation at different steps in the analysis. Sample clustering showed that when using ComBat adjustment, quality-flagged samples no longer clustered together. Furthermore, samples tend to segregate more clearly by disease status following adjustment, which suggests that the algorithm is not introducing artifacts. It is worth noting that patients 13, 4 and 18 were diagnosed with a hereditary form of CRC (HNPCC) – it is therefore not surprising that the ‘normal’ samples from these patients form a separate cluster.
Irrespective of sample/array quality, applying compensatory measures for RNA degradation performed at least as well as excluding arrays that were flagged during quality assessment, as judged by gene expression analysis and IPA. At a p-value of 0.01, SVA and Combat detected the highest number of differentially expressed genes between tumour and normal samples and the top four methods applied here had 1117 differentially expressed genes in common. To evaluate the biological plausibility of the genes deemed significantly differentially expressed between tumour and normal samples, we harnessed the results from IPA to show that, in terms of the top scoring biological functions and upstream regulators, there is considerable overlap in the identity and direction of biological activation when comparing analysis methods that either excluded or included quality-flagged arrays. These results suggest that our analysis strategies are biologically sound and not biased by non-biological variance.
The relevance of each method will depend on the downstream application and the proportion of quality-flagged arrays: If a small percentage of arrays are flagged, there might not be much benefit in including them for downstream analysis. However, if a large proportion of the arrays are affected by RNA quality – which is likely to often be the case where the RNA is derived from irreplaceable historical clinical samples – the ability to retain all arrays and to account for these effects in the analysis will be valuable. Here, ComBat may be useful if direct data adjustment is required, e.g. for sample/gene clustering. On the other hand, for analysis of differential expression, especially when the source of non-biological variance is not immediately apparent, SVA may be most useful since it does not require supervision; notably, in our hands SVA was able to identify two surrogate variables which closely corresponded to “batch” and “quality” factors, judged by the grouping of samples. To establish whether the measures used here to compensate for quality-effects are superior to excluding these arrays from the analysis will require a controlled study with known true- and false-positives where the discriminatory power of each method can be objectively investigated. However, the significant overlap observed between the differentially expressed genes identified by the different approaches used here, combined with the considerable overlaps in both biological function and upstream regulators identified by pathway analysis of the resultant data, argues against a simple expansion of false positives when lower quality array data is included in the analyses. The quality assessment and data analysis methods discussed here should in principle be as useful for Affymetrix Exon ST array analysis as well.