Estimating RNA-quality using GeneChip microarrays
© Fasold and Binder; licensee BioMed Central Ltd. 2012
Received: 24 November 2011
Accepted: 13 April 2012
Published: 14 May 2012
Skip to main content
© Fasold and Binder; licensee BioMed Central Ltd. 2012
Received: 24 November 2011
Accepted: 13 April 2012
Published: 14 May 2012
Microarrays are a powerful tool for transcriptome analysis. Best results are obtained using high-quality RNA samples for preparation and hybridization. Issues with RNA integrity can lead to low data quality and failure of the microarray experiment.
Microarray intensity data contains information to estimate the RNA quality of the sample. We here study the interplay of the characteristics of RNA surface hybridization with the effects of partly truncated transcripts on probe intensity. The 3′/5′ intensity gradient, the basis of microarray RNA quality measures, is shown to depend on the degree of competitive binding of specific and of non-specific targets to a particular probe, on the degree of saturation of the probes with bound transcripts and on the distance of the probe from the 3′-end of the transcript. Increasing degrees of non-specific hybridization or of saturation reduce the 3′/5′ intensity gradient and if not taken into account, this leads to biased results in common quality measures for GeneChip arrays such as affyslope or the control probe intensity ratio. We also found that short probe sets near the 3′-end of the transcripts are prone to non-specific hybridization presumable because of inaccurate positional assignment and the existence of transcript isoforms with variable 3′ UTRs. Poor RNA quality is associated with a decreased amount of RNA material hybridized on the array paralleled by a decreased total signal level. Additionally, it causes a gene-specific loss of signal due to the positional bias of transcript abundance which requires an individual, gene-specific correction. We propose a new RNA quality measure that considers the hybridization mode. Graphical characteristics are introduced allowing assessment of RNA quality of each single array (‘tongs plot’ and ‘degradation hook’). Furthermore, we suggest a method to correct for effects of RNA degradation on microarray intensities.
The presented RNA degradation measure has best correlation with the independent RNA integrity measure RIN, and therefore presents itself as a valuable tool for quality control and even for the study of RNA degradation. When RNA degradation effects are detected in microarray experiments, a correction of the induced bias in probe intensities is advised.
Measurement of gene expression is based on the assumption that an analyzed RNA sample closely represents the amount of transcripts in vivo. Several effects can distort the abundance of RNA transcripts during extraction and preparation before RNA analytics using, e.g., microarrays: The first problem concerns the degradation of the RNA in vitro [1–4]: The quality of purified RNA is variable and after the extraction during storage rather unstable (see  and references cited therein). Especially long mRNA fragments up to 10 kb are very sensitive to degradation through cleavage of RNAses introduced by handling with RNA samples. Moreover, transcripts show stability differences of up to two orders of magnitude in vivo, raising the possibility that partial degradation during cell lysis could cause a variable extent of bias in quantification of different transcripts . The second problem concerns amplification of RNA in samples analyzed on microarrays giving rise to the decrease in the length of products that are reverse transcribed and amplified using T7 polymerase [7, 8]. The multiple rounds of in vitro transcription that are used to generate samples from small amounts of RNA thus induce a decrease in transcript yield and length.
The screening of nearly three thousand public available GeneChip array data suggests that there is noticeable degradation effect in the majority data files and that 2% of the files were even so severely degraded that their worth was questionable . Working with low-quality RNA may strongly compromise the experimental results and lead to erroneous biological conclusions. It is therefore recommended that the highest quality RNA be used for analyses. However, in some cases, such as human autopsy samples or paraffin embedded tissues, high quality RNA samples may not be available [10–12]. It is therefore important to understand how RNA quality affects the interpretation of the results and also how reliable current quality measures are at indicating RNA quality issues. The assessment of RNA integrity is a critical first step in obtaining meaningful gene expression data. A second step comprises developing methods to quantify degradation and, most importantly, to correct the induced degradation bias in the data and thereby provide more coherent expression measures.
Several RNA quality measures are established based on conventional wet lab techniques such as gel optical density measurement or denaturating agarose gel-electrophoresis (see refs. [2, 5] for a review). More novel lab-on-chip gel electrophoresis techniques like Agilents Bioanalyzer became now state of the art. In combination with sophisticated analysis algorithms processing the shape of the electropherogram (and, particularly, the 28 S/18 S rRNA ratio) they provide accepted integrity measures such as the DegFac-RQS (degradation factor RNA quality scale)  or the RIN (RNA integrity number)  which have been validated independently using qRT-PCR .
Importantly, microarray intensity data itself contains information about the RNA quality used for hybridization due to the 3′/5′-gradient of transcript abundance . On microarrays of the GeneChip-type this gradient is typically measured using either specially-designed control probes or exploiting the specifics of the Affymetrix probe design which is based on a set of about one dozen, surface-attached 25-mers interrogating different positions along each transcript. Both options estimate transcript abundance at close and more distant positions towards the 3′-end based on the hybridization signal [15, 16].
Although proven in many applications, these measures are based on probe intensities which, in general, are non-linear functions of transcript abundance [17–20]. The signals can be strongly distorted by effects not related to transcript concentration such as saturation and non-specific background hybridization. Intensity-based RNA quality measures are therefore potentially prone to systematic errors which, in worst case, can provide diametrically opposed information in assessing apparently good RNA quality in samples with largely degraded RNA (see below). Moreover, the important task of correcting microarray signals for RNA degradation effects remained unsolved at least in single chip applications to our best knowledge. A linear correction model requiring both expression and RNA quality data from a series of arrays has recently been published .
This publication addresses the following tasks to overcome these problems: Firstly, we adapt non-linear hybridization theory based on a physico-chemical model of probe/target binding to the special case of truncated transcripts due to RNA degradation. We will show that our approach consistently explains previous observations such as the effect of RNA quality on transcript intensity level  and correlations between probe intensity and probe position along the transcripts and their effect on expression measures . Analysis of the probe signals in terms of this model enables us to define unbiased (in the frame of the hybridization model used) measures of RNA integrity. Secondly, we compare these new measures with established ones. We demonstrate that methods such as affyslope or the RNA-integrity control probes can provide systematically false information on RNA quality. Thirdly, we propose a simple correction method which aims at removing the degradation bias from the probe intensities and which can be integrated into standard preprocessing pipelines. Details of the methods used are summarized in the Methods section at the end of the paper or shifted into the supplementary text provided as Additional file 1.
Affymetrix expression microarrays typically use a 3′-biased probe location which is motivated by the specifics of target preparation prior to hybridization. The preparation step applies in vitro transcription (IVT) protocols according to the Eberwein method . It starts with first-strand cDNA synthesis from source mRNA using T7 oligo(dA) primers followed by second strand cDNA synthesis [7, 8]. The double-stranded cDNA fragments are subsequently transcribed into amplified antisense RNA (aRNA) which, after labeling, is finally hybridized on the arrays.
Importantly, the decaying coverage of the probes is expected to apply to specific (S) but not to non-specific (N) hybridization. In the S-hybridization mode the probes bind the aRNA fragments of complementary sequence transcribed from mRNA transcripts which they intend to detect. In the N-hybridization mode the probes bind aRNA fragments of partly complementary sequence originating however from mRNA transcripts not referring to the interrogated gene. Trivially, these non-specific transcripts lack a common start position with respect to the intended target and, as a consequence, they, on the average, uniformly cover the probes of each probe set (see Figure 1a). Specific hybridization competes with non-specific one. Both hybridization modes contribute to the measured probe intensities. The consequences of different probe coverages for the measured signal will be discussed below.
Also degradation of mRNA, e.g. upon storage, can produce 3′-biased probe coverages of fragmented aRNA by endonuclease activity that cuts RNA internally, or by means of exonucleases . In the first case, the poly(A) tail is removed by a deadenylase activity, followed by two mechanisms that degrade the mRNA: either decapping followed by a 5′-to-3′ decay or a 3′-to-5′ decay. Once the mRNA poly(A) tail is removed, reverse transcription reaction will not proceed, resulting in low concentrations of truncated transcripts (see Figure 1b). Several studies have identified RNA degradation to be a major cause of microarray expression measure variability [4, 6, 9–12].
With only 25 base-pairs, Affymetrix has considerably shorter probe sequence lengths than competing manufacturers of high-density microarrays such as Illumina or Agilent, who employ probe lengths between 50 and 80 bp. Small probe lengths reduce the specificity for probe/target binding with negative consequences for the signal to noise ratio of the probe intensities. Affymetrix compensates the decrease in single probe signal quality by targeting each transcript with several probes forming a so-called probe set. We here focus on Affymetrix 3′-expression arrays which use typically 11 to 16 probes per set to interrogate transcripts preferentially near their 3′-end (see Figure 1 for illustration).
<… > array denotes averaging over all probes of the array. The approximation in the right part assumes a vanishing intercept in good agreement with the data (see Figure 2c). Additional file 1 provides an overview over selected probe design characteristics of different GeneChip types. It shows that the mean position of the first and of the last probe in the probe sets can strongly vary between the different chip types giving rise to a wide range of < ΔL > −values which can change between about 25 and about 60 nucleotides per index increment. These differences refer in first instance to arrays of older and newer generations (e.g., the human genome HGU95a and HG133a arrays and the mouse genome MG74a and MOE430a arrays, respectively). On the other hand, the average span covered by the probe sets is relatively constant for all chip types considered.
Affymetrix GeneChip arrays include a small number of control probe sets designed to estimate the RNA quality in terms of the 3′/5′ bias. They target the 3′ end, the 5′-end and the middle (m) of relatively long transcripts coding, e.g., beta-actin and glyceraldehyde-3-phosphate dehydrogenase (GADPH) using 20 probes per set. Figure 2a and b shows that the 3′- and the 5′ probe sets of the controls together cover the range of about 700 nucleotides between L20 = 281 and L20 = 378 for the 3′-probe sets and L20 = 942 and L20 = 1104 for 5′-probesets of GADPD and beta-actin, respectively.
respectively, which characterize the decrease of the transcript concentration due to the degradation effect. [Sg]target is the (true) expression degree of a selected gene g given as the total concentration of the target transcripts in the hybridization solution independent of their length. It refers to the target concentration in the absence of degradation and presumes that RNA processing proceeds without 3′/5′ bias. Contrarily, [Sg,k] denotes the apparent expression degree reported by probe with index k = 1,…,Npset designed to interrogate target g. It is given as the concentration of the RNA fragments which specifically bind to this probe. It consequently refers to the probe coverage which decays with increasing distance of the probe to the 3′-end of the target. Angular brackets < … > all probes denote averaging over all probes of the array. One expects [S]g,k ≤ [Sg]target and thus dg,k ≤ 1 owing to the 3′-enrichment after incomplete amplification and degradation of the fragments. The probe specific degradation index, dg,k, thus characterizes the loss of mRNA material at a given probe position along the transcribed region of the gene. The mean degradation index d averages the single probe effects over all probes. It estimates the total loss of RNA probed by the microarray in a given preparation prior to hybridization.
respectively. Kp P,S and Kp P,N are the respective equilibrium constants for target/probe binding. The degradation factors in Eq. (4) consider the reduction of the concentrations [Sg target and [N]chip after incomplete amplification and/or degradation (see Eq. (2)). Non-specific hybridization is related to the total amount of RNA used for hybridization . [N]chip is consequently reduced by a factor given by the mean degradation factor d.
respectively. The lower case x defines the hybridization strengths at ‘ideal’ transcript concentrations (see Eq. (4) with d = dp = 1: and denotes the ‘true’ relative degradation index between 5′ and 3′ probes, respectively. Eq. (6) shows that the apparent degradation index is proportional to the true one in the special situation of dominating specific hybridization (xS> > xN) far from saturation only. It however scales with the ratio of the specific binding and washing constants of the 3′- and 5′-probes, which might be larger or smaller than unity depending on the sequences of the particular probes (see  for details). At dominating non-specific binding or saturation one gets apparent degradation indices which are completely independent of the true one. Their values again depend on the probe sequences and can be larger or smaller than unity. Hence, the use of intensity-based degradation metrics raises problems because they reflect the degradation bias of transcript abundance in special situations only.
On the other hand, two intensity-based degradation measures are well established for quality control of GeneChip arrays: (i) The slope of a linear function fitted to the so-called ‘RNA degradation plot’, r5’/3’ slope. This RNA degradation plot displays the mean logged intensity averaged over all probes with the same index k, taken from one array, as a function of k . (ii) The intensity ratio r5′/3′ control of special control probe sets targeting the 5′- and the 3′-end of relatively long transcripts such as beta-actin and GADPH. A threshold of the 3′/5′-signal intensity ratio of the GADPH controls less than 3 (in logarithmic scale log10 3 = 0.48) is recommended for good quality RNA [31, 32].
In view of the discussed problems of intensity-based degradation measures we will revise these estimates and judge their suitability for determining RNA quality. Large values of r5′/3′ slope and/or r5′/3′ control near unity are generally thought to indicate small degradation bias and thus good RNA quality. Note that reciprocal values of these measures are often used in practice estimating the respective 3′/5′-ratios. Here we consequently use 5′/3′-ratios to ensure direct comparability between the various measures.
In summary, probes located nearer to the 3′-end of the interrogated transcripts potentially shine brighter than more distant probes due to the 3′-enrichment of probe coverage giving rise to expected ‘true’ intensity ratios r5′/3′ < 1. However, this rule applies only to conditions of specific hybridization far from saturation. RNA quality measures based on the 5′/3′-intensity ratio consequently require consideration and evaluation of the hybridization mode of the chosen probes. Moreover, the potential dependence of the probe intensities on the degree of degradation gives rise to systematic errors of the estimated expression degree of the transcripts which requires appropriate correction.
as a function of sigma, . These plots use the same abscissa as the hook curve and they also smooth the noisy data using a running window of 500–1000 probes. The subscript s = s3′, s5′ denotes a subset of three consecutive probes within the probe set of size Npset nearest to (s3′, i = 1) or most distant from (s5′, i = Npset-2) the 3′-end of the transcript, or centred about its middle probe (s = m).
Figure 3 shows the hook curves for two selected microarray hybridizations of differently degraded RNA together with the respective degradation hook (panel above) and tongs plot (panel below). The degradation hook shows essentially the same shape as the standard hook. The curves reflect however different effects: The standard hook plots the mean logged intensity difference between paired PM and MM probes. It consequently estimates the intensity penalty of one mismatched base pairing in the respective probe/target-duplexes. Contrarily, the degradation hook judges the logged mean intensity difference between probes located nearer and farther to the 3′end of the transcripts, and thus the 3′/5′-bias of the probe intensities in the probe sets due to the degradation effect.
Interestingly, the different hybridization modes analogously affect the intensity differences in the standard and the degradation hook as well. For example, upon non-specific hybridization both, the PM/MM difference and the 3′-bias essentially disappear because both effects, the MM-penalty and the 3′/5′-bias, require duplexing of the probes with the intended targets. Non-specific binding doesn’t meet this criterion because the binding of non-specific transcripts is indifferent with respect to the mismatched pairing of the middle base of the MM probes and with respect to the degradation bias as well (see also Figure 1 for illustration). Vice versa, both hook-versions show their maximum in the S-range because specific binding is associated with the intended intensity penalty of the MM-probes and of probes located more distant from the 3′-end, respectively.
Note that the two standard hook plots shown in panel a and b of Figure 3 are of virtually equal height owing to the similar MM-penalty (α = 0.83 – 0.85) whereas the respective degradation hooks markedly differ in this respect (Δγ3′/5′ = 0.57 and 0.29, respectively) revealing marked differences in the degradation level between both samples. Comparison of the heights of both hook-types shows that strong degradation can affect the probe intensities nearly by the same order of magnitude as one mismatched base pairing.
The tongs plots explicitly estimate the intensity bias at three positions of the probe sets and thus it illustrates the progression of degradation with increasing probe index. The ΔΣs curves of all three subsets (s = 3′, 5′ and m) degenerate in the N-hybridzation range indicating the absence of the 3′-bias for non-specific binding as discussed above (see also Eq. (6) for xS < <xN). In the mix-range the ΔΣs-curves split into three branches which progressively diverge with increasing sigma and thus with increasing contribution of specific hybridization. The ‘opening of the tongs’, i.e. the split between the 3′- and 5′-branches, reaches its maximum in the S-range of hybridization in parallel with the maximum of the hook curve and of the degradation hook. Subsequently, the different branches start to converge as predicted for the range of saturation (see Eq. (6)). Both, the experimental degradation hook and the tongs plot are well described by theoretical curves based on the Langmuir-model of array hybridization (see Materials and Methods section, Eq. (21)). The split parameter Δγ3′/5′ characterizes the height of the degradation hook, or equivalently, the ‘tongs opening’ serving as a measure of the maximum vertical difference between the 5′- and the 3′-branches of the tongs, respectively. Δγ3′/5′ estimates the 5′-depletion of probe coverage in terms of the logged concentration increment between the targets covering the 5′- and 3′-probes (Eq. (24)). The examples shown in Figure 3a and b refer to relatively strong and weak depletion of targets with 5′/3′-concentration ratios of dtongs = 10-0.57 = 0.27 and 10-0.29 = 0.51, respectively (see Eq. (24)). This analysis shows that degradation can reduce the transcript concentration to less than one third of the initial transcript abundance.
In summary, the 3′/5′-bias of probe intensities essentially disappears for probes which hybridize predominantly non-specifically and it markedly decreases for probes which are strongly saturated with specific transcripts. The 3′/5′-bias consequently provides a suited metrics for RNA-quality only in the linear range of specific hybridization in agreement with the theoretical predictions made in the previous subsection.
In the next step we selected the probe sets from the non-specific and specific hybridization ranges of the hook curve and calculated their frequency histograms as a function of L1 and Lmax, the position of the nearest and of the most distant probe from the 3′-end in each probe set.
We suspect that the increased fraction of non specific hybridization towards the 3′ end of the transcripts is caused by inaccurate assignment of the 3′-transcript end upon probe design and/or by variations of the 3′-end of the transcripts, e.g. due to effects such as alternative polyadenylation as discussed previously [35, 36]. Alternative polyadenylation leads to transcript isoforms with differences in the 3′ UTR length. In these situations the ‘true’ 3′-end of the transcript can be located at L3′ > 0 and all probes at positions closer to the apparent transcript end, L3′ > L > 1, will hybridize exclusively non-specifically owing to the absence of specific transcripts. In consequence, the mean fraction of non-specific hybridization of probes at small L will exceed that of specific hybridization on relative scale, as observed. A very similar plot as shown in Figure 5 for the rat genome array RG230A was obtained for alternative array types such the Human Genome U133 Plus 2.0 (see Additional file 1).
An alternative option that potentially explains the increase of the relative contribution of N-hybridization near the 3′-end can be sought in depletion of the respective targets in solution due to the high number of probes with partly overlapping probe sequences in this L-range. In consequence, a larger number of probes can be thought to compete for each transcript-fragment than at larger distances L. This competition for the same target can deplete its concentration in solution. Such target depletion effectively reduces the binding affinity of the respective probes for specific binding . In consequence this change can increase the relative contribution of non-specific hybridization as observed in this L-range. On the other hand, it has been shown that depletion is clearly governed by the binding affinity of the probes which exponentially affects the abundance of targets whereas the accumulation of partly overlapping probes near the 3′-end can be assumed to affect target concentrations in a linear and thus much weaker fashion. We therefore suspect that target depletion is, if at all, of secondary importance for explaining the high relative contribution of non-specific hybridization at small L.
In summary, we found a biased distribution of specific and non-specific hybridization along the targeted transcripts: Non-specific binding is more heavily weighted near the 3′-end, presumably owing to its inaccurate assignment and to transcript isoforms with variable 3′ UTR lengths.
respectively. The angular brackets denote averaging either over all probes with the same index k or over all probes with the same absolute position within a window L-δL < Lp ≤ L + δL.
This approach attributes the flattening of the decay near the 3′-end to the saturation of the probes with bound transcripts. However this effect becomes relevant usually at large intensity values only (log I(3′)~ log M > 4; see Figure 3). The observed mean initial intensity values of the decays are however much smaller (log I(3′) ~ 3). We conclude that another effect and not saturation causes the flattening of the decays at small L-values. The decrease of the relative contribution of specific hybridization near the 3′-end discussed in the previous subsection well explains the observed trend: Non-specific hybridization still adds a small residual contribution to the specific decays due to imperfect decomposition of the different hybridization modes. The decrease of the contribution of specific binding presumably due to inaccurate assignment and transcript isoforms then effectively increases the relative weight of non-specific binding and adds a constant component to the decays at small distances from the 3′end which in consequence flattens the initial decay.
Panel b and c of Figure 6 show selected examples taken from the rat-QC and the RNeasy cleanup data sets which refer to different array types (RAE 230A and HG-U133A, respectively). With decreasing RNA quality the decays become steeper paralleled by increasing absolute values of the limiting intensity levels but almost constant initial shift parameters L0 ≈ 150 – 200 and k0 = 1- 2. Index- and nucleotide-based length scales give rise to similar trends (compare the right and the left parts in Figure 6b and c). The L-scale in units of nucleotides is associated with a slightly more flat and smaller asymptotic level than the relative k-scale using the probe indices as argument. Note that about 95% of the probes of the arrays are positioned with similar frequencies in the range 100 < L < 600 whereas only less than 5% of them are found at larger distances, however with a broad distribution over the range 600 < L <2800 (Figure 2). Most of the more distant probes refer to the probe indices k = 10, 11. This assignment effectively compresses the asymptotic region to the last two probes with indices k = 10 and 11. As a consequence the decays in relative k-scale can be described with sufficient accuracy using a ‘single exponential’ decays (Eq. (10) with d∞ k = 0) where the values d(10) and d(11) roughly refer to the limiting decay level obtained in the fits using the L-scale, d∞ L.
The L-decays of specifically hybridized probes obviously behave differently for L > 600 showing a less pronounced loss of intensity than for L < 600. The origin of this difference is unknown. The standard error of the experimental decays roughly agrees with the symbol size (k-dependencies, left part of Figure 6b and c) or it slightly exceeds line thickness (L-dependencies, right part of Figure 6b and c). The small oscillations in the decays and the relative increase at L > 600 thus reflect systematic effects presumably due to differences of the probe properties in the different subensembles of probes referring to each data point such as their binding affinity and also their degradation degree. Recall that the number of probes drastically decreases at L > 600 which makes this range less relevant for correcting purposes of the majority of probes. We exclude this range therefore from curve fitting.
Our fits show that the values of the decay parameters systematically depend on the chosen decay function and strongly correlate each with another. To illustrate this correlation we show fits with variable d∞ L but constant λL = 150 in Figure 6b (right part) and fits with constant d∞ L = 0 but variable λL in Figure 6c (right part). The values of the variable parameters d∞ L and λL systematically decrease with progressive degradation. Both options equally well describe the decaying part of d(L) in the range 100 < L < 600.
This logged degradation ratio characterizes the intensity decay in the index-range kstart- kend =2-10, or equivalently, in the positional range Lstart-Lend = <L > 1,2 - < L > 10,11 ~ 150–550 which comprises the majority of more than 95% of all probes. The degradation ratio can be transformed into estimates of the decay length of the exponential decays: and .
The degradation ratio (Eq. (12)) consequently estimates the mean fold change of transcript abundance reported by the probes positioned near the 5′- and 3′-ends of the probed range. It represents an alternative estimate of the tongs opening parameter introduced above, (Eq. (22)). Figure 4c shows the time course of RNA degradation in the Rat QC experiment using the tongs opening (panel c) and the dk (panel d) parameters. Both measures strongly correlate (see insertion in Figure 4d) and essentially reflect the same degradation behaviour of the samples studied.
In summary, the effect of degradation can be described as a function of the probe position in terms of a ‘shifted exponential decay plus constant’-function using either the probe index or the ‘absolute’ probe position as argument. This information can be further condensed into a single degradation ratio parameter characterizing the fold change of transcript abundance over the length of the DNA region interrogated by the probes.
It was previously shown that the 3′/5′ intensity ratios of special control probe sets interrogating long transcripts such as GADPH and beta-actin might not represent a sufficient measure of the degradation bias at small expression degrees because non-specific binding leads to an underestimation of the 3′/5′-bias . Here we show that the controls are often prone to saturation which also leads to the systematic underestimation of the 3′/5′-bias (see also Eq. (6)).
To judge the hybridization mode we also depict the sigma coordinates of the non-specific background intensity (N, red dots) and of the asymptotic saturation level (as, black dots) obtained from the standard hook analysis of each of the arrays. Recall that the sigma-values of the N- and the as-mode limit the range of possible probe intensities. They consequently constitute an intrinsic metric allowing to assign the probes to one of the five possible hybridization modes as indicated in the figure (see also Figure 3 and ref. [33, 34]). It turned out, that small values of the degradation index (Δ3′/5′ control < 0.2) are often associated with sigma-values near the asymptotic saturation limit of the intensities especially in the tissue data set. We argued above that intensity-based degradation measures are not suited to exactly estimate the degradation level in the saturation limit of the probes. In best case they underestimate the true degradation level; in worst case the intensity ratios become meaningless.
It has been recommended that good-quality samples should have a 3′/5′ signal ratio for GADPH and beta-actin of no more than three, or in our notation of Δ3′/5′ control < log(3) = 0.47 . We display this threshold as the vertical orange line in both parts of Figure 7. It consequently divides the data points of each data set into (apparently) bad and good ones for Δ3′/5′ control > threshold and Δ3′/5′ control < threshold, respectively.
The 3′/5′-intensity ratio of the probe sets is however not a constant for a given RNA-quality level. Instead it depends on the hybridization mode (see above and Eq. (6)). Particularly, the 3′/5′-intensity ratio referring to a constant RNA-quality level follows the degradation hook shown in Figure 3: It is maximal in the S-hybridization range and vanishes near the N- and as-ranges of hybridization. We plot representative degradation hook curves in Figure 7 (see the orange curves; note that the x- and y-axes are exchanged in comparison with Figure 3) which are calculated using the threshold value of good RNA-quality (Δγ3′/5′ = 0.47, see Eqs. (14) and (24)) the mean sigma-levels in the N- and as-ranges of the respective data sets. Hence, the degradation hook illustrates that the threshold value of the 3′/5′-intensity ratio strongly decreases in the mix- and sat-ranges due to the progressive effects of non-specific hybridization and of saturation, respectively. It consequently defines a variable, sigma-dependent threshold-curve which allows to differentiate between bad and good RNA quality data independent of the particular hybridization mode of the respective probes. In other words, it is more appropriate to apply this variable 3′/5′-‘threshold hook’ for quality assessment beyond the linear hybridization range instead of using a constant threshold value of the 3′/5′-intensity ratio.
For example, a large fraction of the GADPH- and beta-actin intensity ratios of the tissue data set meet the constant quality criterion, Δ3′/5′ control < threshold = 0.47, indicating apparently good RNA quality (Figure 7, right part). Consideration of the hybridization-dependent ‘threshold hook’ divides this region further into true positive estimates (Δ3′/5′ control < hookthreshold) and false positives (hookthreshold < Δ3′/5′ control < threshold), where the latter data are located between the curved and linear thresholds as shown in Figure 7. We estimated a positive predictive value for GADPH controls of about 0.48 which reflects overestimation of RNA-quality for about 50% of all 677 arrays of the tissue data set (see Methods section). Note also that strong saturation of the probes can completely prevent detection of poor RNA-quality samples because the respective intensity ratio levels off to Δ3′/5′ control = 0.
The mean sigma coordinates (Σ3′+5′ control) of the Rat-QC data set are found approximately halfway between the respective N- and as- levels indicating that the controls are predominantly hybridized in the S-range (Figure 7, left part). Application of a constant quality threshold seems appropriate for this data.
The sigma (i.e. logged mean intensity values) values of both data sets studied clearly indicate the decrease of the mean intensity of the controls with decreasing RNA quality due to the loss of material assumed, e.g. in Eq. (13). In consequence, the hybridization regime of the controls can shift with changing RNA-quality. Note also that GADPH is associated with slightly larger probe signals than beta-actin in both data sets. Beta-actin controls are consequently less prone to saturation than GADPH controls.
In summary, control probes can overestimate RNA-quality if one uses a constant threshold criterion because the true threshold level strongly decays for saturated probes. The problem can be fixed either by using an intensity dependent ‘threshold’ hook or by using alternative RNA-quality estimates such as the degradation ratio (see above).
A widely applied metric for RNA quality is the ‘RNA degradation plot’ provided with the R package affy[15, 16]. The RNA degradation plot displays the mean log intensity averaged over all probes with the same index k of one microarray as a function of the probe index, k = 1,…,Npset. The slope of the regression line then provides a summary measure to characterize the mean degree of RNA-degradation in a chip-specific fashion. Note that the affy-slope parameter originally does not intend to serve as an absolute RNA quality measure per se but instead, represents a relative measure for comparing RNA quality between different chips in a particular series of measurements.
However, the affy-slope degradation plot is virtually identical with the reciprocal positional dependent degradation index introduced above in Eq. (9) (dh(k)-1) except the fact that it considers all probes of the array whereas our approach separately averages over the N- and S-subensembles referring either to the S- or N-hybridization regimes, respectively. The affy-slope estimates are expected to underestimate the degradation level owing to the inclusion of predominantly non-specifically hybridized probes (so-called absent probes) which do not respond to RNA quality as shown above. More importantly, the chip-to-chip variability of the fraction of absent probes (%N; as determined by methods such as MAS5 or hook; see ref.  for comparison) is expected to affect the affy-slope measures by factors which are not or only weakly related to RNA quality.
To generalize this result we studied the mean 5′/3′-intensity ratios of different subsets of probes taken from the 677 array hybridizations of the human tissue data set ( Additional file 1). We found that the affyslope-measure systematically increases with the amount of specific hybridization and with decreasing amount of absent probes per array. The more detailed evaluation of the functional relation between the ‘true’ degradation index estimated in the S-hybridization regime and the apparent one obtained from all probe sets predicts a linear dependence between the apparent and the true degradation ratios where the slope is expected to decrease with increasing %N ( Additional file 1). Theory also predicts that the apparent 5′/3′-ratio directly varies with the fraction of specifically hybridized probe sets in agreement with the tissue data set.
Hence, the apparent degradation ratio derived from the simple affy-slope intensity measures is strongly modulated by the fraction of non-specifically hybridized ‘absent’ probes leading potentially to the systematic overestimation of RNA quality. Contrarily, the proposed use of specifically hybridized probes largely removes this bias from the data and provides a reliable measure of the degradation degree which can be consistently compared between different arrays.
In panel b of Figure 9 we re-plot the dk degradation parameter as a function of the mean transcript length measured independently using the Bioanalyzer . The two branches of the dk -vs-RIN plot merge into one within the error limits. This result confirms that our microarray-based degradation measure more directly relates to the mean transcript length and thus to the state of the mRNAs that the microarray experiment intends to quantify. The RIN however represents an alternative integrity measure capturing a series of electropherogram features that are indicative also for additional properties of the RNA solution such as the ratio of larger to smaller molecules and how far the degradation process has proceeded . The correlation of the 5′/3′ ratios of the degradation controls with the RIN-numbers reveals subtle differences compared with the behavior of the dk parameter (Figure 9c). Particularly, the hybridization control-measures taken from the ‘fresh’ samples are virtually independent of degradation at RIN < 7 whereas for RIN > 7 both treatments give rise to similar behavior of the controls. Recall that the GADPH and beta-actin controls cover a slightly wider range of the transcripts (from about 200 to about 1000 nt, see Figure 2b) than the dk degradation ratio which probes the range from about 100 to about 550, on the average (see Figure 2c). This difference presumably explains the smaller values of the control ratios at larger expression degrees. More importantly, the dk parameter is calculated as the average over a large number of probes. Presumably both types of parameters respond differently to changes of the length distribution of the transcripts due to degradation. Below we address this issue more in detail.
The regression line between the dk parameter and the 5′/3′-intensity ratio of the controls allows to transform the quality threshold of the latter ratio into a dk threshold (Figure 9d, see orange lines). We replot these thresholds into panel a and c of Figure 9: The RIN threshold is clearly more restrictive assigning more samples to bad RNA-quality than the threshold of the microarray-based control probes.
We so far estimated RNA quality in relative units using suited 5′/3′-intensity metrics which reflect the decrease of transcript abundance with increasing distance from the 3′-end. Trivially, this effect is expected to reduce the total amount of mRNA used for hybridization. The decrease of the mean intensity of the control probe sets with increasing degradation ratio as shown in Figure 7 confirms the decrease of the total amount of the respective specific transcripts with progressive degradation. Figure 7 also shows the non-specific intensity level of each of the arrays studied (red dots), which tends to decrease with increasing degradation.
The degree of specific binding drops upon degradation, however to a considerable smaller degree than the amount of non-specific binding (Figure 10b). This discrepancy surprises because naively one expects that the loss of material similarly affects specific and non-specific binding on the average, i.e. dN ≈ dS. The mean hybridization levels of specific and non-specific binding are however directly related also to the respective mean binding constants, <KP,S > and < KN>, respectively (Eq (4)). We have previously shown, that the decrease of RNA-material used for hybridization increases the specific binding constant due to weaker bulk hybridization and vice versa . In consequence, this so-called up-down effect will partly compensate the decrease of the concentration of specific transcripts giving rise to the smaller decrease of the specific hybridization strength upon RNA degradation.
Part c of Figure 10 depicts the percentage of absent probes detected on each of the arrays. It remains essentially unaffected by RNA degradation. This result shows that the loss of material does not affect the detection threshold of the array experiment for specific binding.
It has been previously found that, although moderate levels of RNA degradation are tolerated by differential expression analysis, beyond a threshold especially long targets provide erroneous expression results [10, 38]. Systematic large-scale microarray analyses reveal that the expression values of up to 30% of all genes probed on an array significantly correlate with degradation quality measures such as the 3′/5′-ratios of control genes [4, 21]. The observed correlations can be well explained on the basis of the results presented here: For example, it is found that the expression values of weakly expressed genes negatively correlate with the quality of their transcripts . The authors explain this ‘…the worse the quality the stronger the signal…’-effect by either the enrichment of low quality RNA in the low signal range due to nonspecific hybridization or by compensating effects due to chip-to-chip normalization. The former interpretation disagrees with our results presented in the previous subsection. We found that progressive degradation dilutes the sample and this way decreases the amount of nonspecific hybridization. On the other hand, the observed negative correlations also mean ‘…the better the (apparent) quality the weaker the signal…’ in agreement with our results: For low intensity signals the 3′/5′-ratio indeed improves with decreasing intensity suggesting better RNA-quality. We demonstrated that this trend is however caused by the increasing amount of nonspecific hybridization and not by improved RNA-integrity.
Considering also correlations between 3′/5′-quality measures and signal values (called LEV, ‘labeling extension values’), Lee et al.  found that LEV are typically small at low expression values but step-wisely increase beyond a certain expression threshold. The authors hypothesized that the positional 3′/5′-bias is less notable for low abundant transcripts due to inefficient reverse transcription. However, according to our results, the observed trend can be explained by the dominance of non-specific hybridization lacking positional 3′-bias at small expression levels. These two examples demonstrate advantages of model-based expression analysis using physico-chemical hybridization theory compared with simple correlation analysis.
The final aim is therefore to use the degradation model for correcting the 3′-probe intensity bias to provide (largely) unbiased probe signals for downstream analysis. One expects that the loss of RNA material in general and particularly, RNA-fragments probed far away from the 3′-end, systematically decreases the apparent expression degree extracted from microarray probe intensities. In the Methods-section we propose a simple algorithm which corrects the raw probe intensities for the positional-dependent 3′/5′-bias. It makes use of the mean intensity decay-function of specifically hybridized probes and of the degree of specific hybridization of the particular probe which it intends to correct. Our approach corrects the 3′/5′-bias on the level of raw probe intensities.
In the supplementary text ( Additional file 1) we compare two correction metrics based either on the absolute probe position (‘L-correction’) or on the relative probe position (‘k-correction’) relative to the 3′-transcript end. The k-correction applies the same factor to all probe sets. Their mean intensity is effectively scaled solely by the degree of specific hybridization (see the Methods section). Contrarily, the L-correction applies a specific factor to each probe-set depending on the mean absolute position of the respective probes. The comparison of both correction methods shows that probe sets located on the average nearer to 3′-end of the transcript are corrected to a less degree using their absolute position than probe sets located more distant from the 3′-transcript end. Hence, the L-correction is more specific with respect to each particular probe set. On the other hand, the k-correction is more robust with respect to outliers.
Amplification of RNA-material using primed in-vitro transcription protocols and degradation of RNA during extraction, storage and processing of the samples affects RNA-quality in microarray experiments with consequences for expression estimates and their interpretation. We systematically analysed the effect of varying RNA quality on microarray probe intensities using a physico-chemical hybridization model and propose (i) new measures to assess RNA quality and (ii), a simple method to correct probe intensities for the degradation bias.
poor RNA quality is associated with a 3′-bias of transcript abundance which affects only the probe signal due to specific hybridization;
estimation and correction of the signal bias of each particular probe requires consideration of its hybridization mode (specific, non-specific or a superposition of both) and of the positional effect of probe intensity along the respective gene due to truncated transcripts. The former issue is solved by applying a modified ‘hook’-approach of data analysis based on Langmuir hybridization theory. The latter effect is taken into account by estimating the mean positional intensity decay on each array as a function of either the probe index or the probe’s distance to the 3′ end of its target transcript.
RNA quality is estimated in terms of the 3′/5′-intensity gradient of specifically hybridized probes. In addition to appropriate quality numbers (such as ‘tongs opening’-parameter and the degradation ratio) we introduce graphical characteristics allowing assessment of RNA quality of each single array (‘tongs plot’ and ‘degradation hook’). The parameters have a well defined physical meaning related to the fold change of transcript abundance along the genes. ‘Poor’ RNA quality is characterized roughly by a decay of the mean specific signal by a factor of less than 0.5 between probes near the 3′- end and probes located about 600 nt away.
our approach improves established RNA-integrity measures such as ‘affyslope’ and the 3′/5′-intensity ratio of degradation control probe sets. Both methods are prone to overestimate RNA quality if the signals are dominated by non-specific hybridization (affyslope) and/or saturation (controls). Our microarray-based quality estimate correlates well with the RNA integrity number (RIN) which, in addition, is affected by more complex properties of RNA degradation not uniquely related to transcript length.
short probe sets near the 3′-end are prone to non-specific hybridization presumable because of uncertainties in 3′UTR length owing to inaccurate assignment of the 3′-end and transcript isoforms.
poor RNA quality is associated with a decreased amount of RNA material hybridized on the array paralleled by a decreased total signal level. Additionally, it causes a gene-specific loss of signal due to the positional bias of transcript abundance which requires an individual, gene-specific correction. The former total effect can decrease the overall signal level of an array by the factor of 0.5 -0.7 in the case of poor RNA quality (RIN < 7). The latter local effect can be more pronounced with a penalty in expression measures by a factor of 0.3-0.4 or even less in worst cases.
Our study proposes new degradation measures to judge RNA quality. The basic difference to previous ones is the explicit consideration of the hybridization mode to largely reduce the influence of non-specific hybridization. Based on the physico-chemical mechanisms of probe hybridization we propose a correction method which aims at removing the degradation bias and thus enables to use the full ensemble of probes in downstream analysis without loss of information.
The tongs visualizations and the degradation measures are implemented in the Bioconductor package AffyRNADegradation. Probe distance files are available under the website http://www.izbi.uni-leipzig.de/downloads_links/programs/rna_integrity.php.
Affymetrix microarray raw intensity data (CEL-file format) were downloaded from the public repositories Gene expression omnibus ( http://www.ncbi.nlm.nih.gov/geo) or Array Express ( http://www.ebi.ac.uk/microarray-as/ae/). We studied the following data sets:
The Human tissue dataset (GSE7307, see supplementary text for the detailed list of samples used) comprises 677 samples taken from over 90 distinct tissue types hybridized to Affymetrix HG-U133 plus 2.0 arrays.
The RatQC (rat quality control) dataset (E-MEXP-1069) from ref.  was generated to systemically explore how RNA quality affects microarray results. It consists of 36 rat liver RNA samples hybridized to Affymetrix RAE230A expression arrays. The progressive change in RNA quality was generated either by thawing frozen tissue or by ex vivo incubation of fresh tissue. Each sample was characterized by the RNA integrity number (RIN) and mean transcript length in ref. .
The RNeasy data set consists of five pairs of HG-U133A GeneChips which were hybridized with RNA extracted from ovarian cancer samples and processed in two different ways namely with and without a cleanup step using RNeasy reagents . The RNeasy cleanup should lead to good-quality RNA whereas lack of the cleanup step should yield poorer-quality RNA. The RNeasy data set was used in previous work aiming at judging RNA-quality from microarray data [39, 40].
Probe set definitions were applied as proposed by Affymetrix by using standard CDF-files. Absolute probe positions with respect to the intended 3′-end of the transcript, Lp, (as used, e.g. in Eq. (8)) were determined from the target sequences provided by Affymetrix for each transcript (see http://www.affymetrix.com) by aligning the probe sequences to the respective transcript sequence. The position of each probe Lp (p = 1, 2…) is then defined as the number of nucleotides counted between the 3′-end of the transcript and the first (i.e. nearest) base of the 25meric probe sequence.
<… > pset denotes averaging over each probe set of usually 11 PM/MM probe pairs addressing the same transcript (log ≡ log10 is the decadic logarithm) and < … > mov is the moving average over a sliding window of about 1000 probes for smoothing. Plotting the data into Δ-versus-Σ coordinates provides the hook curve which enables decomposition of the probe signals into contributions due to specific and non-specific hybridization by simple visual inspection.
where < … > s denotes averaging over subsets of three probes from the respective probe set located either nearer its 3′-end (s = 3′), 5′-end (s = 5′) or in the middle (s = m), and subsequent smoothing using a sliding window of appropriate size. The so-called tongs-plot shows the three positional-dependent values ΔΣ3′, ΔΣ5′ and optionally ΔΣm as a function of the hook-abscissa Σ whereas the ‘degradation hook’ plots ΔΣ3′/5′-versus- Σ.
The two-species Langmuir hybridization isotherm predicts the theoretical hook-curve which was previously used to fit the experimental curves and to extract characteristic chip-related parameters. It provides a ‘mean’ hybridization isotherm implicitly characterizing the concentration dependence of the probe signals. Here we modify the hook formalism to take into account the effect of incomplete transcript amplification and degradation in terms of the degradation ratios defined in Eq. (2). For the detailed derivation of the hook equation in the absence of degradation and the detailed discussion of the used parameters see refs.  and . The theoretical expressions for the ‘degradation’ hook and the tongs-plot are obtained analogously.
It scales with /d, the probe specific 3′-bias of the actual transcript abundance averaged over the subset s and divided by the mean degradation index of the selected chip, d. The standard version of the hook (Eq. (14)) is described by Eq. (16) with because averages are calculated over all probes of each probeset (s = pset).
respectively. Note that the width and the start coordinate of the hook curve, β and Σstart, change with the mean degradation index d (see also Eq. (2)) whereas the height of the hook α doesn’t depend on degradation.
Hence, the mean amount of RNA (Eq. (2)) is directly related to the 3′/5′-difference of transcript abundance.
The threshold hook represents a special version of the degradation hook described in the previous subsection. It defines a threshold of the 3′/5-intensity ratio of the probe sets used to assess RNA-quality such a GADPH or beta-actin. The threshold hook accounts for the fact that the probe signals are affected by non-specific binding and by saturation. Both effects give rise to an intensity-dependent threshold for estimating good RNA-quality.
In contrast to the standard hook (Eq. (14)) we here use only the intensities of PM-probes.
In the second step, one calculates the threshold hook as a Δ3′/5′-vs-Σ3′/5′-plot under the condition that both, the 5′- and 3′-probes hybridize according to the hyperbolic Langmuir isotherm (see Eq. (3) with wp P = 1), however using different specific binding activities due to the different degradation indices of the respective transcripts, d5′ < d3′ (Eq. (4)). The delta-value is expressed as a function of sigma using the degradation hook-formalism described in the previous subsection after neglecting the MM-probes (use Eq. (16) with α = −∞). The start and end point of the threshold hook are taken from the standard hook analysis which provides these data with relatively high accuracy.
The application of a constant threshold instead of the variable one will cause false quality estimates. We estimated the error of the 3′/5′-intensity ratio of the GADPH-control taken from the tissue data set as example: Figure 12b shows the hook-coordinates of the GADPH control probe sets of the 677 samples of the tissue data set (dots, see Eq. (25)). The threshold hook and the horizontal line provide the true and the apparent (false) thresholds for good RNA quality in terms of the logged 3′/5′-intensity ratios, Δ3′/5′ < Δthreshold. The constant threshold is assumed to agree with that of the hook curve in the range of specific hybridization. It consequently forms the tangent of the hook-curve at its maximum referring to the S-range of hybridization. The hook curve describes Δthreshold under the realistic assumption of saturation whereas the constant threshold neglects this effect. As a consequence, data located between both thresholds (coloured in blue) define false positives (FP) with respect to the constant threshold whereas data below the hook and above the line are true positives (TP, red) and true negatives (TN, orange), respectively. The number of false negatives (FN) is zero because the hook threshold remains below the constant one. The positive predictive value (PPV = TP/(TP + FP)) and the specificity (SP = TN/(FP + TN)) are 0.48 and 0.79, respectively, meaning that less than 50% of the 3′/5′ controls properly estimate the quality of RNA in terms of good and degraded one. This particular example assumes that the 3′/5′ signal ratio for GADPH for good RNA is of no more than 3, or in our notation Δ3′/5′ < log(3) ≈ 0.5.
RNA quality affects the probe intensities of each probe by two factors: (i) the probe position with respect to the nominal 3′-end of the transcript, L (or, alternatively, the probe index in the probe set, k) and (ii), the hybridization regime. The specific hybridization regime below saturation is particularly prone to biased intensities as opposed to non-specific hybridization and specific hybridization in the asymptotic saturation range. Our approach aims at correcting raw probe intensities which subsequently can be preprocessed using any method available.
Firstly, we estimate the degree of specific, unsaturated hybridization in terms of the relative amplitude of the degradation hook with respect to its maximum reached in the S-range of hybridization, using the logged PM-probe intensity averaged over each probe set as argument, i.e. . Secondly, we estimate the decay function of the probe intensity for specifically hybridized probes, dS(x) (x = k, L), defined in Eqs. (9) and (10), for the respective array.
is defined as weighted sum of the decay functions due to specific and non-specific hybridization where the latter is set to unity, dN(x) = 1. Note that for saturated probes beyond the maximum of the degredation hook dN(x) = 1 can be interpreted as the apparently absent 3′/5′-bias in the asymptotic range of hybridization.
Hence, each probe intensity is rescaled according to its position (x = k or L) and the hybridization mode of its probe set according to y. Consequently, probe intensities taken from the non-specific hybridization range remain uncorrected. With increasing degree of specific hybridiztaion the probes are progressively scaled up with increasing distance from the 3′-end of the transcript. The maximum correction applies to probe sets in the S-hybridzation range. MM probe intensities are scaled using the mean logged MM-intensity of the probe set as argument. In the supplementary material ( Additional file 1) we compare and discuss both options of using relative and absolute probe positions.
The 3′/5′ correction algorithm is implemented in R and builds upon the popular affy package . The respective program and data are available under http://www.izbi.uni-leipzig.de/downloads_links/programs/rna_integrity.php and as Bioconductor package AffyRNADegradation.
This publication is supported by LIFE - Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.