To better understand the strengths of the different ribosomal depletion methodologies available, we utilized a set of controlled samples that could be broadly distributed in order to provide a consistent biological background for each kit and site (Fig. 1). All experiments utilized the well characterized Universal Human Reference RNA (UHR) from Agilent, either in its intact state, or following heat degradation (Additional file 1: Figure S1). Two control spike-ins were added to this sample, the Lexogen Spike-In RNA Variant Controls (SIRVs) which were added before degradation and co-degraded with the sample, and the External RNA Controls Consortium (ERCC) from Ambion, which was added after degradation and thus remained intact and served as an additional control. It should be noted that while heat degraded RNA is a proxy for difficult to isolate samples (eg, brain microdissections or tumors), the samples are not formalin fixed and so lack the base modifications often observed in FFPE samples [4]. Consequently, the degraded sample in this study should be considered a best case scenario for how the kits would perform with challenging sample types that yield poor quality RNA. For each experiment, 100 ng of RNA input was used with the exception of the Takara/Clontech SMARTer pico kit which used 1 ng input as recommended by the manufacturer.
The study tested seven rRNA depletion kits (Fig. 1), each tested at four sites. The kits tested include Illumina RiboZero Gold (RZ), Lexogen RiboCop (LX), Qiagen GeneRead rRNA Depletion (Q), all of which use bead capture for ribosomal depletion, the New England Biolabs NEBNext rRNA Depletion (NE), Kapa RiboErase (K), and Takara/Clontech Ribogone (CR) kits that are based on RNAseH degradation of the rRNA, and SMARTer Pico (CZ) which uses the ZapR enzyme to remove rRNA after library prep. The Lexogen, Qiagen, Ribogone, and NEBNext kits all utilized the NEB Next Ultra II directional library generation kit to convert the RNA to Illumina libraries while the other kits used RNA library generation kits from the manufacturer of the depletion chemistry. A total of 11 sites participated in the study with each site handling no more than four kits. Sites were selected from genomic core facilities that are members of the Association of Biomedical Research Facilities (ABRF) who routinely preform RNA library preparation for academic labs. For each vendor, a consultation conference call was held between the vendor and the participating sites to review the protocol in detail before the experiment was performed with the goal of standardizing and clarifying the protocol, thereby minimizing the chance for confusion about the methodology. Full details of alterations to the standard protocols agreed on by the vendor and the participating sites can be found in the Additional file 2. Technical duplicates of both the degraded and intact RNA were run at each site for each kit. The total 106 samples after dropouts were pooled and sequenced on three NextSeq500 runs at a single site to eliminate bias due to sequencing.
The different methodologies were first evaluated for their ability to perform their primary objective, the removal of rRNA from the samples before sequencing. rRNA reads in each sample were identified by aligning to known rRNA sequences using BWA. A cutoff of 50% nuclear rRNA was chosen to indicate ribosomal depletion failure. Illumina’s RiboZero Gold kit showed ~ 5% rRNA with the intact sample at all sites (excluding a single point failure) but slightly higher rRNA fractions for the degraded sample. This kit was used as a baseline for the other kits due to its long-standing reputation in the sequencing community (Fig. 2a). The other kits that used the capture method for depletion were less consistent than RiboZero Gold with one failed site for Lexogen RiboCop and three failed sites for the Qiagen GeneRead rRNA depletion kit. For both the Lexogen and Qiagen kits, the intact samples performed significantly better than the degraded sample and caution should be used when using these kits on highly degraded RNA.
By comparison, the kits that degraded the rRNAs by either RNase H treatment or using ZapR showed more consistent results. Excluding single sites that failed with the NEBNext Ultra rRNA and SMARTer Pico kits, those two kits as well as the Takara/Clontech RiboGone and Kapa RiboErase kits performed very well with no differences observed between intact and degraded RNA. The RNaseH methods all showed very low rRNA fractions overall with the noted exception to the NEB kit. The SMARTer Pico kit had a slightly higher rRNA level, similar to that observed with Illumina Ribozero Gold degraded samples.
For those samples with successful rRNA depletion, we next ascertained the quality of the RNA sequencing data. Samples with greater than 50% rRNA were excluded from further analysis to eliminate artifacts that may be caused by improper implementation of the protocol. All the kits showed strong strand bias as expected by the protocols (Fig. 2b). Notably, SMARTer Pico reads mapped to the sense strand, which is the opposite strand from the other methods. While this is expected, care should be taken in adapting existing informatics pipelines to this kit. Differences were observed among the kits in how they handled mtRNAs. These were a major contaminant in the Clontech kits, particularly the RiboGone method that only targets the 12 s mtRNA and not the 16 s mtRNA. The other methods that addressed all mtRNA reads, such as Illumina Ribozero Gold, significantly reduced the fraction of reads from mtRNAs (Fig. 2c). This is especially noticeable in the RiboZero samples where site3 utilized a standard Human/Mouse/Rat kit instead of the RiboZero Gold.
Looking at the non-rRNA reads in each sample, the vast majority align to protein coding genes based on the ENSEMBL annotation. All samples show > 60% protein coding with most over 80% and the Clontech RiboGone kit having the largest fraction mapping (Fig. 3a). Most of the samples identified ~ 14,000 genes expressed at greater than one RPKM and ~ 16,000 at over 0.1 RPKM (Fig. 3b). A single site using the Takara/Clontech SMARTer Pico kit did show a somewhat reduced number of genes, which was associated with a lower library complexity observed from that site. Antisense mapped reads and reads mapping to the signal recognition particle RNAs (SRPs) were the most variable aspect of each sample though the source of these differences were unclear as they varied widely from site to site.
While the total number of protein coding genes detected was quite similar, many genes appear to be detected at significantly different rates. To better understand this observation, we performed differential gene expression analysis on the intact RNA samples that passed our QC metrics, comparing each preparation back to Illumina RiboZero Gold (Fig. 3c). The Qiagen rRNA depletion kit was excluded as only two replicates passed these criteria. Hierarchical clustering of the differentially detected genes, clusters the kits first by their RNAseq library prep methodology, with all three kits prepared using the NEB Ultra II Directional RNA Library kit clustering together, followed by the Kapa RiboErase kit and finally the low input Takara/Clontech SMARTer pico. Generally, several hundred genes could be easily observed as differentially detected between each of the kits and Illumina’s RiboZero (fold changes > 2, Benjamini corrected p-values < 0.001, Fig. 3d). Testing the physical properties of these differentially detected genes found that gene length appears to be a large contributor to the direction of the bias with shorter transcripts better detected by RiboZero and longer transcripts better detected by the other kits (Fig. 3e). Many of the most variably detected genes across the data set are quite small, such as mitochondrial proteins and ribosomal proteins (Additional file 3: Figure S2). The libraries themselves did not display any particular size bias with the RiboZero samples having an average length distribution similar to the other kits (Additional file 4: Figure S3). Bias in GC percentage was also observed in a few samples, with the Kapa RiboErase kit having the strongest bias against high GC transcripts (though the underlying gene list is quite small, Fig. 3f).
Ribosomal depletion is a key methodology used in studying noncoding RNAs as many of these transcripts lack polyadenylation sites [5]. We focused on lincRNAs (long intervening noncoding RNAs), one type of noncoding RNAs, since they do not overlap with any protein coding or other long non-coding RNA genes. Using the ENSEMBL annotation, approximately 4% of the non rRNA reads map to lincRNAs using RiboZero (Fig. 4a). Similar numbers are observed with the SMARTer Pico kit and Kapa RiboErase. The other kits, all of which were prepared with NEB Ultra II directional RNA library kit, show less than 3% of reads mapping to lincRNAs. This global decrease in number of lincRNA reads appears to reflect a general decrease in the number of mapping reads rather than a specific bias against a subset of lincRNAs. Two lines of evidence support this conclusion. First, the number of lincRNAs detected at > = 0.01 RPM remains ~ 3500 for all of the different kits tested and no bias is seen against the NEB prepped kits (Fig. 4b). Second, while the majority of lincRNAs detected can be assigned to 4 specific lincRNAs (MALAT1, SNORD3A, RNRP and NEAT1), the remaining fraction remains constant among the different kits, suggesting a global decrease in mapping (Fig. 4c). The precise set of lincRNAs detected varies somewhat but the core of 3200 lincRNAs are detected by all of the kits (Fig. 4d).
While comparison of different detection rates of mRNAs in UHR can point to possible differences between the kits’ chemistries, the spike-in controls provide an absolute metric to evaluate their effectiveness. Both SIRV (co-degraded with the RNA) and ERCC (not degraded) control spike-ins were added to the sample before library preparation and should give an unbiased look at the behavior of the different kits. As an initial test, we examined the ratio between the two spike-in types to model the impact of degradation on efficiency of library formation. Importantly, the kits do treat degraded RNA differently in their protocols and the not degraded ERCCs in the degraded samples were processed along the path proscribed for the bulk RNA, suggesting they are under-fragmented relative to the bulk population. Examining the ratios between the ERCC and SIRVs, we find that all the intact samples show ~ 60% SIRV reads (Fig. 5a). By comparison, the degraded samples show significant bias between the ERCCs and SIRV, generally favoring the intact ERCC. The RiboZero Gold and Kapa RiboErase kits show the least bias based on degradation, while the kits using the NEB stranded RNAseq kits showed a bias against shorter RNA fragments, which is similar to what was observed for protein coding genes (Fig. 3e). The SMARTer Pico kit is biased against the intact ERCC spike-ins in the degraded sample, which is likely due to not pre-degrading the ERCCs in the context of the degraded total RNA, emphasizing the importance of this step. Inserting not degraded spike-in controls into variably degraded experimental samples may confound the ability of these spike-ins to serve as a normalization tool, as previously observed [6, 7].
Distinct transcripts are present at defined ratios in the spike-in control allowing direct visualization of over and under-representation of transcripts. Within the SIRV spike-ins, the distinct transcripts were largely at equal ratios between kits and sites, though deviation from the expected values is observed (Fig. 5b). The SMARTer Pico kit was particularly susceptible to variation, possibly due to the low total input (1:100th of the other kits), and some transcripts (e.g. purple at 1/4×) show loss of signal in the degraded sample. By comparison, the ERCC spike-ins showed significantly more variability across sites, even within the intact RNA samples (Additional file 5: Figure S4). Overall, the Lexogen and Takara/Clontech RiboGone kits generally had the most consistent and even performance on both the ERCC and SIRV spike-in controls across their test sites for intact samples.
Using the transcript diversity within the SIRV spike-in control, we further examined the general trends observed with respect to transcript length, GC content, and abundance. The 50 highest expressed SIRV transcripts from the intact samples were individually quantified using STAR/RSEM and the TPMs normalized to the expected frequency of the transcripts. We then compared the normalized frequencies for all of the transcripts highest and lowest quintile for different parameters to look for differences. Randomly selected transcripts show minimal differences (Fig. 5c), nor do we observe biases when the transcripts are sorted by GC percentage or amount of expected transcript (Fig. 5e, f). Transcript length does appear to show a bias, with the longest 10 transcripts (len > 2200 nt) showing lower levels of detection than the shortest transcripts (len < 480 nt, Fig. 5d) and three of the poorest detected transcripts in the SIRVs are in the set of longest transcripts. However, the overall distribution remains very broad and size alone is not predictive of the behavior of the transcript.