Cross-site comparison of ribosomal depletion kits for Illumina RNAseq library construction

Background Ribosomal RNA (rRNA) comprises at least 90% of total RNA extracted from mammalian tissue or cell line samples. Informative transcriptional profiling using massively parallel sequencing technologies requires either enrichment of mature poly-adenylated transcripts or targeted depletion of the rRNA fraction. The latter method is of particular interest because it is compatible with degraded samples such as those extracted from FFPE and also captures transcripts that are not poly-adenylated such as some non-coding RNAs. Here we provide a cross-site study that evaluates the performance of ribosomal RNA removal kits from Illumina, Takara/Clontech, Kapa Biosystems, Lexogen, New England Biolabs and Qiagen on intact and degraded RNA samples. Results We find that all of the kits are capable of performing significant ribosomal depletion, though there are differences in their ease of use. All kits were able to remove ribosomal RNA to below 20% with intact RNA and identify ~ 14,000 protein coding genes from the Universal Human Reference RNA sample at >1FPKM. Analysis of differentially detected genes between kits suggests that transcript length may be a key factor in library production efficiency. Conclusions These results provide a roadmap for labs on the strengths of each of these methods and how best to utilize them. Electronic supplementary material The online version of this article (10.1186/s12864-018-4585-1) contains supplementary material, which is available to authorized users.


Background
Ribosomal depletion is a critical method in transcriptomics that allows for efficient detection of functionally relevant coding as well as non-coding transcripts through removal of highly abundant rRNA species. Use of oligo dT primer to capture the polyadenylated 3′ end of the transcripts and isolate mRNA is routine in many RNA sequencing preparations; however this method lacks the ability to handle degraded samples where most of the RNA is separated from the 3′ tail, or to isolate non-polyadenylated transcripts such as lncRNAs. Ribosomal removal methods address these issues by directly depleting the rRNA while leaving other transcripts intact. This technique is widely utilized and is a basic component of many large datasets [1][2][3].
The current generation of rRNA removal kits employs three distinct strategies to deplete these transcripts. In the first method, rRNA is captured by complimentary oligonucleotides that are coupled to paramagnetic beads, after which the bound rRNA is precipitated and removed from the reaction. Kits utilizing this method include Illumina's RiboZero, Qiagen GeneRead rRNA depletion, and Lexogen RiboCop. The second method uses an alternative strategy, hybridizing the rRNA to DNA oligos and degrading the RNA:DNA hybrids using RNAseH. These kits include NEBNext rRNA depletion, Kapa RiboErase, and Takara/Clontech's RiboGone. A third method that is specifically aimed at low-input samples using the Takara/Clontech SMARTer Pico kit, targets the ribosomal RNA sequences after conversion to cDNA and library prep using the ZapR enzyme. Additional high abundance transcripts such as globin and mitochondrial RNA (mtRNA) can be targeted by each method. Despite the availability of many different kits utilizing these methods, the efficiency of rRNA removal and possible off-target effects of these different methodologies on the resulting RNAseq data remains unclear.
With an increasing number of ribosomal RNA depletion kits available, understanding the relative strengths of these methods is critical for improving experimental design. To address this challenge, we have conducted a cross-site study comparing seven rRNA depletion kits against a standard sample both as intact and degraded RNA. We find that about half of the kits are likely to require significant care in implementation and note that the Lexogen RiboCop and Qiagen GeneRead kits worked poorly with heavily degraded samples. The different kits also appear to be affected by relative lengths of the transcripts as well as the degradation of the input RNA. These results suggest that different methodologies may be appropriate depending on the experimental question and quality of input material.

Results
To better understand the strengths of the different ribosomal depletion methodologies available, we utilized a set of controlled samples that could be broadly distributed in order to provide a consistent biological background for each kit and site (Fig. 1). All experiments utilized the well characterized Universal Human Reference RNA (UHR) from Agilent, either in its intact state, or following heat degradation (Additional file 1: Figure S1). Two control spike-ins were added to this sample, the Lexogen Spike-In RNA Variant Controls (SIRVs) which were added before degradation and codegraded with the sample, and the External RNA Controls Consortium (ERCC) from Ambion, which was added after degradation and thus remained intact and served as an additional control. It should be noted that while heat degraded RNA is a proxy for difficult to isolate samples (eg, brain microdissections or tumors), the samples are not formalin fixed and so lack the base modifications often observed in FFPE samples [4]. Consequently, the degraded sample in this study should be considered a best case scenario for how the kits would perform with challenging sample types that yield poor quality RNA. For each experiment, 100 ng of RNA input was used with the exception of the Takara/Clontech SMARTer pico kit which used 1 ng input as recommended by the manufacturer.
The study tested seven rRNA depletion kits ( Fig. 1), each tested at four sites. The kits tested include Illumina RiboZero Gold (RZ), Lexogen RiboCop (LX), Qiagen GeneRead rRNA Depletion (Q), all of which use bead capture for ribosomal depletion, the New England Biolabs NEBNext rRNA Depletion (NE), Kapa RiboErase (K), and Takara/Clontech Ribogone (CR) kits that are based on RNAseH degradation of the rRNA, and SMARTer Pico (CZ) which uses the ZapR enzyme to remove rRNA after library prep. The Lexogen, Qiagen, Ribogone, and NEBNext kits all utilized the NEB Next Ultra II directional library generation kit to convert the RNA to Illumina libraries while the other kits used RNA library generation kits from the manufacturer of the Fig. 1 Design of the ribosomal depletion study: Schematic of the sample processing is shown. A single sample of UHR RNA with SIRV spike-ins was kept intact or heat degraded followed by addition of the ERCC spike-in. The two samples were then distributed to the participating sites where they were run as technical duplicates for each kit. All graphics were either produced by the authors or are public domain images that are no longer under copyright depletion chemistry. A total of 11 sites participated in the study with each site handling no more than four kits. Sites were selected from genomic core facilities that are members of the Association of Biomedical Research Facilities (ABRF) who routinely preform RNA library preparation for academic labs. For each vendor, a consultation conference call was held between the vendor and the participating sites to review the protocol in detail before the experiment was performed with the goal of standardizing and clarifying the protocol, thereby minimizing the chance for confusion about the methodology. Full details of alterations to the standard protocols agreed on by the vendor and the participating sites can be found in the Additional file 2. Technical duplicates of both the degraded and intact RNA were run at each site for each kit. The total 106 samples after dropouts were pooled and sequenced on three NextSeq500 runs at a single site to eliminate bias due to sequencing.
The different methodologies were first evaluated for their ability to perform their primary objective, the removal of rRNA from the samples before sequencing. rRNA reads in each sample were identified by aligning to known rRNA sequences using BWA. A cutoff of 50% nuclear rRNA was chosen to indicate ribosomal depletion failure. Illumina's RiboZero Gold kit showed~5% rRNA with the intact sample at all sites (excluding a single point failure) but slightly higher rRNA fractions for the degraded sample. This kit was used as a baseline for the other kits due to its long-standing reputation in the sequencing community (Fig. 2a). The other kits that used the capture method for depletion were less consistent than RiboZero Gold with one failed site for Lexogen RiboCop and three failed sites for the Qiagen GeneRead rRNA depletion kit. For both the Lexogen and Qiagen kits, the intact samples performed significantly better than the degraded sample and caution should be used when using these kits on highly degraded RNA.
By comparison, the kits that degraded the rRNAs by either RNase H treatment or using ZapR showed more consistent results. Excluding single sites that failed with the NEBNext Ultra rRNA and SMARTer Pico kits, those two kits as well as the Takara/Clontech RiboGone and Kapa RiboErase kits performed very well with no differences observed between intact and degraded RNA. The RNaseH methods all showed very low rRNA fractions overall with the noted exception to the NEB kit. The SMARTer Pico kit had a slightly higher rRNA level, similar to that observed with Illumina Ribozero Gold degraded samples.
For those samples with successful rRNA depletion, we next ascertained the quality of the RNA sequencing data. Samples with greater than 50% rRNA were excluded from further analysis to eliminate artifacts that may be caused by improper implementation of the protocol. All the kits showed strong strand bias as expected by the protocols (Fig. 2b). Notably, SMARTer Pico reads mapped to the sense strand, which is the opposite strand from the other methods. While this is expected, care should be taken in adapting existing informatics pipelines to this kit. Differences were observed among the kits in how they handled mtRNAs. These were a major contaminant in the Clontech kits, particularly the RiboGone method that only targets the 12 s mtRNA and not the 16 s mtRNA. The other methods that addressed all mtRNA reads, such as Illumina Ribozero Gold, significantly reduced the fraction of reads from mtRNAs ( Fig. 2c). This is especially noticeable in the RiboZero samples where site3 utilized a standard Human/Mouse/ Rat kit instead of the RiboZero Gold.
Looking at the non-rRNA reads in each sample, the vast majority align to protein coding genes based on the ENSEMBL annotation. All samples show > 60% protein coding with most over 80% and the Clontech RiboGone kit having the largest fraction mapping (Fig. 3a). Most of the samples identified~14,000 genes expressed at greater than one RPKM and~16,000 at over 0.1 RPKM (Fig. 3b). A single site using the Takara/Clontech SMAR-Ter Pico kit did show a somewhat reduced number of genes, which was associated with a lower library complexity observed from that site. Antisense mapped reads and reads mapping to the signal recognition particle RNAs (SRPs) were the most variable aspect of each sample though the source of these differences were unclear as they varied widely from site to site.
While the total number of protein coding genes detected was quite similar, many genes appear to be detected at significantly different rates. To better understand this observation, we performed differential gene expression analysis on the intact RNA samples that passed our QC metrics, comparing each preparation back to Illumina RiboZero Gold (Fig. 3c). The Qiagen rRNA depletion kit was excluded as only two replicates passed these criteria. Hierarchical clustering of the differentially detected genes, clusters the kits first by their RNAseq library prep methodology, with all three kits prepared using the NEB Ultra II Directional RNA Library kit clustering together, followed by the Kapa RiboErase kit and finally the low input Takara/Clontech SMARTer pico. Generally, several hundred genes could be easily observed as differentially detected between each of the kits and Illumina's RiboZero (fold changes > 2, Benjamini corrected p-values < 0.001, Fig. 3d). Testing the physical properties of these differentially detected genes found that gene length appears to be a large contributor to the direction of the bias with shorter transcripts better detected by RiboZero and  . 3e). Many of the most variably detected genes across the data set are quite small, such as mitochondrial proteins and ribosomal proteins (Additional file 3: Figure S2). The libraries themselves did not display any particular size bias with the RiboZero samples having an average length distribution similar to the other kits (Additional file 4: Figure S3). Bias in GC percentage was also observed in a few samples, with the Kapa RiboErase kit having the strongest bias against high GC transcripts (though the underlying gene list is quite small, Fig. 3f ). Ribosomal depletion is a key methodology used in studying noncoding RNAs as many of these transcripts lack polyadenylation sites [5]. We focused on lincRNAs (long intervening noncoding RNAs), one type of noncoding RNAs, since they do not overlap with any protein coding or other long non-coding RNA genes. Using the ENSEMBL annotation, approximately 4% of the non rRNA reads map to lincRNAs using RiboZero (Fig. 4a). Similar numbers are observed with the SMARTer Pico kit and Kapa RiboErase. The other kits, all of which were prepared with NEB Ultra II directional RNA library kit, show less than 3% of reads mapping to lincRNAs. This global decrease in number of lincRNA reads appears to reflect a general decrease in the number of mapping reads rather than a specific bias against a subset of lincRNAs. Two lines of evidence support this conclusion. First, the number of lincRNAs detected at > = 0.01 RPM remains~3500 for all of the different kits tested and no bias is seen against the NEB prepped kits (Fig. 4b). Second, while the majority of lincRNAs detected can be assigned to 4 specific lincRNAs (MALAT1, SNORD3A, RNRP and NEAT1), the remaining fraction remains constant among the different kits, suggesting a global decrease in mapping (Fig. 4c). The precise set of lincRNAs detected varies somewhat but the core of 3200 lincRNAs are detected by all of the kits (Fig. 4d).
While comparison of different detection rates of mRNAs in UHR can point to possible differences between the kits' chemistries, the spike-in controls provide an absolute metric to evaluate their effectiveness. Both SIRV (co-degraded with the RNA) and ERCC (not degraded) control spike-ins were added to the sample before library preparation and should give an unbiased look at the behavior of the different kits. As an initial a b c d test, we examined the ratio between the two spike-in types to model the impact of degradation on efficiency of library formation. Importantly, the kits do treat degraded RNA differently in their protocols and the not degraded ERCCs in the degraded samples were processed along the path proscribed for the bulk RNA, suggesting they are under-fragmented relative to the bulk population. Examining the ratios between the ERCC and SIRVs, we find that all the intact samples show~60% SIRV reads (Fig. 5a). By comparison, the degraded samples show significant bias between the ERCCs and SIRV, generally favoring the intact ERCC. The RiboZero Gold and Kapa RiboErase kits show the least bias based on degradation, while the kits using the NEB stranded RNAseq kits showed a bias against shorter RNA fragments, which is similar to what was observed for protein coding genes (Fig. 3e). The SMARTer Pico kit is biased against the intact ERCC spike-ins in the degraded sample, which is likely due to not pre-degrading the ERCCs in the context of the degraded total RNA, emphasizing the importance of this step. Inserting not degraded spike-in controls into variably degraded experimental samples may confound the ability of these spike-ins to serve as a normalization tool, as previously observed [6,7].
Distinct transcripts are present at defined ratios in the spike-in control allowing direct visualization of over and under-representation of transcripts. Within the SIRV spike-ins, the distinct transcripts were largely at equal ratios between kits and sites, though deviation from the expected values is observed (Fig. 5b). The SMARTer Pico kit was particularly susceptible to variation, possibly due to the low total input (1:100 th of the other kits), and some transcripts (e.g. purple at 1/4×) show loss of signal in the degraded sample. By comparison, the ERCC spike-ins showed significantly more variability across sites, even within the intact RNA samples (Additional file 5: Figure S4). Overall, the Lexogen and Takara/Clontech RiboGone kits generally had the most consistent and even performance on both the ERCC and SIRV spike-in controls across their test sites for intact samples.
Using the transcript diversity within the SIRV spike-in control, we further examined the general trends observed with respect to transcript length, GC content, and abundance. The 50 highest expressed SIRV transcripts from the intact samples were individually quantified using STAR/RSEM and the TPMs normalized to the expected frequency of the transcripts. We then compared the normalized frequencies for all of the a b c d e f Fig. 5 Effect of rRNA Depletion Chemistry on Spike-In Controls: a) Effect of degradation of SIRVs on the ratio of SIRV reads to ERCC reads in each replicate. Percent of reads mapping to SIRVs out of total reads mapping to Spike-ins in shown. Data sets ordered by intact/degraded status followed by site within each kit left to right. b) Relative ratio of reads mapping to SIRV1. Fraction of reads mapping to each isoform of SIRV1 are shown for each replicate. Expected fractions shown as dark horizontal lines. Light horizontal lines show 2-fold changes in fraction observed (log scale). Each SIRV1 isoform is shown in a different color. Replicates ordered as in a. c) Boxplot of normalized transcripts per million (TPM) for subsets of SIRVs. SIRVs present at 4×, 1× and 1/4× in the pool were randomized and normalized TPM scores for all sites for two sets of 10 SIRVs were plotted across different chemistries using intact samples. 5 th , 25 th ,Median, 75 th and 95 th percentile are shown. d) Boxplot as in c except red boxplot highlights shortest quintile (len < 480 nt) and green includes longest quintile (len > 2200). e) Boxplot as in c except red boxplot highlights lowest GC quintile (< 36% GC) and green includes highest quintile (> 44.5% GC). f) Boxplot as in c except red highlights random set of 10 transcripts included at 4× while green includes set of 10 transcripts at 1/4× transcripts highest and lowest quintile for different parameters to look for differences. Randomly selected transcripts show minimal differences (Fig. 5c), nor do we observe biases when the transcripts are sorted by GC percentage or amount of expected transcript (Fig. 5e, f). Transcript length does appear to show a bias, with the longest 10 transcripts (len > 2200 nt) showing lower levels of detection than the shortest transcripts (len < 480 nt, Fig. 5d) and three of the poorest detected transcripts in the SIRVs are in the set of longest transcripts. However, the overall distribution remains very broad and size alone is not predictive of the behavior of the transcript.
Discussion rRNA depletion methodologies have expanded significantly in the last year. While differences exist between the rRNA depletion chemistries tested, all of the kits tested were able to successfully remove a significant amount of the rRNA in library preparations. The bead depletion chemistries were the most challenging to consistently implement successfully and struggled to remove rRNAs in the degraded RNA samples, possibly due to inefficiencies in hybridization with the degraded material. All of the kits, including the low-input ZapR based kit from Takara/Clontech, detected~14,000 transcripts at > 1 RPKM.
With the broad success of the different chemistries, other aspects of library preparation, such as cost and ease of use, can be considered. Participating sites were surveyed to collect feedback regarding ease of use and previous experience with each kit that was tested (Table 1).
While all sites were familiar with the RiboZero depletion kit, the majority of sites had no prior experience with the other kits. The participating sites generally reported high comfort levels with all the kits. Interestingly, the comfort level with the different kits did not correlate with success as the sites with the lowest comfort levels for the NEB kit and ZapR based kit from Takara/Clontech both showed very good performance. This may be due to the robustness of the specific methods or the quality of the written protocols.
While we did observe differences in the efficiency of library preparation, analytics remains a key caveat. For this study we used STAR/RSEM/DESEQ [8][9][10] for the analysis of the transcript levels, but different informatics tools may have more or less ability to handle the variations between the different chemistries and to model the spike-in controls. The combination of defined control samples with single transcript and spliced spike-ins provides an opportunity to use this data in the evaluation of different algorithmic approaches without overfitting to a single site or single type of chemistry. The number of RNAseq algorithms continues to multiply and finding the most appropriate methodology remains challenging. We believe this data set will provide a unique opportunity to better characterize the strengths and challenges of not only the depletion chemistries, but the RNAseq analysis algorithms as well.

Conclusions
We evaluated the performance of seven different commercially available rRNA depletion kits. While all kits were able to successfully remove rRNA in at least in some instances, there were marked differences in crosssite reproducibility, ease of use, and biases associate with sample quality. These results will provide important guidance for researchers when selecting the most appropriate rRNA depletion kit for their specific project aims.

Input RNA preparation
Intact and degraded input RNA was prepared and aliquoted at a single site. The Universal Human Reference RNA (Agilent) was diluted to 500 ng/ μL in 200 μl of RNase-free water and 3.94 μl of the Spike-in RNA Variant Control E2 Mix (Lexogen) were added. The sample was split into two aliquots, one of which was then heated at 94°C on an Eppendorf™Thermomixer for 1 h and 27 min. 1 μl of ERCC RNA Spike-In Mix 1 (Ther-moFisher Scientific) was added to both the intact and degraded samples before running on a Bioanalyzer 2100 RNA Nano chip (Agilent) (Additional file 1: Figure S1). The final intact and degraded RNA samples were then diluted to 25 ng/μl and were distributed to each site on dry ice for rRNA-depletion and library preparation.

Site selection and index allocation
Eleven genomics core facilities were selected from among the members of the Association of Biomedical Research Facilities membership who volunteered to take part in this study. All of the participating cores routinely perform RNAseq library preparation for laboratories at their institutes and each site prepared between one and four library types. Kits were assigned to each site to minimize overlapping sets with the exception of the Takara/Clontech and Illumina kits. All four Takara/ Clontech sites preformed both the Ribogone (CR) and SMARTer Pico (CZ) kit to minimize shipping costs. Similarly, Illumina RiboZero (RZ) sites were selected to minimize shipping of the donated reagents. Sites were not selected based on prior experience with specific chemistries. Each chemistry was tested by four sites, named 1-4. The same library creation site will not necessary have the same site number for different chemistries. Indices were assigned by the group to prevent overlapping among libraries. Note the Qiagen chemistry is no longer available for purchase.

Protocol normalization
For each chemistry, a conference call was arranged for each of the participating sites and a vendor representative to review the protocol in detail prior to library preparation. During the course of the phone call, the protocol was reviewed in detail and any required modifications or variables (eg., cycle number) were discussed and agreed on by all users. These details are reported in the Additional file 2.

rRNA depletion and library construction
Each site performed rRNA depletion and subsequent library prep following the vendor protocols (Additional file 2: Table S1). Kits, excluding Illumina, were supplied from a single manufacturer lot. Input RNA concentrations, fragmentation conditions and PCR cycles for intact and degraded RNA samples for each kit were discussed with the vendors and can be found in Additional file 2: Table S2 for each kit. Completed libraries were quantified by Qubit or equivalent and run on a Bioanalyzer or equivalent for size determination. Libraries were pooled and sent to a single site for final quantification by Qubit fluorometer (ThermoFisher Scientific), TapeStation 2200 (Agilent), and RT-qPCR using the Kapa Biosystems Illumina library quantification kit. Libraries were pooled for each run on a NextSeq 500.

Sequencing
Sites were instructed to make an equimolar pool of libraries from each kit using site-specific quantification and pooling SOPs and return each pool along with individual un-pooled libraries to the designated sequencing site. The sequencing site quantified each pool by Qubit fluorometer, Agilent TapeStation, and qPCR using the Kapa Illumina quant assay. Library pools were multiplexed and sequenced over three high output paired-end 75 bp runs on the Illumina NextSeq 500 to achieve sufficient read depth for analysis (See Additional file 2).

Alignment and quality control
Reads were aligned against hg19 using bwa-mem v. 0.7.12-r1039 with flags -t 16 -f [11]. Mapping rates, fraction of multiply-mapping reads, strandedness (Fig. 2b), number of unique 20-mers at the 5′ end of the reads, insert size distributions (Additional file 4: Figure S3) and fraction of nuclear ribosomal RNAs (Fig. 2a) were calculated using dedicated perl scripts and bedtools v. 2.25.0 [12]. In addition, each resulting bam file was randomly down-sampled to one million aligned reads and read density across genomic features were estimated for RNA-Seq-specific quality control metrics. Samples with < 10 reads or > 50% rRNA were eliminated from additional analysis and were not included in later sequencing pools. 50% rRNA was selected as the threshold as for samples with less than 50% rRNA, achieving sufficient read depth is faster and less expensive by providing additional sequencing depth rather than repreparation of the library. This number, however, is significantly higher than most participating sites would consider successful. Sample reads from all runs were concatenated before final RNA analysis.

Differential representation analysis
Libraries and kits were compared against RiboZero samples using a standard differential expression framework to identify genes with differential representation by the various chemistries. Briefly, intact samples from different sites were treated as replicates and differential expression was performed in the R statistical environment (R v. 3.2.3) using Bioconductor's DESeq2 package on the protein-coding genes only [9]. Dataset parameters were estimated using the estimateSizeFactors(), and estimateDispersions() functions, and differential expression based on a negative binomial distribution/Wald test was performed using nbinomWaldtest() (all packaged into the DESeq() function), using the kit type as a contrast. Fold-changes, p-values and Benjamin-Hochberg-adjusted p-values (BH) were reported for each gene. Genes with BH < 0.001 and absolute foldchanges greater than 2 were considered for downstream analyses (Fig. 3c-d).

lincRNA analysis
After removing lincRNAs that were not assigned to chromosomes, 7251 lincRNAs were identified from the ENSEMBL annotation for further analysis. The lincR-NAs with mean RPM > =0.01 were retained for each kit. To examine the distribution of different lincRNAs in each of the samples, the percentage RPM was calculated. Except the top 4 lincRNAs, rest of the lincRNAs were classified as 'Others' category. The figures were generated using a custom python script. Based on the ribo-depletion methods, the seven kits were grouped into three sets: RLQ (includes RZ, LX, and Q), NCK (includes NE, CR, K) and CZ. The lincRNAs detected in three sets were compared using an Euler diagram created using Venny (http://bioinfogp.cnb.csic.es/tools/venny/).

Genomic features analysis and visualization
Gene lengths were retrieved from RSEM outputs. GC content was calculated on the longest annotated isoform of each protein-coding gene. Box-plots were generated using Spotfire (Tibco) and TreeView (Fig. 3e-f ).
-Relative ratio of reads mapping to ERCCs. Fraction of reads mapping to each ERCC mRNA is shown for each replicate. Light horizontal lines show 2-fold changes in fraction observed (log scale). Each expected concentration is shown in a different color. Data sets ordered by intact/ degraded status followed by site within each kit left to right. Table S1. Catalog numbers and manual versions for protocols used. Table S2.