Efficient depletion of ribosomal RNA for RNA sequencing in planarians

Background The astounding regenerative abilities of planarian flatworms prompt steadily growing interest in examining their molecular foundation. Planarian regeneration was found to require hundreds of genes and is hence a complex process. Thus, RNA interference followed by transcriptome-wide gene expression analysis by RNA-seq is a popular technique to study the impact of any particular planarian gene on regeneration. Typically, the removal of ribosomal RNA (rRNA) is the first step of all RNA-seq library preparation protocols. To date, rRNA removal in planarians was primarily achieved by the enrichment of polyadenylated (poly(A)) transcripts. However, to better reflect transcriptome dynamics and to cover also non-poly(A) transcripts, a procedure for the targeted removal of rRNA in planarians is needed. Results In this study, we describe a workflow for the efficient depletion of rRNA in the planarian model species S. mediterranea. Our protocol is based on subtractive hybridization using organism-specific probes. Importantly, the designed probes also deplete rRNA of other freshwater triclad families, a fact that considerably broadens the applicability of our protocol. We tested our approach on total RNA isolated from stem cells (termed neoblasts) of S. mediterranea and compared ribodepleted libraries with publicly available poly(A)-enriched ones. Overall, mRNA levels after ribodepletion were consistent with poly(A) libraries. However, ribodepleted libraries revealed higher transcript levels for transposable elements and histone mRNAs that remained underrepresented in poly(A) libraries. As neoblasts experience high transposon activity this suggests that ribodepleted libraries better reflect the transcriptional dynamics of planarian stem cells. Furthermore, the presented ribodepletion procedure was successfully expanded to the removal of ribosomal RNA from the gram-negative bacterium Salmonella typhimurium. Conclusions The ribodepletion protocol presented here ensures the efficient rRNA removal from low input total planarian RNA, which can be further processed for RNA-seq applications. Resulting libraries contain less than 2% rRNA. Moreover, for a cost-effective and efficient removal of rRNA prior to sequencing applications our procedure might be adapted to any prokaryotic or eukaryotic species of choice.


4
Background 69 Freshwater planarians of the species Schmidtea mediterranea are well known for their 70 extraordinary ability to regenerate. This ability is supported by the presence of a large population of 71 adult pluripotent stem cells, termed neoblasts [1]. Neoblasts are capable of producing all planarian 72 cell types [2]. Moreover, they preserve their potency over the whole lifespan of the animal, which 73 seems to be infinite [3]. Therefore, planarians embody an excellent model to study regeneration, 74 aging and stem cell-based diseases. The phylum Platyhelminthes, to which S. mediterranea belongs 75 to, includes multiple other members showing varying degrees of regenerative abilities. While some 76 freshwater species (e.g. Dugesia japonica and Polycelis nigra) are capable to restore their body from 77 any tiny piece [4,5], others (Procotyla fluviatilis) have limited anterior regeneration abilities [6]. 78 Altogether, the ability to regenerate is not solely based on the presence of pluripotent stem cells, but 79 represents a complex interplay between different signaling pathways. The underlying changes in gene 80 expression therefore need to be studied using transcriptome-wide techniques like RNA sequencing. 81 For any informative RNA-seq library preparation, ribosomal RNA, comprising >80% of total 82 RNA, has to be removed. To achieve this goal two strategies can be pursued: either polyadenylated 83 (poly(A)) RNA transcripts are enriched or rRNA is removed. Both approaches have advantages and 84 limitations. On the one hand, the enrichment of poly(A) transcripts ensures better coverage of coding 85 genes compared to ribodepleted samples, when sequenced to similar depth [7]. However, this 86 advantage is outweighed by the loss of transcripts lacking poly(A) tails, which include preprocessed 87 RNAs, a large share of all non-coding RNAs, such as enhancer RNAs and other long non-coding RNAs. 88 In addition, long terminal repeat (LTR) retrotransposons and various intermediates of endonucleotic 89 RNA degradation are lost during poly(A) selection [8][9][10][11][12][13]. Furthermore, most prokaryotic RNAs lack 90 poly(A) tails, making rRNA depletion crucial for the study of bacterial transcriptome [14]. 91 Here, we describe a probe-based subtractive hybridization workflow for rRNA depletion that 92 efficiently removes planarian rRNA from total RNA. The protocol can be applied to input as low as 100 93 ng total RNA, which corresponds to 100,000 FACS-sorted planarian stem cells (X1 population) [15,16]. 94 Moreover, the DNA probes developed for S. mediterranea were successfully used for the removal of 95 ribosomal RNA in related planarian species of the order Tricladida. The rRNA removal workflow 96 presented here is also easily adapted to other organisms, as demonstrated by the removal of rRNA 97 from Salmonella typhimurium total RNA using organism-specific probes. To deplete ribosomal RNA from planarian total RNA, we chose to develop a protocol based on 118 the hybridization of rRNA-specific biotinylated DNA probes to ribosomal RNA and the capture of the 119 resulting biotinylated rRNA-DNA hybrids by use of streptavidin-coated magnetic beads ( Figure 1A). To 120 that end, we synthesized a pool of 88 3'-biotinylated 40-nt long DNA oligonucleotide probes (siTOOLs 121 Biotech, Martinsried, Germany). We chose probes with a length of 40 nucleotides since their melting 122 temperature in RNA-DNA hybrids was shown to be 80±6.4 °C in the presence of 500 mM sodium ions 123 [17]. This would allow probe annealing at 68°C in agreement with general probe hybridization 124 temperatures [18]. The probes were devised in antisense orientation to the following planarian rRNA 125 species: 28S, 18S type I and type II, 16S, 12S, 5S, 5.8S, internal transcribed spacer (ITS) 1 and ITS 2 126 To assess RNA quality and the efficiency of rRNA removal we used capillary electrophoresis 128 (Fragment Analyzer, Agilent). The separation profile of total planarian RNA only shows a single rRNA 129 peak at about 1500 nucleotides (nts) ( Figure 1B). This single rRNA peak is the result of the 28S rRNA 130 being processed into two fragments that co-migrate with the peak for 18S rRNA [19]. Planarian 28S 131 rRNA processing usually entails the removal of a short sequence located in the D7a expansion 132 segment of 28S rRNA. The length of the removed fragment thereby varies from 4 nts to 350 nts 133 between species (e.g. for Dugesia japonica 42 nts are removed) [19]. Intriguingly, a similar rRNA 134 maturation process was observed in particular protostomes, in insects such as D. melanogaster and in 135 other platyhelminthes [19][20][21]. In addition to this 28S rRNA maturation phenomenon, S. mediterranea 7 possesses two 18S rDNA copies that differ in about 8% or their sequence. However, only 18S rRNA 137 type I was reported to be functional and predominantly transcribed [22,23]. 138 As a first step during rRNA removal all 88 DNA probes were annealed to total planarian RNA. 139 Since RNA molecules are negatively charged, the presence of cations facilitates the annealing of     polyadenylation sites were detected on 16S rRNA. Therefore, we speculate that upon folding of the 235 16S rRNA stretches of A nucleotides become exposed and facilitate the interaction with oligo-dT 236 beads during transcript poly(A) selection. 237 We next assigned the analyzed datasets to the planarian genome. In ribodepleted libraries 238 more than 13% of all mapped reads were assigned to intergenic regions, compared to 7% -10.5% for 239 poly(A)-enriched ones ( Figure 3D). In addition, the percentage of unmapped reads was higher in 240 ribodepleted libraries and constituted about 17.6%, which is on average 2.4% more than in poly(A) 241 datasets. We speculate that for ribodepleted libraries the proportion of reads mapping to intergenic 242 regions will increase in future once complete assemblies of the planarian genome are available. which varied considerably from 13 to 64 million of mapped reads ( Figure 3F). 250 Next, to estimate the correlation between ribodepleted and poly(A) libraries, we calculated 251 their Pearson correlation coefficients ( Figure 3G). We found the highest Pearson correlation between 252 ribodepleted libraries and polyA B2 samples (R=0.94, p < 2.2e-16) ( Figure 3F). This could be due to 253 their similar sequencing depth compared to the other polyA libraries. The transcripts whose 254 abundance was most significantly affected by poly(A) selection were found to be histone mRNAs that 255 are known to lack polyA tails ( Figure 3G

Non-specific depletion of coding transcripts in ribodepleted libraries 269
In using custom ribodepletion probes, our major concern was that the utilized probes would 270 lead to unspecific co-depletion of planarian coding transcripts. To exclude this possibility, we first 271 mapped our pool of 88 DNA probes in antisense orientation to the planarian transcriptome allowing 272 up to 8 mismatches and gaps of up to 3 nts. This mapping strategy requires at least 75% of a DNA 273 probe to anneal to its RNA target. It resulted in only 11 planarian genes to be potentially recognized 274 by 20 DNA probes from our oligonucleotide pool. Next, we carried out a differential expression 275 analysis of these 11 potentially targeted transcripts between the ribodepleted libraries and poly(A)-276 selected ones. The analysis revealed that 9 out of 11 potential targets were downregulated at least 1-277 fold in at least two poly(A) experiments ( Figure 4A). As the abundance of three transcripts 278 respectively) the probes were predicted to map to loci that do not exhibit RNA-seq coverage ( Figure  288 4C, Supplemental Figs. S1C, S1D). The likely reason for this is inaccurate gene annotation. 289 Alternatively, target regions might represent repetitive, multimapping sequences, which we excluded 290 during read mapping. Taken together, our off-target analysis revealed that a maximum of 11 genes 291 might be affected by our rRNA removal procedure -a very low number that underscores the 292 specificity and efficiency of our depletion protocol. 293 294

Applicability of the described ribodepletion method to other organisms 295
To demonstrate the applicability of the developed rRNA workflow to other organisms, we 296 employed our protocol to the depletion of ribosomal RNA from Salmonella typhimurium using a pool 297 of organism-specific DNA probes (riboPOOL) developed by siTOOLs Biotech (Martinsried, Germany) 298 ( Figure 5A). We compared the libraries resulting from the application of our newly developed 299 procedure to the established rRNA depletion workflow that utilizes the Ribo-Zero rRNA Removal Kit 300 (Bacteria) from Illumina. Removal of rRNA from a S. typhimurium sample using riboPOOL probes was 301 comparably successful as a depletion reaction using Ribo-Zero, leaving as low as 3.4% rRNA in the final 302 library ( Figure 5A). Moreover, an overall comparison of gene expression levels showed a high 303

correlation (Pearson correlation > 0.98) between riboPOOL depleted libraries and libraries prepared 304
with the Ribo-Zero kit ( Figure 5B). Taken together, the rRNA depletion workflow described in this 305 manuscript is robust and easily applicable to any bacterial and eukaryotic species of choice utilizing 306 organism-specific probes. 307 308

Discussion 309
For samples from typical model organisms, such as human, mouse and rat, there are 310 numerous commercial kits available for the removal of rRNA, e.g. NEBNext from New England Biolabs, 311 RiboGone from Takara and RiboCop from Lexogen. This also applies to typical gram-positive and 312 gram-negative bacteria (MICROBExpress from Thermofisher and Ribominus from Invitrogen). 313 Moreover, these kits can be utilized with a certain degree of compatibility for the depletion of rRNA in 314 organisms of distant phylogenetic groups (e.g. RiboMinus Eukaryote Kit for RNA-Seq, Invitrogen). Over and above, we found ribodepleted libraries to contain additional information on histone mRNAs, 326 transposable elements (of which many do not have polyA tails) and transcriptional intermediates. This 327 highlights that ribodepletion better retains the dynamic information of transcriptomes. As more and 328 more co-translational decay pathways are being discovered [48], understanding the dynamic of RNA 329 degradation will become more important in the near future. Moreover, by successfully depleting 330 rRNA from other freshwater triclad species, we could demonstrate the versatility of the DNA probes 331 designed for S. mediterranea. Last, we validated the efficiency of the developed workflow by removal 332 of rRNA in the gram-negative bacterium S. typhimurium. Therefore, the proposed workflow serves as 333 an efficient and cost-effective method for rRNA depletion in any organism of interest. 334 335

Conclusions 336
This study describes an rRNA depletion workflow for the planarian model system S. 337 mediterranea and related freshwater triclads. It is based on the hybridization of 40-mer biotinylated 338 DNA oligos to ribosomal RNA followed by the subtraction of formed DNA-RNA hybrids. The protocol is 339 very robust and ensures the efficient removal of rRNA even from low input total RNA. Moreover, we 340 suggest the general applicability of the presented workflow to any prokaryotic or eukaryotic 341 organisms by using organism-specific pools of probes.