- Methodology article
- Open Access
Comprehensive assessment of multiple biases in small RNA sequencing reveals significant differences in the performance of widely used methods
BMC Genomics volume 20, Article number: 513 (2019)
RNA sequencing offers advantages over other quantification methods for microRNA (miRNA), yet numerous biases make reliable quantification challenging. Previous evaluations of these biases have focused on adapter ligation bias with limited evaluation of reverse transcription bias or amplification bias. Furthermore, evaluations of the quantification of isomiRs (miRNA isoforms) or the influence of starting amount on performance have been very limited. No study had yet evaluated the quantification of isomiRs of altered length or compared the consistency of results derived from multiple moderate starting inputs. We therefore evaluated quantifications of miRNA and isomiRs using four library preparation kits, with various starting amounts, as well as quantifications following removal of duplicate reads using unique molecular identifiers (UMIs) to mitigate reverse transcription and amplification biases.
All methods resulted in false isomiR detection; however, the adapter-free method tested was especially prone to false isomiR detection. We demonstrate that using UMIs improves accuracy and we provide a guide for input amounts to improve consistency.
Our data show differences and limitations of current methods, thus raising concerns about the validity of quantification of miRNA and isomiRs across studies. We advocate for the use of UMIs to improve accuracy and reliability of miRNA quantifications.
Research of miRNA expression has been instrumental in identifying miRNAs involved in development and diseases , and identifying expression-signatures for use as biomarkers [2,3,4]. Small RNA sequencing (sRNA-seq) allows for detection of novel miRNAs and altered canonical miRNA sequences, termed isomiRs [5,6,7]. These miRNA isoforms are produced by many mechanisms, including shifts in Drosha and Dicer cleavage sites of the pri- and pre- miRNA sequence, as well as trimming by exoribonucleases, additions of bases by nucleotidyl transferases, and RNA editing by adenosine deaminase acting on RNA (ADAR) enzymes . isomiRs show differences in stability, localization, and functionality. Some isoforms can even regulate alternative repertoires of mRNAs as compared to the canonical sequence [6, 8,9,10,11,12]. Recent research indicates that they are of clinical importance for many diseases and conditions, including cancer , diabetes , and Huntington’s disease . Despite the enhanced capability to detect such sequences, there has been little assessment of the quantification of isomiRs using sRNA-seq; and miRNA quantifications using this method are often inconsistent across studies . This is likely in part due to differences between methods and/or variation in the detection by individual methods  (from library preparation to preprocessing to normalization, etc.). Furthermore, several aspects of sRNA-seq can lead to the preferential quantification of some miRNAs, while other miRNAs may be poorly detected or not detected at all, thus introducing biases that lead to misrepresentations of true miRNA expression levels . Evaluations and comparisons of the accuracy (how close measurements are to the truth) and consistency (how close measurements are across replicates) associated with current methods are critical for proper cross-study interpretation and for guiding methodological improvement.
Evidence suggests that biases and inconsistencies in sRNA-seq based quantifications and group comparisons are largely based on study design and library preparation methods . The details of these issues have been reviewed elsewhere [16, 19,20,21,22,23,24,25]. Some of these issues are avoidable with proper study design. However, bias and inconsistency related to adapter ligation, cDNA synthesis, and amplification may principally be dependent on library preparation and preprocessing methods, which are less readily controlled.
A considerable number of studies have evaluated adapter ligation bias in quantifications from several commercially available kits [24, 26,27,28]; however, to our knowledge, only one study has directly compared the performance of randomized adapter methods and adapter ligation-free methods . Furthermore, limited studies have investigated the influence of reverse transcription or amplification bias in sRNA-seq [29,30,31,32] and no study to date has evaluated the use of unique molecular identifiers (UMIs) in order to identify and remove duplicate reads to mitigate such biases in sRNA-seq biological samples. While amplification bias is associated with variations in sequence length and GC content , and while the use of UMIs has proven to be useful in mitigating amplification bias for traditional RNA sequencing [34, 35] and has become increasingly popular, the exact implementation of this method is still unresolved in the RNA sequencing field at large . It remains unclear if UMIs would also be useful for small RNA sequencing as this has not been explored to date. The consensus in the field about PCR amplification bias in small RNA has been divided. One view holds that amplification bias appears to be minimal in sRNA-seq. They argue this because small RNA sequences are very similar in size, studies show that bias in sRNA-seq largely appears to be due to adapter ligation bias [18, 37, 38], and studies show no distortion of quantification results with excessive numbers of PCR cycles compared to more reasonable numbers of cycles [31, 32]. However, others suggest that amplification bias in sRNA-seq could also introduce bias as it does in traditional RNA sequencing (10), especially given the large range of GC content among miRNAs. While a couple of studies have used UMIs in sRNA-seq [29, 30], only one has evaluated the reproducibility of sRNA-seq quantifications obtained from utilizing UMIs to those without , in which the authors concluded that biological technical replicates had less variation when UMIs were used to remove duplicate reads compared to when either all or no duplicates were removed. However, no statistical tests were performed in this assessment, and no evaluation of the influence of the UMI deduplication on the accuracy of the quantifications was performed. In the sRNA-seq literature there has also been little assessment of the influence of starting amount on the consistency of quantifications. While RNA editing detection has been evaluated , other aspects of isomiR quantification have not yet been performed.
To complete the gap left by previous studies, we comprehensively evaluated bias among miRNA and isomiR quantifications from four commercially available library preparation methods, as well as those obtained following the removal of duplicate reads using UMIs. We also evaluated the consistency of the results using a variety of starting amounts. We assessed the similarity of the quantifications from each method, the diversity of the detection of different types of small RNAs by each method, as well as the accuracy and the consistency of the results obtained from each method within and across batch. Such evaluations are critical for optimizing sRNA-seq methods to obtain both reliably consistent and accurate results across batches and studies, and to therefore allow for more accurate and reproducible miRNA and isomiR quantifications in disease states and conditions. Based on these results, we offer suggestions for future study designs.
In this study we evaluated the influence of several potential sources of bias and inconsistency on miRNA quantifications (Fig. 1a) by comparing the performance of four commercially available kits including two that are designed to mitigate adapter ligation bias in different ways (Fig. 1b) and two preprocessing methods including one to mitigate reverse transcription and adapter ligation bias and a control for comparison (Fig. 1c), as well as various starting amounts (100 ng to 2000 ng) for each method to determine the reliability of results achieved with smaller inputs (Fig. 1d). The following library preparation kits were compared: 1) the Clontech SMARTer smRNA-Seq Kit for Illumina, now owned by Takara Bio (Clontech), which incorporates adapter and index sequences during reverse transcription and amplification and is therefore ligase-free to avoid adapter ligation bias; 2) the Bioo Scientific NEXTflex Illumina Small RNA Sequencing Kit v3 (NEXTflex), now owned by Perkin Elmer and called NEXTFLEX, which utilizes adapter sequences with random nucleotide sequences adjacent to the miRNA binding location giving each miRNA a variety of adapter sequences to bind to avoid adapter ligation bias; 3) the Illumina TruSeq Small RNA Library Prep Kit (Illumina); and 4) the New England BioLabs Next Multiplex small RNA kit (NEB). Based on our literature search Illumina and NEB sRNA-seq kits appear to be the first and second most widely used kits to date, respectively. The NEB and the NEXTflex kits include polyethylene glycol in an effort to reduce adapter ligation bias by improving overall ligation efficiency.
We also evaluated the influence of reverse transcription and amplification bias by utilizing the random sequences within the adapters of the NEXTflex kit (that are added prior to the cDNA synthesis and PCR amplification steps), as UMIs. These UMIs allow for the removal of duplicate reads introduced during amplification and possible mitigation for sequences that may have been preferentially reverse transcribed (Fig. 1c). We will hereinafter refer to these data as “Deduped”. To determine if differences identified between the Deduped data and the NEXTflex data were simply due to a reduction in the number of reads (as the UMI-based deduplication process reduces the data down to 5% of the original), we also included a random 5% subset of the NEXTflex reads, hereinafter referred to as “Fivepercent,” for comparison.
We evaluated two types of samples (Fig. 1d-e) and processed the data following the methods outlined in Fig. 1e. See the methods for more details of our experimental approach. We then evaluated several questions shown in Fig. 1f about the similarity of the quantifications obtained from the 6 tested methods (Fig. 1b), the accuracy of those quantifications using synthetic miRNAs in equimolar concentration, the ability of each method to detect a variety of miRNAs and isomiRs, and the consistency of the quantifications by each method of technical replicates within the same batch and across two batches.
Overall quantifications are similar, yet results for individual miRNAs are quite divergent across methods
We first performed a general evaluation of the similarity of the resulting miRNA quantifications from each method (Fig. 1b) and of major contributors to overall variability using the data derived from the same human brain sample across technical replicates (Fig. 1f.Similarity). Hierarchical cluster analysis indicated that the samples generally clustered by method and starting amount (Fig. 2a). Differential analysis of the miRNA expression estimates revealed that the methods (Fig. 1b) produce overall relatively similar results, however some individual miRNAs showed very different quantifications with intensity ratios ranging as extreme as − 9 to 6 (Fig. 2b).
Evaluating the top 20 abundant miRNAs from each method (Additional file 1: Table S1), only 6 miRNAs (30%) overlapped across all methods (however the top 20 for Fivepercent were identical to the top 20 from the raw full NEXTflex data). Thus, emphasizing only the most abundant miRNAs for further study may be problematic. The overlap between the most abundant miRNAs detected by Clontech and the other methods was lower (45 to 55%) than the overlap between Illumina, NEB, and NEXTflex (60–65%). The Deduped method resulted in an 85% overlap with the raw NEXTflex data.
Sum of squares analysis revealed that method choice was the largest contributor to miRNA count variability (on average 82% variance explained for individual miRNAs (Fig. 2c) when evaluating the data from all methods (excluding the Fivepercent control). This further exemplifies the lack of consistency in quantifications that may occur when different methods are utilized.
The reduction of numerous biases improves accuracy
To assess the accuracy of each method (Fig. 1f.Accuracy), we investigated how consistently each kit detected 962 equimolar synthetic miRNA sequences. We calculated the difference of each miRNA count from the mean count for all miRNAs for each method, which we called “accuracy error”. The six methods showed significant differences in accuracy (F = 40.00, p < 2.2e-16). The Deduped data had significantly less accuracy error compared to all other methods (up to ≈ 8% less error), followed by comparable accuracy for Clontech, NEXTflex, and Fivepercent methods (which did not significantly differ from one another in post hoc analysis), and worse accuracy for the NEB and Illumina methods. This suggests that the Illumina and NEB methods detect different sequences with less validity than the other methods. This was expected, given the known adapter ligation bias associated with these methods. Our results suggest that the methods utilized by the Clontech and NEXTflex kits both diminish bias – we speculate that this is due to a reduction in adapter ligation bias as compared to the Illumina and NEB methods. This is based on findings of previous evaluations of adapter ligation bias  demonstrating that randomized adapters, such as those used in the NEXTflex protocol, reduce adapter ligation bias and the fact that the Clontech method is adapter ligation free. However, mitigation of other forms of bias may also contribute to the improved accuracy. Using UMI sequences for deduping resulted in additional error reduction (the raw NEXTflex data had 2.81% more error) (Fig. 3a and Additional file 2: Table S2), which may be due to a reduction in reverse transcription and/or amplification bias. This is consistent with our analysis of the overall variance of the counts for these synthetic sequences (Fig. 3b). The concordance of the rank of the sequences with higher accuracy error across the methods was poor (data not shown), suggesting that different sequences were prone to bias for each of the methods.
We thus analyzed the overall contribution of different sequence characteristics to the variance of the count estimates of the synthetic miRNAs and found that indeed different characteristics were associated with variability for the different methods (Fig. 3c, Fig. 3d). The secondary structure free energy was highly influential for Clontech (explaining ≈ 7% of the variance), and the NEXTflex-based methods (explaining ≈ 10% of the variance for each). The identity of the last 2 bases was influential for all methods but in particular for the NEB and Illumina methods (explaining ≈ 6% of the variance for each), suggesting that adapter ligation of the 3′ end particularly introduces bias of miRNA quantifications, in agreement with previous work . The identity of the first 2 bases (5′) was most influential for Clontech and explained ≈ 8% of the variance suggesting that the SMART template-switching of the 5′ end may introduce more bias. The number of Cs within a sequence also accounted for a relatively large percentage of the variance (2.5–5% for all methods except for Clontech). Interestingly, GC content only accounted for ≈ 1% of the variance for each method. See Additional file 3: Figure S1 for more detailed information about these sequence characteristics and their influence on quantification estimates. See Additional file 4: Figure S2 for secondary structures of specific miRNAs detected above and below average for all methods, as an illustration of how secondary structure may influence quantification.
Detection of RNA classes
Libraries generated using the Clontech kit had very low miRNA mapping rates
We next assessed the percentage of reads that mapped to miRNA or other small RNA species for each of our brain-derived samples using bowtie  (Fig. 1f. Detection Diversity). We excluded the Deduped data and its control, as alignment was required to produce these data. There was a significant difference in the miRNA mapping rate of the 1000 ng starting input data across the kits (F = 108.9, p-value = 5.73e-09). The NEXTflex and NEB methods had the highest rates, while the Clontech method had a significantly the lower mapping rate, with only 1–2% of all reads mapping to miRNAs (Fig. 4a-b), as previously described  (Additional file 5: Table S3). There was a significant difference for all the tested types of RNA across the methods except for small Cajal body-specific RNAs (scaRNA) after multiple testing correction. The Clontech reads largely aligned to ribosomal RNA (rRNA) and had significantly higher rates of small nucleolar RNAs (snoRNA) and small nuclear (snRNA) mapping than the other methods, while the NEXTflex method resulted in the largest number of P-element induced wimpy testis (PIWI)-interacting RNA (piRNA) reads (Additional file 6: Table S4). All of the kits had quite consistent mapping rates across the various starting amounts (Fig. 4b). Mapping rates of the synthetic RNA were much more comparable among the methods, suggesting that the differences seen with the biological samples are largely due to differences in detection of other biological RNAs (Additional file 5: Table S3).
Detection of unique miRNAs
The deduped data and Clontech data had better detection rates
To discern if any of the methods have an advantage in detecting a diversity of unique miRNAs, we compared the detection rate of miRNA sequences (Fig. 1f. Detection Diversity). Here we define a miRNA as detected if the miRNA was present with at least 10 normalized reads in the quantifications for each of the triplicates of the 1000 ng batch 1 brain data. The number of detected unique miRNAs was highest in the Deduped data, and lowest in the Fivepercent and Illumina data (Fig. 4c), which was consistent when including the second batch (Additional file 7: Table S5). Despite the low mapping rate of the Clontech samples, the miRNA diversity detected by this kit was relatively comparable to that of the other methods tested. Since both the Deduped and the Fivepercent data also included only 5% of the total raw NEXTflex reads, both of these methods also resulted in a much lower number of reads that could map to miRNA. The similarity of the detection rates of all the methods despite the large difference in miRNA mapping rates is due to the DESeq2  normalization strategy utilized, which accounts for differences in library composition, and the high sequencing depth. An analysis of subsamples containing only 10 million, 5 million or 1 million reads of the Clontech data resulted in lower detection diversity (Additional file 8: Table S6).
Using the data from all starting amounts, there was a significant difference in the number of detected miRNAs across methods (F = 7.69, p-value = 0.00017), however pairwise comparisons were largely nonsignificant (except between Clontech and Fivepercent, p = 0.0024). There was a weak but significant positive relationship (r = 0.4, p-value = 0.027) between detection diversity and input amount (Additional file 9: Figure S3) using all methods. Thus, as anticipated, larger inputs resulted in a more diverse pool of detected unique miRNAs; however, the pool size did not differ greatly. When evaluating each kit individually, only the Deduped method had a significant (p-value = 0.009) and strongly positive relationship between starting amount and the number of unique detected miRNAs (r = 0.92).
The Clontech method was significantly worse than the others
To determine how well each method consistently detected the same miRNAs (Fig. 1f. Detection Diversity), we calculated the proportion of miRNAs detected for each sample that were not detected by the other two samples within the triplicates as a measure of detection inconsistency using the 1000 ng input data (Fig. 4d). There was a significant difference in the inconsistency of detection overall between the methods (F = 12.27, p-value = 0.0002), and although no individual contrasts between pairs of methods were significant in post-hoc analysis, the Clontech data resulted in the highest level of inconsistency, and NEB performed the best, with the lowest level of inconsistency.
Analysis of the full set of data including all starting amounts (Additional file 9: Figure S3) demonstrated a significant difference in the inconsistency across methods (F = 14.83, p-value = 1.65e-10) and starting amounts (t = − 3, p-value = 0.00257, Pearson r = − 0.31). There was significantly more inconsistency for the Clontech method compared to all other methods (up to 240%) except the Fivepercent control method (Additional file 10: Table S7). This suggests that although the Clontech level of detection may have been rescued by the high depth of sequencing, the low mapping rate may still result in much poorer consistency of detection.
Detection of isomiRs
The methods detected significantly different numbers of isomiRs – the Clontech method detected the most
We next evaluated the isomiR detection rate of each method (Fig. 1f. Detection Diversity). We define an isomiR as detected if it had greater than 100 normalized reads in all triplicates for each method of the 1000 ng input data. We observed the largest number of unique isomiR sequences in the Clontech data and the lowest in the NEXTflex data (Fig. 4e). When evaluating detection across both batches, the Clontech data remained the most diverse (with the greatest number of isomiRs consistently detected in both batches), while the Illumina method detected the lowest number of unique isomiRs (Additional file 7: Table S5). Using all the data derived from all the starting amounts, we determined that there was a significant difference in the number of isomiRs detected across the methods (F = 83.5, p-value = 8.89e-15), but not across starting amounts (Additional file 9: Figure S3). Clontech detected the largest number (up to 250% more), followed by NEB (up to 169% more) and Illumina (up to 147% more), while the NEXTflex based methods similarly detected the least (Additional file 11: Table S8).
When evaluating the consistency of isomiR detection (Fig. 4f, Additional file 9: Figure S3), there was a significant difference in the triplicate consistency of detection (F = 5.9, p-value = 0.006), but again no individual contrasts between pairs of methods were significant. For the 1000 ng input data, the Illumina data had the highest inconsistency, while the NEB data had the least.
Despite different miRNA mapping rates, all the methods capture overlapping miRNAs but very few overlapping isomiRs
We next characterized the overlap of unique miRNA sequences captured by each method (Fig. 1f. Detection Diversity). Evaluating the miRNAs consistently detected by all 1000 ng triplicates of the first batch, we determined that in general a large proportion of the miRNAs were commonly detected by all of the methods (on average 74%), and only a small fraction of miRNAs was uniquely detected by a single method (4.7% on average) (Fig. 4g). In contrast, on average only 5% of the detected isomiRs by each method overlapped those detected by all the other methods (Fig. 4h).
Detection of false positive isomiRs
All methods detected false isomiRs, but especially Clontech and NEB
To assess the possibility that the methods may result in false isomiR detections, we evaluated the presence of isomiRs in the synthetic miRNA data (which has no designed isomiRs) (Fig. 1f. Detection Diversity). False isomiRs were detected by all of the tested methods. We compared the methods based on: 1) the number of overall unique detected isomiR sequences, 2) the number of unique isomiR sequences detected for each individual synthetic sequence, and 3) the quantifications of the individual false isomiRs. There was a significant difference in the number of isomiRs detected for each synthetic sequence by the methods (F = 176.37, p-value = < 2.2e-16). The Clontech method detected more unique isomiRs overall than all the other tested methods (on average 401% more false isomiRs); 14,130 were observed for Clontech, 5021 for Illumina, 9074 for NEB, 2221 for NEXTflex, 2213 for Deduped, and 1779 for Fivepercent. The number of unique isomiRs observed for each synthetic sequence was also significantly higher. On average the Clontech method resulted in 14 isomiRs per synthetic sequence, while the NEXTflex-based methods (the raw NEXTflex data, the Deduped, and the Fivepercent) resulted in roughly 2 isomiRs per synthetic sequence (Fig. 4i, Additional file 12: Table S9). The counts observed for the individual isomiRs detected were significantly higher for NEB than all the other methods (with 60% higher expression than the isomiRs detected by the NEXTflex based methods) (Fig. 4j). The NEXTflex methods (raw, Deduped, and Fivepercent) resulted in the fewest isomiRs detected, with the fewest isomiR counts per synthetic sequences, and with the lowest expression. There was no difference between the Deduped and the raw NEXTflex data for the expression of the isomiRs or in the number detected per synthetic sequence (Additional file 12: Table S9).
Sequence feature analysis revealed that the identity of the first two bases (5′) accounted for most of the variance in the number of isomiRs detected for each synthetic sequence for the Clontech kit (accounting for nearly 9% of the variance) (Fig. 4k). Therefore, false positive isomiRs may be generated during the reverse transcription step of the library preparation for this method. This is consistent with other studies that suggest that the template-switching reverse transcription method utilized by Clontech of the 5′ end can lead to shortened miRNA transcripts in a process called strand invasion  and potentially longer miRNA transcripts due to concatamers of the template-switching oligo . In contrast, the last 2 bases (3′) accounted for the largest amount of variance of the other methods (on average 5.3%).
Consistency across batch
Illumina had the lowest consistency, while the other methods performed similarly
To determine the consistency of results obtained across batches for each method, we compared the mean of the triplicates in one batch to a second batch of a single sample of the same human brain (Fig. 1f.Consistency). Using the normalized and log transformed reads for miRNAs that were found to be detected by each kit when filtering for greater than 10 reads across all 4 samples for each kit, we calculated the distance from the mean of the two batches for each detected miRNA individually. Overall, method choice had a weak significant association with error across batch (F = 2.39, p-value =0.036). This association was driven by the batch error of the Illumina method which was significantly higher than the other methods, with up to 74% more error than other methods (Fig. 5a, Additional file 13: Table S10). NEB, NEXTflex, and the Deduped data were the most consistent across batch, with no significant difference in the performance of these methods (p > 0.05). The top miRNAs showed some level of concordance across the methods (data not shown).
Consistency across triplicates
Clontech and Illumina had the lowest consistency
We then evaluated the consistency of the triplicates (Fig. 1f.Consistency) within the 1000 ng data, by calculating the distance of each triplicate from the mean of all three triplicates. We determined that there was a significant difference across the methods (F = 36.7, p-value = < 2.2e-16). Consistency was significantly higher for the NEB and Deduped methods, while Clontech, Illumina, and the random Fivepercent had the lowest consistency (with ≈ 20–40% more error, Fig. 5b, Additional file 14: Table S11). Deduping of the NEXTflex data improved consistency. The raw data had 14% more error between triplicates.
We then calculated the triplicate consistency for each starting amount. We determined that using all starting amount data, there was still a significant difference in triplicate consistency between the methods (F = 79.7, p-value = < 2.2e-16), but there was no relationship with starting amount (Pearson r = − 0.11) (Fig. 5c). All pairs of methods were significantly different, except for the contrasts between NEB and Deduped and between Clontech and Illumina (Fig. 5c, Additional file 14: Table S11). NEB and Deduped again had the greatest consistency (up to 23% less inconsistency) and Clontech and Illumina had the least (≈17% more inconsistency).
Factors affecting consistency
The most abundant miRNAs were the most inconsistently detected for each method
To determine if any aspect of the miRNA sequences was associated with more or less consistency across batch (Fig. 1f.Consistency), we evaluated the association of various sequence factors with the batch error estimate. For each method, the expression of each individual miRNA was the largest contributor to variance of batch error (Fig. 5d-e). All methods showed a significant and positive relationship between expression and inconsistency across batch (Pearson r > = 0.83 for all methods), Fig. 5f.
Evaluating sequence characteristic associations with triplicate consistency, again, expression was the largest contributor to variance of error estimates (Fig. 5g-h) and again all methods showed a significant positive association with expression and inconsistency across triplicates (Pearson r > 0.82 for all methods), Fig. 5i.
To determine if the same miRNAs showed high error across the starting amounts or methods, we ranked the triplicate consistency error estimates and compared the ranks between the starting amounts of a given method and between methods (data not shown). The concordance of the ranks between starting amounts and methods was highest among the sequences with the highest error with roughly 40% concordance.
Consistency and its relationship to starting amount
There was no improvement in consistency beyond 500 ng of total RNA for most methods
Using data normalized and filtered for greater than 10 normalized reads for each method individually, we further evaluated the influence of starting amount on consistency across triplicates (Fig. 1f.Consistency). Overall, starting amount was significantly associated (p < 0.05) with triplicate consistency error for each method except for Illumina, which is likely due to the fact that fewer starting amounts were assessed for this method, Fig. 5c, Additional file 15: Table S12. The results suggest that a larger starting amount will generally improve consistency, see Additional file 15: Table S12 for specific guidance for each kit. For most methods the highest consistency with the lowest starting amount was achieved with 500 ng, however, 1000 ng improved the consistency of the Deduped data. The consistency was relatively similar for all the Clontech kit samples regardless of starting amount.
We report an extensive evaluation of commonly used sRNA-seq kits for their performance in identifying and quantifying miRNAs and isomiRs, as well as the results obtained with the use of a UMI and a UMI control. Our detailed analyses identify critical factors that influence their performance. Prior performance evaluations of current sRNA-seq methods have been very limited and adapter ligation bias has largely been the focus of earlier reports [26, 31, 38, 43]. Several studies have compared the NEXTflex kit with the Illumina and NEB kits [24, 26,27,28, 44], and most suggest that the NEXTflex kit offers advantages due to reduced adapter ligation bias by including randomized adapters. We compared the NEXTflex kit with the Clontech kit which was also designed to mitigate adapter ligation bias, but by using an adapter ligation-free method. Only one prior study has compared the performance of these two kits using a previous and now discontinued version of the NEXTflex kit , which demonstrated that the Clontech kit resulted in less bias, however, only 6 synthetic miRNAs were utilized in their accuracy assessment. A recent study that coincided with ours also found that these two kits perform similarly for accuracy . A similar UMI method is utilized by a recent library preparation kit by Qiagen. However, this kit was released after the data collection of our analysis. In addition, this kit, similarly to the NEB and Illumina kits, does not include methods to reduce adapter ligation bias, and the UMI is added after reverse transcription, which therefore does not allow for any reduction in bias associated with this step. The results of a recent study, which performed a similar analysis as ours, further suggest that the Qiagen kit has more bias and is less accurate than the Clontech and NEXTflex Kits .
To better assess the contributions of bias in the quantifications resulting from various library preparation designs, we have evaluated the quantifications from each method using a variety of metrics including: 1) Similarity – how similar are the quantifications across methods (Fig. 1f.Similarity); 2) Accuracy – how well does each method equally quantify different equimolar miRNAs (Fig. 1f.Accuracy); 3) Detection diversity – what capacity does each method have to capture a diverse range of unique small RNAs (Fig. 1f.Detection Diversity); and 4) Consistency – how similar are results across batches, triplicates, and different starting inputs (Fig. 1f.Consistency). Our analysis of individual sequences using the metric tests provide information about potential bias due to adapter ligation, reverse transcription, and amplification. Table 1 summarizes our results. Overall, there are clear and important differences between the methods tested and all show performance limitations in real world sRNA-seq. Based on our results, we propose a number of suggestions for future studies.
First, cross-study comparisons using different methods should be viewed with skepticism, because although the kits resulted in fairly similar results overall, quantifications of individual miRNAs, including the most abundant miRNAs, varied widely across methods (Additional file 1: Table S1, Fig. 2b). More research is required to determine how to best utilize data derived from different sRNA-seq methods for mega- and meta-analyses. We also advise against further study of only the top expressed miRNAs from a single sRNA-seq study, particularly when a more biased method is utilized, as the top observed miRNAs may not be truly among the most abundant or influential, but instead those that are preferentially observed by the method. This issue has previously been discussed at length16. It is important to note that it remains unclear how relatively abundant a miRNA needs to be to exert biological importance in different contexts.
We suggest the use of a degenerate base method, such as NEXTflex or a ligation-free method, to improve accuracy. These methods appeared to equally improve accuracy, likely due to a reduction of adapter ligation bias (Fig. 3a-b). We suggest that future small RNA studies utilize a UMI strategy, as our NEXTflex data preprocessed to account for UMI duplicates was even more accurate, reducing the overall variance of the log2 transformed and normalized quantifications by 68%, or on average the difference from the mean for each miRNA by nearly 3% (Fig. 3a-b, Additional file 2: Table S2). We speculate that our deduplication method led to such improvements due to reduced reverse transcription and/or amplification bias. Our sequence-specific analysis further indicated that secondary structure of miRNAs was one of the largest contributors to error of the Clontech and NEXTflex kits for the accuracy assessment, which appeared to be mitigated in the UMI deduped data for the NEXTflex kit (Fig. 3c-d). This suggests that the secondary structure of miRNA sequences may be particularly influential for reverse transcription and/or amplification bias, in agreement with previous work that indicates that secondary structure can indeed influence reverse transcription . More work is required to determine the extent that amplification or reverse transcription are particularly contributing to bias, and to what extent each are mitigated by the use of UMIs. Furthermore, it is unclear if the use of deduplication would improve other methods beyond the performance level of the current NEXTflex kit. However, the UMIs are inherently already included in the NEXTflex adapters, making this one of the best current options to mitigate bias in sRNA-seq.
All of the methods tested were capable of detecting a diverse range of miRNA sequences and there was a high degree of overlap in the identity of the miRNAs detected by each method (Fig. 4c-e). Therefore, any of the tested methods may be appropriate for assessments about general miRNA diversity. However, the identity of miRNAs detected by Illumina varied greatly across batch, Additional file 7: Table S5). We observed greater resolution for detection of a larger variety of miRNAs with greater sequencing depth. We did not evaluate depths above 20 million reads, so it remains unknown if even greater resolution can be achieved beyond this depth, however subsets of our 20 million depth data resulted in a reduction of diversity. We also observed that a more diverse pool was detected with larger input amounts; therefore, for the best diversity of detection, we recommend using the largest input possible.
The Clontech kit resulted in the largest percentage of reads mapping to snoRNAs and snRNAs, while the NEXTflex kit resulted in the largest percentage of piRNA mapping reads (nearly 4.2 times higher than Clontech) (Fig. 4a-b, Additional file 6: Table S4). Therefore, if these particular small RNAs are of interest, we would suggest the use of these two kits respectively. We did not evaluate the diversity of these other classes of small RNAs; however, given the results of our miRNA analysis, we predict that deeper sequencing will result in greater resolution of diversity.
We especially suggest using randomized adapter methods, such as NEXTflex, for studies involving isomiR analysis. We suggest that all isomiR studies utilize an additional method for validation such as hairpin probe based RT-qPCR methods like dumbbell PCR  or the two-tailed qPCR method  or northern blot methods , as all the utilized RNA sequencing methods resulted in the observation of false isomiRs, this is particularly important for the methods that resulted in larger numbers of false isomiRs. We acknowledge that some observed isomiRs in the synthetic data may be due to errors in the synthesis of the synthetic miRNA pool, however the differences between the methods suggests that some methods may detect more false isomiR sequences due to improper adapters or other features of the library preparation. In particular, the Clontech method resulted in the highest level (on average 401% more than the other methods), thus we do not suggest that others utilize this method for studies that aim to evaluate isomiR expression (Fig. 4h-j). Furthermore, because this method utilizes polyadenylation of the 3′ end, it is impossible to truly distinguish isomiRs that terminate with 3′ adenine bases. We speculate that some of the false isomiRs detected in the other methods may also result from PCR stutter, in which sequences with repeats or low entropy may experience deletions or expansions during PCR amplification . In all, the Deduped method resulted in the highest number of detected miRNAs with the lowest false isomiR detection (Fig. 4h-j). Therefore, of the tested methods, we suggest that the Deduped method may be the best for detecting the most diverse and reliable set of miRNAs. Further work is necessary to determine the potential sources that result in the detection of these false isomiRs and how this may influence the quantification of isomiRs in biological samples.
The Deduped method was also the most consistent for individual miRNA quantifications across triplicates within the same batch (Fig. 5b) Therefore, we suggest the use of this method for optimal consistency. In general, we particularly caution against the use of Illumina when multiple batches of sequencing will be involved in a study, as this method resulted in significantly more inconsistent results across batches relative to all the other tested methods (Fig. 5a, Additional file 7: Table S5).
An earlier analysis determined that detection consistency was poorer with much smaller starting amounts (10 ng) . Agreeing with this, our results indicate that larger starting amounts for some methods may mitigate inconsistent quantifications of miRNAs and isomiRs.
Overall, we observed the most consistent results across triplicates when utilizing 500 ng or greater of starting input. In most cases, 500 ng was sufficient, and no improvement was achieved with higher input amounts. However, the Deduped method performed best with at least 1000 ng and the Clontech method resulted in similar levels of consistency despite the use of smaller inputs (Additional file 15: Table S12). Thus, if differing starting amounts or smaller starting amounts are required, and interest in isomiRs is limited, the Clontech method may be the best choice.
Additional studies of UMI use for other library preparation methods and across biological samples are necessary to further understand the ability of UMIs to improve the consistency and reproducibility of sRNA-seq studies. Further work is also necessary to optimize the length of the UMI. In some cases, all UMIs will become saturated if a given small RNA is very highly expressed. Our calculations suggest that this UMI length is sufficient for the brain (using our current methods), in which miRNA make up a very small fraction of the total RNA and in which our data suggested that the most abundant miRNA represented only 11% of the miRNA reads. However longer UMIs may be required for tissues with greater enrichment of miRNAs or greater enrichment of other small RNAs of interest, where single RNAs may have more than 65,536 individual copies before amplification (see Additional file 16: Note S1, which refers to Additional file 17: Table S13).
In conclusion, we observed significant differences in the accuracy, detection, and consistency of the various sRNA-seq methods tested suggesting that the methods differ in terms of bias contributed by adapter ligation, reverse transcription, and amplification. Our results underscore the importance of the library preparation methods and suggest that with moderately large starting amounts, the NEXTflex kit with deduplication may produce the least biased and most consistent results within and across studies. Our results suggest that bias is introduced in sRNA-seq due to reverse transcription and/or amplification and that the use of UMIs should be considered for further optimization to mitigate these biases in future sRNA-seq studies. Additional work is needed to decipher the role of these biases in sRNA-seq in order to guide more accurate sequencing methods. Further, and perhaps most noteworthy, our results indicate that all methods may result in false isomiR detection, stressing the importance of validation assessments for isomiR studies, and that the Clontech template switching method results in the detection of substantially more false isomiRs. Therefore, we advise caution with isomiR quantifications particularly when using this method and we suggest the use of RT-qPCR [47, 48] or northern blot  analysis for validation. Ultimately, additional standardization of sRNA-seq data generation and analysis will improve our ability to understand the expression and regulatory role of these small but important RNAs in conditions and disease.
Library preparation and sequencing
Two sample types were used to evaluate the performance of the sRNA-seq methods, (Fig. 1d). To evaluate detection and consistency we used various starting amounts in triplicate of total RNA from a homogenate human brain sample, purchased from Ambion and derived from a 74-year-old Caucasian female. The cause of death of this individual was respiratory failure. To evaluate accuracy, we used 300 ng of the Miltenyi Biotec miRXplore Universal Reference equimolar pool of 962 synthetic sequences corresponding to human, rat, mouse, and virus miRNA.
Each library preparation was performed by the same two lab scientists using the same equipment. Each protocol was followed exactly as provided by the vendor for each kit. The number of PCR cycles for each sample was determined based on the recommendations of each kit for the various starting input amounts (Table 2).
Size selection using PAGE gels was recommended by three of the manufacturers (Illumina, NEXTflex, and NEB kits) and was performed for these kits for better comparisons. We used AMPure XP beads for size selection for the Clontech samples, as the vendor does not recommend PAGE gel size selection.
A Qubit Fluorometer was used to determine the concentration of the final libraries. The library preparations were sequenced using single-end sequencing on the Illumina HiSeq 3000 with the Illumina Real Time Analysis (RTA) module and the bcl2fastq2 v2.17 to generate 51 base pair reads.
Unique molecular identifier deduplication
In order to test the use of UMIs to mitigate reverse transcription and amplification bias, reads derived from the NEXTflex kit were collapsed based on random sequences of 4 bases in length contained within both the 3′ and 5′ adapter sequences as a UMI using UMI-tools  (Fig. 1c). In this method, the adapters are ligated before reverse transcription and PCR amplification, therefore allowing for the estimation of the abundance of the sequences present in the sample before these steps. In the collapsed NEXTflex data referred to as “Deduped”, only reads that contained the same pair of a unique transcript with a unique UMI were maintained, while duplicate pairs were discarded. Therefore, each unique sequence had the opportunity to bind up to 65,536 different UMIs. As a control, we compared the performance of the Deduped data to a random 5% subset of the reads, referred to as “Fivepercent”. This was necessary as only 5% of the total reads remained following the collapsing method which required a preliminary alignment step. Thus this data was also produced with the preliminary alignment step, all preprocessing was the same except for the use of UMI-tools .
We utilized an in-house script to extract the degenerate bases from the adapters to determine the UMI sequence for each read and to add it to the identifier line of the FASTQ files for each sample. In this script we also removed reads which contained any unknown bases within the UMI. We then used bowtie  (v1.2.2) with a seed length of 15 allowing for 2 mismatches to produce a liberally aligned bam file to be used with UMI-tools  for deduping in order to retain miRNA isoforms. With this liberal alignment, various isoforms are aligned to reference miRNAs and thus if any are paired with the same UMI as the reference or other isoforms, they are considered as a duplicate, thus greatly reducing the concern brought up by Sena et al., 2018  about similar inserts being considered different sequences despite pairing with the same UMI instead of being collapsed together. We utilized the directional method in UMI-tools to remove duplicate reads from the bam file, which is particularly stringent for correcting sequencing errors in the UMI sequences themselves . We then converted the bam file to a FASTQ file for alignment with miRge  with the other method samples. Our script to prepare NEXTflex samples for UMI-tools  is available on GitHub at https://github.com/LieberInstitute/miRNA_Kit_Comparison .
Adapter and degenerate base trimming and alignment
For the NEXTflex (and therefore the Deduped and Fivepercent), NEB, and Illumina FASTQ files the 3′ adapter sequences and all bases 3′ of the adapter were trimmed from the ends of the reads using cutadapt  version 1.8.3. For the NEXTflex samples the first and last four bases, which correspond to the random bases included in the adapter sequences, were also trimmed. In the case of the Deduped samples these sequences were added to the identifier line prior to trimming. These bases correspond to the random adapter sequences because sequencing begins at the location of the 4 random bases in the 5′ adapter for this kit.
Unlike the other kits, the Clontech kit is stranded. Read 1 corresponds to the sense strand of the input RNA and the first three bases correspond to the nucleotides added during the SMART template-switching method. Then 10 Adenine bases were removed from the 3′ end, as well as all potential bases 3′ of this stretch of bases.
When trimming the synthetic sample FASTQ files, a lower length limit of 16 bases was used (as this was the shortest synthetic RNA), while a lower length limit of 18 bases was used for the brain samples (as human miRNAs are generally longer than 16 bases), to reduce the inclusion of reads that were too short in the data.
To perform the hierarchical cluster analysis, we used the miRNA quantifications from all brain libraries with all starting amounts using both batches (total of 99 libraries, 19 for the Clontech, NEXTflex, Deduped and Fivepercent methods and 10 for Illumina and 13 for NEB). We normalized the data using the DESeq2  method with the method as the design, using the DESeqDataSetFromMatrix(), estimateSizeFactors(), and counts() functions of the Bioconductor package DESeq2 (v 1.18.1). Normalization for small RNA sequencing is a debated topic and further studies are needed to confirm the best method for different small RNA sources [54,55,56]. However, we chose the widely used DESeq2  method for normalization as this method assumes very few differences between samples and we expected little difference between the individual synthetic miRNAs, the replicates across batches, and the triplicates within a given batch given that the samples are biologically the same. Furthermore, we chose this method because plotting the raw data suggested that the various methods resulted in different miRNA compositions, with some miRNAs showing much higher abundance in some samples relative to others and this method not only normalizes for overall library size, but also normalizes for differences in composition or overall count dispersion of the samples by determining the ratio of the counts for a given gene/small RNA to the overall geometric mean of that transcript in all of the samples tested. The median of these ratios is then used to scale the count data for each sample [40, 57]. Using this normalization method, we then determined which normalized expression estimates were greater than one for all 99 samples. This resulted in 151 common miRNAs above the threshold. We then log2 transformed these estimates. We determined the distance between the samples using the hclust() function of the stats package (v 3.4.0). We also used these quantification estimates in a sum of squares analysis to determine the percent of variance explained by method, starting amount, batch, and the number of reads that mapped to miRNA. To do this we used the Anova() type II function of the CRAN car package(v 3.0–0). To create the MA plots we used only the 1000 ng brain samples from both batches (a total of 24 samples, 4 for each method). We again normalized this subset of samples using DESeq2  and the sRNA-seq method as the design. We again restricted our analysis to miRNAs with greater than one normalized count in all 24 samples. This resulted in 174 common miRNAs above the threshold. We then manually created the MA plots. We then ranked the log2 normalized quantifications and determined the overlap of the most abundant miRNAs.
To perform the accuracy analyses, we used equimolar pools of 962 synthetic miRNAs purchased from Miltenyi Biotec. The minimum free energy of the secondary structures for each synthetic miRNA was determined using RNAfold as part of the ViennaRNA package 2 (version 2.3.5) [58, 59] GC content was determined using the letterFrequency() function of the Bioconductor package Biostrings  (v 2.46.0). Alignments were performed using the miRge program. The miRge raw count estimates were normalized using DESeq2  (v 1.18.1) with method as the design. The difference of each miRNA count estimate was then calculated from the mean of all synthetic sequences. The absolute of this difference was then log2 transformed for statistical comparisons and is referred to as “accuracy error”.
A linear model was used to evaluate the influence of method on accuracy error, using the lm() function of the stats package, and paired t-tests using the t.test() function of the stats package (v 3.4.0) were used for pairwise comparisons of each method. The Bonferroni method was used to for multiple testing correction. The omega squared value was calculated using the anova_stats() function of the CRAN package sjstats  (v 0.16.0). Hedge’s g was calculated using the tes() function of the CRAN package compute.es (v 0.2–4). Catplots to evaluate concordance of error rank were created using the CATplot() function of the Bioconductor package ffpe (v 1.22.0).
Detection diversity assessment
To assess mapping rates to various classes of RNAs, we collected FASTA files for a variety of RNAs: miRNA, piRNA, rRNA, scaRNA, snoRNA, snRNA, and transfer RNA (tRNA) and then created a merged FASTA file from each of the smaller FASTA files. We used the miRNA FASTA file included in miRge. The piRNA data was acquired from piRNAQuest  ( http://bicresources.jcbose.ac.in/zhumur/pirnaquest/ ). The rRNA, tRNA, and snRNA data came from the hg19 assembly from the UCSC genome table browser  ( http://genome.ucsc.edu ). The snoRNA and scaRNA data came from snoRNABase  ( https://www-snorna.biotoul.fr/browse.php ). Only the C/D box snoRNAs were included as all the H/ACA box snoRNAs overlapped with the snRNA data from UCSC. Six of the C/D snoRNA sequences and the snRNA overlapped in our merged FASTA file. Additionally, all of the scaRNAs overlapped the snoRNA C/D box sequences, but we maintained them in order to analyze scaRNA. Exact matches of miRNA sequences and piRNA were removed from the piRNA portion of the FASTA file. Bowtie  was used for alignment to all the sequences simultaneously allowing for zero mismatches within the default seed length of 28 bases to better distinguish similar sequences of different RNA classes. We then determined the count of reads that mapped to each RNA class.
miRNAs were considered detected if they were observed with > 10 normalized reads in all triplicates of a given starting amount. Raw counts from miRge for the brain batch 1 samples (93 in total) were normalized with DESeq2  but were not log transformed. Another analysis was performed using both batches and normalizing with DESeq2  using all brain samples and a threshold of > 10 normalized reads for all samples of a given starting amount. The percent of detected miRNAs that were inconsistently detected was calculated as follows:
Where UX = number of unique miRNAs detected with > 10 normalized reads in a given triplicate and where U1, 2, 3 = number of unique miRNAs detected with > 10 normalized reads in all triplicates.
The same methods were used for the isomiR analysis, however a threshold of 100 normalized reads was used instead of 10.
Statistical analyses of the detection and detection inconsistency were performed as in the Accuracy assessment with lm() and t.test() of stats package (v 3.4.0) and the tes() function of the compute.es package (v 0.2–4) and the anova_stats() function of the sjstats package (v 0.16.0) to calculate effect sizes. Percent variance explained analyses of sequence characteristics were performed using the Anova() function of the CRAN package car (v 3.0–0). Concordance was evaluated using the CATplot() function of the Bioconductor package ffpe (v 1.22.0). The evaluation of false isomiRs used the synthetic miRNA data. An isomiR was considered as detected if over 100 normalized reads were observed.
Consistency across batch was determined using all 1000 ng brain samples (24 samples). DESeq2  (v 1.18.1) was used to normalize these samples with method as the design. Quantifications were filtered for those with > 10 normalized reads in all samples of a given method. The mean of the quantifications from the first batch triplicates was compared with that of the second batch quantifications. The log2 transformed value of the absolute difference between these two quantifications was used to compare the batch consistency of the methods. Again the lm() of the stats package (v 3.4.0) was used for global analyses, while the t.test() function with Bonferroni correction was used to compare pairs of methods. Evaluating the intersection of all miRNAs detected across both batches for each method (total of 162 miRNAs), we determined the percent of variance in triplicate error for sequence characteristics.
To evaluate the consistency of triplicates, we used all 93 brain samples of the first batch. This data was normalized using DESeq2  using method as the design. The quantification estimates were filtered for those with > 10 normalized counts in all samples for a given starting amount. “Triplicate error” was determined as the difference of the value of each triplicate relative to the mean of all triplicates. The absolute value of these differences was then log2 transformed and the mean error value of triplicates was determined for each miRNA detected by each method for statistical comparisons. Evaluating just the intersection of all miRNAs detected for each starting amount and method (total of 228 miRNAs), we determined the percent of variance in triplicate error for sequence characteristics. The consistency of triplicates was then used to compare the various starting amounts. The Bonferroni method was used for multiple testing correction.
Availability of data and materials
The code for all of the analyses performed in this manuscript is publicly available at https://github.com/LieberInstitute/miRNA_Kit_Comparison. The data supporting the conclusions of this article are available at the National Institutes of Health (NIH) Sequence Read Archive (SRA), SRA BioProject accession: PRJNA498326. FASTQ files can be obtained from: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP199350.
the Clontech SMARTer smRNA-Seq Kit for Illumina, now owned by Takara Bio
NEXTflex data with the removal of duplicate reads based on UMI
Random 5% subset of the NEXTflex data
the Illumina TruSeq Small RNA Library Prep Kit
isoforms of miRNAs
the New England BioLabs Next Multiplex small RNA kit
the Bioo Scientific NEXTflex Illumina Small RNA Sequencing Kit v3, now owned by Perkin Elmer and called NEXTFLEX
P-element induced wimpy testis (PIWI) interacting RNA
small Cajal body-specific RNAs
small nucleolar RNA
small nuclear RNA
small ribonucleic acid sequencing
unique molecular identifiers
Bandiera S, Hatem E, Lyonnet S, Henrion-Caude A. microRNAs in diseases: from candidate to modifier genes. Clin Genet. 2010;77:306–13.
Miller BH, Wahlestedt C. MicroRNA dysregulation in psychiatric disease. Brain Res. 2010;1338:89–99.
Basak I, Patil KS, Alves G, Larsen JP, Møller SG. microRNAs as neuroregulators, biomarkers and therapeutic agents in neurodegenerative diseases. Cell Mol Life Sci. 2016;73:811–27.
Reid G, Kirschner MB, van Zandwijk N. Circulating microRNAs: association with disease and potential use as biomarkers. Crit Rev Oncol Hematol. 2011;80:193–208.
Eminaga S, Christodoulou DC, Vigneault F, Church GM, Seidman JG. Quantification of microRNA expression with next-generation sequencing. In: Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA, et al., editors. Current protocols in molecular biology. Hoboken: Wiley; 2013. https://doi.org/10.1002/0471142727.mb0417s103.
Neilsen CT, Goodall GJ, Bracken CP. IsomiRs – the overlooked repertoire in the dynamic microRNAome. Trends Genet. 2012;28:544–9.
Morin RD, O’Connor MD, Griffith M, Kuchenbauer F, Delaney A, Prabhu A-L, et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res. 2008;18:610–21.
Cloonan N, Wani S, Xu Q, Gu J, Lea K, Heater S, et al. MicroRNAs and their isomiRs function cooperatively to target common biological pathways. Genome Biol. 2011;12:1.
Nejad C, Pillman KA, Siddle KJ, Pépin G, Änkö M-L, McCoy CE, et al. miR-222 isoforms are differentially regulated by type-I interferon. RNA. 2018;24:332–41.
Yu F, Pillman KA, Neilsen CT, Toubia J, Lawrence DM, Tsykin A, et al. Naturally existing isoforms of miR-222 have distinct functions. Nucleic Acids Res. 2017;45:11371–85.
Burroughs AM, Ando Y, de Hoon MJL, Tomaru Y, Nishibu T, Ukekawa R, et al. A comprehensive survey of 3′ animal miRNA modification events and a possible role for 3′ adenylation in modulating miRNA targeting effectiveness. Genome Res. 2010;20:1398–410.
Tan GC, Dibb N. IsomiRs have functional importance. Malaysian J Pathol. 2015;37:73–81.
Telonis AG, Magee R, Loher P, Chervoneva I, Londin E, Rigoutsos I. Knowledge about the presence or absence of miRNA isoforms (isomiRs) can successfully discriminate amongst 32 TCGA cancer types. Nucleic Acids Res. 2017;45:2973–85.
Baran-Gale J, Fannin EE, Kurtz CL, Sethupathy P. Beta cell 5′-shifted isomiRs are candidate regulatory hubs in type 2 diabetes. PLoS One. 2013;8:e73240.
Martí E, Pantano L, Bañez-Coronel M, Llorens F, Miñones-Moyano E, Porta S, et al. A myriad of miRNA variants in control and Huntington’s disease brain regions detected by massively parallel sequencing. Nucleic Acids Res. 2010;38:7219–35.
Witwer KW, Halushka MK. Toward the promise of microRNAs – enhancing reproducibility and rigor in microRNA research. RNA Biol. 2016;13:1103–16.
Linsen SEV, de Wit E, Janssens G, Heater S, Chapman L, Parkin RK, et al. Limitations and possibilities of small RNA digital gene expression profiling. Nat Methods. 2009;6:4734–476.
Raabe CA, Tang T-H, Brosius J, Rozhdestvensky TS. Biases in small RNA deep sequencing data. Nucleic Acids Res. 2014;42:1414–26.
Buschmann D, Haberberger A, Kirchner B, Spornraft M, Riedmaier I, Schelling G, et al. Toward reliable biomarker signatures in the age of liquid biopsies - how to standardize the small RNA-Seq workflow. Nucleic Acids Res. 2016;44:5995–6018.
Lopez JP, Diallo A, Cruceanu C, Fiori LM, Laboissiere S, Guillet I, et al. Biomarker discovery: quantification of microRNAs and other small non-coding RNAs using next generation sequencing. BMC Med Genet. 2015;8. https://doi.org/10.1186/s12920-015-0109-x.
Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, et al. Library construction for next-generation sequencing: overviews and challenges. BioTechniques. 2014;56. https://doi.org/10.2144/000114133.
Pritchard CC, Cheng HH, Tewari M. MicroRNA profiling: approaches and considerations. Nat Rev Genet. 2012;13:358–69.
Kim Y-K, Yeo J, Kim B, Ha M, Kim VN. Short structured RNAs with low GC content are selectively lost during extraction from a small number of cells. Mol Cell. 2012;46:893–5.
Dard-Dascot C, Naquin D, d’Aubenton-Carafa Y, Alix K, Thermes C, van Dijk E. Systematic comparison of small RNA library preparation protocols for next-generation sequencing. BMC Genomics. 2018;19. https://doi.org/10.1186/s12864-018-4491-6.
Sorefan K, Pais H, Hall AE, Kozomara A, Griffiths-Jones S, Moulton V, et al. Reducing ligation bias of small RNAs in libraries for next generation sequencing. Silence. 2012;3:4.
Giraldez MD, Spengler RM, Etheridge A, Godoy PM, Barczak AJ, Srinivasan S, et al. Comprehensive multi-center assessment of small RNA-seq methods for quantitative miRNA profiling. Nat Biotechnol. 2018. https://doi.org/10.1038/nbt.4183.
Yeri A, Courtright A, Danielson K, Hutchins E, Alsop E, Carlson E, et al. Evaluation of commercially available small RNASeq library preparation kits using low input RNA. BMC Genomics. 2018;19. https://doi.org/10.1186/s12864-018-4726-6.
Baran-Gale J, Kurtz CL, Erdos MR, Sison C, Young A, Fannin EE, et al. Addressing Bias in small RNA library preparation for sequencing: a new protocol recovers MicroRNAs that evade capture by current methods. Front Genet. 2015;6. https://doi.org/10.3389/fgene.2015.00352.
Fu Y, Wu P-H, Beane T, Zamore PD, Weng Z. Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers. BMC Genomics. 2018;19. https://doi.org/10.1186/s12864-018-4933-1.
Faridani OR, Abdullayev I, Hagemann-Jensen M, Schell JP, Lanner F, Sandberg R. Single-cell sequencing of the small-RNA transcriptome. Nat Biotechnol. 2016;34:1264–6.
Hafner M, Renwick N, Brown M, Mihailovic A, Holoch D, Lin C, et al. RNA-ligase-dependent biases in miRNA representation in deep-sequenced small RNA cDNA libraries. RNA. 2011;17:1697–712.
Jayaprakash AD, Jabado O, Brown BD, Sachidanandam R. Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res. 2011;39:e141.
Dabney J, Meyer M. Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. BioTechniques. 2012;52. https://doi.org/10.2144/000113809.
Hong J, Gresham D. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing. BioTechniques. 2017;63. https://doi.org/10.2144/000114608.
Kivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M, Linnarsson S, et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods. 2012;9:72–4.
Sena JA, Galotto G, Devitt NP, Connick MC, Jacobi JL, Umale PE, et al. Unique molecular identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis. Sci Rep. 2018;8. https://doi.org/10.1038/s41598-018-31064-7.
Alon S, Vigneault F, Eminaga S, Christodoulou DC, Seidman JG, Church GM, et al. Barcoding bias in high-throughput multiplex sequencing of miRNA. Genome Res. 2011;21:1506–11.
Fuchs RT, Sun Z, Zhuang F, Robb GB. Bias in ligation-based small RNA sequencing library construction is determined by adaptor and RNA structure. PLoS One. 2015;10:e0126049.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15. https://doi.org/10.1186/s13059-014-0550-8.
Tang DTP, Plessy C, Salimullah M, Suzuki AM, Calligaris R, Gustincich S, et al. Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching. Nucleic Acids Res. 2013;41:e44.
Kapteyn J, He R, McDowell ET, Gang DR. Incorporation of non-natural nucleotides into template-switching oligonucleotides reduces background and improves cDNA synthesis from very small RNA samples. BMC Genomics. 2010;11:413.
Zhuang F, Fuchs RT, Sun Z, Zheng Y, Robb GB. Structural bias in T4 RNA ligase-mediated 3′-adapter ligation. Nucleic Acids Res. 2012;40:e54.
Huang X, Yuan T, Tschannen M, Sun Z, Jacob H, Du M, et al. Characterization of human plasma-derived exosomal RNAs by deep sequencing. BMC Genomics. 2013;14:319.
Barberán-Soler S, Vo JM, Hogans RE, Dallas A, Johnston BH, Kazakov SA. Decreasing miRNA sequencing bias using a single adapter and circularization approach. Genome Biol. 2018;19. https://doi.org/10.1186/s13059-018-1488-z.
Stahlberg A. Properties of the reverse transcription reaction in mRNA quantification. Clin Chem. 2004;50:509–15.
Honda S, Kirino Y. Dumbbell-PCR: a method to quantify specific small RNA variants with a single nucleotide resolution at terminal sequences. Nucleic Acids Res. 2015;43:e77.
Androvic P, Valihrach L, Elling J, Sjoback R, Kubista M. Two-tailed RT-qPCR: a novel method for highly accurate miRNA quantification. Nucleic Acids Res. 2017;45:e144.
Choi YS, Edwards LO, DiBello A, Jose AM. Removing bias against short sequences enables northern blotting to better complement RNA-seq for the study of small RNAs. Nucleic Acids Res. 2017;45:e87.
Smith TS, Heger A, Sudbery I. UMI-tools: modelling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 2017;27:491–9.
Baras AS, Mitchell CJ, Myers JR, Gupta S, Weng L-C, Ashton JM, et al. miRge - a multiplexed method of processing small RNA-Seq data to determine MicroRNA entropy. PLoS One. 2015;10:e0143066.
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–2.
Griffiths-Jones S. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–4.
Garmire LX, Subramaniam S. The poor performance of TMM on microRNA-Seq. RNA. 2013;19:735–6.
Garmire LX, Subramaniam S. Evaluation of normalization methods in mammalian microRNA-Seq data. RNA. 2012;18:1279–88.
Dillies M-A, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013;14:671–83.
Evans C, Hardin J, Stoebel DM. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief Bioinform. 2018;19:776–92.
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. Fast folding and comparison of RNA secondary structures. Monatshefte fur Chemie Chemical Monthly. 1994;125:167–88.
Lorenz R, Bernhart SH, Höner zu Siederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011;6:26.
Pagès H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: efficient manipulation of biological strings. R package; 2018.
Lüdecke D. sjstats: Statistical functions for regression models. R package. 2018. https://CRAN.R-project.org/package=sjstats.
Sarkar A, Maji R, Saha S, Ghosh Z. piRNAQuest: searching the piRNAome for silencers. BMC Genomics. 2014;15:555.
Karolchik D. The UCSC table browser data retrieval tool. Nucleic Acids Res. 2004;32:493D–496.
Lestrade L. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 2006;34:D158–62.
We are grateful for the generosity of the Lieber and Maltz families in establishing an institute dedicated to understanding the basis of developmental brain disorders. We are grateful to the Johns Hopkins Program for microRNA Biology Journal Club for feedback about our study, particularly from Marc Halushka.
This work was supported by the Lieber Institute for Brain Development (LIBD), a nonprofit research center in Baltimore, Maryland and the AstraZeneca postdoctoral fellowship program, run by AstraZeneca Limited (AZ), a company in Cambridge Massachusetts. The data used in this study were obtained through funding by the AZ postdoctoral fellowship program to first author Carrie Wright and LIBD during the data collection phase of this project. The AZ program contributed salary support for Carrie Wright and support towards library preparation and sequencing costs. LIBD provided facilities and resources, including those required to perform the library preparation and sequencing, as well as data storage and computational resources. LIBD also supported the authors in the analysis phase of this work, as well as during the writing of this manuscript.
Ethics approval and consent to participate
All human tissue was acquired from Ambion (Thermo Fisher Scientific) Catalog No. AM7962, human brain total RNA, and therefore no approval or consent was required specifically for this study. Ambion certifies that all human derived materials have been prepared from tissue obtained with consent from a fully informed donor or a member of the donor’s family. All human tissues used for RNA isolation at Ambion are obtained through accredited organ/tissue procurement organizations that operate with Institutional Review Board approval, and in accordance with regulations set forth by the United States Department of Health and Human Services.
Consent for publication
At the time of this work, AJC and NJB were full-time employees and shareholders of AstraZeneca PLC and CW (Carrie Wright) was a postdoctoral fellow in the AstraZeneca postdoctoral studentship program. The remaining authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Table shows the top 20 most abundant miRNAs for each method. The 6 miRNAs that were commonly among the top 20 for all of the methods are highlighted in different colors. (XLSX 43 kb)
Table S2. Statistics for paired t-tests of “accuracy error” of each synthetic miRNA from the mean for each method. Percent difference is calculated as the percent difference of the first method in the comparison relative to the second, thus Clontech had 4.55% less error than Illumina. Percent difference was calculated using more precise means than the rounded means shown in the table. Significant findings are highlighted in yellow. (XLSX 41 kb)
Figure S1. Boxplots and heatmaps of the influence of various synthetic sequence aspects on accuracy error. Boxplots show sequences grouped by various sequence aspects, including: Gibb’s free energy secondary structure estimates, the identity of the last 2 bases, the identity of the first 2 bases, and the number of Cs in the sequence. The percent expression relative to the mean of all synthetic sequences is plotted for each of the 962 synthetic sequences. Heatmaps also depict the percent expression of synthetic sequences grouped by the sequence aspect of interest relative to the mean expression of all the synthetic sequences. Overall, most methods showed a positive quantification relationship with higher or less negative Gibb’s free energy estimates. Most methods showed consistent quantification despite the identity of the last two bases, however NEB and Illumina showed more inconsistent quantification. All of the methods showed fairly consistent quantification despite the identity of the first two bases, however the Deduped method showed more consistent quantification. Most of the methods showed decreased quantification with increasing numbers of Cs, however the Clontech method showed the opposite relationship, and the Deduped method was quite consistent. (PDF 663 kb)
Figure S2. This figure depicts the secondary structure of example of synthetic miRNAs with differences in detection that may be due to secondary structure. Mmu-miR-540-5p/rno-miR-540-5p and hsa-miR-614 were detected less than the average synthetic sequence by all tested methods, while mmu-miR-654-5p and rno-miR-743b were detected more than the average synthetic sequence by all tested methods. (PDF 36 kb)
Table S3. Percentage of reads mapping to various types of small RNA. Columns E-K show the number of reads mapping to each type of RNA, while L shows the total number of reads. Columns N-T show the percentage of reads of the total mapped reads that map to each type of small RNA. Column U shows the percentage of the total reads that did not map to any off the evaluated RNAs. Column V through AB show the percentage of reads of all reads that map to each type of small RNA. miRNA = microRNA, piRNA = PIWI interacting RNA, rRNA = ribosomal RNA, snoRNA = small nucleolar RNA, snRNA = small nuclear RNA, tRNA = transfer RNA. Synthetic samples from are highlighted in red. These samples were only aligned to the synthetic sequences. (XLSX 77 kb)
Table S4..Statistics on percentages of reads mapping to different types of RNAs. Statistics are shown for the t-test comparisons of the percentages of reads mapping to each type of RNA for the data derived from the same human brain using 1000 ng of total RNA for each library preparation kit, using both batches. Percent difference is calculated as the percent difference of the first method in the comparison relative to the second, thus Clontech had 199% more snRNA mapped reads than Illumina. Percent difference was calculated using more precise means than the rounded means shown in the table. miRNA = microRNA, piRNA = PIWI interacting RNA, rRNA = ribosomal RNA, snoRNA = small nucleolar RNA, snRNA = small nuclear RNA, tRNA = transfer RNA. Significant findings are highlighted in yellow. (XLSX 42 kb)
Table S5. Number of miRNA and isomiRs detected by each method. The top half of the table shows the number of miRNAs detected above 10 normalized counts for each method in all 3 triplicates, the number detected above the threshold in both batches, and the percentage of miRNAs detected in the first batch that were not detected in the second batch. The bottom half of the table shows the number of non-canonical miRNA sequences, called isomiRs, detected above 100 normalized counts for each method in all 3 triplicates, the number detected above the threshold in both batches, and the percentage of isomiRs detected in the first batch that were not detected in the second batch. (XLSX 33 kb)
Table S6. Clontech subsample analysis. The original number of reads was retained for all methods other than Clontech. The variation in the number of detected miRNAs (number of those with > 10 normalized reads) for the other methods demonstrates the influence of the inclusion of the different Clontech subsets on the DESeq2 normalization. The normalization in this case only included these samples, therefore there may be differences compared to the other tables. (XLSX 34 kb)
Figure S3. Detection of miRNAs and isomiRs and detection consistency of miRNA and isomiRs across various starting amounts. For the detection plots, the number of miRNAs detected above 10 normalized counts and the number of isomiRs detected above 100 normalized counts in all triplicates of batch 1 for each method is plotted on the y-axis. The starting total RNA amount is indicated on the x-axis in nanograms. Only starting amounts in the range of suggested inputs were tested for each method. For the consistency of detection plots, the percentage of miRNAs or isomiRs detected above the threshold by a single triplicate that were not detected above the threshold by the other two triplicates is plotted on the y-axis. The starting total RNA amount is indicated on the x-axis in nanograms. The relationship between the percentage of inconsistently detected miRNAs or isomiRs and starting amount is plotted as a line using a locally estimated scatterplot smoothing regression (LOESS). (PDF 2586 kb)
Table S7. Pairwise differences between methods for miRNA detection inconsistency estimates. Percent difference is calculated as the percent difference of the first method in the comparison relative to the second, thus Clontech had 61% more inconsistency than Illumina. Percent difference was calculated using more precise means than the rounded means shown in the table. Significant findings are highlighted in yellow. (XLSX 40 kb)
Table S8. Pairwise differences between methods for number of isomiRs detected by each method. Percent difference is calculated as the percent difference of the first method in the comparison relative to the second, thus Clontech detected 41.39% more isomiRs than Illumina. Percent difference was calculated using more precise means than the rounded means shown in the table. Significant findings are highlighted in yellow. (XLSX 43 kb)
Table S9. Method comparison of the number of falsely detected isomiRs and the expression of those false isomiRs. Percent difference is calculated as the percent difference of the first method in the comparison relative to the second, thus Clontech had 181% more false isomiRs per sequence than Illumina and 26% less expression than those of Illumina. Percent difference was calculated using more precise means than the rounded means shown in the table. Significant findings are highlighted in yellow. (XLSX 39 kb)
Table S10. Pairwise comparisons of batch error. Percent difference is calculated as the percent difference of the first method in the comparison relative to the second, thus Clontech had 33% less error than Illumina. Percent difference was calculated using more precise means than the rounded means shown in the table. Significant findings are highlighted in yellow. (XLSX 43 kb)
Table S11. Pairwise comparisons of triplicate inconsistency of brain samples. Percent difference is calculated as the percent difference of the first method in the comparison relative to the second, thus Clontech had 29.56% more error than NEB in the comparison of triplicate consistency of the 1000 ng samples. Percent difference was calculated using more precise means than the rounded means shown in the table. Significant findings are shown in yellow. (XLSX 42 kb)
Table S12. Pairwise comparisons of triplicate inconsistency error bewteen starting amounts. Statistics for t-tests of triplicate inconsistency error between different starting amounts for each tested method are shown. Percent difference is calculated as the percentage that the first amount in the comparison is relative to the second, thus Fivepercent 100 ng data had 13.84% more error than the 250 ng data. Percent difference was calculated using more precise means than the rounded means shown in the table. Significant findings are highlighted in yellow. Illumina did not show a significant difference in consistency between the tested starting amounts, (F = 1.44, p = 0.23). (XLSX 43 kb)
Note S1. Analysis to determine if the length of our UMI sequence was adequate. Also see Additional file 17: Table S13. (PDF 26 kb)
Table S13. Evaluation of the most abundant miRNA in the NEXTflex samples. To determine if our UMI length was adequate, we estimated the number of starting sequences for the most abundant miRNA (miR-9-5p) in the NEXTflex data. For comparison sake, we also show the estimated number of starting molecules for a more moderately abundant miRNA (miR-137). The possible number of starting sequences for each miRNA was estimated assuming a PCR amplification efficiency of only 62.5%, all samples, thus enabling more conservative estimates for the number of starting sequences. PCR efficiencies of miRNA generally range from 75 to 95%, and previous studies demonstrate that the PCR efficiency of miR-9-5p has been 98%. Therefore, our estimates of the number of starting miR-9-5p sequences are likely higher than the true number of starting sequences. (XLSX 44 kb)
About this article
Cite this article
Wright, C., Rajpurohit, A., Burke, E.E. et al. Comprehensive assessment of multiple biases in small RNA sequencing reveals significant differences in the performance of widely used methods. BMC Genomics 20, 513 (2019) doi:10.1186/s12864-019-5870-3
- Small RNA sequencing
- Library preparation
- Unique molecular identifiers