- Research article
- Open Access
False negative rates in Drosophila cell-based RNAi screens: a case study
BMC Genomicsvolume 12, Article number: 50 (2011)
High-throughput screening using RNAi is a powerful gene discovery method but is often complicated by false positive and false negative results. Whereas false positive results associated with RNAi reagents has been a matter of extensive study, the issue of false negatives has received less attention.
We performed a meta-analysis of several genome-wide, cell-based Drosophila RNAi screens, together with a more focused RNAi screen, and conclude that the rate of false negative results is at least 8%. Further, we demonstrate how knowledge of the cell transcriptome can be used to resolve ambiguous results and how the number of false negative results can be reduced by using multiple, independently-tested RNAi reagents per gene.
RNAi reagents that target the same gene do not always yield consistent results due to false positives and weak or ineffective reagents. False positive results can be partially minimized by filtering with transcriptome data. RNAi libraries with multiple reagents per gene also reduce false positive and false negative outcomes when inconsistent results are disambiguated carefully.
The success of RNAi high throughput screening (HTS) relies on low experimental rates of false negative and false positive results, which in turn depend on the efficacy and specificity of the RNAi reagents, respectively (reviewed in [1, 2]). False positive results can arise from at least the following causes: experimental noise inherent to large-scale studies, bias associated with a particular screen assay, incorrect gene models, and arguably most importantly, reagent-specific off-target effects (OTEs) (reviewed in ). Similarly, false negative results can arise as the result of experimental noise [4, 5], aspects of screen assay design, and incorrect gene models, protein stability, gene redundancy, but most importantly, the rate of false negative results depends on the efficacy of the RNAi reagents used in the screen.
The issue of false positive results associated with RNAi reagents has been a matter of extensive study in recent years for screens in both Drosophila and mammalian cells [6–11]. In Drosophila cell-based RNAi screens, the focus of this study, cultured cells are treated with long double-stranded RNAs (dsRNAs) as the reagent for knockdown. Sequence-associated false positive results have been observed and characterized to a significant extent [10, 11]; however, the full cause of the phenomenon remains to be elucidated. There are a number of ways to identify false positives in a screen, for example using 'gold standard' rescue methods [12, 13]. By contrast, the identification of false negatives is not as straightforward, as identification of a false negative result requires previous knowledge that a gene is involved in the process under analysis. Thus, rates of false negative results have been estimated for screens that investigated well-characterized pathways. For example, in a screen for Hedgehog (Hh) signaling factors, only nine of fourteen known components of the pathway were identified  and only seven of these passed additional validation , suggesting a rate of false negative results of nearly 48%. Similarly, in a screen for Wingless (Wg)/Wnt signaling, only 16 of 21 canonical components expressed in the cell line used were identified in the screen . Interestingly, when the "hits" (positive results) from the Wg screen were re-tested using three independent dsRNAs, 70 of 204 genes tested scored with three independent dsRNAs but 68 scored with only two out of three, suggesting a false negative rate of 16% . Altogether, these analyses have suggested that false negative rates may be in the order of 16% to 50% in RNAi HTS.
One caveat to the studies that to date have looked at false negative rates in RNAi HTS is that the sample sizes were small. In order to get a more global view of false negative rates in Drosophila cell-based RNAi HTS, we decided to perform a number of analyses on a larger set of screens. The data sets we analyzed were from RNAi screens performed at the Drosophila RNAi Screening Center (DRSC)  where a standardized screening platform enables both local and visiting scientists to perform high-throughput screens with dsRNAs in Drosophila cell tissue culture. Each of the screens we analyzed used essentially the same dsRNA library (DRSC "2.0") and a standard cell line (S2, S2R+ or Kc167), such that variability due to equipment and reagents should be minimal. We also used data from DRSC screens in conjunction with an analysis of the transcriptome of cell lines  to estimate an overall false positive rate among long dsRNAs of roughly 1% and a false negative rate due to ineffective or weak dsRNAs of at least 8%. Furthermore, we find that the presence of multiple RNAi reagents per gene in a screening library can be a statistically powerful means of reducing false positive and negative results, although careful consideration must be made regarding the disambiguation of inconsistent results obtained with multiple reagents directed against the same target gene.
Estimation of false negative rates using data from RNAi reagents directed against ribosome and proteasome components
The proteasome and ribosome are two well-characterized complexes in the cell that perform the essential functions of protein degradation and protein assembly, respectively. Because of the broad functionality of the ribosome and proteasome in basic cell metabolism, we reasoned that dsRNAs targeting components of these complexes might affect the output of a wide range of RNAi screens. Indeed, we find that dsRNAs targeting ribosomal or proteasome components frequently score as "hits" (positive results) in many screens thus making them particularly useful for analysis of false negative results. We used the Gene Ontology (GO) annotations at FlyBase  to select 185 genes with the GO:0005840: Ribosome annotation, and 58 genes with GO:0000502: Proteasome Complex (Additional file 1). Of the 185 ribosomal genes, a sub-set of 94 genes are also annotated with GO:0022626: Cytosolic Ribosome.
We next selected 16 screens performed at the DRSC (see Materials and Methods) using version 2 of the DRSC genome-wide library, which was designed to minimize OTEs , and determined the scoring pattern of the ribosome and proteasome set for these screens. Two prominent clusters with strongly scoring dsRNAs clearly emerge (Figure 1A; Additional file 1): a "cytosolic ribosome cluster" that consists of 79 genes enriched for GO:0022626: Cytosolic Ribosome and a "proteasome complex cluster" that consists of 36 genes enriched for GO:0000502: Proteasome Complex. For each cluster, a "screen signature" was calculated by determining for each screen the mean Z-score of the dsRNAs in the cluster. The screen signatures for the proteasome complex and cytosolic ribosome clusters are shown in Figures 1B and 1C, respectively. Outside of these two clusters, the majority of dsRNAs are those that target components of mitochondrial ribosome. Unlike the cytosolic ribosome components, these do not appear to show strong phenotypes across multiple RNAi screens.
Some cytosolic ribosome and proteasome complex genes are absent from their respective clusters and lack functionally typical screen signatures. Overall, 22 of 94 cytosolic ribosome genes and 29 of 58 proteasome complex genes failed to yield the appropriate screen signature (Additional file 2). Possibilities for these failures include functional mis-annotation of genes or protection from loss-of-function phenotypes due to gene redundancy. Additionally, some of the non-clustering cytosolic ribosome and proteasome complex genes may represent false negatives due to insufficient knockdown.
When there is only a single dsRNA targeting the gene that did not result in the predicted screen signature, it is not possible to distinguish among potential causes of negative results. Fortunately, however, DRSC library version 2 has two or more dsRNAs per gene for many genes represented in the library. In principle, because dsRNAs that target the same gene should yield similar screen signatures, we can ask if this is the case when two such dsRNAs against the same gene exist in the collection. Within the proteasome complex and cytosolic ribosome clusters, there are 51 genes represented by two dsRNAs in the dsRNA library for a total of 103 dsRNAs (in one case three dsRNAs targeted a single gene). Of these 51 genes, 42 have dsRNAs that exhibit the appropriate screen signatures. In 9 cases however, only one dsRNA appears either in the cytosolic ribosome cluster (Figure 2A) or the proteasome complex cluster (Figure 2B). In 8 of these 9 cases, the gene target is well known and functionally consistent with the cluster in which one of its dsRNAs appears (DRSC03201, which targets Pomp, clusters in the periphery of the cytosolic ribosome cluster and is most likely a false positive). Because for 8 genes, one dsRNA gave the expected screen signature but the other did not, the non-signature dsRNAs are likely false negatives. Thus, we conclude that 8 out of 103 dsRNAs failed to cluster as expected, yielding to a false negative rate due to ineffective RNAi reagents at 8%. Because this estimate is derived from a meta-analysis of multiple screens, the most likely explanation for these false negative is weak or ineffective dsRNAs rather than statistical noise from an individual screen.
Note that the 8% rate is likely under-estimated since we did not take into account the false negative rate present in the initial screen. Our reasoning for not including them is that we do not know whether the genes that did not score initially should have scored in the assays. Regardless, if we do include those, the false negative rate is higher and reaches 34% [(22+29)/(94+58)]. Test of additional dsRNAs will be necessary to address whether these are genuine false negatives or not.
Use of focused RNAi libraries with multiple reagents per gene as a strategy to minimize the rate of false negative results
The past and current DRSC genome-wide Drosophila libraries included redundant dsRNAs for only a subset of the genome, thus limiting our ability to fully assess rates of false negative results using full-genome screen datasets. To address this issue, we generated sub-libraries containing multiple RNAi reagents (2 to 4) for several specific gene families (see Materials and Methods), such that analysis of results from a sub-library should supplement the results reported from genome-wide screens. Similar to the cluster analysis presented above, the sub-library sets allow for comparison of the behavior of multiple reagents per gene. Additionally, the layout of the sub-library assay plates was designed with an outer perimeter of wells that lack dsRNAs to reduce the possible influence edge effects that occur in many screens . Currently, four sub-libraries have been generated: a kinases and phosphatases sub-library (K/P), a transcription factor and DNA binding sub-library (TRXN), a transmembrane domain-containing protein sub-library (TM), and a library which covers genes involved in ubiquitination and related processes (UBIQ) (Table 1). Like version 2 of the DRSC genome-wide library, these sub-libraries were designed with SnapDragon  to avoid sequences known to cause OTEs (see Materials and Methods).
K/P screen for JAK/STAT signaling pathway components: a case study in identification of false discovery rates
To demonstrate the utility of focused libraries with multiple amplicons per gene, we screened the K/P set for factors involved in the JAK/STAT pathway. S2R+ cells were transfected with dsRNA and both 10xSTAT-firefly luciferase and actin-renilla luciferase constructs as previously described (; Materials and Methods). The pathway was stimulated three days later by the addition of S2NP cells transfected with a plasmid expressing the Unpaired ligand , and JAK/STAT pathway activity quantified by measuring firefly luciferase activity. Renilla luciferase activity was used for normalization. The redundant coverage of genes in the K/P library provides an opportunity to compare the behavior of dsRNAs that target the same gene. The K/P set contains two canonical positive regulators of JAK/STAT signaling with three dsRNAs each: domeless (dome), which was initially annotated as a phosphatase , and the kinase hopscotch (hop) (reviewed in ). All dsRNAs targeting dome and hop were strong hits in the screen, with Z-scores less than -4. The K/P set also contains one canonical negative regulator of JAK/STAT signaling, Ptp61F. The three dsRNAs for this gene did not score, most likely because over-stimulation with the act-upd construct makes it difficult to detect negative regulators of JAK/STAT signaling in S2R+ cells .
For further analysis, we selected those dsRNAs with Z-scores with an absolute value of 2 or greater across both replicates, which in this case included dsRNAs targeting 24 genes (Figure 3, Table 2). We then compared these to the Z-scores of the other dsRNAs in the K/P set that target the same gene and transcripts. In some cases, the scores obtained with all dsRNAs directed against a particular gene were consistent, whereas in other cases, some dsRNAs directed against a single gene were phenotypically inconsistent. We categorized the results of dsRNAs into three categories: In category 1, all dsRNAs directed against a given gene were hits. In category 2, at least 2 dsRNAs were hits but there was at least one which did not score significantly. In category 3, only 1 dsRNA directed against a gene yielded a significant result. Out of 24 genes, 5 had positive results for all dsRNAs (category 1), 4 were in category 2, and 15 were in category 3 (Figure 3, Table 2).
Using transcriptome analysis to preferentially filter false positives
In those cases where we observed discrepancies (categories 2 and 3), we determined whether the targeted gene was expressed in S2R+ cells using expression datasets . In principle, this information could be extremely useful for data curation, as dsRNAs that score but for which there is no evidence that the gene is expressed in the cell line tested are likely false positives. Importantly, transcriptome information may not only help to resolve many ambiguous false positive cases but also help identify false negatives, as the inconsistent dsRNAs that have been ruled out to be due to false positives should be enriched for false negatives.
Analysis of the transcriptional activity in S2R+ cultured cells provides evidence for expression of 7,069 genes (see Materials and Methods). Of these, 6,223 (or 45%) of annotated protein-coding genes are expressed at elevated levels (FPKM >= 5). Of the genes in the K/P sub-library, 70% are expressed in S2R+ cells. Importantly, we found evidence that all of the core components of the JAK/STAT pathway required for signal transduction are expressed in S2R+ cells (Figure 4). Interestingly, the Upd ligands are either not expressed or expressed at low levels, suggesting that the JAK/STAT pathway is either not active or active at low levels in cultured cell lines, which is consistent with the fact that stimulation with act-upd was necessary to activate the pathway for our RNAi screen (see Materials and Methods).
Of the 24 genes found in the K/P screen, 16 are expressed in S2R+ cells (Figure 3, Table 2). All category 1 and category 2 genes are expressed and are represented by multiple scoring dsRNAs, suggesting that the few dsRNAs that did not score are most likely false negatives. Categories 1 and 2 represent results from 37 dsRNAs of which 5 did not score. Therefore, we estimate a false negative rate of ~13%, which is roughly consistent with the ~8% estimate from the ribosome and proteosome cluster analysis described above. All 8 of the unexpressed genes are limited to the 15 category 3 genes for which only a single dsRNA scored (Figure 3, Table 2). Therefore, these 8 genes should be considered false positive results and should be viewed as low priority for selection for additional validation.
Since only 7 of the 15 category 3 genes are expressed (47%), category 3 genes show no enrichment for expressed genes. This suggests that few, if any of category 3 genes, for which a single dsRNA scored, represent true positives. Thus, assuming that all 15 category 3 dsRNAs are false positives, the overall rate of false positives for this K/P screen is 1% since we screened 1,545 dsRNAs in total. It is important to note that although 1% appears to be an acceptable low rate, when the same false positive rate is shown as a percentage of the genes identified as positives in the screen, the figure is 62% (15 out of 24; Figure 3, Table 2), thus, underscoring the need for further validation of primary hit lists.
Knowledge of the transcriptome of the cell line used in our K/P JAK/STAT screen allowed us to estimate the false positive rate, as few unexpressed genes are expected to be legitimate hits. Likewise, in any screen, failure to uncover some expected hits can sometimes be explained by the finding that those genes are simply not expressed in the specific cell line tested. In turn, this allows an estimate of false negatives in conjunction with multiple reagents per gene. To assist such analyses, we have analyzed gene expression based on deep-sequencing data obtained by the modENCODE consortium  for five Drosophila cell lines commonly used in RNAi HTS (Figure 5) and have made this data available on a website (, see Materials and Methods). Each of the cell lines expresses about 53% of protein-coding genes in the genome but the specific sub-set of genes that are expressed differs somewhat among the cell lines. We identified 6,230 genes expressed in all five cell lines, representing 46% of annotated protein-coding genes in release 5.22 of the Drosophila genome. False positives and false negatives can also potentially be filtered using tools based on protein interaction networks such as RNAiCut  and NePhe .
This study has focused on false negative rates among long dsRNAs used in Drosophila RNAi screens in cultured cells. Although the exact rates will vary depending on the reagent library, assay design, and the level of statistical noise, our analysis provides a detailed example of the issues that need to be considered carefully in the data analysis of an RNAi screen. Importantly, other RNAi reagents, such as siRNAs, shRNAs, and siRNA pools used in mammalian RNAi screens, have their own false positive and false negative rates and these are not necessarily the same as what we observed with Drosophila long dsRNAs. Regardless of the reagent used, however, any false negative rate significantly above zero will cause genes to be missed in an RNAi screen. Likewise, as shown in the K/P screen, even a very low false positive rate among the set of reagents can yield a very high proportion of false positives when expressed as a percentage of the hits obtained in an individual screen. Finally, our study illustrates how transcriptome data from the cell lines can be included as part of the data analysis to eliminate false positives.
The existence of false negatives due to ineffective RNAi reagents necessitates strategies for reducing their effects on the outcomes of RNAi screens. One obvious approach to minimize false negatives in screens is to use multiple, independently screened reagents per gene, as done in some recent RNAi screens [29, 30]. In principle, use of multiple reagents per gene should reduce the number of false negatives, as a single ineffective RNAi reagent would be compensated by those that are effective. An obvious caveat to this, however, is that simply by including more reagents, the number of false positive results will also increase.
To explore how multiple RNAi reagents per gene could affect the outcome of a screen and to determine the best strategy for disambiguating results when different reagents yield inconsistent results, we devised a simple model of one, two, and three reagents per gene (Table 3). Furthermore, we examined three simple generalized disambiguation approaches and modeled how these approaches would affect the outcome of a screen. These disambiguation approaches are as follows: a lenient approach wherein a gene is considered a hit if any RNAi reagent directed against that gene scores above some threshold (Table 3, Rule A); a stringent approach that requires all reagents directed against the same gene to score (Table 3, Rule B); and an intermediate approach that requires more than half of the reagents directed against the same gene to score (Table 3, Rule C). For the purpose of this model, an RNAi "mini-pool" of reagents, such as is sometimes used for mammalian siRNA knockdown, or combinatorial knockdown with multiple dsRNAs, counts as a single RNAi reagent unless the individual components are tested separately.
To illustrate the model, we chose as an example three hypothetical Drosophila genome-wide dsRNA libraries with false negative and false positive rates of 10% and 1% respectively (Figure 6). The model shows that the strategy used to disambiguate results from multiple reagents is critical when interpreting results from a library with more than one independently tested reagent per gene. In a hypothetical library with three reagents per gene, a lenient interpretation (requiring one or more of three reagents to score) results in few false negatives but an extremely high number of false positives in the outcome of a screen (Table 3 and Figure 6, Rule A). In this scenario, the presence of multiple reagents per gene virtually eliminates false negatives but at the cost of a high number of false positives as illustrated by our K/P JAK/STAT screen which would have a 62% final false positive rate (in terms of the percentage of hits) if interpreted this way. A stringent disambiguation (requiring all three reagents to score) results in few false positives but a high number of false negatives (Tables 3 and Figure 6, Rule B).
A third possible strategy for libraries with three reagents per gene (Tables 3 and Figure 6, Rule C) requires two out of three RNAi reagents to score. This disambiguation method achieved a balance of false negatives and false positives, resulting in low numbers of each relative to what would be achieved by screening a single dsRNA per gene. Thus, adding additional reagents per gene can greatly reduce false negative rates in screens but can also greatly increase the number of false positives in the absence of careful disambiguation.
For Drosophila cell-based RNAi screens, a library with three dsRNAs per gene, wherein discrepancies are disambiguated by requiring two of three dsRNAs to score, achieves a good balance between false negatives and false positives. For RNAi reagents with significantly different reagent-level false positive and false negatives rates, a different number of reagents with a different disambiguation strategy may be more appropriate. Indeed, several groups have proposed using four or more siRNAs per gene in mammalian siRNA screens [4, 31]. Moreover, our model and disambiguation strategy is based on a simple binary interpretation of hits, but other more quantitative approaches have been proposed that do not require a screener to designate individual reagents as hits or non-hits. A recently described approach for disambiguating image-based RNAi screens, quantitative multiparametric image analysis (QMPIA), can be applied to complex screens with a very large number of read-outs . A more broadly applicable quantitative disambiguation approach, the redundant siRNA activity (RSA) method , requires only one read-out per RNAi experiment. Regardless of the disambiguation approach used, screeners must carefully interpret results obtained with multiple reagents per gene in order to reduce false negative results without increasing the number of false positive results to an unacceptably high level.
RNAi reagents that target the same gene do not always yield consistent results. Some of these inconsistencies can be explained by false positives and off-target effects, but some RNAi reagents are weak or ineffective and cause false negative results. False positive results and off-target effects can be partially filtered by using cell-line transcriptome expression data, and we have presented a web-tool to enable Drosophila cell-based RNAi screeners to filter screen results. RNAi libraries with multiple reagents per gene enable a reduction in false positive and false negative outcomes so long as care is taken when disambiguating inconsistent results to prevent an unintentional increase in false positive or false negative results.
Construction of the RNAi sub-libraries
RNAi "sub-libraries" were constructed by selecting genes based on known and predicted function as determined by FlyBase  supplemented with curation of the lists by experts. For each gene, two to four dsRNAs were selected from existing libraries or designed de novo using SnapDragon . SnapDragon is a dsRNA design tool that selects gene regions common to splice forms and avoids sequences known to cause OTE [10, 11]. RNAi reagents were constructed based on previously described protocols , dsRNAs were normalized to a dilution of 50 ng/ul, and 5 ul of this was aliquoted into each well of 384-well plates.
A kinases and phosphatases sub-library screen was performed as described previously with minor modifications . Briefly, S2R+ cells were transfected with dsRNA and two reporter constructs (10xSTAT-fire fly luciferase and actin-renilla luciferase). Three days later, the JAK/STAT pathway was stimulated via the addition of S2NP cells transfected with a plasmid expressing the JAK/STAT pathway ligand Unpaired (actin-Unpaired/act-upd) . JAK/STAT signaling activity was quantified by measuring firefly luciferase activity, as the expression of firefly luciferase is under the control of 10 repeats of a STAT binding sequence. We used ubiquitously expressed Renilla luciferase activity to normalize for transfection efficiency and cell viability. The normalized luciferase values were used to calculate Z-scores. A Z-score for a well is calculated using the formula: (x - μ)/σ where x is the value of the well, μ is the mean value across all wells of the plate, and σ is the standard deviation of the well values of the plate.
RNAi screens performed using the DRSC "2.0" genome-wide library (i.e. the most updated genome-wide library) were selected for analysis. Raw data from these screens were normalized using a standard plate-based Z-score analysis. The screens included are diverse in terms of assay read-outs and the subject under investigation; they include image-based screen assays, fluorimeter and luminometer (i.e. plate-reader) assays and investigated topics such as cell signaling pathways, pathogen infection, ion transport, cell viability, cellular and sub-cellular morphology, and RNA processing.
The 243 genes that target ribosome and proteasome components were selected based on Gene Ontology annotations in FlyBase . A complete list of the dsRNAs analyzed in the study, which correspond to this set of 243 genes, can be found in Supplementary Table S1. The screen results obtained with dsRNAs targeting these genes were clustered based on their Z-scores across the screens using Cluster 3.0  using Pearson's correlation and average linkage hierarchical clustering.
To characterize gene expression levels, we used deep sequencing data obtained by the modENCODE consortium and available online  for the BG3, Cl.8, Kc167, S2-DRSC and S2R+ cultured cell lines. The first four cell lines were sequenced by modENCODE using 37 nt paired-end reads on the Illumina GAIIx platform (GEO Accession GSE15596) . In addition, we analyzed samples obtained from the S2R+ cell line that were sequenced with the same platform in a strand-specific manner using a combination of single and paired-end reads of different lengths (76 nt and 108 nt, respectively. The reads were aligned the genome (FlyBase release 5.22) using TopHat  with up to two mismatches allowed and a mapping limit of 40 potential locations. Cufflinks  was used to estimate the level of expression of the annotated protein-coding genes. An FPKM (Fragments per Kilobase of gene/transcript model per million fragments mapped) value of 1 was set as a threshold for expressed genes. The expression of any gene in each cell line can be searched using the DRSC Cell Lines Expression Levels web-tool .
High throughput screening
Drosophila RNAi Screening Center
Kinase/Phosphatase RNAi sub-library
Transcription factors RNAi sub-library
Ubiquitin-related genes RNAi sub-library
Transmembrane proteins RNAi sub-library.
Echeverri CJ, Perrimon N: High-throughput RNAi screening in cultured cells: a user's guide. Nat Rev Genet. 2006, 7: 373-84. 10.1038/nrg1836.
Echeverri CJ, Beachy PA, Baum B, Boutros M, Buchholz F, Chanda SK, Downward J, Ellenberg J, Fraser AG, Hacohen N, Hahn WC, Jackson AL, Kiger A, Linsley PS, Lum L, Ma Y, Mathey-Prévôt B, Root DE, Sabatini DM, Taipale J, Perrimon N, Bernards R: Minimizing the risk of reporting false positives in large-scale RNAi screens. Nat Methods. 2006, 3: 777-9. 10.1038/nmeth1006-777.
Mohr S, Bakal C, Perrimon N: Genomic screening with RNAi: Results and challenges. Annu Rev Biochem. 2010, 79: 37-64. 10.1146/annurev-biochem-060408-092949.
Zhang XD, Heyse JF: Determination of sample size in genome-scale RNAi screens. Bioinformatics. 2009, 25: 841-4. 10.1093/bioinformatics/btp082.
Zhang XD: A new method with flexible and balanced control of false negatives and false positives for hit selection in RNA interference high-throughput screening assays. J Biomol Screen. 2007, 12: 645-55. 10.1177/1087057107300645.
Haley B, Foys B, Levine M: Vectors and parameters that enhance the efficacy of RNAi-mediated gene disruption in transgenic Drosophila. Proc Natl Acad Sci USA. 2010
Tschuch C, Schulz A, Pscherer A, Werft W, Benner A, Hotz-Wagenblatt A, Barrionuevo LS, Lichter P, Mertens D: Off-target effects of siRNA specific for GFP. BMC Mol Biol. 2008, 9: 60-10.1186/1471-2199-9-60.
Jackson AL, Burchard J, Schelter J, Chau BN, Cleary M, Lim L, Linsley PS: Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. RNA. 2006, 12: 1179-87. 10.1261/rna.25706.
Birmingham A, Anderson EM, Reynolds A, Ilsley-Tyree D, Leake D, Fedorov Y, Baskerville S, Maksimova E, Robinson K, Karpilow J, Marshall WS, Khvorova A: 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods. 2006, 3: 199-204. 10.1038/nmeth854.
Kulkarni MM, Booker M, Silver SJ, Friedman A, Hong P, Perrimon N, Mathey-Prévôt B: Evidence of off-target effects associated with long dsRNAs in Drosophila melanogaster cell-based assays. Nat Methods. 2006, 3: 833-8.
Ma Y, Creanga A, Lum L, Beachy PA: Prevalence of off-target effects in Drosophila RNA interference screens. Nature. 2006, 443: 359-63. 10.1038/nature05179.
Kondo S, Booker M, Perrimon N: Cross-species RNAi rescue platform in Drosophila melanogaster. Genetics. 2009, 183: 1165-73. 10.1534/genetics.109.106567.
Langer CCH, Ejsmont RK, Schönbauer C, Schnorrer F, Tomancak P: In vivo RNAi rescue in Drosophila melanogaster with genomic transgenes from Drosophila pseudoobscura. PLoS ONE. 2010, 5: e8928-10.1371/journal.pone.0008928.
Nybakken K, Vokes SA, Lin TY, McMahon AP, Perrimon N: A genome-wide RNA interference screen in Drosophila melanogaster cells for new components of the Hh signaling pathway. Nat Genet. 2005, 37: 1323-32. 10.1038/ng1682.
DasGupta R, Nybakken K, Booker M, Mathey-Prevot B, Gonsalves F, Changkakoty B, Perrimon N: A case study of the reproducibility of transcriptional reporter cell-based RNAi screens in Drosophila. Genome Biol. 2007, 8: R203-10.1186/gb-2007-8-9-r203.
DasGupta R, Kaykas A, Moon RT, Perrimon N: Functional genomic analysis of the Wnt-wingless signaling pathway. Science. 2005, 308: 826-33. 10.1126/science.1109374.
Flockhart I, Booker M, Kiger A, Boutros M, Armknecht S, Ramadan N, Richardson K, Xu A, Perrimon N, Mathey-Prevot B: FlyRNAi: the Drosophila RNAi screening center database. Nucleic Acids Res. 2006, 34 (Database issue): D489-94. 10.1093/nar/gkj114.
Cherbas L, Willingham A, Zhang D, Yang L, Zou Y, Eads BD, Carlson JW, Landolin JM, Kapranov P, Dumais J, Samsonova A, Choi JH, Roberts J, Davis CA, Tang H, van Baren MJ, Ghosh S, Dobin A, Bell K, Lin W, Langton L, Duff MO, Tenney AE, Zaleski C, Brent MR, Hoskins RA, Kaufman TC, Andrews J, Graveley BR, Perrimon N, Celniker SE, Gingeras TR, Cherbas P: The transcriptional diversity of 25 Drosophila cell lines. Genome Research.
Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, Millburn G, Osumi-Sutherland D, Schroeder A, Seal R, Zhang H, The FlyBase Consortium: FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Research. 2009, 37: D555-D559. 10.1093/nar/gkn788.
Birmingham A, Selfors LM, Forster T, Wrobel D, Kennedy CJ, Shanks E, Santoyo-Lopez J, Dunican DJ, Long A, Kelleher D, Smith Q, Beijersbergen RL, Ghazal P, Shamu CE: Statistical methods for analysis of high-throughput RNA interference screens. Nat Methods. 2009, 6: 569-75. 10.1038/nmeth.1351.
Baeg GH, Zhou R, Perrimon N: Genome-wide RNAi analysis of JAK/STAT signaling components in Drosophila. Genes Dev. 2005, 19: 1861-70. 10.1101/gad.1320705.
Morrison DK, Murakami MS, Cleghon V: Protein kinases and phosphatases in the Drosophila genome. J Cell Biol. 2000, 150: F57-62. 10.1083/jcb.150.2.F57.
Arbouzova NI, Zeidler MP: JAK/STAT signalling in Drosophila: insights into conserved regulatory and cellular functions. Development. 2006, 133: 2605-16. 10.1242/dev.02411.
Müller P, Boutros M, Zeidler MP: Identification of JAK/STAT pathway regulators--insights from RNAi screens. Semin Cell Dev Biol. 2008, 19: 360-9.
Gene Expression Levels by Cell Line. [http://www.flyrnai.org/cgi-bin/RNAi_expression_levels.pl]
Kaplow IM, Singh R, Friedman A, Bakal C, Perrimon N, Berger B: RNAiCut: automated detection of significant genes from functional genomic screens. Nat Methods. 2009, 6: 476-7. 10.1038/nmeth0709-476.
Wang L, Tu Z, Sun F: A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila. BMC Genomics. 2009, 10: 220-10.1186/1471-2164-10-220.
Collinet C, Stöter M, Bradshaw CR, Samusik N, Rink JC, Kenski D, Habermann B, Buchholz F, Henschel R, Mueller MS, Nagel WE, Fava E, Kalaidzidis Y, Zerial M: Systems survey of endocytosis by multiparametric image analysis. Nature. 2010, 464: 243-9. 10.1038/nature08779.
König R, Zhou Y, Elleder D, Diamond TL, Bonamy GM, Irelan JT, Chiang CY, Tu BP, De Jesus PD, Lilley CE, Seidel S, Opaluch AM, Caldwell JS, Weitzman MD, Kuhen KL, Bandyopadhyay S, Ideker T, Orth AP, Miraglia LJ, Bushman FD, Young JA, Chanda SK: Global analysis of host-pathogen interactions that regulate early-stage HIV-1 replication. Cell. 2008, 135: 49-60.
König R, Chiang CY, Tu BP, Yan SF, DeJesus PD, Romero A, Bergauer T, Orth A, Krueger U, Zhou Y, Chanda SK: A probability-based approach for the analysis of large-scale RNAi screens. Nat Methods. 2007, 4: 847-9.
Ramadan N, Flockhart I, Booker M, Perrimon N, Mathey-Prévôt B: Design and implementation of high-throughput RNAi screens in cultured Drosophila cells. Nat Protoc. 2007, 2: 2245-64. 10.1038/nprot.2007.250.
de Hoon MJ, Imoto S, Nolan J, Miyano S: Open source clustering software. Bioinformatics. 2004, 20: 1453-4. 10.1093/bioinformatics/bth078.
Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-11. 10.1093/bioinformatics/btp120.
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28: 511-5. 10.1038/nbt.1621.
Jiang D, Zhao L, Clapham DE: Genome-wide RNAi screen identifies Letm1 as a mitochondrial Ca2+/H+ antiporter. Science. 2009, 326: 57-8. 10.1126/science.1180482.
Sessions OM, Barrows NJ, Souza-Neto JA, Robinson TJ, Hershey CL, Rodgers MA, Ramirez JL, Dimopoulos G, Yang PL, Pearson JL, Garcia-Blanco MA: Discovery of insect and human dengue virus host factors. Nature. 2009, 458: 1047-50. 10.1038/nature07967.
Akimana C, Al-Khodor S, Abu Kwaik Y: Host factors required for modulation of phagosome biogenesis and proliferation of Francisella tularensis within the cytosol. PLoS One. 2010, 5: e11025-10.1371/journal.pone.0011025.
DRSC RNAi Sub-Libraries: [http://www.flyrnai.org/DRSC-SUB.html]
RNAi Core Facility at New York University (NYU), [http://rnainy.med.nyu.edu/]
We would like to thank Chi Yun, Shauna Katz, and Ramanuj Dasgupta at New York University for assisting with the construction of some of the RNAi sub-libraries. We would also like to thank the modENCODE consortium and in particular, Brenton Graveley for help working with the Illumina sequencing data. We are grateful to Bernard Mathey-Prevot for comments on the manuscript and insightful discussions. This work was supported by R01 GM067761 (NIGMS/NIH) and U01 HG004271 (NHGRI/NIH). In addition, Y.K. is supported by the Damon Runyon Cancer Research Foundation, S.E.M. is supported in part by the Dana Farber/Harvard Cancer Center, and N.P. is an investigator of the Howard Hughes Medical Institute.
The authors declare that they have no competing interests.
MB performed statistical analysis of DRSC RNAi screens and clustered dsRNAs that target ribosome and proteasome components, performed statistical analysis of the K/P JAK/STAT screen and interpreted its results, created the model of RNAi libraries with multiple reagents per gene and drafted the manuscript. AAS analyzed the deep-sequencing data for five Drosophila cell-line transcriptomes. YK performed the K/P JAK/STAT RNAi screen. IF developed the web-tool to query Drosophila cell-line gene expression data. SEM runs the DRSC, participated in screen design, and helped draft the manuscript. NP conceived of the study and helped draft the manuscript. All authors read and approved the final manuscript.