The impact of RNA sequence library construction protocols on transcriptomic profiling of leukemia
© The Author(s). 2017
Received: 23 March 2017
Accepted: 8 August 2017
Published: 17 August 2017
RNA sequencing (RNA-seq) has become an indispensable tool to identify disease associated transcriptional profiles and determine the molecular underpinnings of diseases. However, the broad adaptation of the methodology into the clinic is still hampered by inconsistent results from different RNA-seq protocols and involves further evaluation of its analytical reliability using patient samples. Here, we applied two commonly used RNA-seq library preparation protocols to samples from acute leukemia patients to understand how poly-A-tailed mRNA selection (PA) and ribo-depletion (RD) based RNA-seq library preparation protocols affect gene fusion detection, variant calling, and gene expression profiling.
Overall, the protocols produced similar results with consistent outcomes. Nevertheless, the PA protocol was more efficient in quantifying expression of leukemia marker genes and showed better performance in the expression-based classification of leukemia. Independent qRT-PCR experiments verified that the PA protocol better represented total RNA compared to the RD protocol. In contrast, the RD protocol detected a higher number of non-coding RNA features and had better alignment efficiency. The RD protocol also recovered more known fusion-gene events, although variability was seen in fusion gene predictions.
The overall findings provide a framework for the use of RNA-seq in a precision medicine setting with limited number of samples and suggest that selection of the library preparation protocol should be based on the objectives of the analysis.
RNA sequencing (RNA-seq) has become an important technology in the comprehensive analysis of disease transcriptomes and holds great promise for clinical applications including disease diagnosis, therapeutic selection, and precision medicine strategies [1–4]. The technique has been particularly insightful in understanding the pathogenesis and classification of leukemia [5, 6]. For example, it has enabled identification of a wide variety of clinically relevant predictive expression biomarkers [4, 7], fusion-genes and recurrent mutations [8, 9], expressed variants , and alternative splicing events  in different leukemia types. However, as a relatively new technology, sample preparation protocols and data analysis methods are still in their infancy and require further testing before RNA-seq can be translated to standard clinical practice .
Generation of a sequencing library for RNA-seq analysis is a complex, multi-step process and a potential source of significant variation [11, 12]. This process is most commonly carried out using poly-A-tailed mRNA selection (PA) or rRNA depletion (RD) to eliminate rRNAs that are naturally abundant in the sample and which would otherwise dominate the sequence data [13, 14]. However, both of these mainstream methods have their own advantages and limitations. For example, recent studies have noted that the RD protocol captures a wide repertoire of transcripts [15, 16] and works efficiently with degraded RNA [12, 15]. The high number of intron mapping reads in RD datasets may also be advantageous in understanding pre-mRNA dynamics and the post-transcriptional impact of microRNAs . In contrast, Li et al. have reported PA libraries to contain less intronic reads than RD libraries  thereby offering a more cost-effective solution for gene expression studies . The PA method also appears to outperform the RD protocol in detecting differentially expressed genes [15, 18]. However, the assessments of different RNA-seq library preparation protocols have mostly relied on non-clinical samples [16, 18–20], emphasizing the need for systematic comparison of library preparation protocols using patient samples. To address this need, our comparative analysis provides recommendations for the application of RNA-seq in clinical or pre-clinical settings with a limited number of samples.
In this study, we tested the performance of PA and RD protocols on samples from acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) patients in a personalized medicine setting. For each patient sample we generated two RNA-seq libraries using the PA and/or RD protocols. We then assessed the effects of the two different protocols on i) expression of protein coding and non-coding RNAs, ii) differential gene expression analysis, iii) pathway analysis, iv) fusion gene detection, and v) expressed variant calling. In addition, we measured the variability introduced by library preparation in technical replicates along with biological replicates and developed metrics applicable in routine medical practice and other small-n settings by integrating RNA-seq data with variant, biomarker, and ex vivo drug sensitivity and resistance testing (DSRT) data. Our analyses showed that PA and RD protocols produced consistent results and that patient heterogeneity represented the largest source of variation. However, the RD method captured more transcriptome features whereas PA outperformed the RD protocol in detecting differentially expressed genes and leukemic markers. Importantly, some of the observed discrepancies were clinically relevant and therefore selection of the protocol is a crucial step in clinical decision-making. Our results are directly relevant for researchers and healthcare professionals aiming to apply RNA-seq in a precision medicine setting to examine transcriptomes of hematological diseases for clinical assessment and indicate that the selection of the library preparation protocol should be guided by the study objectives.
The Helsinki University Hospital Ethics Committee has approved the study and collection of samples (permit numbers 239/13/03/00/2010, 303/13/03/01/2011). Bone marrow (BM) aspirates from two AML and two ALL patients were collected after signed informed consent and with protocols in accordance with the Declaration of Helsinki. Mononuclear cells (MNCs) were isolated by density gradient separation from the BM of the patients (Ficoll-Paque PREMIUM; GE Healthcare; Little Chalfont, Buckinghamshire, UK).
mRNA purification and library construction
Total RNA was extracted from MNCs using the Qiagen miRNeasy kit (Qiagen, Hilden, Germany). The kit is capable of isolating all types of RNA from a minimal amount of starting material. Short RNAs (< 200 nt) present in the total RNA were removed prior to preparation of the RNA-seq libraries. Next, RNA was quantified using the Qubit fluorometer (Thermo Fisher, Carlsbad, CA, USA), while the quality of the RNA samples was measured using a Bioanalyzer instrument and RNA nano chips (Agilent, Santa Clara, CA, USA). For the PA protocol, 2.5-5 microgram of total RNA from the ALL 542 and AML 800 samples was then subjected to oligo(dT) selection using the Dynabeads® mRNA Purification Kit (Thermo Fisher) as per the manufacturer’s instructions. The RD protocol was carried out from 2.5-5 microgram of total RNA from ALL 542, ALL 668, AML 800 and AML 1867 samples using the Ribo-Zero™ rRNA Removal Kit (Epicentre, Madison, WI, USA) as per the manufacturer’s instructions. After PA or RD selection, the samples were purified using Agencourt AMPure XP SPRI beads (Beckman Coulter, Brea, CA, USA) to remove chemical contaminants and short RNAs less than 200 nt in length.
PA and RD samples were further reverse transcribed to double stranded cDNA using the SuperScript Double-Stranded cDNA Synthesis Kit (Thermo Fisher). Random hexamers (New England BioLabs, Ipschwich, MA, USA) were used for priming the first strand synthesis reaction. Samples were prepared for RNA-seq using Illumina compatible Epicentre Nextera™ technology. After limited cycle PCR the RNA-seq libraries were size selected (350–700 bp fragments) in 2% agarose gel followed by purification with the QIAquick gel extraction kit (Qiagen).
Each transcriptome was loaded to occupy one third of the lane capacity in the flow cell. The cBot-2 system and TruSeq PE Cluster Kit v3 (Illumina, San Diego, CA, USA) were used for cluster generation, and TruSeq SBS Kit v3-HS reagent kit and HiSeq2000 instrument (Illumina, San Diego, CA, USA) was used to generate paired 100-bp reads according to the manufacturer’s instructions. Nextera Read Primers 1 and 2 as well as Nextera Index Read Primer (Illumina) were used for paired-end sequencing and index read sequencing, respectively.
Detailed descriptions of the data analysis methods, tools and information of used published data are provided in Additional file 1.
Real-time quantitative reverse transcription-PCR (qRT-PCR)
Total RNA was extracted from two patients (ALL 542 and AML 800) and four breast cancer cell lines BT-474, MCF-7, KPL-4 and SKBR3 with on column DNase treatment. The RNA was quantified using the Qubit fluorometer. For each sample, the RNA was divided into three fractions for total RNA and PA and RD processing. PA capture was carried out using Dynabeads mRNA Purification Kit (Thermo Fisher). RD was performed with Ribo-Zero Magnetic Gold Kit (Epicentre). The cDNA was synthesized using SuperScript III Reverse Transcriptase (Thermo Fisher). The qRT-PCR reactions were prepared using 10 ng of cDNA from each cell line or patient sample plus the iQ SYBR Green Super Mix (Bio-Rad, Hercules, CA, USA), and reactions run on the CFX96 Real Time System instrument (Bio-Rad). Normalized fold expression values were calculated by the ΔΔCt method using B2M, GAPDH, PGK1, and RPLP0 as reference genes and total RNA as control . The primer sequences are listed in Additional file 2: Table S8.
PCR and Sanger sequencing
To validate the suspected fusion genes standard PCR was performed on cDNA from the ALL 542 and AML 800 samples. The cDNA was synthesized from total RNA using SuperScript III Reverse Transcriptase (Thermo Fisher). Primers were designed for ST3GAL1-NDAG1, MCM4-PRKDC, HBB-B2M, PQLC1-CTDP1, NCL-NR4A1 (Additional file 2: Table S8). The cDNA was amplified with Taq polymerase and using the T Professional thermocycler (Biometra, Göttingen, Germany). No template and GAPDH were included as negative and positive controls, respectively. PCR products were run on a 3% agarose gel, stained with GelRed Nucleic Acid Stain (Biotium, Fremont, CA, USA) and visualized on a standard UV trans illuminator. The DNA fragment for the HBB-B2M fusion gene was excised from the gel, cleaned using the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel, Düren, Germany), and quantified using the Qubit dsDNA HS kit (Thermo Fisher). 4.5 ng of the fragment was used for Sanger sequencing using both forward and reverse primers of the HBB-B2M fusion gene and standard sequencing protocols.
Statistical analyses were performed with R version 3.3.1 (2016-06-21) and Prism software version 6.0 (GraphPad Software, San Diego, CA, USA). In the qRT-PCR analysis, two-tailed Student’s T-test was used to analyze gene expression and P-values <0.05 were considered as statistically significant. Statistical dependence between two variables was calculated by Spearman’s rank, Pearson’s correlation analysis and hypergeometric distribution as implemented in R.
Overview of the study design
Reads and read mapping statistics
We observed in total 30,205 features in the PA libraries as compared to 32,830 features detected in their matched RD counterparts using an RPKM (reads per kilobase per million) value of 0.125 as a threshold for minimum expression  (Additional file 3: Figure S2). While rather similar sets of features were captured, some differences were observed in specific transcript classes, such as processed pseudogene (7.82% to 7.03%), lincRNA (5.35% to 4.67%), and snRNA (6.99% to 6.54%) elements, as presented in a biodetection plot (Fig. 2b, Additional file 2: Table S1). Overall, the RD protocol detected 20.8 to 26.3% more of these features compared to PA. Antisense and miRNA features were also called at greater level by the RD protocol, while rRNA elements were enriched in the PA protocol. Notably, discrepancies in the calling of lincRNA, miRNA, and antisense genes resulted mainly from protocol differences, given the similar levels of these types of calls across the technical replicates (Fig. 2b). We also observed subtle but biologically intriguing variations in the detection of protein coding gene transcripts. For example, a total of 1380 protein-coding genes were called discordantly between the matched PA and RD libraries at RPKM threshold of 0.125 (Additional file 3: Figure S3), which is close to twice that observed between technical replicates (patient 668 and 1867; Additional file 3: Figure S4). Further analysis of discordantly identified protein coding transcripts revealed most of these to be attributable to RNA preparation (Additional file 2: Table S2). On one hand, 55 of a total of 71 histone genes (hypergeometric distribution p-value <0.05) were overlooked by the PA library protocol at RPKM value of 0.125. On the other hand, protein-coding genes relevant to cancer such as TGF-β1 mediating the activation of TGF-β/SMAD signaling pathway in ALL cells , BCL3, a proto-oncogene candidate associated with B-cell leukemia , and BRD4, which is associated with transcriptional deregulation in leukemia  were overlooked by the RD protocol at this threshold.
rRNA removal efficiency
To compare the rRNA depletion efficiency of the RD and PA protocols, the fraction of reads aligned to known human rRNA sequences in each library was quantified by aligning reads to the rRNA precursor sequences with the RNA-SeQC software . Rather unexpectedly, the PA libraries exhibited higher rRNA mapping read rates than RD libraries (1.8% vs. 0.6%; Fig. 2d). However, the rRNA mapping rates varied highly (2.24% to 0.04% and 0.67% to 0.48%) even between technical replicate libraries.
Reproducibility of transcript abundances
Assessment of the concordance of protein-coding transcript abundances was made by measuring the correlation of RPKMs between different datasets (Additional file 3: Figure S6). We found a high level of concordance between the RPKMs of matched PA and RD samples (Spearman rho >0.95) and technical replicates (Spearman rho >0.98) in agreement with previously published results . Since RPKM values between protocol-matched biological replicates (Spearman rho >0.91) and between AML and ALL samples (Spearman rho >0.88) were less correlated, patient heterogeneity appeared to represent the largest source of variation in these data. The hierarchical clustering of protein-coding transcripts with RPKM >4 and coefficient of variation >20 also corroborated the above findings and revealed that clustering was driven by disease type and patient rather than library type (Fig. 2e).
Accuracy in expression-based leukemia classification
Independent validation of RD and PA expression estimates
Fusion gene detection
To confirm some of the low confidence fusion genes, we tested the presence of five fusion partners with discordant fusion detections in two samples (AML 800 and ALL 542) using a standard PCR assay. PCR amplification of ST3GAL1-NDRG1 (5 spanning reads in ALL 542-PA), MCM4-PRKDC (3 spanning reads in ALL 542-RD), PQLC1-CTDP1 (6 spanning reads in AML 800-RD), NCL-NR4A1 (3 spanning reads in AML 800-PA) and HBB-B2M (8 spanning reads in ALL 542-RD) failed to detect the fusion gene, except HBB-B2M in ALL 542-RD. However, Sanger sequencing did not confirm the HBB-B2M fusion, which could be due to low abundance of the fusion in the sample or false positive prediction by the FusionCatcher tool (Additional file 2: Table S9).
In addition to the capture efficiency analysis, we performed variant effect annotation and filtering analysis on discovered variants. On average, 131,623 variants and 293 filtered variants were discovered in the process (Additional file 2: Table S5). Among these were several affecting one of the top 20 mutated genes for AML or ALL (Additional file 2: Table S6). For example, ALL 542-PA and ALL 542-RD had mutations in CREBBP (p.L161 fs) and ABL1 (p.T315I; present in 73 samples in COSMIC) genes, while AML 800-PA and AML 800-RD supported a mutation in the TP53 gene (p.R273C; present in 142 COSMIC samples). In addition to these three variants supported by both library preparation protocols, AML 800-PA revealed the presence of a mutation in the TET2 gene (p.V1949 fs). The library replicates detected mutations in DNMT3A, NRAS, TET2, BCOR, and CREBBP genes, indicating that variant discovery had high technical reproducibility and concordance.
Effect of library preparation on pathway enrichment analysis
With the unique ability to comprehensively characterize transcriptomes, RNA-seq has the potential to revolutionize clinical testing for a wide range of diseases. However, before the broader translation of RNA-seq into clinical practice, additional knowledge is needed to guide the selection of the library preparation protocol, as different alternatives can have a significant effect on downstream analysis and interpretation of RNA-seq outputs [12, 15, 18]. The performance of RNA-seq library preparation protocols has mostly been tested on non-clinical samples [16, 18–20] and the impact of protocols on clinical decision-making has not yet been addressed systematically. Some studies have even reported marked inconsistencies between RNA-seq data originating from differently processed libraries [19, 38], indicating that RNA-seq library preparation could also influence clinical decision-making.
To fill the gap in knowledge and assess the role of library preparation protocols on the detection of clinically important molecular characteristics, we applied two mainstream library preparation protocols to samples from leukemia patients and systematically compared their performance in a precision medicine setting. Although the small number of patient samples in our study is a limitation, especially for differential gene expression and pathway analysis, it absolutely mimics the clinical scenario and provides a fair equivalent to the current personalized medicine practices. Also, it is important to study the behavior of RNA-seq in a small-n setting to understand how RNA-seq performs in situations where only a few samples are available with no possibility of multiple replicates. Otherwise there is a risk that RNA-seq protocols are evaluated using metrics non-optimal for the goals of precision medicine and a wrong protocol is translated to standard clinical practice. Importantly, our study highlighted important differences between RNA-seq protocols, some of which were even clinically relevant, and indicated that that library preparation protocols have differing preferences for differential gene expression analysis, transcriptome characterization, fusion gene detection, and variant discovery.
Read and read mapping statistics demonstrated PA and RD libraries were largely comparable. All libraries were constructed from high quality RNA, had approximately similar insert sizes, and contained roughly the same amount of different types of reads. In line with recently published studies [12, 18], the PA protocol captured more transcripts emanating from exonic regions than the RD protocol. Given that these exon-mapping reads positively affect differential expression analysis , the PA protocol is the preferred method in understanding differential expression. In contrast, the higher intergenic or intronic mapped reads counts in the RD protocol can be advantageous in understanding pre-mRNA dynamics and identifying previously uncharacterized transcripts . Moreover, the RD protocol captured 6.3% more non-coding RNAs compared to the PA protocol and depleted less non-coding RNAs in our qRT-PCR results suggesting its superiority at characterizing the non-coding RNA landscapes of the leukemia samples. In agreement with Sultan et al.,  we found more rRNA mapping reads in the PA libraries than RD libraries, suggesting higher efficiency of the RD method to remove rRNAs compared to the PA method. This finding was further supported by expression estimation of rRNA genes in two patient samples by qRT-PCR. However, the rRNA mapping rates varied highly between technical replicate libraries, indicating that this step was greatly affected by experiment and preparation-dependent factors.
Regarding protein coding gene identification, the RD protocol performed a bit better and captured a wider repertoire. This method, for example, detected many non-polyadenylated protein-coding genes missed by the PA protocol, corroborating results from an earlier study . Despite the better capture efficiency of the RD protocol, several known oncogenes were missed by this method. For example, TGF-β1, BCL3 and BRD4, which all have been linked with leukemia development [23–25] were overlooked by the RD protocol. This suggests that the PA protocol may suit better for characterization of leukemia transcriptomes and indicates that selection of the RNA-seq library protocol should be guided by the objectives of the study.
Protein coding transcript abundances were largely in agreement and a high level of concordance was found between the RPKMs of matched PA and RD samples. If extrapolated to other leukemia studies, our results indicate that biological features impact more on RNA-seq data reproducibility than library preparation. Moreover, disease heterogeneity rather than the protocol limits the comparison of RNA-seq data across leukemia studies. Nevertheless, the PA protocol quantified expression differences between AML and ALL better and provided constantly higher RPKM values for leukemia marker genes. This implies the suitability of the PA protocol for differential expression analysis is partly attributed to the higher number of exon-mapping reads in PA libraries, indicating that the read depth should be a key consideration in the adaption of RNA-seq to leukemia samples.
Validation by qRT-PCR also highlighted the suitability of the PA protocol for gene expression analysis. In particular, the PA protocol detected mRNAs efficiently and more accurately reproduced expression differences in clinical samples. Moreover, the PA protocol performed better in the analysis of breast cancer cell lines, emphasizing that the effect of the library preparation is consistent irrespective of the source of RNA material and type of cancer. In contrast, the RD protocol efficiently depleted rRNA molecules compared to the PA protocol and was better suited for the non-coding RNA detection. Overall, the qRT-PCR results suggested that PA better mimics total RNA when analyzing mRNA transcripts, which could be explained by the higher efficiency of the qRT-PCR reaction in PA libraries and presence of mature mRNAs in this RNA preparation.
Abnormal fusion genes caused by chromosomal rearrangement are important genomic events in leukemia and characterize a substantial population of the leukemia cases. For example, the BCR-ABL1 fusion gene is detected in 25–30% of young adult ALL cases  and is a clinical marker for treatment with targeted drugs. Markedly, both RNA-seq protocols successfully identified all known clinical diagnostic fusion genes BCR-ABL1, MLLT4-MLL and TCF3-PBX1 in the patient material with high numbers of supporting reads. This indicates that RNA-seq and fusion gene analysis can sensitively detect fusions despite some previous claims to the contrary . In addition, many potentially false positive predictions were made and fusion genes were called rather discordantly even between protocol and technical replicates. However, most of the discordantly detected gene fusions were supported only by a small number of spanning read pairs, indicating that precision could be improved significantly using stricter filtering parameters. Results from PCR amplification of low confidence fusion genes supported by <10 reads also suggest that fusion genes supported by few spanning reads may be false positives and should be validated by other methods, if these fusions are of interest. However, validation methods such as PCR amplification followed by Sanger sequencing may not be sensitive to detect low expressed fusions.
Overall, this comparative study provides preliminary guidelines for the use of RNA-seq in a personalized medicine and other small-n setting, especially for hematological malignancies. In general, it showed that both PA and RD protocols produced consistent measures and were largely of similar usability. However, the PA protocol outperformed the RD protocol in more tests and it showed improved performance in gene expression analysis, classification of leukemia patients, quantification of leukemic marker genes, and variant analysis, which are all important for clinical sample assessment. Given that the study included only a limited number replicates, it would be beneficial to validate results using a larger cohort. Additionally, the effect of cDNA synthesis on library composition should be evaluated.
The authors would like to thank the patients who contributed samples to the study and personnel at the FIMM Technology Center including Pekka Ellonen, Aino Palva, Anne Vaittinen, Minna Suvela, and Henrikki Almusa for their technical assistance. Mika Kontro provided the clinical details of the patients.
The study was supported by funds from Finnish Funding Agency for Technology and Innovation (DNro 1488/31/09 and DNro 3137/31/13) and the Cancer Society of Finland (Syöpäjärjestöt). AK was supported by grants from the University of Helsinki Research Foundation and the Doctoral Programme in Biomedicine (DPBM), University of Helsinki.
Availability of data and materials
The data sets supporting the results of this article are included within the article and its additional files.
AK, MK and CAH designed the study. AK and MK performed the bioinformatics data analyses. AK and AP performed the cell culture work and qRT-PCR experiments. AK wrote the first draft. MK and CAH guided the analysis and supervised the work. AK, MK, PM, OK and CAH all contributed to the preparation of the manuscript. All authors approved the final version.
Ethics approval and consent to participate
The Helsinki University Hospital Ethics Committee approved the study and collection of samples (permit numbers 239/13/03/00/2010, 303/13/03/01/2011).
Consent for publication
Bone marrow (BM) aspirates from two AML and two ALL patients were collected after signed informed consent and with protocols in accordance with the Declaration of Helsinki.
CAH has received research funding from Celgene, Orion, Pfizer, and Novartis unrelated to this study. OK has research funding from Bayer and Roche unrelated to this study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Van Keuren-Jensen K, Keats JJ, Craig DW. Bringing RNA-seq closer to the clinic. Nat Biotechnol. 2014;32(9):884–5.View ArticlePubMedGoogle Scholar
- Byron SA, Van Keuren-Jensen KR, Engelthaler DM, Carpten JD, Craig DW. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet. 2016;17(5):257–71.View ArticlePubMedGoogle Scholar
- Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive assessments of RNA-seq by the SEQC consortium: FDA-led efforts advance precision medicine. Pharmaceutics. 2016;8(1):8.View ArticlePubMed CentralGoogle Scholar
- Kontro M, Kumar A, Majumder MM, Eldfors S, Parsons A, Pemovska T, Saarela J, Yadav B, Malani D, Fløisand Y. HOX gene expression predicts response to BCL-2 inhibition in acute myeloid leukemia. Leukemia. 2017;31(2):301–9.View ArticlePubMedGoogle Scholar
- Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;2013(368):2059–74.View ArticleGoogle Scholar
- Andersson AK, Ma J, Wang J, Chen X, Gedman AL, Dang J, Nakitandwe J, Holmfeldt L, Parker M, Easton J. The landscape of somatic mutations in infant MLL-rearranged acute lymphoblastic leukemias. Nat Genet. 2015;47(4):330–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Lavallee VP, Krosl J, Lemieux S, Boucher G, Gendron P, Pabst C, Boivin I, Marinier A, Guidos CJ, Meloche S, Hebert J, Sauvageau G. Chemo-genomic interrogation of CEBPA mutated AML reveals recurrent CSF3R mutations and subgroup sensitivity to JAK inhibitors. Blood. 2016;127(24):3054–61.View ArticlePubMedGoogle Scholar
- Lilljebjorn H, Agerstam H, Orsmark-Pietras C, Rissler M, Ehrencrona H, Nilsson L, Richter J, Fioretos T. RNA-seq identifies clinically relevant fusion genes in leukemia including a novel MEF2D/CSF1R fusion responsive to imatinib. Leukemia. 2014;28(4):977–9.View ArticlePubMedGoogle Scholar
- Lavallee VP, Lemieux S, Boucher G, Gendron P, Boivin I, Armstrong RN, Sauvageau G, Hebert J. RNA-sequencing analysis of core binding factor AML identifies recurrent ZBTB7A mutations and defines RUNX1-CBFA2T3 fusion signature. Blood. 2016;127(20):2498–501.View ArticlePubMedGoogle Scholar
- Gianfelici V, Chiaretti S, Demeyer S, Di Giacomo F, Messina M, La Starza R, Peragine N, Paoloni F, Geerdens E, Pierini V, Elia L, Mancini M, De Propris MS, Apicella V, Gaidano G, Testi AM, Vitale A, Vignetti M, Mecucci C, Guarini A, Cools J, Foa R. RNA sequencing unravels the genetics of refractory/relapsed T-cell acute lymphoblastic leukemia. Prognostic and therapeutic implications. Haematologica. 2016;101(8):941–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8.View ArticlePubMedGoogle Scholar
- Li S, Tighe SW, Nicolet CM, Grove D, Levy S, Farmerie W, Viale A, Wright C, Schweitzer PA, Gao Y, Kim D, Boland J, Hicks B, Kim R, Chhangawala S, Jafari N, Raghavachari N, Gandara J, Garcia-Reyero N, Hendrickson C, Roberson D, Rosenfeld J, Smith T, Underwood JG, Wang M, Zumbo P, Baldwin DA, Grills GS, Mason CE. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat Biotechnol. 2014;32(9):915–25.View ArticlePubMedGoogle Scholar
- Lindberg J, Lundeberg J. The plasticity of the mammalian transcriptome. Genomics. 2010;95(1):1–6.View ArticlePubMedGoogle Scholar
- O'Neil D, Glowatz H, Schlumpberger M: Ribosomal RNA depletion for efficient use of RNA-seq capacity. Curr Protoc Mol Biol 2013, Chapter 4;Unit 4.19.Google Scholar
- Sultan M, Amstislavskiy V, Risch T, Schuette M, Dokel S, Ralser M, Balzereit D, Lehrach H, Yaspo ML: Influence of RNA extraction methods and library selection schemes on RNA-seq data. BMC Genomics 2014, 15;675-2164-15-675.Google Scholar
- Cui P, Lin Q, Ding F, Xin C, Gong W, Zhang L, Geng J, Zhang B, Yu X, Yang J, Hu S, Yu J. A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing. Genomics. 2010;96(5):259–65.View ArticlePubMedGoogle Scholar
- Gaidatzis D, Burger L, Florescu M, Stadler MB. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat Biotechnol. 2015;33(7):722–9.View ArticlePubMedGoogle Scholar
- Zhao W, He X, Hoadley KA, Parker JS, Hayes DN, Perou CM: Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling. BMC Genomics 2014, 15;419-2164-15-419.Google Scholar
- Sun Z, Asmann YW, Nair A, Zhang Y, Wang L, Kalari KR, Bhagwate AV, Baker TR, Carr JM, Kocher JP, Perez EA, Thompson EA. Impact of library preparation on downstream analysis and interpretation of RNA-Seq data: comparison between Illumina PolyA and NuGEN ovation protocol. PLoS One. 2013;8(8):e71745.View ArticlePubMedPubMed CentralGoogle Scholar
- Tariq MA, Kim HJ, Jejelowo O, Pourmand N. Whole-transcriptome RNAseq analysis from minute amount of total RNA. Nucleic Acids Res. 2011;39(18):e120.View ArticlePubMedPubMed CentralGoogle Scholar
- Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative C(T) method. Nat Protoc. 2008;3(6):1101–8.View ArticlePubMedGoogle Scholar
- Hackett NR, Butler MW, Shaykhiev R, Salit J, Omberg L, Rodriguez-Flores JL, Mezey JG, Strulovici-Barel Y, Wang G, Didon L. RNA-Seq quantification of the human small airway epithelium transcriptome. BMC Genomics. 2012;13(1):82.View ArticlePubMedPubMed CentralGoogle Scholar
- Rouce RH, Shaim H, Sekine T, Weber G, Ballard B, Ku S, Barese C, Murali V, Wu M, Liu H. The TGF-β/SMAD pathway is an important mechanism for NK cell immune evasion in childhood B-acute lymphoblastic leukemia. Leukemia. 2016;30(4):800–11.View ArticlePubMedGoogle Scholar
- Chen CY, Lee DS, Yan YT, Shen CN, Hwang SM, Lee ST, Hsieh PC. Bcl3 bridges LIF-STAT3 to Oct4 signaling in the maintenance of naive Pluripotency. Stem Cells. 2015;33(12):3468–80.View ArticlePubMedGoogle Scholar
- Gilan O, Lam EY, Becher I, Lugo D, Cannizzaro E, Joberty G, Ward A, Wiese M, Fong CY, Ftouni S, Tyler D, Stanley K, MacPherson L, Weng CF, Chan YC, Ghisi M, Smil D, Carpenter C, Brown P, Garton N, Blewitt ME, Bannister AJ, Kouzarides T, Huntly BJ, Johnstone RW, Drewes G, Dawson SJ, Arrowsmith CH, Grandi P, Prinjha RK, Dawson MA. Functional interdependence of BRD4 and DOT1L in MLL leukemia. Nat Struct Mol Biol. 2016;23(7):673–81.View ArticlePubMedGoogle Scholar
- DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C, Reich M, Winckler W, Getz G. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics. 2012;28(11):1530–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Haferlach T, Kohlmann A, Wieczorek L, Basso G, Kronnie GT, Bene MC, De Vos J, Hernandez JM, Hofmann WK, Mills KI, Gilkes A, Chiaretti S, Shurtleff SA, Kipps TJ, Rassenti LZ, Yeoh AE, Papenhausen PR, Liu WM, Williams PM, Foa R. Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the international microarray innovations in leukemia study group. J Clin Oncol. 2010;28(15):2529–37.View ArticlePubMedGoogle Scholar
- Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet. 2002;30(1):41–7.View ArticlePubMedGoogle Scholar
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–7.View ArticlePubMedGoogle Scholar
- Kohlmann A, Schoch C, Schnittger S, Dugas M, Hiddemann W, Kern W, Haferlach T. Molecular characterization of acute leukemias by use of microarray technology. Genes Chromosomes Cancer. 2003;37(4):396–405.View ArticlePubMedGoogle Scholar
- Thomas JG, Olson JM, Tapscott SJ, Zhao LP. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 2001;11(7):1227–36.View ArticlePubMedPubMed CentralGoogle Scholar
- Pemovska T, Kontro M, Yadav B, Edgren H, Eldfors S, Szwajda A, Almusa H, Bespalov MM, Ellonen P, Elonen E, Gjertsen BT, Karjalainen R, Kulesskiy E, Lagstrom S, Lehto A, Lepisto M, Lundan T, Majumder MM, Marti JM, Mattila P, Murumagi A, Mustjoki S, Palva A, Parsons A, Pirttinen T, Ramet ME, Suvela M, Turunen L, Vastrik I, Wolf M, Knowles J, Aittokallio T, Heckman CA, Porkka K, Kallioniemi O, Wennerberg K. Individualized systems medicine strategy to tailor treatments for patients with chemorefractory acute myeloid leukemia. Cancer Discov. 2013;3(12):1416–29.View ArticlePubMedGoogle Scholar
- Yadav B, Pemovska T, Szwajda A, Kulesskiy E, Kontro M, Karjalainen R, Majumder MM, Malani D, Murumagi A, Knowles J, Porkka K, Heckman C, Kallioniemi O, Wennerberg K, Aittokallio T. Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies. Sci Rep. 2014;4:5193.View ArticlePubMedPubMed CentralGoogle Scholar
- Edgren H, Murumagi A, Kangaspeska S, Nicorici D, Hongisto V, Kleivi K, Rye IH, Nyberg S, Wolf M, Borresen-Dale AL, Kallioniemi O: Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol 2011, 12(1);R6-2011-12-1-r6. Epub 2011 Jan 19.Google Scholar
- Nicorici D, Satalan M, Edgren H, Kangaspeska S, Murumagi A, Kallioniemi O, Virtanen S, Kilkku O: FusionCatcher-a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 2014, ;011650.Google Scholar
- Greif PA, Eck SH, Konstandin NP, Benet-Pages A, Ksienzyk B, Dufour A, Vetter AT, Popp HD, Lorenz-Depiereux B, Meitinger T, Bohlander SK, Strom TM. Identification of recurring tumor-specific somatic mutations in acute myeloid leukemia by transcriptome sequencing. Leukemia. 2011;25(5):821–7.View ArticlePubMedGoogle Scholar
- Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U, Campbell PJ. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43(Database issue):D805–11.View ArticlePubMedGoogle Scholar
- Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z: GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 2009, 10;48-2105-10-48.Google Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in RNA-seq: a matter of depth. Genome Res. 2011;21(12):2213–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Hoelzer D. Personalized medicine in adult acute lymphoblastic leukemia. Haematologica. 2015;100(7):855–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Kumar S, Vo AD, Qin F, Li H. Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Sci Rep. 2016;6:21597.View ArticlePubMedPubMed CentralGoogle Scholar