- Methodology article
- Open Access
Comprehensive kinome NGS targeted expression profiling by KING-REX
BMC Genomics volume 20, Article number: 307 (2019)
Protein kinases are enzymes controlling different cellular functions. Genetic alterations often result in kinase dysregulation, making kinases a very attractive class of druggable targets in several human diseases. Existing approved drugs still target a very limited portion of the human ‘kinome’, demanding a broader functional knowledge of individual and co-expressed kinase patterns in physiologic and pathologic settings. The development of novel rapid and cost-effective methods for kinome screening is therefore highly desirable, potentially leading to the identification of novel kinase drug targets.
In this work, we describe the development of KING-REX (KINase Gene RNA EXpression), a comprehensive kinome RNA targeted custom assay-based panel designed for Next Generation Sequencing analysis, coupled with a dedicated data analysis pipeline. We have conceived KING-REX for the gene expression analysis of 512 human kinases; for 319 kinases, paired assays and custom analysis pipeline features allow the evaluation of 3′- and 5′-end transcript imbalances as readout for the prediction of gene rearrangements. Validation tests on cell line models harboring known gene fusions demonstrated a comparable accuracy of KING-REX gene expression assessment as in whole transcriptome analyses, together with a robust detection of transcript portion imbalances in rearranged kinases, even in complex RNA mixtures or in degraded RNA.
These results support the use of KING-REX as a rapid and cost effective kinome investigation tool in the field of kinase target identification for applications in cancer biology and other human diseases.
Protein kinases constitute one of the largest families of enzymes that share a highly homologous catalytic domain (the kinase domain), which transfers the gamma phosphate from nucleoside triphosphates (ATP) to protein substrates, activating signal cascades and regulating multiple complex cellular processes. In several human diseases, such as cancer, kinases are often deregulated by gene alterations, leading to their anomalous expression and activation . ‘Druggability’ by small molecule inhibitors, binding the conserved ATP-pocket, makes kinases therapeutically very attractive : more than 500 kinases (the “kinome”) are encoded in the human genome, and kinase inhibitors now account for a quarter of all current drug discovery research and development efforts [3,4,5,6]. The clinical success of tyrosine kinase inhibitors is proven by a number of examples, such as imatinib in BCR–ABL1 fusion-positive leukaemia patients , or the more recent crizotinib and ceritinib in patients with lung carcinomas and mesenchymal tumors harboring anaplastic lymphoma kinase (ALK) fusions [8, 9]. Approved drugs, though, target a very limited portion of the human kinome, leaving much of the kinase therapeutic potential unexplored.
In cancer, besides the investigation of individual kinase genetic alterations , ‘kinomics’ approaches are emerging in the definition of co-expressed kinase functional roles in health and disease , as well as in integrative ‘polypharmacology’ approaches exploring synergizing effects of highly promiscuous kinase inhibitors . Based on these considerations, the quest for novel kinase targets effective in oncogene-defined tumor types  is strongly encouraged to investigate tumor biology and to identify new candidate targets in specific disease contexts, also through the continuous generation of molecular data and the development of novel methods for kinome screening.
While ‘omics’ analysis approaches produce huge amounts of molecular information, requiring substantial computational power for data storage and management, next-generation sequencing (NGS) targeted RNA approaches enable the analysis of focused portions of the transcriptome. Currently, several companies supplying NGS solutions offer custom based RNA assays targeting gene transcripts, isoforms, splice junctions, noncoding RNAs, mutations and expressed fusion genes, with advantages over whole transcriptome sequencing in terms of reduced costs and simplicity of execution. Indeed, a number of commercial and custom approaches have been developed targeting small kinase panels (Illumina TruSight RNA Pan-Cancer panel , Illumina TruSight RNA Fusion Panel , Archer® FusionPlex® NGS assays [14,15,16,17]), allowing the detection of specific kinase gene and isoform expression, mutation and gene fusion events. However, to our knowledge, none of the reported NGS targeted RNA applications allow a comprehensive whole kinome expression analysis.
In this work, we describe the development of KING-REX (KINase Gene RNA EXpression), a kinome RNA targeted custom panel based on the TruSeq Targeted RNA expression Illumina kit (TREx, ) and coupled with a custom dedicated bioinformatics pipeline. We have conceived KING-REX as a solution for human kinome gene expression analysis on small/medium scale Illumina sequencers, requiring reduced computational resources in terms of storage space and data processing, with an additional feature in the analysis pipeline allowing the evaluation of 3′- and 5′-end transcript imbalances as a readout for the prediction of gene rearrangements.
Design of a targeted RNA panel for the profiling of human kinome gene expression
Next Generation Sequencing (NGS) technologies currently offer the possibility to design panels of custom based assays for user defined sequences of interest. The focus of our work was the custom set-up of a targeted RNA procedure for a comprehensive gene expression analysis of the entire human kinome, intended for small/medium scale sequencers. Based on the Illumina TruSeq Targeted RNA Expression (TREx) approach , enabling a custom definition of up to 1000 assay panels, we assembled the KING-REX (KINome Gene RNA EXpression) panel, by selecting pre-designed assays with a specific targeting strategy, to combine the maximum capacity of the custom panel composition with the highest possible kinome coverage.
We started from compiling a comprehensive list of human protein kinases by integrating information from the currently available kinase resources [6, 19,20,21,22,23]. For 514 unique genes, clearly annotated as protein kinases, we retrieved genomic coordinates for 2230 kinase isoforms from the UCSC database, providing an acceptable confidence level for transcript annotation . Kinase domain coordinates were then obtained for 1716 protein kinase isoforms harboring the catalytic domain, as reported in the Superfamily database (, superfamily ID number 56112, containing the ‘Protein kinases, catalytic subunit’ subfamily of interest), and directly mapped onto UCSC transcripts. We visualized all this information via the Integrated Genome Viewer (IGV)  to drive the assembly of a panel of 876 pre-designed amplicon-based Illumina TREx assays, selected to specifically target 512 human kinases, according to the schema depicted in Fig. 1. Briefly, in the panel design we selected a first assay for each kinase (ASSAY IN), targeting the kinase domain common to most of the reported kinase isoforms, thus prioritizing the expressed druggable portion of the target sequences (Fig. 1a). Only for CDK3 and TNNI3K no pre-designed TREx assays were available within their respective kinase domains, so they were excluded from the panel. A second assay (ASSAY OUT) was then selected at the maximum sequence distance from ASSAY IN, targeting a region outside the kinase domain and covering the same isoforms encompassed by ASSAY IN (Fig. 1b). An ASSAY OUT fulfilling these criteria could be initially identified for 274 kinases. For other 45 kinases, it was not possible to cover all the isoforms targeted by ASSAY IN with a unique ASSAY OUT: for these, the ASSAY IN was reselected based on a restricted number of isoforms, in order to allow the selection of an ASSAY OUT according to the above criteria. In these latter cases, the initial assay encompassing the kinase domain covering the maximum number of isoforms was retained (ASSAY ADD) in addition to the restricted ASSAY IN and respective ASSAY OUT (Fig. 1b). In this way, we could balance isoform coverage between ASSAY IN and ASSAY OUT without penalizing the number of isoforms detectable for gene expression (ASSAY ADD). In total, an ASSAY OUT could be included for 319 kinases.
This experimental design enables a more robust evaluation of gene expression for those kinases that are probed with more than one independent assay; at the same time, the detection of uneven expression of kinases through the ASSAY IN and the corresponding ASSAY OUT levels can be exploited as readout to suggest the presence of potential gene rearrangements.
Evaluation of KING-REX performance in detecting kinase gene expression
We evaluated the performance of KING-REX in gene expression quantification by sequencing a panel of 10 colorectal cancer (CRC) cell lines, using KING-REX and whole transcriptome approaches. The same bioinformatics data analysis pipeline was used for both datasets as described in M&M. We calculated the R squared correlation between kinase expression levels measured in the two different protocols in all the analyzed cell lines and observed an average R2 > 0.8 of KING-REX with whole transcriptome measurements.
To test the reproducibility of KING-REX, we processed the UHRR control sample in 6 different experiments and the KM12 cell line in 4 different experiments. The R2 correlation was calculated in both cases, obtaining an average R2 = 0.983 for UHRR and R2 = 0.976 for KM12, respectively.
We then evaluated recall, precision and F-measure metrics of KING-REX in detecting kinase signals with varying gene expression level thresholds, considering in-house whole transcriptome data on the same CRC cell lines as reference, as detailed in M&M. Recall and precision values ranged from over 80 to 96%, regardless of the selected threshold (Table 1). The same analysis was also performed using Cancer Cell Line Encyclopedia (CCLE) RNAseq data as reference , obtaining comparable results (Table 1).
Next, we extended the comparative analysis to a more heterogeneous sample panel by selecting cell lines of different tissue origins (BT-474, breast; HPAC, pancreas; K-562, leukemia; KARPAS 299, lymphoma; NCI-H716, large intestine; SNU-1079, biliary tract; U-118 MG, nervous system) for KING-REX analysis and using publicly available CCLE whole transcriptome RNAseq data as reference . Recall, precision and F-measure values were again comparable to the ones observed with the CRC cell line panel (Table 2).
Evaluation of KING-REX performance in predicting kinase fusion events
In the KING-REX panel assembly, the paired ASSAY IN and corresponding ASSAY OUT, located in opposite 5′ and 3′ transcript ends, support the identification of potential truncated isoforms or gene rearrangements for 319 kinases. For this specific purpose, we have implemented a dedicated pipeline evaluating imbalances between the kinase ASSAY IN and ASSAY OUT signals. We tested the ability of KING-REX to identify imbalanced 5′ vs. 3′ signals in five human cancer cell lines, harboring well known kinase gene fusions: KARPAS 299, KM-12, LC-2/ad, U-118 MG and NCI-H716. KARPAS 299 is a T-cell lymphoma cell line carrying the NPM-ALK gene fusion ; KM-12 is a colorectal cancer cell line we had previously reported to express the chimeric TPM3-TRKA protein ; LC-2/ad is a lung adenocarcinoma cell line harboring a CCD6-RET fusion ; U-118 MG is a glioblastoma cell line characterized by the presence of a FIG(GOPC)-ROS1 rearrangement . NCI-H716 colorectal cancer cell line was included as a control, harboring an amplified full length FGFR2 kinase, whose hyper-activation is due to gene amplification and not to the presence of a concomitant fusion at the FGFR2 C-terminal with COL14A1 gene, conserving an intact kinase sequence . All these cell lines were sequenced in duplicate using KING-REX and analyzed with the kinase fusion event detection pipeline using ASSAY IN and ASSAY OUT measurements (Fig. 2). In the pipeline, after the standard data normalization step (Fig. 2a), results are further processed with a second normalization step, introduced to balance for possible technical differences between ASSAY IN and ASSAY OUT signals, related to primer efficiency, differential end degradation and/or RNA reverse transcription performance. This resulted in the expected clustering of IN and OUT assays belonging to the same cell line (Fig. 2b).
Imbalanced ASSAY IN and ASSAY OUT signals in all the kinases involved in known gene rearrangements were clearly detected by differential gene expression analysis (Table 3). Gene fusions were confirmed by RT-qPCR (Additional file 1: Figure S1). No imbalanced signals could be detected in the negative control cell line NCI-H716, harboring a nearly full length FGFR2 kinase, covered by both ASSAY IN and OUT for FGFR2 (Table 3).
KING-REX limits of detection in gene expression
We explored the limits of detection of KING-REX, both in terms of gene expression measurement and of gene fusion prediction, using 7 samples derived from KARPAS 299 and U-118 MG, both harboring a kinase gene fusion (NPM-ALK in KARPAS 299 and FIG(GOPC)-ROS1 in U-118 MG). We mixed the RNA from the two cell lines in different proportions: 100–0%, 87.5–12.5%; 75–25%, 50–50%, 25–75%, 12.5–87.5%, 0–100%, to simulate different tissue heterogeneity levels as found in clinical cancer samples. Duplicate samples for each mix were then subjected to KING-REX analysis; sequencing results for technical duplicates clearly clustered according to the relative dilution proportions (Additional file 2: Figure S2).
2We next focused on kinases expressed exclusively in U-118 MG or KARPAS 299, i.e. not detected in the other cell line (average gene expression value < 1), and evaluated the disappearance of gene expression signals with the variable increments of KARPAS 299 or U-118MG background, respectively. A clear signal could be observed after incremental dilutions for all the U-118 MG or KARPAS 299 unique kinases, down to the lowest tested concentration (12.5%), regardless of their basal gene expression level (Additional file 3: Table S1).
To evaluate the linearity of kinase gene expression variation within the serially diluted sample set, we compared the measured expression values for each kinase in all the mixed samples versus a ‘theoretical’ value calculated for each dilution as described in M&M, inferred starting from the measured levels in the 100% samples. Despite the complexity of the assay panel composition, theoretical and measured kinase gene expression levels showed high concordance at all dilution levels, as demonstrated by the observed linearity of the reported scatter plot, with an R2 = 0.98 (Fig. 3), supporting the robustness of the KING-REX kinase expression profiling approach.
KING-REX limits of detection in gene fusion prediction
We next investigated the limits of detection of fusion events, by evaluating the ability of KING-REX to correctly predict gene fusions in the serially diluted samples described above, based on ASSAY IN and ASSAY OUT imbalances for the rearranged kinases present in the two cell lines. In U-118 MG, the 3′ portion of the ROS1 kinase is detected at high level due to the presence of the FIG-ROS1 gene fusion, while both 5′ and 3′ ROS1 signals are undetectable in KARPAS 299, thus representing an ideal background to determine ROS1 detection limits without confounding factors (Fig. 4a). We observed that the ability of KING-REX to detect the ROS1 5′ vs. 3′ imbalance in a null background is maintained throughout all the dilutions, down to the experimentally tested limit of 12.5% (Table 4).
KARPAS 299, expressing an ALK gene fusion (NPM-ALK), was diluted in the U-118 MG cell line background, in this case expressing a full length ALK, thus representing a more frequent real-life scenario, simulating the case of a tumor mixed with normal adjacent tissue and/or infiltrating lymphocytes, which might express full length ALK (Fig. 4b). In this case, KING-REX could clearly detect the presence of imbalanced ALK gene expression in KARPAS 299 mixtures above 50% proportion (Table 4).
The described cases are only two of many possible experimental scenarios, indicating that the limit of detection of the system is higher when the fusion gene is expressed at lower level or when the background full length WT kinase is highly expressed. Along with these experimental results, we simulated the ‘theoretical’ lower limits of detection of the system at different expression levels of a kinase gene fusion (GE1) in a background with varying expression of the full length kinase (GE2), by generating a synthetic dataset with different combinations of expression levels, as detailed in M&M. Results in Table 5 show the minimum percentage of sample containing a kinase gene fusion (‘sample 1’) theoretically required by the KING-REX system to successfully detect a gene fusion event with varying background levels of full length kinase in a contaminating sample (‘sample 2’). This simulated dataset was then compared to the available experimental results to verify the theoretical predictions, at least for the tested conditions.
Indeed, in the case of FIG-ROS1, expressed in U-118 MG cell line and diluted in a null ROS1 background (KARPAS 299), the simulation showed that a ‘theoretical’ limit of detection of 6.25% might be reached (Table 5, highlighted in bold), i.e. below the lowest experimentally tested 12.5% dilution factor (Table 4). Similarly, in the case of KARPAS 299, expressing the NPM-ALK gene fusion diluted in the ALK full length U-118 MG background, a ‘theoretical’ detection limit of 31.25% might be reached (Table 5, highlighted in bold), i.e. between the experimentally tested 50% (positive for fusion detection) and 25% (negative for fusion detection) dilution factors (Table 4), supporting experimental validation conclusions and suggesting an even higher sensitivity of the system.
KING-REX performance with degraded RNA
We then evaluated the experimental performance of KING-REX on degraded RNA. We extracted RNA from KM-12 cell line and artificially heat degraded it by treating the RNA in aqueous solution at 90 °C for different times (0, 5, 10, 20, 40, 60 min or 5 h). RNA quality and integrity was evaluated by assessing RIN and DV200% parameters for all samples with an Agilent Bioanalyzer. As expected, RNA quality was inversely proportional to the time of exposure to high temperature (Table 6). Heat degraded samples obtained at four representative time points were selected (time = 0, 10, 20 and 60 min) and subjected to KING-REX library preparation in duplicate, followed by sequencing on Illumina MiSeq.
Library performance was evaluated in relation to the RNA degradation level. We observed a trend in the reduction of the library concentration with the decrease in RIN and DV200% parameters. KING-REX sequencing performance appeared acceptable in samples with low RIN but still high DV200% values, since an adequate number of reads was still obtained (PF, passing filter reads in Table 6); sequencing analysis of the 60-min sample, characterized by a DV200% of 10%, indicating extreme degradation, yielded a poor number of PF, in line with Illumina recommendations for the TREx protocol, suggesting to process samples preferably with a DV200% > 30% .
In terms of gene fusion detection, an imbalanced ASSAY IN/OUT NTRK1 signal in the KM-12 cell line could be appreciated with significant p-value at the early time points (time = 0, 10 and 20 min), while this was not possible in sample duplicates with 60 min exposure to heating conditions, due to the generally poor quality of these highly degraded samples (DV200 < 10%). These results demonstrated that KING-REX analysis can be applied also to partially degraded samples.
High biological relevance in different diseases and druggability make kinases a very attractive class of pharmacological targets. Despite the about 40 kinase inhibitor drugs approved in Oncology , and the hundreds of compounds in clinical development , much of the kinase therapeutic potential remains untapped. Targeted NGS analysis approaches can exploit selected ‘omics’ observations to enable more rapid, cost-effective and focused molecular research screens or diagnostic approaches and are increasingly used. In general, in the implementation of RNAseq targeted panels for the detection of kinase gene expression and/or gene fusions, a compromise must be reached between optimal assay performance and limitations imposed by small scale approaches, by maximizing either sequence coverage or target number. For the detection of gene fusions, assays spanning all the exons of the gene and all exon-exon boundaries of widely characterized kinase diagnostic targets are available (Archer® FusionPlex® NGS assays ; QuantideX® NGS RNA Lung Cancer Kit ; Ovation® Fusion Panel Target Enrichment System V2 ; Ion AmpliSeq RNA Fusion Lung Cancer Research Panel [15, 17, 36]). In our work, we have described the set-up and the performance of KING-REX, a custom targeted RNA approach suitable for the gene expression screening of a comprehensive human kinome panel on Illumina MiSeq or NextSeq platforms. Our panel was intended to maximize kinome coverage (512 kinases) by minimizing the number of per-kinase assays, while retaining the possibility to infer the presence of gene fusions for a wide number of kinases (319), using paired assays located within and outside the catalytic domain. The application of a similar strategy, based on measuring the imbalanced expression between 5′ and 3′ transcript ends, has been restricted so far to the analysis of a limited number of kinases [16, 17]. We implemented an ad hoc data analysis pipeline to streamline both the gene expression analysis workflow and the detection of kinase imbalances as a readout for potential truncated isoform expression or gene fusion events. This was reached by introducing a scaling factor to balance for possible different performances of IN and OUT assays, possibly deriving from technical artifacts, such as primer efficiency, RNA degradation and/or reverse transcription effects. In our work, KING-REX demonstrated a high detection sensitivity and R squared correlation of 0.8 with whole transcriptome results performed in parallel.
An added value of KING-REX system over transcriptome is its focus on kinases, which allows reaching increased read depth for the evaluation of gene expression and for the estimation of the imbalance between gene portions. In the data presented in our manuscript, the average coverage per gene obtained with KING-REX and transcriptome was comparable (about 2000 reads per gene). However, the average coverage per base was over 700 for KING-REX and about 200 for transcriptome analysis, respectively. This allows a higher confidence in estimating the 3′/5′ imbalances used to predict kinase rearrangements; in order to achieve the same coverage per base with a transcriptome analysis, a minimum of 180 M reads per sample should be achieved.
We also extended the analysis to show that KING-REX gene expression detection accuracy was maintained even in heterogeneous RNA mixtures, mimicking the condition of tumor clinical samples, where contamination with adjacent normal/stromal tissue or inflammatory infiltration represents a common scenario. In addition, using cell line models harboring known gene fusions, we have shown that kinase rearrangements can be correctly and robustly detected by the system, distinct from full length sequences, even in complex background mixtures or in heat-degraded RNA, based on the evaluation of the imbalanced measured expression ratio for paired kinase assays.
To our knowledge, KING-REX is currently the largest targeted approach for expression analysis of kinases which could be used as a rapid and cost effective investigation tool in cancer biology. It represents a useful setup for the comprehensive analysis of the kinome in cancer or other diseases, for applications in the field of the identification of novel, putative kinase targets.
KING-REX panel design
The KING-REX panel was assembled by selecting 876 pre-designed amplicon-based assays from the Illumina DesignStudio Custom Assay Design Tool for use with the TruSeq Targeted RNA Expression (TREx) kit (Illumina, San Diego, CA, USA). Genomics coordinates of all the protein kinases and respective kinase isoforms from UCSC database (, assembly hg19 and table knownGene), together with coordinates of the kinase catalytic domains from Superfamily database (, superfamily ID number 56112, hg19 genome version) were used to select the assays for the kinome, as described in Results and Fig. 1. In particular, 193 kinases are covered by an ASSAY IN, 274 kinases by an ASSAY IN and an ASSAY OUT, and 45 kinases by an ASSAY IN, an ASSAY OUT and an ASSAY ADD. See Additional file 4: Table S2 for more details.
Cell cultures and RNA preparation
Human cancer cell lines were maintained as recommended by the suppliers (BT-474, COLO 205, HCT 116, HCT-15, HPAC, NCI-H716, RKO and U-118 MG from ATCC; COLO-678, KARPAS-299, SW 480, SW 948 from ECACC; K-562 and LC-2/ad from DSMZ; LS-180 and SW1417 from ICLC; SNU-1079 from KCLB; KM-12 from NCI). The identity of all cell lines was verified by STR analysis as described in . RNA was extracted from cancer cell line pellets using the RNeasy Mini Kit (Qiagen, Venlo, Netherlands). RNA from KM-12 cell line was heated at 90 °C for 0, 5, 10, 20, 40, 60 min or 5 h in a Hybex thermo-block (SciGene) and immediately chilled and stored at − 20 °C. RNA quality was evaluated by measuring RIN (RNA Integrity Number) and DV200% (% Distribution value of fragments ≥200 nucleotides) parameters using Bioanalyzer 2100 System (Agilent Technologies). Stratagene’s Universal Human Reference RNA (UHRR – Agilent Technologies), an RNA mixture from 10 human cancer cell lines, was used as control in all experiments.
Library preparation and sequencing
KING-REX library preparation was performed according to the manufacturer’s protocol . Libraries were sequenced in single-end on a MiSeq platform (Illumina, San Diego, CA, USA). Whole transcriptome sequencing library preparation was performed using the TruSeq RNA Access Library Kit (Illumina, San Diego, CA, USA) according to the manufacturer’s protocol and sequenced on a HiSeq1000 (Illumina, San Diego, CA, USA).
Gene expression analysis
Gene expression quantification from KING-REX and from whole transcriptome data were performed using the same pipeline. Fastq files were aligned to the human reference genome (hg19) using STAR (v. 2.5.1b) . Raw Count (RC) quantification was performed using RSEM tool (v. 1.2.30) . Normalization was performed using DESeq2 (v. 1.12.4)  with default parameters and log2 transformed. Gene expression levels were reported as Log2 of Normalized Count (NC) for each kinase.
Pipeline for potential kinase fusion event detection
Raw Count (RC) quantification was performed independently for ASSAY IN and ASSAY OUT using Bedtools Coverage tool (v. 2.22.0) . A first normalization was performed using DESeq2 (v. 1.12.4)  with default parameters. A further normalization step was applied to balance possible ASSAY IN and ASSAY OUT expression detection differences due to technical artifacts (i.e. primer efficiency, degradation, RNA reverse transcription effects). In this step, the normalized counts (NC) of ASSAY OUT are corrected with a scaling factor, calculated as the median value of the ratio between ‘ASSAY IN’ NC and the ‘ASSAY OUT’ NC along all samples n:
EdgeR (v. 3.14.0)  was applied for the detection of differential expression between the ASSAY IN and ASSAY OUT for each kinase, used as an indicator of potential kinase fusion events.
Reverse-transcription quantitative PCR
Reverse-Transcription (RT) quantitative PCR (RT-qPCR) was carried out using SYBR green technology; specific primers were designed for the genes of interest using the freely available Primer3 software  and synthesized using an Applied Biosystems 3900 Synthesizer. RNA was reverse transcribed into complementary DNA (cDNA) using iScript cDNA Synthesis Kit (Bio-Rad), according to manufacturer’s instructions. Real Time quantitative PCR (qPCR) was carried out on a C1000 Touch ThermalCycler with CFX96 Touch Real-Time PCR Detection System (Bio-Rad), using reagents and materials from Bio-Rad (SsoAdvanced™ Universal SYBR® Green Supermix), according to manufacturer’s instructions, in a volume of 10 μl per reaction, each containing approximately 10–12 ng cDNA, 600 nM primers for 5′ and 3′ regions of the targeted kinases or for the endogenous reference control peptidylprolyl isomerase A (PPIA), as detailed in Additional file 5: Table S3. Each sample was assayed in duplicate qPCR reactions. Quantification of expression levels relative to the UHRR sample was calculated for each targeted sequence following the ΔΔCt method . For comparison with RT-qPCR normalized results, for each IN / OUT assay in each tested cell line, KING-REX Normalized Counts (NC) were processed according to the following formula:
Recall, precision and F-measure calculations
Recall, precision and F-measure were evaluated after extracting individual kinase raw counts for each cell line from: i) KING-REX data; ii) in-house transcriptome data, and iii) Cancer Cell Line Encyclopedia (CCLE; ) transcriptome data.
Data were normalized using Upper Quartile Normalization (setting the 75th percentile to 1000) . For each kinase in each cell line (cl), the presence (P) or absence (N) of the kinase in the reference data was calculated with variable thresholds (thrs), ranging from 0.5 to 7:
For each kinase in each cell line (cl) the concordance of KING-REX with reference control was calculated as follows:
From these data, the total number of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) values in all tested cell lines was calculated for each kinase. Recall, precision and F-measure were calculated for each kinase as follows:
Tables 1 and 2 report values for recall, precision and F-measure averaged for all kinases in the panel. For more details see Additional file 6: Table S4, Additional file 7: Table S5 and Additional file 8: Table S6.
Evaluation of measured vs ‘theoretical’ gene expression variation in serially diluted samples
Seven sample mixtures of RNA from KARPAS 299 and U-118 MG cell lines were prepared with the following proportions: 100–0%, 87.5–12.5%; 75–25%, 50–50%, 25–75%, 12.5–87.5%, 0–100%, and subjected to KING-REX library preparation and sequencing. Gene expression values measured for each kinase in the 100% KARPAS 299 or U-118 MG samples were used to infer the ‘theoretical’ expression values expected in each dilution mix, according to the following formula:
where NC is the Normalized Counts for each kinase and P corresponds to the serial dilution factor (0.875, 0.75, 0.5, 0.25, or 0.125, respectively). A scatter plot between the measured and ‘theoretical’ sets of data was generated, and R2 correlation coefficient was calculated.
In silico dilutions for the calculation of fusion detection limits
A ‘theoretical’ dilution matrix was created with virtual kinase gene expression levels (GE), to simulate variable tissue mixture conditions in which cells containing a fusion kinase (sample 1, S1) are diluted, or ‘contaminated’, with variable proportions of cells containing a full length kinase (sample 2, S2).
We assumed that, for fusion kinase genes in S1:
where GE1 = gene expression of virtual gene fusion in S1, ranging from 15 to 7;
and for full length kinase genes in S2:
where GE2 = gene expression of the full length gene in S2 ranging from GE1 value to 0. The two artificial S1 and S2 datasets were subjected to KING-REX analysis for potential kinase fusion detection, after generating ‘virtually mixed’ samples with different dilution proportions of S1 and S2, using the following formula:
P is the percentage of the sample S1 and 1-P is the percentage of sample S2; P ranges from 100 to 0% in steps of 6.25%;
NC is the Normalized Counts for each kinase ASSAY IN and ASSAY OUT (including the virtual kinase) obtained with the KING-REX pipeline for potential kinase fusion event detection. For each combination of fixed GE1 and GE2 values and for variable P, we established Pn-1 as the minimum allowed P, if at Pn the gene fusion was not detected by KING-REX pipeline analysis. The matrix in Table 5 shows the minimum allowed P of sample S1, containing the fusion kinase gene, required in a S1/S2 sample mixture for successful gene fusion detection by KING-REX analysis for each combination of virtual GE1 and GE2 values.
American Type Culture Collection
Cancer Cell Line Encyclopedia
Colorectal cell lines
Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH
% Distrubution value of fragment ≥200 nucleotides
Interlab Cell Line Collection
Integrated Genome Viewer
Korean Cell Line Bank
KINase Gene RNA Expression
National Cancer Institute
Next Generation Sequencing
Passing filter reads
RNA Integrity Number
Short tandem repeat
Targeted RNA expression
Stransky N, et al. The landscape of kinase fusions in cancer. Nat Commun. 2014;5:4846.
Zhang J, et al. Targeting cancer with small molecule kinase inhibitors. Nat Rev Cancer. 2009;9(1):28–39.
Bhullar KS, et al. Kinase-targeted cancer therapies: progress, challenges and future directions. Mol Cancer. 2018;17(1):48.
Klaeger S, et al. The target landscape of clinical kinase drugs. Science. 2017;358(6367):eaan4368.
Ferguson FM, Gray NS. Kinase inhibitors: the road ahead. Nat Rev Drug Discov. 2018. https://doi.org/10.1038/nrd.2018.21.
Manning G, et al. The protein kinase complement of the human genome. Science. 2002;298:1912–34.
Jabbour E, et al. Tyrosine kinase inhibition: a therapeutic target for the management of chronic-phase chronic myeloid leukemia. Expert Rev Anticancer Ther. 2013;13(12):1433–52.
Shaw AT, et al. Crizotinib versus chemotherapy in advanced ALK-positive lung cancer. New Engl J Med. 2013;368:2385–94.
Shaw AT, et al. Ceritinib in ALK-rearranged non-small-cell lung cancer. New Engl J Med. 2014;370:1189–97.
Kilpinen S, et al. Analysis of kinase gene expression patterns across 5681 human tissue samples reveals functional genomic taxonomy of the kinome. PLoS One. 2010;5(12):e15068.
Ursu O, et al. Network modeling of kinase inhibitor polypharmacology reveals pathways targeted in chemical screens. PLoS One. 2017;12(10):e0185650.
TruSight RNA Pan-Cancer Panel | Cancer gene fusions and expression. https://emea.illumina.com/products/by-type/clinical-research-products/trusight-rna-pan-cancer.html?langsel=/it/. Accessed 20 July 2018.
TruSight RNA Fusion Panel | Fusion detection in cancer research samples. https://emea.illumina.com/products/by-type/clinical-research-products/trusight-rna-fusion.html?langsel=/it/. Accessed 20 July 2018.
FusionPlex Assays | ArcherDX. http://archerdx.com/fusionplex-assays/. Accessed 20 July 2018.
Reeser JW, et al. Validation of a Targeted RNA sequencing assay for kinase fusion detection in solid tumors. J Mol Diagn. 2017;19(5):682–96.
Beadling C, et al. A multiplexed amplicon approach for detecting gene fusions by next-generation sequencing. J Mol Diagn. 2016;18(2):165–75.
Rogers TM, et al. Multiplexed transcriptome analysis to detect ALK, ROS and RET rearrangements in lung cancer. Sci Rep. 2017;7:42259–66.
TruSeq Targeted RNA Expression Kits | Customizable gene expression studies. https://www.illumina.com/products/by-type/sequencing-kits/library-prep-kits/truseq-targeted-rna.html. Accessed 20 July 2018.
KinBase: Kinase Database at Manning’s Group. http://kinase.com/web/current/kinbase/. Accessed 20 July 2018.
Kinweb: kinase database. https://www.itb.cnr.it/kinweb/. Accessed 20 July 2018.
UniProt. https://www.uniprot.org/docs/pkinfam. Accessed 20 July 2018.
Kinomer. http://www.compbio.dundee.ac.uk/kinomer/bin/kinomes.pl. Accessed 20 July 2018.
KinG, Kinases ecncoded in Genomes. http://king.mbu.iisc.ernet.in/. Accessed 20 July 2018.
Zhao S, et al. A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification. BMC Genomics. 2015;16(1):97.
Superfamily. http://supfam.org/SUPERFAMILY. Accessed 20 July 2018.
Thorvaldsdóttir H, et al. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–92.
Barretina J, et al. The Cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603–7.
Hübinger G, et al. The tyrosine kinase NPM-ALK, associated with anaplastic large cell lymphoma, binds the intracellular domain of the surface receptor CD30 but is not activated by CD30 stimulation. Exp Hematol. 1999;27(12):1796–805.
Ardini E, et al. Entrectinib, a pan-TRK, ROS1, and ALK inhibitor with activity in multiple molecularly defined Cancer indications. Mol Cancer Ther. 2016;15(4):628–39.
Matsubara D, et al. Identification of CCDC6-RET fusion in the human lung adenocarcinoma cell line, LC-2/ad. J Thorac Oncol. 2012;7(12):1872–6.
Kurtis DD, Robert CD. Molecular pathways - ROS1 fusion proteins in cancer. Clin Cancer Res. 2013;19(15):4040–5.
Nicorici D. Novel FGFR2 fusion genes in NCI-H716 colorectal cancer cell line. Figshare; 2014.
TruSeq® Targeted RNA Expression Reference Guide. https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_truseq/truseqtargetedrna/truseq-targeted-rna-expression-reference-guide-15034665-01.pdf Accessed 24 July 2018.
QuantideX® NGS RNA Lung Cancer Kit. https://asuragen.com/portfolio/oncology/quantidex-ngs-rna-lung-cancer-kit/. Accessed 20 July 2018.
Ovation® Fusion Panel Target Enrichment System V2 | NuGEN. https://www.nugen.com/products/ovation-fusion-panel-target-enrichment-system-v2. Accessed 20 July 2018.
Simple, Sensitive RNA Fusion Detection With Only 10 ng FFPE RNA | Thermo Fisher Scientific – IT. https://www.thermofisher.com/it/en/home/life-science/cancer-research/cancer-genomics/targeted-sequencing-cancer-mutation-detection/rna-fusion-detection.html. Accessed 20 July 2018.
UCSC Table Browser. https://genome.ucsc.edu/cgi-bin/hgTables. Accessed 20 July 2018.
Somaschini A, et al. Cell line identity finding by fingerprinting, an optimized resource for short tandem repeat profile authentication. Genet Test Mol Biomarkers. 2013;17(3):254–9.
Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21.
Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.
Love MI, et al. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.
Aaron RQ, Ira MH. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.
Robinson MD, et al. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Primer3. http://biotools.umassmed.edu/bioapps/primer3_www.cgi. Accessed 6 Feb 2019.
Livak KJ, et al. Analysis of relative gene expression data using real-time quantitative PCR and the 2(−Delta DeltaC(T)) method. Methods. 2001;25(4):402–8.
Bullard JH, et al. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010;11:94.
We acknowledge the Cell Bank team for cell growth.
The fees for NGS sequencing, real time polymerase chain reaction detection and data processing were funded by the Italian Ministry of Education, University and Research (MIUR) under the PON R&C 2007–2013 (PON project PON 01_02418) and then by Fondazione Regionale per la Ricerca Biomedica (Regione Lombardia), project NR.2015–0010. The funding bodies did not have any role in study design or execution or any other aspect of the manuscript preparation and submission.
Availability of data and materials
Data generated for the analysis are available in Additional files 9, 10 and 11. The source code and all the scripts used in this published article have been deposited in GitHub, repository name KING-REX (https://github.com/GiovanniCarapezza/KING-REX).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1:
Figure S1. Kinase fusion detection by KING-REX vs. RT-qPCR. Relative quantification data as assessed by KING-REX (left) and RT-qPCR (right) analyses for Assay_OUT and Assay_IN regions of ALK in KARPAS299, NTRK1 in KM12, RET in LC2AD and ROS1 in U118MG. Assay_IN and ASSAY_OUT data are reported in blue and red, respectively. Data were normalized as described in M&M section vs. UHRR control sample. (PNG 184 kb)
Additional file 2:
Figure S2. Distance matrix analysis of KING-REX mixed samples. Distance matrix of KING-REX analysis results for technical duplicates of RNA from two cell lines (KARPAS 299 and U-118 MG), mixed in different percent dilution proportions as indicated on the right side of the graph. The blue shading indicates the Euclidean distance between the expression values of two samples (cell line mixtures), ranging from dark blue (high similarity) to light blue (low similarity). (PNG 220 kb)
Additional file 3:
Table S1. KING-REX gene expression values of U-118 MG or KARPAS 299 unique kinases in serially diluted cell line RNA mixtures. List of kinases above a mean signal of 5 (Log2NC) in only one of the two tested cell lines (U-118 MG, left panel; or KARPAS 299, right panel) and below a mean signal of 1 (Log2NC) in the other one, with respective gene expression values measured in technical duplicates of RNA mixtures from the two cell lines (KARPAS 299 and U-118 MG) in different proportions (100–0%; 87.5%-12,5%; 75–25%; 50–50%; 25–75%; 12.5–87.5%; 0–100%). The blue shading reflects variations of gene expression values (Log2NC), ranging from 0 (white) to 14 (dark blue). (XLSX 16 kb)
Additional file 4:
Table S2. Kinase isoforms detected by KING-REX assays. For each kinase, the number of selected assays and the number of isoforms detected by each assay are reported. (XLSX 71 kb)
Additional file 5:
Table S3. Sequence of primers used in RT-qPCR validation experiments. The sequence of the primers used for the detection of Assay_OUT and Assay_IN portions of ALK, RET, NTRK1, ROS1 and the reference control in RT-qPCR experiment are reported. (XLSX 10 kb)
Additional file 6:
Table S4. Additional information in support to Table 1, left panel (KING-REX vs. In-house transcriptome data). For each threshold and for each kinase in the KING-REX panel, the following information is reported: (i) the number of cell lines in which the kinase is considered present (P) or absent (N) in the reference dataset; (ii) the total number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN); (iii) the recall, precision and F-measure metrics used to obtain data in Table 1, left panel (KING-REX vs. In-house transcriptome data. (XLSX 204 kb)
Additional file 7:
Table S5. Additional information in support to Table 1, right panel (KING-REX vs. CCLE transcriptome data). For each threshold and for each kinase in the KING-REX panel, the following information is reported: (i) the number of cell lines in which the kinase is considered present (P) or absent (N) in the reference dataset; (ii) the total number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN); (iii) the recall, precision and F-measure metrics used to obtain data in Table 1, right panel (KING-REX vs. CCLE transcriptome data). (XLSX 200 kb)
Additional file 8:
Table S6. Additional information in support to Table 2. For each threshold and for each kinase in the KING-REX panel, the following information is reported: (i) the number of cell lines in which the kinase is considered present (P) or absent (N) in the reference dataset; (ii) the total number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN); (iii) the recall, precision and F-measure metrics used to obtain data in Table 2. (XLSX 204 kb)
Additional file 9:
KING-REX gene expression data. Normalized counts (Log2transformed) of KING-REX analysis in each sample, as calculated using the pipeline for gene expression analysis described in M&M. (TXT 375 kb)
Additional file 10:
Transcriptome Gene expression data. Normalized counts (Log2transformed) of transcriptome analysis in each sample, obtained using the pipeline for gene expression analysis described in M&M. (TXT 3067 kb)
Additional file 11:
KING-REX ASSAY IN and ASSAY OUT expression data. KING-REX Normalized counts (Log2transformed) for each ASSAY IN and ASSAY OUT for the 319 kinases in each analyzed sample, obtained using the pipeline for kinase fusion event detection described in M&M. (TXT 460 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Carapezza, G., Cusi, C., Rizzo, E. et al. Comprehensive kinome NGS targeted expression profiling by KING-REX. BMC Genomics 20, 307 (2019). https://doi.org/10.1186/s12864-019-5676-3
- Kinome gene expression
- NGS RNA targeted panel