Skip to main content

Unbiased RNA-Seq-driven identification and validation of reference genes for quantitative RT-PCR analyses of pooled cancer exosomes

Abstract

Background

Exosomes are extracellular vesicles (EVs) derived from endocytic compartments of eukaryotic cells which contain various biomolecules like mRNAs or miRNAs. Exosomes influence the biologic behaviour and progression of malignancies and are promising candidates as non-invasive diagnostic biomarkers or as targets for therapeutic interventions. Usually, quantitative real-time polymerase chain reaction (qRT-PCR) is used to assess gene expression in cancer exosomes, however, the ideal reference genes for normalization yet remain to be identified.

Results

In this study, we performed an unbiased analysis of high-throughput mRNA and miRNA-sequencing data from exosomes of patients with various cancer types and identify candidate reference genes and miRNAs in cancer exosomes. The expression stability of these candidate reference genes was evaluated by the coefficient of variation “CV” and the average expression stability value “M”. We subsequently validated these candidate reference genes in exosomes from an independent cohort of ovarian cancer patients and healthy control individuals by qRT-PCR.

Conclusions

Our study identifies OAZ1 and hsa-miR-6835-3p as the most reliable individual reference genes for mRNA and miRNA quantification, respectively. For superior accuracy, we recommend the use of a combination of reference genes - OAZ1/SERF2/MPP1 for mRNA and hsa-miR-6835-3p/hsa-miR-4468-3p for miRNA analyses.

Background

Exosomes are a class of extracellular vesicles (EVs) which are secreted by eukaryotic cells. Exosomes contain biomolecules, such as DNA, RNA, miRNA or proteins and are considered important mediators of intercellular communication [1,2,3,4,5,6,7]. Cancer cell-derived exosomes play a pivotal role in tumorigenesis and cancer progression as they modulate cancer cell biology, the tumor microenvironment and the immune response [7,8,9,10,11,12,13]. Tumor-derived exosomes can also be harnessed as non-invasive diagnostic biomarkers due to their abundance in biological fluids and the enrichment of tumor-relevant biomolecules such as mRNAs or miRNAs within [4, 14,15,16,17]. In the past, various exosome-based liquid biopsies studies have suggested clinical feasibility for cancer diagnosis [18,19,20].

To accurately explore exosomes as non-invasive biomarkers and to better understand their impact on cancer progression, the precise quantification of biomolecule abundance within exosomes is of utmost importance. Quantitative real-time polymerase chain reaction (qRT-PCR) is the most widely used laboratory technique to evaluate cell-intrinsic and exosomal gene expression patterns [21]. qRT-PCR offers the advantage of high sensitivity and specificity combined with reproducibility and low template input requirements [22, 23]. However, technical or experimental factors inherent to qRT-PCR, such as variable template integrity or efficiency of reverse transcription, can reduce the diagnostic accuracy [23,24,25,26]. In addition, the numbers, sizes, and compositions of exosomes are usually affected by many factors including the methodologies for exosome isolation, intracellular biological processes, cell culture parameters and the treatments of the parental cells, which introduce the difficulty for the characterization of the composition in exosomes [27,28,29]. To account for this, reference genes with stable expression across different conditions or cancer subtypes are used to normalize gene expression [22, 30, 31]. Currently, the reference genes used for expression analyses in exosomes are most frequently those which are also used for tissue or cell lines, such as ACTB, 18S rRNA and GAPDH [5, 32, 33]. Notwithstanding their broad use, expression levels of these housekeeping genes are not universally stable, thus decreasing the quantitative accuracy in exosome studies [22, 31, 34,35,36,37]. For example, the small nucleolar RNA RNU6 is frequently used as a reference gene for miRNA expression analyses within cells [38,39,40], but the molecule is only expressed in the cell nucleus and not detected in exosomes [41,42,43]. Whereas some studies reported RNU6 to be detectable in exosomes, this is most likely due to contamination of the exosome fraction with intact cells or cell debris [44, 45]. Therefore, the unvalidated use of classical housekeeping genes suitable for cell lines or tissues needs to be critically considered for the analysis of exosomes.

To address this unmet need of an unbiased identification and validation of reference genes or miRNAs for exosome studies, here, we performed a sequencing-driven analysis with high-throughput mRNA- and miRNA-Seq datasets from serum exosomes of patients with frequent cancer types and of healthy control individuals and subsequently validate these candidates by qRT-PCR in serum exosomes of an independent cohort of ovarian cancer patients and of healthy control individuals.

Results

Identification of candidate reference genes by an unbiased integrative analysis of pooled cancer mRNA-Seq datasets

To identify reference genes with stable expression in serum exosomes, we interrogated RNA-Seq data from 47 serum exosome samples of patients with PAAD, CRC and HCC as well as of 32 healthy control individuals (HC) and applied Deseq2 to evaluate expression levels across samples. Only genes with high expression in both, serum exosomes of cancer patients and of healthy individuals (measured as transcripts per million (TPM)) compared to the average gene expression level (pooled-transcriptome) were considered as potential reference candidates. Our analysis firstly identified 112, 117, and 85 stably expressed genes respectively in serum exosomes of PAAD, CRC and HCC (p value > 0.1), by comparing their patients with healthy control individuals using Deseq2 analysis. Then 48 genes were found to be universally stably expressed in serum exosomes of all cancers. By sorting these genes by their expression level, we identified ten highly expressed candidate reference genes (ADP-ribosylation factor 1 (ARF1), beta-2-microglobulin (B2M), H3 histone pseudogene 6 (H3F3AP4), integral membrane protein 2B (ITM2B), membrane palmitoylated protein 1 (MPP1), ornithine decarboxylase antizyme 1 (OAZ1), protein-L-isoaspartate (D-aspartate) O-methyltransferase domain containing 1 (PCMTD1), superoxide dismutase 2 (SOD2), small EDRK-rich factor 2 (SERF2), and WAS/WASL Interacting Protein Family Member 1 (WIPF1) (Fig. 1a, indicated by red dots and Table 1). The diagonal scatterplot distribution of candidate reference genes indicates consistent expression abundance between exosomes of cancer patients and of healthy control individuals (Fig. 1a), with a correlation coefficient of R = 0.995. Furthermore, expression patterns of candidate reference genes identified by the pooled cancer analysis (including PAAD, CRC and HCC) were recapitulated in each cancer subtype as well (Fig. 1b-d).

Fig. 1
figure1

Scatterplots of predicted candidate reference genes for serum exosomes using RNA-Seq data. Expression levels of candidate reference genes in serum exosomes are depicted for pooled cancer samples (PAAD, CRC, HCC) (a), for pancreatic adenocarcinoma (PAAD) (b), colorectal cancer (CRC) (c) and hepatocellular carcinoma (HCC) (d) samples and compared to serum exosomes of healthy control individuals. Expression values are shown as the logarithm of transcripts per million (TPM) (log2(TPM + 1)). Red dots represent candidate reference genes and grey dots genome-wide genes

Table 1 Candidate reference genes (n = 10) ranked in order of their expression level and expression stability

Evaluation of expression levels and stability of candidate reference genes

To further validate our predicted candidate reference genes for exosomes, we compared their respective expression levels and stabilities with those of nine classical housekeeping genes: beta-actin (ACTB), beta-2-microglobulin (B2M), ribosomal protein L13A (RPL13A), tyrosine 3-monooxygenase/tryptophane 5-monooxygenase activation protein zeta (YWHAZ), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), vimentin (VIM), peptidylprolyl isomerase A (PPIA), aldolase A (ALDOA), and ubiquitin C (UBC). Overall, abundance of exosomal candidate reference genes (Fig. 2a) was similar to those of classical housekeeping genes (Fig. 2b). B2M had by far the highest overall expression abundance of all candidate reference genes (Fig. 2a) which was only surpassed by the classical housekeeping gene ACTB (Fig. 2b). We then assessed the expression stability across samples and tumor types by two measures: 1) the coefficient of variation “CV” as the standard deviation divided by the mean of the expression levels (transcripts per million - TPM), and 2) the average expression stability “M” determined by the geNorm algorithm. “CV” values for the exosomal candidate reference genes (0.405 to 0.723) (Fig. 2c) were significantly lower than those for classical housekeeping genes (p = 8.10e-04, Wilcoxon rank-sum test) (Fig. 2d) with “M” values below 1.0, thus indicating higher expression stability across samples and tumor types (Fig. 2e). The “M” values were also significantly lower in candidate reference genes compared to those for classical housekeeping genes (p = 0.0279, Wilcoxon rank-sum test) (Fig. 2f). The candidate reference genes were then sorted according to their expression stability from highest to lowest, and both, the “CV” and “M” criteria achieved similar ranks for most candidates. OAZ1 was identified as the gene with the highest expression stability across samples and tumor types (Table 1). We also identified and validated ten candidate reference genes respectively for each cancer subtype including PAAD (FTL, OAZ1, FYB1, SERF2, SOD2, PCMTD1, ARPC2, NCOA4, HCLS1 and TYROBP), CRC (B2M, RPL41, SNCA, RPS9, BTF3, ADIPOR1, HEMGN, SOD2, PCMTD1 and NCOA4), and HCC (FTL, OAZ1, CD74, DDX5, PCMTD1, HCLS1, LSP1, RPL9, WIPF1 and H3F3AP4) as well (Suppl. Fig. 1).

Fig. 2
figure2

Gene expression levels and stability of candidate reference genes for exosomes predicted with RNA-Seq data. Expression levels of ten candidate genes sorted by their respective expression levels (a). Expression levels of ten candidate reference genes (blue bars) compared with those of nine commonly used housekeeping genes (green bars) (b). Expression stability of candidate reference genes as measured by the coefficient of variation (“CV”) (c). Comparison of “CV” values between candidate reference genes and classical housekeeping genes (p = 8.10e-04, Wilcoxon rank-sum test) (d). Expression stability of candidate reference genes as measured by the average expression stability value (“M”) (e). Comparison of “M” values between candidate reference genes and classical housekeeping genes (p = 0.0279, Wilcoxon rank-sum test) (f)

Validation of candidate reference genes in exosomes of an independent cohort of ovarian cancer patients

Based on the promising results from the pooled analysis of serum exosomes of patients with different tumour types, we expected our predicted candidate reference genes to be applicable to serum exosomes from patients with various other cancer types as well. Therefore, we next sought to validate the candidate reference genes in a “real-life setting” in samples of serum exosomes of ten ovarian cancer patients and of ten healthy control individuals. The qRT-PCR results showed that as expected from the RNA-Seq data, B2M had the highest expression abundance among all candidates (Fig. 3a). Moreover, absolute abundance of SOD2, H3F3AP4, OAZ1, and SERF2 were comparable to the expression level of 18S rRNA, whereas the abundance of the remaining five genes (ITM2B, ARF1, PCMTD1, WIPF1, MPP1) was lower (Fig. 3a). Interestingly, the abundance of the reference candidate genes in serum exosomes of healthy control individuals and of ovarian cancer patients were highly consistent (Fig. 3a). Most candidate genes also exhibited high expression stability in ovarian cancer and healthy control individuals with “M” and “CV” values lower than 1.0 (Fig. 3b-e), even though some variation occurred regarding the gene order between both stability indicators. Whereas MPP1, WIPF1, SOD2 and OAZ1 exhibited lower “CV” values in exosomes of healthy individuals (Fig. 3c), in both exosome groups, OAZ1 had the lowest “M” value (Fig. 3d-e). The “M” values for OAZ1, ITM2B, SERF2, MPP1, H3F3AP4, and ARF1 were advantageous over 18S rRNA, whereas WIPF1, B2M, SOD2 and PCMTD1 in part had clearly higher “M” values indicating reduced expression stability (Fig. 3d). The expression stability of 18S rRNA was lower (indicated by a higher “M” value”) compared to many of the identified candidate reference genes especially in exosomes of healthy control individuals (Fig. 3d-e).

Fig. 3
figure3

Experimental validation of candidate reference genes in exosomes of patients with ovarian cancer and healthy control individuals. Expression levels (Ct values) of candidate reference genes in exosomes of ovarian cancer patients (red bars) and healthy control individuals (blue bars) relative to 18S rRNA (a). Expression stability of the candidate reference genes in serum exosomes of ovarian cancer patients (b) and healthy control individuals (c) as measured by the “CV” indicator. Expression stability of the candidate reference genes in serum exosomes of ovarian cancer patients (d) and healthy control individuals (e) as measured by the “M” indicator

To quantify gene expression levels more accurately, multiple reference genes can be used [46]. Therefore, we also determined the expression stability of respective combinations of candidate reference genes by determining the average gene-specific variation with the geNorm algorithm for RNA-Seq datasets in exosomes of the pooled cancer populations and for qRT-PCR data of exosomes from ovarian cancer patients. Overall, three combinations according to their expression stability ranks (Table 1) were evaluated: 1) genes 1–3 (OAZ1, SERF2, MPP1); 2) genes 4–6 (H3F3AP4, WIPF1, PCMTD1); and 3) genes 8–10 (SOD2, B2M, ITM2B). The first group with a combination of OAZ1, SERF2 and MPP1 had the lowest average gene-specific variations in exosomes of the pooled patient group including PAAD, HCC and CRC (RNA-Seq, Suppl. Fig. 2A) as well as in ovarian cancer patients (qRT-PCR, Suppl. Fig. 2B) indicating the highest expression stability.

Identification and validation of candidate reference miRNAs in cancer exosomes

In addition to mRNA, exosomes also contain miRNA. To identify reliable miRNAs for normalization in exosomes, we analyzed miRNA-Seq data of 72 serum exosome samples of patients with HCC, HNSCC, LCA, NBL, OVA, and THCA and 31 serum exosome samples of healthy control individuals. We identified six candidate reference miRNAs with high and stable expression: hsa-miR-125-5p, hsa-miR-192-3p, hsa-miR-4468, hsa-miR-4469, hsa-miR-6731-5p, and hsa-miR-6835-3p (Fig. 4a). Expression levels and stability of the candidate reference miRNAs were evaluated in the exosomes of pooled cancer and further validated in the exosomes of ovarian cancer and healthy control individuals (Fig. 4b-j). Across the pooled exosomes of six cancer types, but also for each individual cancer type, these candidate miRNAs show high expression and similar abundance compared to exosomes of healthy control individuals (depicted as counts per million (CPM)) (Fig. 4b, Suppl. Fig. 3). Among all candidate miRNAs, hsa-miR-6835-3p had the highest expression level across samples and tumor types (Table 2). And hsa-miR-4468 had the highest and hsa-miR-6731-5p the lowest expression stability across samples and cancer types as indicated by low and high “CV” and “M” values, respectively (Fig. 4e, h). Overall, “M” values for all candidate miRNAs were low (< 1.5), indicating their general expression stability and potential utility as candidate reference miRNAs for exosomes. By integrating both stability indicators “CV” and “M”, candidate reference miRNAs were ranked and hsa-miR-4468 showed the highest overall expression stability across samples and tumor types (Table 2). Finally, hsa-miR-6835-3p with high expression level and stability was identified as the best reference miRNA.

Fig. 4
figure4

Identification and validation of candidate reference miRNAs predicted in exosomes of ovarian cancer patients. Scatterplot of candidate reference miRNA expression levels in pooled cancer samples (HCC, HNSCC, LCA, NBL, OVA, and THCA) and healthy control individuals. Expression values are shown as the logarithm of counts per million (CPM) (log2(CPM + 1)). The red dots represent candidate reference miRNAs, grey dots genome-wide miRNAs (a). Expression levels of six candidate reference miRNAs in exosomes of pooled cancer (b), ovarian cancer patients (relative to ce-miR-39-1, n = 10) (c) and healthy control individuals (relative to ce-miR-39-1, n = 10) (d). Expression stability of candidate reference miRNAs in exosomes of pooled cancer (e), ovarian cancer patients (f) and healthy control individuals (g) as measured by the “CV”. Expression stability of six candidate reference miRNAs in exosomes of pooled cancer (h), ovarian cancer patients (i) and healthy control individuals (j) as measured by the “M” indicator

Table 2 Candidate reference miRNAs (n = 6) ranked in order of their expression level and expression stability

To further validate the predicted reference miRNA candidates, we measured their expression levels by qRT-PCR in serum exosomes of patients with ovarian cancer (n = 10) and of healthy control individuals (n = 10). miRNA abundance was calculated as cycle threshold numbers (Ct) relative to ce-miR-39-1. ce-miR-39-1 is a frequently used miRNA for normalization (Fig. 4c-d). These results showed the highest expression for hsa-miR-4469 in exosomes of ovarian cancer patients even though all miRNAs were less abundant than ce-miR-39-1 (Fig. 4c-d). In exosomes of ovarian cancer patients, hsa-miR-4469 and hsa-miR-4468 were the miRNAs with the highest and lowest expression levels, reproducing the results for exosomes of healthy control individuals (Fig. 4c-d). Compared to the miRNA-Seq analysis (Fig. 4e, h), hsa-miR-6731-5p, hsa-miR-4468, hsa-miR-192-3p and hsa-miR-6835-3p exhibited lower “CV” and “M” values indicating even higher expression stability in a “real-life” setting (Fig. 4f, g, i, j). Overall, all candidate reference miRNAs in exosome of ovarian cancer and healthy control individuals exhibited “M” values smaller than 1.5 indicating high expression stability (Fig. 4i-j).

Furthermore, the expression stability of combinations of multiple reference miRNAs was determined by the average gene-specific variation. We generated three combinations of two candidate reference miRNAs each according to their expression stability ranks (Table 2): 1) miRNAs 1–2 (hsa-miR-4468 and hsa-miR-6835-3p), 2) miRNAs 3–4 (hsa-miR-192-3p and hsa-miR-125a-5p), and 3) miRNAs 5–6 (hsa-miR-4469 and hsa-miR-6731-5p). The combination of hsa-miR-6835-3p and hsa-miR-4468 had the highest expression stability in exosomes of pooled groups of patients affected by PAAD, HCC and CRC (miRNA-Seq data, Suppl. Fig. 4A) or by ovarian cancer (qRT-PCR data, Suppl. Fig. 4B).

Discussion

Exosomes are nano-sized (< 200 nm in diameter) biovesicles which are released into the surrounding body fluids upon fusion of endocytic compartments with the plasma membrane [47] . Exosomes transfer various types of cargo from donor to acceptor cells among them nucleic acids, mRNAs and miRNAs were the first nucleic acids to be identified in exosomes [3]. Interestingly, some mRNAs and miRNAs are even specifically enriched in cancer exosomes implying a critical role for cancer biology and progression. Therefore, exosomes can be harnessed as diagnostic biomarkers or as targets for therapeutic interventions [3, 5, 48,49,50]. To characterize the composition of exosomes, the accurate quantification of mRNA and miRNA expression within the exosome fraction is critical. qRT-PCR combines high sensitivity and specificity with high reproducibility and low template input requirements and is therefore an ideal technology for exosome studies [22, 23]. qRT-PCR analyses, however, require the selection of appropriate reference genes to avoid variation in gene expression results under different experimental conditions (e.g. tumor cell vs. exosome) [22, 30, 31, 51] and currently, the ideal reference genes for the analysis of exosomes across cancers or for comparison of expression with cancer cells or tissues remain largely unknown [52, 53]. Often, classical housekeeping genes used for gene expression analyses in tissues or cell lines are used for exosome studies as well, but the expression stability of these genes is not unconditionally guaranteed for exosome samples thereby limiting the analytical accuracy. In this context, previous studies have confirmed that there is no universal reference gene for normalization under different conditions [35, 36, 54, 55].

Therefore, here, we sought to perform an unbiased and sequencing-driven analysis of publicly available high-throughput RNA- and miRNA-Seq datasets to identify and experimentally validate reference genes and combinations of these genes to generate accurate RNA- and miRNA-expression data from serum exosomes of cancer patients. From the pooled RNA- and miRNA-Seq datasets we identify multiple potential candidate reference genes in cancer exosomes (ARF1, B2M, H3F3AP4, ITM2B, MPP1, OAZ1, PCMTD1, SOD2, SERF2, WIPF1 for mRNA analyses and hsa-miR-125-5p, hsa-miR-192-3p, hsa-miR-4468, hsa-miR-4469, hsa-miR-6731-5p, and hsa-miR-6835-3p for miRNA analyses) (Tables 1 and 2 and Suppl. Table 1) and subsequently validate their expression stability in exosomes isolated from sera of patients with ovarian cancer and of healthy control individuals. All ten identified candidate reference genes provide better accuracy in terms of stability and variation of expression compared to classical housekeeping genes (Table 1, Fig. 2d and f). Interestingly, if we applied our algorithm to exosome data of each individual cancer type, the predicted candidate reference genes were different from those in the pooled cancer analysis and also varied among cancer subtypes (Suppl Fig. 1).

By employing two different indicators, we define mRNA and miRNA expression stability from two different perspectives: 1) the coefficient of variation “CV” and 2) the “M” value which are both based on the expression abundance of a gene (measured as transcripts per million – TPM) or miRNA (measured as counts per million – CPM). Whereas “CV” measures the variation of a reference gene across all samples, “M” represents the average pairwise variation of a reference gene versus all other reference genes across all samples. Low “CV” and “M” values suggest stable gene expression, and in general, genes with “M” < 1.5 can be accepted as reference genes [54]. By requiring a reference gene to have ideally both, low “CV” and “M” values, we identify OAZ1 and hsa-miR-6835-3p as the most accurate single reference genes based on RNA-Seq datasets (Tables 1 and 2). We confirm the utility of all candidate reference genes and miRNAs but especially of OAZ1 and hsa-miR-6835-3p by measuring their expression abundance (Fig. 3a and Fig. 4c, d) and stability (Fig. 3b, c, d, e and Fig. 4f, g, i, j) in serum exosomes of an independent cohort of ovarian cancer patients and of healthy control individuals (n = 10 each group, Fig. 3). NormFinder [56] is another popular method without requiring priori knowledge of control gene as calibrator to enhance the accuracy, by calculating intra and intergroup variations to evaluate the stability of expression. Here we used NormFinder to evaluate the candidates predicted by geNorm. The results (Suppl. Fig. 5) showed that the most reliable genes OAZ1 and hsa-miR-6835-3p still presented a robust stability measured by NormFinder, with OAZ1 ranking the 1st in healthy controls and the 2nd in ovarian cancer patients (Suppl. Fig. 5A, B), and with hsa-miR-6835-3p ranking the 1st in healthy controls and the 3rd in ovarian cancer (Suppl. Fig. 5C, D). To increase the diagnostic accuracy, some studies suggest the use of a combination of multiple reference genes or miRNAs [52, 54, 57]. We therefore tested various combinations of reference gene candidates, and identified the combinations of OAZ1, SERF2 and MPP1 for mRNA (Suppl. Fig. 2) and hsa-miR-6835-3p and hsa-miR-4468 for miRNA (Suppl. Fig. 4) providing the highest expression stability across samples and tumor types.

Unlike many previous studies which used an approach to narrow down the number of reference genes from a panel of previously reported candidate genes - usually a list of classical housekeeping genes [37, 45, 58, 59] - our study has the clear advantage of an unbiased and sequencing data-driven approach thus preventing bias from artificial selection. Due to the scarcity of publicly available RNA- and miRNA-Seq exosome datasets with high sequencing quality, the exosome datasets used herein include the RNA-Seq exosome samples with PAAD, CRC and HCC as well as miRNA-Seq exosome samples with HCC, HNSCC, LCA, NBL, OVA, and THCA. Accounting for the limitation that the samples in our current analysis may not fully capture the dynamic expression of genes in exosomes of pan-cancers, our analysis will be continuously updated with the emergence of additional sequencing datasets to refine the robustness of identified candidate reference genes. In this regard, it will be interesting to determine if the reference genes and miRNAs identified here will also proof utility in a true pooled cancer analysis including most or even all of the different cancer types affecting patients.

In our analysis, pooled cancer RNA- or miRNA-Seq datasets are from the extracellular vesicles (including exosomes) isolated from the different labs. However, different sized vesicles are derived from different intracellular processes, that affects the numbers and biomolecule contents. Other technical factors (such as exosome isolation and quantification procedures) and biological factors (such as cancer type) can also impact the numbers and composition of exosomes [60,61,62]. Exosomes isolated from the different sources or by the different isolation methodologies introduce variations in the concentration, purity and size [28, 29]. Moreover, due to the limited knowledge of exosome specific molecular machineries of biogenesis and release, there are the challenges in confirming the biogenesis mechanisms of exosomes. Therefore, future efforts should definitely include standardization (number, volume, etc) and verification of the exosome or extracellular vesicle characteristics, prior to sequencing.

We finally confirm previous study results indicating, that housekeeping genes in cancer cell lines and tumor tissue cannot be transferred to the analysis of exosomes and vice versa without further validation [22, 31, 35, 36]. For this, we predicted the top ten candidate reference genes (ACTB, RPS27, RPS11, RPL13A, RPL41, RPS14, RPL41P1, RPS29, RPL10 and NACA), in cancer tissues with simultaneous high expression abundance (log2(TPM + 1)) and stable expression (“CV” and “M”) (Suppl. Fig. 6A), and compared their ranks with those of the top ten predicted reference genes in exosomes and vice versa (Suppl. Fig. 6B). The top ten genes with the high expression stability in tissues as indicated by low “CV” and “M” values were ranked the 120th to 600th in exosomes (Suppl. Fig. 6B, C, D). The results clearly show that there is no overlap between candidate reference genes in exosomes and cancer cell lines/tissue if both, exosome and cancer cell line datasets are not considered together in the initial step of candidate reference prediction (Suppl. Fig. 6B). Although derived from tissue, exosomes deliver specific cargo of protein, miRNA, and small molecules, which is heterogenous between exosome and tissue. The use of tissue-specific reference genes causes quantitative inaccuracy due to the instability of them in exosome studies. Therefore, it is great practical necessity for use exosome-specific reference genes to enhance the quantity accuracy.

Conclusions

Our study provides reference genes and miRNAs with high abundance and expression stability across samples and cancer types for more accurate future qRT-PCR analyses of cancer exosomes. OAZ1 and hsa-miR-6835-3p were identified as the most reliable individual reference genes for mRNA and miRNA quantification, respectively. For superior accuracy, we recommend the use of a combination of reference genes - OAZ1/SERF2/MPP1 for mRNA and hsa-miR-6835-3p/hsa-miR-4468-3p for miRNA analyses. The use of the ideal reference genes is favorable to the accurate quantification of mRNA and miRNA expression fraction within the exosome.

Methods

Data collection

We manually curated RNA- and miRNA-Seq exosome data from the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) [63]. The RNA-Seq datasets included 79 serum exosome samples from patients with pancreatic adenocarcinoma (PAAD), colorectal cancer (CRC), hepatocellular carcinoma (HCC) and healthy control individuals (HC) (Suppl. Table 2). The miRNA-Seq datasets included 72 serum exosome samples from patients with seven different cancer types (HCC, lung cancer (LCA), ovarian cancer (OVA), head and neck squamous cell carcinoma (HNSCC), neuroblastoma (NBL) and thyroid cancer (THCA)) as well as healthy control individuals (n = 31) (Suppl. Table 2). For comparison of expression patterns between exosomes and cancer tissues or cell lines, we furthermore analyzed RNA-Seq data from 28 datasets including three cancer types (PAAD, CRC, and HCC) with overall 1038 tissue and cancer cell line samples (Suppl. Tab. 2). To validate the reference genes identified by mRNA- and miRNA-Seq analyses, the serum samples were collected from 10 patients diagnosed with ovarian cancer stage I and II according to TNM staging without any treatment and 10 healthy volunteers were recruited in the study.

Processing of raw RNA- and miRNA-Seq data

Raw reads of the RNA-Seq data were trimmed by removing adaptors and low-quality bases using Trimmomatic (version 0.36) [64]. The clean reads were then mapped onto a human reference genome (GRCh38) using HASAT2 software (version 2.1.0) [65]. StringTie (version 1.3.0) [66] was applied to quantify the number of reads that were aligned to the regions of protein-coding RNAs (mRNAs) and annotations of mRNAs in the human genome were retrieved from GENCODE (v29) [67]. For the identification of reference miRNAs, sequencing data were additionally discarded of low-quality reads, adaptor dimers and sequences with lengths < 18 and > 35 nucleotides. The filtered reads were mapped to the human genome by bowtie (version 1.2.1.1) [68] and quantified by featureCounts (version 1.5.3) [69], miRNA annotations were retrieved from miRBase (v22.1) [70]. Expression levels were depicted as transcripts per million for mRNA (TPM) [71] and counts per million (CPM) for miRNA.

Strategy to identify candidate reference genes and miRNAs in cancer exosomes

For individual cancer subtypes, we selected genes or miRNAs as reference candidates if the respective expression level was greater than the average genome-wide expression. Therefore, genes or miRNAs were analyzed using DESeq2 (version 1.22.1) in serum exosomes of cancer patients and of healthy control individuals [72]. DESeq2 provides algorithms for determining differential expression within digital expression datasets using a negative binomial distribution model. In the present study, genes with a p-value > 0.1 were considered to be not differentially expressed.

To determine the expression stability of candidate reference genes and of miRNAs across samples and cancer types, we used two measures: 1) “CV” - the variation of a candidate reference gene or miRNA across all samples - and 2)” M” - the average expression stability value which represents the average pairwise variation between a reference gene and other reference genes across samples calculated by geNorm [54]. Lower values of “CV” or “M” indicate higher expression stability.

We assumed that there were m samples and n reference genes. For a given reference gene j in the ith sample, the gene expression level aij is the normalization expression value (TPM for mRNA, CPM for miRNA) of RNA−/miRNA-Seq data or of the transformed value of cycle threshold numbers (Ct) in qRT-PCR data (Eq. 1). Based on the expression levels of the reference gene j across m samples (Eq. 2), we defined the coefficient of variation “CV” as the ratio of the standard deviation to the mean (Eq. 3).

$$ \left(\forall i\in \left[1,m\right],\forall j\in \left[1,n\right]\right): $$
$$ {a}_{ij}={2}^{-{C}_{t_{ij}}} $$
(1)
$$ {A}_j=\left({a}_{1j},{a}_{2j},\dots {a}_{mj}\right)={\left({a}_{ij}\right)}_{i=1\to m} $$
(2)
$$ {CV}_j=\frac{st. dev\left({A}_j\right)}{mean\left({A}_j\right)} $$
(3)

The second expression stability measure - the average expression stability value “M” - was developed by Vandesompele J et al. in the tool geNorm [54]. For any two reference genes j and k, the logarithm of the expression ratio \( \raisebox{1ex}{${a}_{ij}$}\!\left/ \!\raisebox{-1ex}{${a}_{ik}$}\right. \) for the sample i (i = 1 → m) forms an array Ajk (Eq. 4). Based on the pairwise variation Vjk defined by the standard deviation of Ajk (Eq. 5), the average expression stability value Mj for the gene j is the arithmetic mean of all pairwise variation Vjk (Eq. 6). Usually, a gene with a M less than 1.5 is acceptable as a reference gene.

$$ \left(\forall j,k\in \left[1,n\right]\ and\ j\ne k\right): $$
$$ {A}_{jk}=\left\{{\log}_2\left(\frac{a_{1j}}{a_{1k}}\right),{\log}_2\left(\frac{a_{2j}}{a_{2k}}\right),\dots {\log}_2\left(\frac{a_{mj}}{a_{mk}}\right)\right\}={\left\{{\log}_2\left(\frac{a_{ij}}{a_{ik}}\right)\right\}}_{i=1\to m} $$
(4)
$$ {V}_{jk}= st. dev\left({A}_{jk}\right) $$
(5)
$$ {M}_j=\frac{\sum \limits_{k=1}^n{V}_{jk}}{n-1} $$
(6)

Furthermore, multiple genes were combined as candidate references and their stability was measured as the average gene-specific variation AV according to geNorm. For any three reference genes j, k and l, the average gene-specific variation AVj, k, l was calculated as the geometric mean of the three-gene stability value “M" (Eq. 7).

$$ \left(\forall j,k,l\in \left[1,n\right]\ and\ j\ne k\ne l\right): $$
$$ {AV}_{j,k,l}=\sqrt[3]{M_j\bullet {M}_k\bullet {M}_l} $$
(7)

Isolation of the exosome fraction and sample storage

To validate the reference genes identified by mRNA- and miRNA-Seq analyses, serum exosomes were obtained from ten ovarian cancer patients and ten healthy control individuals. Therefore, blood was drawn and kept at room temperature (15–25 °C) for 10 min to 1 h before further processing. The tubes were centrifuged for 10 min at 1900 x g (3,000 rpm) and 4 °C using a swinging bucket rotor. The upper serum phase was transferred into a new canonical tube without disturbing the cell pellet. Subsequently, sera were centrifuged for 15 min at 3000 x g and 4 °C in conical tubes, passed through a 0.8-μm filter and the supernatants were carefully removed without disturbing the pellet and transferred into new tubes. Samples were stored at 2–8 °C for immediate processing or kept frozen in aliquots at − 65 °C to − 90 °C for long-term storage [29].

Quantitative reverse transcription-PCR (qRT-PCR)

Total RNA (containing the mRNA and miRNA fractions) was isolated from 1 ml serum using the exoRNeasy Serum/Plasma Maxi Kit (Qiagen, Wetzlar, Germany). cDNA was generated with the PrimeScriptRT Master Mix (Perfect Real Time; Takara Bio, Kusatsu, Japan) according to the manufacturer’s protocol. Quantitative RT-PCR was performed with ChamQTM Universal SYBR qPCR Master Mix (Vazyme, Nanjing, China) in a final reaction volume of 10 μl using an Applied Biosystems QuantStudio 5 Real-Time PCR Instrument (Thermo Fisher Scientific, Rockford, USA). 18S rRNA and ce-miR-39-1 served as internal controls for mRNA and miRNA, respectively. Expression levels are depicted as cycle threshold (Ct) value of the candidate gene relative to the Ct value of the housekeeping gene. Data were analyzed with the QuantStudio TM Design & Analysis software. The primer sequences for B2M were 5′-TGTCTTTCAGCAAGGACTGGT-3′ and 5′-TGCTTACATGTCTCGATCCCAC-3′, for OAZ1 were 5′-GCCAAACGCATTAACTGGCG-3′ and 5′-TGTCCTCGCGGTTCTTGTG-3′, for ITM2B were 5′-TTGCCTCAGTCCTATCTGATTCA-3′ and 5′-TCTGCGTTGCAGTTTGTAAGT-3′, for SOD2 were 5′-GGAAGCCATCAAACGTGACTT-3′ and 5′-CCCGTTCCTTATTGAAACCAAGC-3′, for PCMTD1 were 5′-TGCATTTGTTGTTGGTAATTGCC-3′ and 5′-GTCCAGTTCGCATAATCTGTGT-3′, for ARF1 were 5′-ATGGGGAACATCTTCGCCAAC-3′ and 5′-GTGGTCACGATCTCACCCAG-3′, for MPP1 were 5′-GTCAGCTCCTAGCGAAGCC-3′ and 5′-GCCGAACGACTTCCTCGTAG-3′, for WIPF1 were 5′-AGCCGCTGCGCGATTTAT-3′ and 5′-TCCCAGCCTGCTCTGTCTTA-3′, for SERF2 were 5′-CCGCAAGCAGAGGGACTC-3′ and 5′-AGCACTACAGGAGGAAACGC-3′, for H3F3AP4 were 5′-CAGCTATCGGTGCTTTGCAG-3′ and 5′-AGCACGTTCTCCACGTATGC-3′, for 18 s were 5′-CTTCCACAGGAGGCCTACAC-3′ and 5′-CTTCGGCCCACACCCTTAAT-3′.

Statistical analysis

The Wilcoxon rank-sum test was used to compare expression stability between classical housekeeping and exosomal candidate reference genes as measured by “CV” and “M”. Candidate reference genes were sorted according to their “CV” and “M” values from low (higher expression stability across samples) to high (lower expression stability across samples) and assigned a rank, and the best candidate gene or miRNA for validation was determined as the one with the lowest sum of these two ranks. All statistical analyses were executed in R.

Availability of data and materials

The datasets analysed during the current study are available in the Gene Expression Omnibus (GEO), [https://www.ncbi.nlm.nih.gov/geo/]. The datasets information is included in the supplementary materials.

Abbreviations

EVs:

Extracellular vesicles

mRNA:

Messenger RNA

miRNA:

MicroRNA

RNA:

Ribonucleic acid

RNA-Seq:

RNA-sequencing

miRNA-Seq:

MicroRNA-sequencing

qRT-PCR:

Quantitative real-time polymerase chain reaction

TPM:

Transcripts per million

CPM:

Counts per million

Ct:

Cycle threshold numbers

CV:

Coefficient of variation

PAAD:

Pancreatic adenocarcinoma

CRC:

Colorectal cancer

HCC:

Hepatocellular carcinoma

HC:

Healthy control individuals

LCA:

Lung cancer

OVA:

Ovarian cancer

HNSCC:

Head and neck squamous cell carcinoma

NBL:

Neuroblastoma

THCA:

Thyroid cancer

B2M :

Beta-2-microglobulin

ARF1 :

ADP-ribosylation factor 1

H3F3AP4 :

H3 histone pseudogene 6

ITM2B :

Integral membrane protein 2B

MPP1 :

Membrane palmitoylated protein 1

OAZ1 :

Ornithine decarboxylase antizyme 1

PCMTD1 :

Protein-L-isoaspartate (D-aspartate) O-methyltransferase domain containing 1

SOD2 :

Superoxide dismutase 2

SERF2 :

Small EDRK-rich factor 2

WIPF1 :

WAS/WASL Interacting Protein Family Member 1

ACTB :

Beta-actin

RPL13A :

ribosomal protein L13A

YWHAZ :

Tyrosine 3-monooxygenase/tryptophane 5-monooxygenase activation protein zeta GAPDH

glyceraldehyde-3-phosphate dehydrogenase

VIM :

Vimentin

PPIA :

Peptidylprolyl isomerase A

ALDOA :

Aldolase A

UBC :

Ubiquitin C

FTL :

Ferritin light chain,

FYB1 :

FYN binding protein 1

ARPC2 :

Actin related protein 2/3 complex subunit 2

NCOA4 :

Nuclear receptor coactivator 4

HCLS1 :

Hematopoietic cell-specific Lyn substrate 1

TYROBP :

Transmembrane immune signaling adaptor TYROBP

RPL41 :

Ribosomal protein L41

SNCA :

Synuclein alpha

RPS9 :

Ribosomal protein S9

BTF3 :

Basic transcription factor 3

ADIPOR1 :

Adiponectin receptor 1

HEMGN :

Hemogen

CD74 :

CD74 molecule

DDX5 :

DEAD-box helicase 5

LSP :

Lymphocyte specific protein 1

RPL9 :

Ribosomal protein L9

References

  1. 1.

    Thery C, Zitvogel L, Amigorena S. Exosomes: composition, biogenesis and function. Nat Rev Immunol. 2002;2(8):569–79.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  2. 2.

    Ruivo CF, Adem B, Silva M, Melo SA. The biology of cancer Exosomes: insights and new perspectives. Cancer Res. 2017;77(23):6480–8.

    CAS  Article  Google Scholar 

  3. 3.

    Valadi H, Ekstrom K, Bossios A, Sjostrand M, Lee JJ, Lotvall JO. Exosome-mediated transfer of mRNAs and microRNAs is a novel mechanism of genetic exchange between cells. Nat Cell Biol. 2007;9(6):654.

    CAS  Article  Google Scholar 

  4. 4.

    Yáñez-Mó M, Siljander PRM, Andreu Z, Bedina Zavec A, Borràs FE, Buzas EI, et al. Biological properties of extracellular vesicles and their physiological functions. J Extracellular Vesicles. 2015;4(1):27066.

    Article  Google Scholar 

  5. 5.

    Jeppesen DK, Fenix AM, Franklin JL, Higginbotham JN, Zhang Q, Zimmerman LJ, et al. Reassessment of exosome composition. Cell. 2019;177(2):428–445.e418.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Redis RS, Calin S, Yang Y, You MJ, Calin GA. Cell-to-cell miRNA transfer: from body homeostasis to therapy. Pharmacol Ther. 2012;136(2):169–74.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  7. 7.

    Tkach M, Thery C. Communication by extracellular vesicles: where we are and where we need to go. Cell. 2016;164(6):1226–32.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  8. 8.

    Kalluri R. The biology and function of exosomes in cancer. J Clin Invest. 2016;126(4):1208–15.

    PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Zhao H, Yang L, Baddour J, Achreja A, Bernard V, Moss T, et al. Tumor microenvironment derived exosomes pleiotropically modulate cancer cell metabolism. Elife. 2016;5:e10250.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  10. 10.

    Alipoor SD, Mortaz E, Varahram M, Movassaghi M, Kraneveld AD, Garssen J, et al. The potential biomarkers and immunological effects of tumor-derived Exosomes in lung cancer. Front Immunol. 2018;9:819.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  11. 11.

    Maia J, Caja S, Strano Moraes MC, Couto N, Costa-Silva B. Exosome-based cell-cell communication in the tumor microenvironment. Front Cell Dev Biol. 2018;6:18.

    PubMed  PubMed Central  Article  Google Scholar 

  12. 12.

    Xu R, Rai A, Chen M, Suwakulsiri W, Greening DW, Simpson RJ. Extracellular vesicles in cancer — implications for future improvements in cancer care. Nat Rev Clin Oncol. 2018;15(Suppl):617–38.

    CAS  PubMed  Article  Google Scholar 

  13. 13.

    Salehi M, Sharifi M. Exosomal miRNAs as novel cancer biomarkers: challenges and opportunities. J Cell Physiol. 2018;233(9):6370–80.

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    Vaidyanathan R, Soon RH, Zhang P, Jiang K, Lim CT. Cancer diagnosis: from tumor to liquid biopsy and beyond. Lab Chip. 2018;19(1):11–34.

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Bernard V, Kim DU, San Lucas FA, Castillo J, Allenson K, Mulu FC, et al. Circulating nucleic acids are associated with outcomes of patients with pancreatic cancer. Gastroenterology. 2019;156(1):108–18 e104.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  16. 16.

    Kalishwaralal K, Kwon WY, Park KS. Exosomes for non-invasive cancer monitoring. Biotechnol J. 2019;14(1):e1800430.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  17. 17.

    Kalluri R, Le Bleu VS. The biology, function, and biomedical applications of exosomes. Science. 2020;367(6478):eaau6977.

  18. 18.

    Sheridan C. Exosome cancer diagnostic reaches market. Nat Biotechnol. 2016;34(4):359–60.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  19. 19.

    McKiernan J, Donovan MJ, O’Neill V, Bentink S, Noerholm M, Belzer S, et al. A novel urine exosome gene expression assay to predict high-grade prostate cancer at initial biopsy. JAMA Oncol. 2016;2(7):882–9.

    PubMed  Article  PubMed Central  Google Scholar 

  20. 20.

    Hoshino A, Kim HS, Bojmar L, Gyan KE, Cioffi M, Hernandez J, et al. Extracellular vesicle and particle biomarkers define multiple human cancers. Cell. 2020;182(4):1044–61 e1018.

    CAS  PubMed  Article  Google Scholar 

  21. 21.

    VanGuilder HD, Vrana KE, Freeman WM. Twenty-five years of quantitative PCR for gene expression analysis. Biotechniques. 2008;44(5):619–26.

    CAS  Article  Google Scholar 

  22. 22.

    Huggett J, Dheda K, Bustin S, Zumla A. Real-time RT-PCR normalisation; strategies and considerations. Genes Immun. 2005;6(4):279–84.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  23. 23.

    Derveaux S, Vandesompele J, Hellemans J. How to do successful gene expression analysis using real-time PCR. Methods. 2010;50(4):227–30.

    CAS  PubMed  Article  Google Scholar 

  24. 24.

    S AB. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J Mol Endocrinol. 2000;2(25):169–93.

    Google Scholar 

  25. 25.

    Ruijter JM, Ramakers C, Hoogaars WMH, Karlen Y, Bakker O, van den Hoff MJB, et al. Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res. 2009;37(6):e45.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Schmittgen TD, Zakrajsek BA. Effect of experimental treatment on housekeeping gene expression: validation by real-time, quantitative RT-PCR. J Biochem Biophys Methods. 2000;46(1–2):69–81.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  27. 27.

    Ludwig N, Whiteside TL, Reichert TE. Challenges in exosome isolation and analysis in health and disease. Int J Mol Sci. 2019;20(19):4684.

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  28. 28.

    Konoshenko MY, Lekchnov EA, Vlassov AV, Laktionov PP. Isolation of extracellular vesicles: general methodologies and latest trends. Biomed Res Int. 2018;2018:8545347.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  29. 29.

    Tang YT, Huang YY, Zheng L, Qin SH, Xu XP, An TX, et al. Comparison of isolation methods of exosomes and exosomal RNA from cell culture medium and serum. Int J Mol Med. 2017;40(3):834–44.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem. 2009;55(4):611–22.

    CAS  Article  Google Scholar 

  31. 31.

    Kozera B, Rapacz M. Reference genes in real-time PCR. J Appl Genet. 2013;54(4):391–406.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Eisenberg E, Levanon EY. Human housekeeping genes are compact. Trends Genet. 2003;19(7):362–5.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  33. 33.

    Noerholm M, Balaj L, Limperg T, Salehi A, Zhu LD, Hochberg FH, et al. RNA expression patterns in serum microvesicles from patients with glioblastoma multiforme and controls. BMC Cancer. 2012;12(1):22.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Thind A, Wilson C. Exosomal miRNAs as cancer biomarkers and therapeutic targets. J Extracell Vesicles. 2016;5:31292.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  35. 35.

    Ohl F, Jung M, Xu C, Stephan C, Rabien A, Burkhardt M, et al. Gene expression studies in prostate cancer tissue: which reference gene should be selected for normalization? J Mol Med. 2005;83(12):1014–24.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  36. 36.

    Goidin D, Mamessier A, Staquet M-J, Schmitt D, Berthier-Vergnes O. Ribosomal 18S RNA prevails over Glyceraldehyde-3-phosphate dehydrogenase and β-actin genes as internal standard for quantitative comparison of mRNA levels in invasive and noninvasive human melanoma cell subpopulations. Anal Biochem. 2001;295(1):17–21.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  37. 37.

    Lamba V, Ghodke-Puranik Y, Guan W, Lamba JK. Identification of suitable reference genes for hepatic microRNA quantitation. BMC Res Notes. 2014;7(1):129.

    PubMed  PubMed Central  Article  Google Scholar 

  38. 38.

    Ramezani A, Devaney JM, Cohen S, Wing MR, Scott R, Knoblach S, et al. Circulating and urinary microRNA profile in focal segmental glomerulosclerosis: a pilot study. Eur J Clin Invest. 2015;45(4):394–404.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. 39.

    Solé C, Cortés-Hernández J, Felip ML, Vidal M, Ordi-Ros J. miR-29c in urinary exosomes as predictor of early renal fibrosis in lupus nephritis. Nephrol Dial Transplant. 2015;30(9):1488–96.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  40. 40.

    Zheng G, Wang H, Zhang X, Yang Y, Wang L, Du L, et al. Identification and validation of reference genes for qPCR detection of serum microRNAs in colorectal adenocarcinoma patients. PLoS One. 2013;8(12):e83025.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  41. 41.

    Wang K, Yuan Y, Cho J-H, McClarty S, Baxter D, Galas DJ. Comparing the MicroRNA Spectrum between serum and plasma. PLoS One. 2012;7(7):e41561.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Benz F, Roderburg C, Vargas Cardenas D, Vucur M, Gautheron J, Koch A, et al. U6 is unsuitable for normalization of serum miRNA levels in patients with sepsis or liver fibrosis. Exp Mol Med. 2013;45(9):e42.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  43. 43.

    Xiang M, Zeng Y, Yang R, Xu H, Chen Z, Zhong J, et al. U6 is not a suitable endogenous control for the quantification of circulating microRNAs. Biochem Biophys Res Commun. 2014;454(1):210–4.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Lange T, Stracke S, Rettig R, Lendeckel U, Kuhn J, Schlüter R, et al. Identification of miR-16 as an endogenous reference gene for the normalization of urinary exosomal miRNA expression data from CKD patients. PLoS One. 2017;12(8):e0183435.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  45. 45.

    Ragni E, Perucca Orfei C, De Luca P, Colombini A, Viganò M, Lugano G, et al. Identification of miRNA reference genes in extracellular vesicles from adipose derived Mesenchymal stem cells for studying osteoarthritis. Int J Mol Sci. 2019;20(5):1108.

    CAS  PubMed Central  Article  Google Scholar 

  46. 46.

    Francis J, Rea G, Stephanie N, Sheri N, André F, Hacker NF, et al. Careful selection of reference genes is required for reliable performance of RT-qPCR in human normal and cancer cell lines. PLoS One. 2013;8(3):e59180.

    Article  CAS  Google Scholar 

  47. 47.

    Mathieu M, Martin-Jaular L, Lavieu G, Thery C. Specificities of secretion and uptake of exosomes and other extracellular vesicles for cell-to-cell communication. Nat Cell Biol. 2019;21(1):9–17.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  48. 48.

    Okoye Isobel S, Coomes Stephanie M, Pelly Victoria S, Czieso S, Papayannopoulos V, Tolmachova T, et al. MicroRNA-containing T-regulatory-cell-derived Exosomes suppress pathogenic T helper 1 cells. Immunity. 2014;41(1):89–103.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Skog J, Würdinger T, van Rijn S, Meijer DH, Gainche L, Curry WT, et al. Glioblastoma microvesicles transport RNA and proteins that promote tumour growth and provide diagnostic biomarkers. Nat Cell Biol. 2008;10(12):1470–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Yan W, Wu X, Zhou W, Fong MY, Cao M, Liu J, et al. Cancer-cell-secreted exosomal miR-105 promotes tumour growth through the MYC-dependent metabolic reprogramming of stromal cells. Nat Cell Biol. 2018;20(5):597–609.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Huang X, Yuan T, Tschannen M, Sun Z, Jacob H, Du M, et al. Characterization of human plasma-derived exosomal RNAs by deep sequencing. BMC Genomics. 2013;14(1):319.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Madadi S, Schwarzenbach H, Lorenzen J, Soleimani M. MicroRNA expression studies: challenge of selecting reliable reference controls for data normalization. Cell Mol Life Sci. 2019;76(18):3497–514.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  53. 53.

    Geeurickx E, Hendrix A. Targets, pitfalls and reference materials for liquid biopsy tests in cancer diagnostics. Mol Aspects Med. 2020;72:100828. https://doi.org/10.1016/j.mam.2019.10.005.

    CAS  Article  PubMed  Google Scholar 

  54. 54.

    Vandesompele J, Preter KD, Pattyn F, Poppe B, Roy NV, Paepe AD, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002;3(7):research0034.0031.

    Article  Google Scholar 

  55. 55.

    Haller F, Kulle B, Schwager S, Gunawan B, Heydebreck AV, Sültmann H, et al. Equivalence test in quantitative reverse transcription polymerase chain reaction: confirmation of reference genes suitable for normalization. Anal Biochem. 2004;335(1):1–9.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  56. 56.

    Andersen CL, Jensen JL, Orntoft TF. Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004;64(15):5245–50.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  57. 57.

    Nicot N, Hausman J-F, Hoffmann L, Evers D. Housekeeping gene selection for real-time RT-PCR normalization in potato during biotic and abiotic stress. J Exp Bot. 2005;56(421):2907–14.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  58. 58.

    Li Y, Xiang GM, Liu LL, Liu C, Liu F, Jiang DN, et al. Assessment of endogenous reference gene suitability for serum exosomal microRNA expression analysis in liver carcinoma resection studies. Mol Med Rep. 2015;12(3):4683–91.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  59. 59.

    Song J, Bai Z, Han W, Zhang J, Meng H, Bi J, et al. Identification of suitable reference genes for qPCR analysis of serum microRNA in gastric cancer patients. Dig Dis Sci. 2012;57(4):897–904.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  60. 60.

    Thery C, Witwer KW, Aikawa E, Alcaraz MJ, Anderson JD, Andriantsitohaina R, et al. Minimal information for studies of extracellular vesicles 2018 (MISEV2018): a position statement of the International Society for Extracellular Vesicles and update of the MISEV2014 guidelines. J Extracell Vesicles. 2018;7(1):1535750.

    PubMed  PubMed Central  Article  Google Scholar 

  61. 61.

    Royo F, Thery C, Falcon-Perez JM, Nieuwland R, Witwer KW. Methods for separation and characterization of extracellular vesicles: results of a worldwide survey performed by the ISEV rigor and standardization subcommittee. Cells. 2020;9(9):1955.

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  62. 62.

    Gurunathan S, Kang MH, Jeyaraj M, Qasim M, Kim JH. Review of the isolation, characterization, biological function, and multifarious therapeutic approaches of Exosomes. Cells. 2019;8(4):307.

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  63. 63.

    Tanya B, Troup DB, Wilhite SE, Pierre L, Carlos E, Kim IF, et al. NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res. 2013;39(Database issue):1005–10.

    Google Scholar 

  64. 64.

    Bolger AM, Marc L, Bjoern U. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  65. 65.

    Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–60.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  66. 66.

    Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  67. 67.

    Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 2012;22(9):1760–74.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  68. 68.

    Langmead B, Pop M. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  69. 69.

    Yang L, Smyth GK, Wei S. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30.

    Article  CAS  Google Scholar 

  70. 70.

    Ana K, Sam GJ. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39(Database issue):D152.

    Google Scholar 

  71. 71.

    Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2010;26(4):493–500.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  72. 72.

    Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

We thank all individuals who participated in this study.

Funding

This work was supported by grants from the National Key Research and Development Program (2017YFC0908500 to HW), the National Natural Science Foundation of China (31771469 and 31571363 to HW, and 81772762 to XS), the Shanghai Natural Science Foundation of China (19ZR1461600 to AL) and a Mildred-Scheel post-doctoral fellowship of the German Cancer Aid Foundation (70111755 to JK). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Affiliations

Authors

Contributions

HW conceived the hypothesis. YD and HW designed and performed the data analysis. YC and SX performed the experimental validation. HW, YD, SX and JK interpreted the results and wrote the manuscript. AL provided helpful suggestions for the data analysis. The authors read and approved the final manuscript.

Corresponding authors

Correspondence to Shaohua Xu or Haiyun Wang.

Ethics declarations

Ethics approval and consent to participate

This study has been approved by the ethics committee of Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine (Ethics code: KS1748). All of the participants signed an informed consent and the investigators obtained the written informed consent prior to entry into the study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

Gene expression level and stability of candidate reference genes in PAAD, HCC and CRC exosomes predicted using RNA-Seq data A, Expression levels of ten candidate genes in PAAD (A), HCC (B) and CRC (C) exosomes sorted by their expression. Expression stability of ten candidate reference genes in exosomes of patients with PAAD (D), HCC (E) and CRC (F) measured by the “CV”. Expression stability of ten candidate reference genes in exosomes of PAAD (G), HCC (H) and CRC (I) patients as measured by the “M” indicator. Expression levels are given as the log2(TPM + 1) (TPM – transcripts per million). The TPMs of each candidate gene were used to determine “CV” and “M” indicators.

Additional file 2: Figure S2.

Evaluation of expression stability of combinations of three reference genes in pooled exosomes of cancer patients with PAAD, CRC and HCC (A, RNA-Seq data) and with ovarian cancer (B, qRT-PCR data) The expression stability of respective combinations was measured as the average gene-specific variation calculated with the geNorm algorithm based on transcripts per million (TPM) (A) or cycle threshold (Ct) values (B). Three combinations according to their expression stability ranking from Table 1 were evaluated: 1) genes 1–3 (OAZ1, SERF2, MPP1); 2) genes 4–6 (H3F3AP4, WIPF1, PCMTD1); and 3) genes 8–10 (SOD2, B2M, ITM2B)

Additional file 3: Figure S3.

Scatterplots of expression levels of six candidate reference miRNAs (red dots) in serum exosomes of patients with HCC (A), HNSCC (B), LCA (C), NBL (D), OVA (E) and THCA (F) compared to exosomes of healthy control individuals The expression values are depicted as: log2(CPM + 1) (CPM – counts per million). Grey dots indicate genome-wide miRNAs

Additional file 4: Figure S4.

Evaluation of expression stability of combinations of two reference miRNA candidates in pooled exosomes of cancer patients with different tumor types (A, miRNA-Seq data) or with ovarian cancer (B, qPCR data) The expression stability of miRNA combinations was measured as the average miRNA-specific variation, which was calculated by the geNorm algorithm based on counts per million (CPM) (A) or cycle threshold (Ct) values (B). Three combinations were considered according to the miRNA expression stability ranks shown in Table 2: miRNAs 1–2 (miR-4468 and miR-6835-3p), miRNAs 3–4 (miR-192-3p and miR-125a-5p), and miRNAs 5–6 (miR-4469 and miR-6731-5p)

Additional file 5: Figure S5.

Validation of candidate reference genes and miRNAs predicted in exosomes of ovarian cancer patients and healthy controls by NormFinder Expression stability of candidate reference genes as measured by the NormFinder stability value in exosomes of ovarian cancer patients (A) and healthy control individuals (B). Expression stability of six candidate reference miRNAs in exosomes of ovarian cancer patients (C) and healthy control individuals (D) as measured by the NormFinder stability value

Additional file 6: Figure S6.

Analysis of candidate reference genes predicted in cancer tissues Expression levels of the ten top candidates in pooled cancer tissue samples calculated using RNA-Seq data. Expression levels are given as log2(TPM + 1; TPM = transcript per million) (A). The top ten predicted candidate reference genes for exosomes were compared with the respective ranking in cancer tissues and vice versa (B). Expression stability of the ten top candidate reference genes in tumor tissues as measured by “CV” (C) and “M” indicators (D)

Additional file 7: Table S1.

List of candidate reference genes (n = 10) and miRNAs (n = 6) identified by RNA-Seq and quantitative real-time PCR analyses.

Additional file 8: Table S2.

Detailed information for the RNA-Seq datasets in the GEO database.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dai, Y., Cao, Y., Köhler, J. et al. Unbiased RNA-Seq-driven identification and validation of reference genes for quantitative RT-PCR analyses of pooled cancer exosomes. BMC Genomics 22, 27 (2021). https://doi.org/10.1186/s12864-020-07318-y

Download citation

Keywords

  • Reference gene
  • qRT-PCR
  • Cancer exosome
  • RNA-Seq
  • miRNA-Seq