A qualitative transcriptional signature to reclassify histological grade of ER-positive breast cancer patients

Li, Jing; Jiang, Wenbin; Liang, Qirui; Liu, Guanghao; Dai, Yupeng; Zheng, Hailong; Yang, Jing; Cai, Hao; Zheng, Guo

doi:10.1186/s12864-020-6659-0

Research article
Open access
Published: 06 April 2020

A qualitative transcriptional signature to reclassify histological grade of ER-positive breast cancer patients

Jing Li¹^na1,
Wenbin Jiang¹^na1,
Qirui Liang¹,
Guanghao Liu¹,
Yupeng Dai¹,
Hailong Zheng¹,
Jing Yang¹,
Hao Cai² &
…
Guo Zheng¹

BMC Genomics volume 21, Article number: 283 (2020) Cite this article

1750 Accesses
2 Citations
Metrics details

Abstract

Background

Histological grade (HG) is commonly adopted as a prognostic factor for ER-positive breast cancer patients. However, HG evaluation methods, such as the pathological Nottingham grading system, are highly subjective with only 50–85% inter-observer agreements. Specifically, the subjectivity in the pathological assignment of the intermediate grade (HG2) breast cancers, comprising of about half of breast cancer cases, results in uncertain disease outcomes prediction. Here, we developed a qualitative transcriptional signature, based on within-sample relative expression orderings (REOs) of gene pairs, to define HG1 and HG3 and reclassify pathologically-determined HG2 (denoted as pHG2) breast cancer patients.

Results

From the gene pairs with significantly stable REOs in pathologically-determined HG1 (denoted as pHG1) samples and reversely stable REOs in pathologically-determined HG3 (denoted as pHG3) samples, concordantly identified from seven datasets, we extracted a signature which could determine the HG state of samples through evaluating whether the within-sample REOs match with the patterns of the pHG1 REOs or pHG3 REOs. A sample was classified into the HG3 group if at least a half of the REOs of the 10 gene pairs signature within this sample voted for HG3; otherwise, HG1. Using four datasets including samples of early stage (I–II) ER-positive breast cancer patients who accepted surgery only, we validated that this signature was able to reclassify pHG2 patients into HG1 and HG3 groups with significantly different survival time. For the original pHG1 and pHG3 patients, the signature could also more accurately and objectively stratify them into distinct prognostic groups. And the up-regulated and down down-regulated genes in HG1 compared with HG3 involved in cell proliferation and extracellular signal transduction pathways respectively. By comparing with existing signatures, 10-GPS was with prognostic significance and was more aligned with survival of patients especially for pHG2 samples.

Conclusions

The transcriptional qualitative signature can provide an objective assessment of HG states of ER-positive breast cancer patients, especially for reclassifying patients with pHG2, to assist decision making on clinical therapy.

Background

Breast cancer has the highest incidence and mortality among females [1]. The microscopic morphological assessment of the degree of tumor cell differentiation, represented as tumor histological grades (HGs), has powerful prognostic prediction capability in breast cancer [2,3,4,5] and has been incorporated into the eighth edition of American Joint Commission of Cancer staging system [6]. According to the Nottingham grading system, after assessing tubule formation (tubularity), nuclear pleomorphism (nuclearity) and mitotic count, each patient can be assigned to histologic grade 1 (HG1, well-differentiated, slow-growing tumor), histologic grade 2 (HG2, moderately differentiated, slightly faster growing tumor) or histologic grade 3 (HG3, poorly differentiated, highly proliferative tumor) [5]. The higher grade is associated with lower survival rate [3, 4, 7]: the 5-year survival rates of untreated HG1, HG2 and HG3 patients are 95, 75 and 50%, respectively [5, 8, 9]. Considering the excellent prognoses, HG1 patients are amenable for a mild and less harmful anti-cancer therapy. On the contrary, HG3 patients require a more powerful anti-cancer therapy. Genomics analysis indicates that HG1 and HG3 breast carcinomas develop independently along different genetic pathways [10, 11], while HG2 patients (comprising ~ 50% of breast cancer cases) contain a blend of histological features, some of which are common to both HG1 and HG3 tumors, and exhibit a mixed gene expression profiles of HG1 and HG3 [12, 13]. Thus, HG2 breast carcinomas should not be classified as individual HG, but represent clinical and molecular hybrids between HG1 and HG3 diseases [14, 15]. The heterogeneity of the HG2 breast cancers resulted in uncertain disease outcome prediction and there is no standard treatment protocol for clinical decision making [7, 16].

However, the pathological Nottingham grading system, the most employed HG evaluation method, is dependent on adequately prepared hematoxylin-eosin-stained tumor tissue sections to be assessed by an appropriately trained pathologist, which is highly subjective with only 50–85% inter-observer agreements [17,18,19,20]. The consensus was even lower for HG2 samples (comprising ~ 50% of breast cancer cases) [15, 21, 22], and it primarily resulted in the unappealing inter-observer agreements among pathologists during evaluating HG. Therefore, many studies have tried to identify transcriptional signatures to reclassify pathologically-determined HG states especially HG2 (pHG2) status of patients in order to improve the therapeutic planning for breast cancer patients [13, 16, 23,24,25]. However, most of the previously proposed signatures for classifying samples were based on summarized expression measurements of the signature genes, which lack robustness for clinical applications due to widespread batch effects and quality uncertainties of clinical samples [26,27,28,29]. The data normalization of samples collected in advance also hinders the feasibility of these signatures in routine clinical practice [16]. In contrast, the qualitative transcriptional signatures based on the within-sample relative expression orderings (REOs) of gene pairs are robust against experimental batch effects [27,28,29], varied proportions of the tumor epithelial cell in tumor tissues sampled from different tumor locations of the same patient [30], partial RNA degradation during specimen preparation and storage [31] and amplification bias for minimum specimens with about 15–25 cancer cells [32], which are common factors that can lead to failures of quantitative transcriptional signatures in clinical applications. Besides, the qualitative signatures can be applied at the individual level [33]. Based on the within-sample relative expression orderings (REOs) of gene pairs, we have developed prognostic signatures for many cancer types [28, 34,35,36] and demonstrated their robustness in both inter-laboratories and across-platforms tests [29, 37].

Approximately 70% breast cancer patient express estrogen receptor (ER) according to American Cancer Society [1], adjuvant endocrine therapy is the routine regimen, and only for ER-positive patients with high HG, combined chemotherapy is suggested. It has been reported that there is a significant transcriptional difference between ER-positive and ER-negative cohorts [38]. ER-positive status is associated with a heterogeneous mixture of histologic grades, whereas ER-negative status is generally associated with HG3 [39].

In this study, we aimed to develop a qualitative transcriptional signature to identify HG states objectively in ER-positive breast cancers. Using gene expression profiles of 932 ER-positive early stage breast cancer patients, we developed a qualitative signature to allocate each patient into the pathologically-determined HG1 (denoted as pHG1) or HG3 (denoted as pHG3) group. Using four independent validation datasets including a total 524 samples of ER-positive breast cancer patients who accepted surgery only, the signature could find out a certain percentage of pHG1 patients as HG3 patients with worse prognoses and some pHG3 patients identified as HG1 patients with better prognoses. Especially, we adopted an objective approach to validate the signature through evaluating whether the pHG2 patients reclassified as HG1 had better prognoses than the pHG2 patients reclassified as HG3.

Results

Development of the REO-based grade signature

The flowchart of this study was described in Fig. 1. For each of the seven training datasets (Table 1), with FDR < 0.1, we firstly identified gene pairs with significantly stable REOs in the pHG1 and pHG3 groups, respectively, and then identified gene pairs with reversal REOs between the two groups. Then, 437 gene pairs were commonly identified from the seven datasets and they consistently showed the same reversal REO patterns between the pHG1 and pHG3 groups in the seven datasets. Next, we performed a forward-stepwise selection procedure to search a set of gene pairs that achieved the highest F-score according to the classification rule as follows: a sample was classified into the HG3 group if at least a half of the REOs of the set of gene pairs within the sample voted for HG3; otherwise, into the HG1 group. Finally, we obtained 10 gene pairs, denoted as 10-GPS (Table 2), to distinguish different histological grades with the highest F-score (0.8884). In the training data, the apparent specificity for HG1 samples was 90.77% and the apparent sensitivity for HG3 samples was 86.99%. The performance of the transcriptional grade signature in each training dataset can be found in Additional file 1: Table S1. Notably, the apparently imperfect performance should be reasonable because HG evaluation based on the pathological Nottingham grading system is highly subjective with only 50–85% inter-observer agreements [17,18,19,20]. We speculated that the 10-GPS could provide a more objective and clinically relevant measure of tumor grade with prognostic significance.

Table 1 Description of the datasets used in this study

Full size table

Table 2 The REO-based transcriptomic grade signature

Full size table

We validated the above speculation based on the knowledge that HG3 patients were with lower survival rate than HG1 patients [3, 7]. Here, we collected another four independent datasets (Table 1) including samples with RFS or OS data of early stage ER-positive breast cancer patients who accepted surgery only. When the 10-GPS was applied to these datasets, the averaged apparent sensitivity for all HG3 samples was 83.1% and the average apparent specificity for all HG1 samples was 78.4%. In a merged dataset from the three validation datasets with the RFS information, the 12 pHG3 patients reclassified as HG1 by the signature had significantly higher 10-years RFS rate than that of the 46 HG3 patients confirmed by the signature (p = 0.0143; HR = 8.17, 95% CI: 1.10–60.60; C-index = 0.61, Fig. 2a). And, we also compared 10-years RFS rates between the 13 pHG1 patients reclassified as HG3 by the 10-GPS and the 93 HG1 patients confirmed by the 10-GPS. Despite no statistical difference was, there was trend to be different between the two groups (p = 0.3583; HR = 1.65, 95% CI: 0.56–4.86; C-index = 0.54, Fig. 2b). There was also trend of difference between the RFS rate of 12 pHG3 patients reclassified as HG1 by the signature and that of 13 pHG1 patients reclassified as HG3 by the 10-GPS (p = 0.1847; HR = 3.95, 95% CI: 0.44–35.35; C-index = 0.65, Fig. 2c).

In each of the three validation datasets with the RFS information, we also compared the survival between the pHG1 and pHG3 patients diagnosed by the pathological Nottingham grading system and the survival between the HG1 and HG3 patients reclassified by the 10-GPS from the pHG1 and pHG3 patients. Significant difference of RFS between the pHG1 and pHG3 patients was observed only in GSE6532 and GSE4922 dataset (Additional file 2: Fig. S1). However, the HG1 patients showed significantly better RFS than those of HG3 patients in all the three datasets (Fig. 3). No significant difference of prognosis was observed between pHG1 and HG1 cohorts for each of the four validation datasets. Similar for pHG3 and HG3 cohorts for each of the four datasets (Additional file 3 Fig. S2). The majority histological grade labels of pHG1 and pHG3 sample were correct, which resulting no significant difference of survival mentioned above. These results demonstrated that the 10-GPS can more accurately and objectively stratify samples into distinct prognostic groups.

Application of the signature to reclassification of HG2 samples

Then, we used the 10-GPS to reclassify the pHG2 samples of the above four validation datasets with RFS or OS information (Table 1) into the HG1 and HG3 groups, respectively, and evaluated their prognostic differences.

Firstly, for the 68 pHG2 samples of the GSE7390 dataset, the 10-GPS signature allocated 38 and 30 patients into the HG1 and HG3 groups, respectively. And, the former ones had significantly higher RFS rate than the latter ones (p = 5.55E-03; HR = 2.53, 95% CI: 1.28–4.97; C-index = 0.64; Fig. 4a). Then, in the 91 pHG2 patients combined from the datasets of GSE6532 and GSE4922 with small sample sizes, the RFS rate of the 65 patients stratified into the HG1 group was significantly higher than that of the 26 patients stratified into the HG3 group (p = 9.06E-03; HR = 2.64, 95% CI: 1.24–5.62; C-index = 0.61; Fig. 4b). In the EGA dataset, the 71 HG1 patients classified by the 10-GPS also displayed significant higher OS rate than that of the 49 HG3 patients classified by the 10-GPS (p = 6.92E-03; HR = 2.10, 95% CI: 1.21–3.64; C-index = 0.61; Fig. 4c).

Transcriptional characteristics of the low-HG and high-HG samples recognized by the 10-GPS

In the TCGA-BRCA dataset, we used the limma algorithm and found 6194 differentially expressed genes (DEGs) between the 58 pHG1 and 170 pHG3 samples diagnosed by the pathological Nottingham grading system (FDR < 0.05, Additional file 4: Table S2). Applying the 10-GPS to these samples, 94 samples were allocated into the HG1 group and the other 134 samples were allocated into the HG3 group. We identified 8087 DEGs between the two reclassified groups with the same FDR control (Additional file 4: Table S3). And of these genes, up-regulated genes were significantly associated with proliferation and down-regulated were significantly associated with extracellular signal transduction (Fig. 5). When comparing the two DEG lists, we found that 5472 (88.34%) of the 6194 DEGs between the original HG1-HG3 groups were also included in the DEGs identified after sample reclassification and the dysregulation directions of the overlapped genes reached up to 100% (binomial test, p < 1.10E–16). We also identified 3126 DEGs between 145 HG1 (denoted as LHG2) samples and 71 HG3 (denoted as HHG2) samples recognized from the pHG2 samples with the aid of 10-GPS (Additional file 4: Table S4). About 31.14% of the 2519 DEGs were also included in the 8087 DEGs. The concordance score of the 1164 overlapped DEGs was 99.92%, which was unlikely to happen by chance (binomial test, p < 1.10E–16). Moreover, after reclassifying HG status of samples for TCGA dataset, we identified differential expressed genes identified from pHG1 and pHG3 samples, from HG1 and HG3 samples reclassified from pHG2 samples, and HG1 and HG3 samples reclassified from overall samples respectively (Fig. 6). In all of the three two-way clustering heatmaps in Fig. 6, each of the two reclassified histological grade sample subclasses (two child nodes under the root node in sample clustering tree) contained both HG1 and HG3 samples although HG1 (or HG3) samples were in the majority. This may be caused by the difference between quantitative and qualitative expression relationship essentially. After identified enriched pathway lists by differential expressed genes for all the four datasets, the most common pathways were associated with proliferation and extracellular signal transduction (Additional file 5: Fig. S3) The clearer transcriptional differences between the two reclassified groups indicated that the 10-GPS could more accurately and objectively stratify samples into distinct histological grade groups.

Comparison of 10-GPS prognostic significance with GGI, Oncotype DX and PAM50

For each of the four validation datasets, we obtained relapse risk by using existing signatures such as Gene expression Grade Index (GGI) [41], PAM50 [42] and Oncotype DX [43]. Chi-square test were conducted, revealing that there were significant differences in relapse risks generated by almost all the three existing signatures between HG1 and HG3 cohorts (Fig. 7, Additional file 1: Table S5). This indicating that histological grade status reclassified by 10-GPS were with prognostic significance. Meanwhile, no prognostic differences between low-risk group identified by GGI and HG1 group identified by 10-GPS were observed (Fig. 8a-d). It was similar for high-risk group identified by GGI and HG3 group identified by 10-GPS were observed (Fig. 8e-h). Significant prognostic difference between HG1 samples who were identified as low-risk by GGI and HG1 samples who were identified as high-risk by GGI was observed only in GSE4922, which might result from unbalanced samples (Additional file 6 Fig. S4 a-d). No significant prognostic difference between HG3 samples who were identified as low-risk by GGI and HG3 samples who were identified as high-risk by GGI was observed (Additional file 6 Fig. S4 e-h). It showed that the prognostic significance of 10-GPS was more aligned with the survival of patients. Moreover, HG1 samples who were identified as high-risk and HG3 samples who were identified as low-risk were mainly from pHG2 cohort. The consistency between the result and prior knowledge that pHG2 are with low inter-observer agreements was reasonable. It implied that 10-GPS was feasible for reclassifying histological grade status, especially for pHG2 samples. All these comparisons indicated that 10-GPS could effectively reclassify into distinct histological grade groups with significantly prognostic difference.

Discussion

In this study, we developed a histological grade signature consisting of 10 gene pairs (10-GPS) to reclassify the ER-positive breast cancer patients to distinct prognostic groups. This transcriptional qualitative signature, which is based on REOs in an individual sample, was highly robust against experimental batch effects, varied proportions of the tumor epithelial cell in tumor tissues [30], RNA degradation [31], and amplification bias for minimum specimens [32]. All of these merits make it possible to apply the 10-GPS into clinical practices. The 10-GPS could not only objectively and accurately allocate HG1 and HG3 patients but also reclassify HG2 patients into two groups with significantly different survival rates. For clinical application, the patients allocated into the HG3 group should receive adjuvant chemotherapy followed by endocrine therapy; and the patients allocated into the HG1 group were recommended the endocrine therapy only.

Fortunately, based on the working assumption that the majority labels of the pHG1 and pHG3 samples were right, thus we employed a supervised learning method to develop the signature. Imperfect F-score of 0.8884 just suggested that the 10-GPS did not over-fit the training dataset. There’s no surprise that the apparent sensitivity for all HG3 samples was 83.10% and the apparent specificity for all HG1 samples was 78.40% in the validation datasets. In this study, we adopted a more objective approach to validate the signature through evaluating whether the reclassified HG1 patients could have better prognosis than that of HG3 patients. In four independent validation datasets, the reclassified HG1 and HG3 groups recognized by the 10-GPS from the original HG2 patients or from the original HG1 and HG3 patients had significantly different survival.

We expected that the 10-GPS can replace or serve as auxiliary reference of the pathological Nottingham grading system to stratify ER-positive patients into two distinct groups in clinical practices. When applying the 10-GPS to all 132 samples of the GSE7390 dataset, the grade signature classified 66 patients into the HG1 group and 66 patients into the HG3 group. The RFS rate of the former group was significantly higher than that of the latter group (p = 7.74E-04; HR = 2.49, 95% CI: 1.44–4.33; C-index = 0.64; Additional file 7: Fig. S5 a). Significantly different survival time between HG1 and HG3 groups reclassified by the 10-GPS were also observed in another three independent validation datasets (Additional file 7: Fig. S5 2b-d). By comparing with existing signatures, 10-GPS was with prognostic significance and was more aligned with survival of patients especially for pHG2 samples.

Some signatures had been developed to re-classify the HG status of BC samples, for example, during the development of GGI, samples with ≥100 ng and a RIN ≥ 7 were considered as qualified, and the quantitative threshold has been adopted during the re-classification process which might be affected by constituent ratios of samples [41], which results lacking of reproducibility for datasets generated by different labs or platforms and limitation of individual application. Meanwhile, the qualitative signature 10-GPS is with wider application to trace samples [32], samples with lower RIN [31] and samples with low tumor-purity [30].

A limitation of this study is that we were unable to directly evaluate the signature in RNA-sequencing, such as those archived in the TCGA database, where no patients accepted surgery only and sizes of samples who accepted same therapy regimen were too small to perform the validation. Here, we only indirectly validate the 10-GPS in RNA-seq data of the TCGA-BRCA dataset through analysis of DEGs between the HG1 and HG3 groups. Another limitation of our study is no evidence could prove that application of the signature to patients who have undergo chemotherapy or radiation therapy is feasible (Additional file 8 Fig. S6). In the future, we will evaluate the performance of the signature developed in this study for expression data produced by RNA-sequencing or PCR platforms and evaluate applying the signature to patients who have undergo chemotherapy or radiation therapy.

Conclusions

Pathological histological grade evaluation methods are with high subjectivity, especially for the evaluation of HG2 breast cancer specimens. The transcriptional qualitative signature is objective for the evaluation and is robust for application of microscale samples, samples with lower RIN and samples with low tumor-purity, and can assist making on clinical therapy especially for patients with pHG2.

Methods

Data collection and pre-processing

We collected gene expression profiles of 932 ER-positive breast cancer samples with pHG1 or pHG3 diagnosed by the pathological Nottingham grading system. To evaluate whether the reclassified two groups have significantly different survival time, we also collected independent expression data of 524 early stage (I–II) ER-positive breast cancer patients who accepted surgery only. All the breast cancer datasets used in this study were summarized in Table 1. The overall pathological histologic grades of TCGA samples were obtained from the study of Zheng Ping et al. [44].

For Affymetrix array data, raw intensity files (.cel), downloaded from the Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo), were processed with the Robust Multichip Average algorithm (RMA) algorithm for background adjustment without quantile normalization [45]. For Illumina beadchip data, the normalized expression data under accession number EGAD00010000210 and EGAD00010000211 [40] were downloaded from the European Genome-Phenome Archive (http://www.ebi.ac.uk/ega/). When processing the data of the two platforms, each probe set ID was mapped to Gene ID according the corresponding annotation files, and then probe sets that mapped to multiple Gene IDs or did not map to any Gene ID were removed. The expression measurements of all probe sets corresponding to the same Gene ID were averaged to obtain a single measurement (on the log2 scale). For RNA-Seq data downloaded from The Cancer Genome Atlas database (TCGA, https://www.cancer.gov/tcga), the level 3 Fragments per Kilobase of transcript per Million mapped reads (FPKM) [46] values were downloaded from The Cancer Genome Atlas (TCGA) database. After removing genes with a count of 0 in more than 75% of samples, other zero values were filled with the smallest count in this expression data. The Ensembl gene IDs corresponding to the unique Entrez gene IDs were used. From the seven training datasets, we extracted expression profiles of 11,587 genes commonly measured by the three platforms (Affymetrix array, Illumina beadchip and Illumina HiSeq 2000) for subsequent analysis.

Development of the transcriptional signature for histological grade

Firstly, we identified the significantly stable REOs in pHG1 groups of each training dataset. For a given gene pair (G_i, G_j), if the REO pattern (G_i > G_j or G_i < G_j) was kept in more samples than expected by random chance, we defined the REO pattern of this gene pair, G_i > G_j in pHG1 group (or equally G_i < G_j in pHG3 group) as a stable REO characterizing pHG1 samples. The significance of the REO pattern is determined by a binomial test [47] as follows,

$$ P=1-\sum \limits_{i=0}^{s-1}\left(\begin{array}{c}n\\ {}i\end{array}\right){\left({p}_0\right)}^i{\left(1-{p}_0\right)}^{n-1} $$

(1)

where s is the number of samples in which gene i has a higher (or lower) expression level than gene j in a total of n samples, p₀ is the probability of observing a certain REO pattern (G_i > G_j or G_i < G_j) in a sample by chance (p₀ = 0.5). The Benjamini-Hochberg multiple testing correction was used to estimate the false discovery rate (FDR) [48]. Then, we identified the gene pairs with stable REOs in pHG3 group but reversal REO patterns between the pHG1 and pHG3 groups in each training dataset.

After selecting gene pairs with concordant reversal REOs among the seven training datasets, a forward-stepwise selection algorithm was performed to search for optimal subset of these gene pairs that resulted in the highest F-score. The F-score, harmonic mean of sensitivity and specificity, was calculated as follows,

$$ \mathrm{F}-\mathrm{score}=\frac{2\times \mathrm{sensitivity}\times \mathrm{specificity}}{\mathrm{sensitivity}+\mathrm{specificity}} $$

(2)

where sensitivity was defined as the proportion of correctly identified HG3 samples among all pHG3 samples, and specificity was defined as the proportion of correctly identified HG1 samples among all pHG1 samples.

Survival analysis

Recurrence-free survival (RFS) and overall survival (OS) served as the prognosis endpoint. Kaplan-Meier survival plots and log-rank tests [49] were used to evaluate the differences in RFS and OS of distinct groups. The Cox proportional-hazards model was also performed to calculate the hazard ratios (HRs) and their 95% confidence intervals (CIs) [50]. To evaluate the predictive performance of a signature we also adopted the concordance index (C-index), which is a measure of overall concordance between predicted risk scores and observed survival [51, 52].

Differential expression and functional enrichment analysis

After using limma package in R, the expression values of all tumor samples of TCGA-BRCA dataset were log-transformed by voom and the batch effects such as plate was corrected by removeBatchEffect. Then differential expressed genes were identified. The Fisher [53] was used to determine the significance of biological pathways enriched with a set of interested genes by hypergeometric distribution test.

Availability of data and materials

Previous data analyzed in this study should be requested from the authors of the original publications. Please see methods cohort description (Table 1), for references to these publications.

The Affymetrix array data (GSE19615, GSE21653, GSE1456, GSE3494, GSE7390, GSE6532 and GSE4922) that support the findings of this study were downloaded from Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo). The Illumina beadchip datasets (EGAD00010000210 and EGAD00010000211) that support the findings of this study were downloaded from the European Genome-Phenome Archive (http://www.ebi.ac.uk/ega/) after permitted by the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for non-commercial use. The RNA-Seq data (TCGA) that support the findings of this study were downloaded from The Cancer Genome Atlas database (TCGA, https://www.cancer.gov/tcga).

Abbreviations

HG:: histological grade
pHG:: pathologically-determined histological grade
C-index:: concordance index
CIs:: confidence intervals
HRs:: hazard ratios
FDR:: false discovery rate
REOs:: relative expression orderings
RMA:: Robust Multi-Array Average

References

Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Article PubMed Google Scholar
Buerger H, Mommers EC, Littmann R, Simon R, Diallo R, Poremba C, Dockhorn-Dworniczak B, van Diest PJ, Boecker W. Ductal invasive G2 and G3 carcinomas of the breast are the end stages of at least two different lines of genetic evolution. J Pathol. 2001;194(2):165–70.
Article CAS PubMed Google Scholar
Rakha EA, El-Sayed ME, Menon S, Green AR, Lee AH, Ellis IO. Histologic grading is an independent prognostic factor in invasive lobular carcinoma of the breast. Breast Cancer Res Treat. 2008;111(1):121–7.
Article PubMed Google Scholar
Rakha EA, El-Sayed ME, Lee AH, Elston CW, Grainge MJ, Hodi Z, Blamey RW, Ellis IO. Prognostic significance of Nottingham histologic grade in invasive breast carcinoma. J Clin Oncol. 2008;26(19):3153–8.
Article PubMed Google Scholar
Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. the value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology. 1991;19(5):403–10.
Article CAS PubMed Google Scholar
Giuliano AE, Connolly JL, Edge SB, Mittendorf EA, Rugo HS, Solin LJ, Weaver DL, Winchester DJ, Hortobagyi GN. Breast Cancer-major changes in the American joint committee on Cancer eighth edition cancer staging manual. CA Cancer J Clin. 2017;67(4):290–303.
Article PubMed Google Scholar
Rakha EA, Reis-Filho JS, Baehner F, Dabbs DJ, Decker T, Eusebi V, Fox SB, Ichihara S, Jacquemier J, Lakhani SR, et al. Breast cancer prognostic classification in the molecular era: the role of histological grade. Breast Cancer Res. 2010;12(4):207.
Article PubMed PubMed Central Google Scholar
Elston CW, Ellis IO: Pathological prognostic factors in breast cancer. I. the value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 2002;41(3A):151–152, discussion 152-153.
Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. the value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology. 2002;41(3A):154–61.
Article CAS PubMed Google Scholar
Roylance R, Gorman P, Harris W, Liebmann R, Barnes D, Hanby A, Sheer D. Comparative genomic hybridization of breast tumors stratified by histological grade reveals new insights into the biological progression of breast cancer. Cancer Res. 1999;59(7):1433–6.
CAS PubMed Google Scholar
Buerger H, Otterbach F, Simon R, Schafer KL, Poremba C, Diallo R, Brinkschmidt C, Dockhorn-Dworniczak B, Boecker W. Different genetic pathways in the evolution of invasive breast cancer are associated with distinct morphological subtypes. J Pathol. 1999;189(4):521–6.
Article CAS PubMed Google Scholar
Ma XJ, Salunga R, Tuggle JT, Gaudet J, Enright E, McQuary P, Payette T, Pistone M, Stecker K, Zhang BM, et al. Gene expression profiles of human breast cancer progression. Proc Natl Acad Sci U S A. 2003;100(10):5974–9.
Article CAS PubMed PubMed Central Google Scholar
Ivshina AV, George J, Senko O, Mow B, Putti TC, Smeds J, Lindahl T, Pawitan Y, Hall P, Nordgren H, et al. Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res. 2006;66(21):10292–301.
Article CAS PubMed Google Scholar
Ellsworth RE, Hooke JA, Love B, Kane JL, Patney HL, Ellsworth DL, Shriver CD. Correlation of levels and patterns of genomic instability with histological grading of invasive breast tumors. Breast Cancer Res Treat. 2008;107(2):259–65.
Article PubMed Google Scholar
Fanshawe TR, Lynch AG, Ellis IO, Green AR, Hanka R. Assessing agreement between multiple raters with missing rating information, applied to breast cancer tumour grading. PLoS One. 2008;3(8):e2925.
Article PubMed PubMed Central CAS Google Scholar
Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006;98(4):262–72.
Article CAS PubMed Google Scholar
Harvey JM, de Klerk NH, Sterrett GF. Histological grading in breast cancer: interobserver agreement, and relation to other prognostic factors including ploidy. Pathology. 1992;24(2):63–8.
Article CAS PubMed Google Scholar
Robbins P, Pinder S, de Klerk N, Dawkins H, Harvey J, Sterrett G, Ellis I, Elston C. Histological grading of breast carcinomas: a study of interobserver agreement. Hum Pathol. 1995;26(8):873–9.
Article CAS PubMed Google Scholar
Frierson HF Jr, Wolber RA, Berean KW, Franquemont DW, Gaffey MJ, Boyd JC, Wilbur DC. Interobserver reproducibility of the Nottingham modification of the bloom and Richardson histologic grading scheme for infiltrating ductal carcinoma. Am J Clin Pathol. 1995;103(2):195–8.
Article PubMed Google Scholar
Dalton LW, Pinder SE, Elston CE, Ellis IO, Page DL, Dupont WD, Blamey RW. Histologic grading of breast cancer: linkage of patient outcome with level of pathologist agreement. Mod Pathol. 2000;13(7):730–5.
Article CAS PubMed Google Scholar
Italian Network for Quality Assurance of Tumour Biomarkers G. Quality control for histological grading in breast cancer: an Italian experience. Pathologica. 2005;97(1):1–6.
Google Scholar
Ellis IO, Coleman D, Wells C, Kodikara S, Paish EM, Moss S, Al-Sam S, Anderson N, Bobrow L, Buley I, et al. Impact of a national external quality assessment scheme for breast pathology in the UK. J Clin Pathol. 2006;59(2):138–45.
Article CAS PubMed PubMed Central Google Scholar
Cava C, Bertoli G, Ripamonti M, Mauri G, Zoppis I, Della Rosa PA, Gilardi MC, Castiglioni I. Integration of mRNA expression profile, copy number alterations, and microRNA expression levels in breast cancer to improve grade definition. PLoS One. 2014;9(5):e97681.
Article PubMed PubMed Central CAS Google Scholar
Aswad L, Yenamandra SP, Ow GS, Grinchuk O, Ivshina AV, Kuznetsov VA. Genome and transcriptome delineation of two major oncogenic pathways governing invasive ductal breast cancer development. Oncotarget. 2015;6(34):36652–74.
Article PubMed PubMed Central Google Scholar
Wang M, Klevebring D, Lindberg J, Czene K, Gronberg H, Rantalainen M. Determining breast cancer histological grade from RNA-sequencing data. Breast Cancer Res. 2016;18(1):48.
Article PubMed PubMed Central CAS Google Scholar
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11(10):733–9.
Article CAS PubMed Google Scholar
Patil P, Bachant-Winner PO, Haibe-Kains B, Leek JT. Test set bias affects reproducibility of gene signatures. Bioinformatics. 2015;31(14):2318–23.
Article CAS PubMed PubMed Central Google Scholar
Qi L, Chen L, Li Y, Qin Y, Pan R, Zhao W, Gu Y, Wang H, Wang R, Chen X, et al. Critical limitations of prognostic signatures based on risk scores summarized from gene expression levels: a case study for resected stage I non-small-cell lung cancer. Brief Bioinform. 2016;17(2):233–42.
Article PubMed Google Scholar
Guan Q, Chen R, Yan H, Cai H, Guo Y, Li M, Li X, Tong M, Ao L, Li H, et al. Differential expression analysis for individual cancer samples based on robust within-sample relative gene expression orderings across multiple profiling platforms. Oncotarget. 2016;7(42):68909–20.
Article PubMed PubMed Central Google Scholar
Cheng J, Guo Y, Gao Q, Li H, Yan H, Li M, Cai H, Zheng W, Li X, Jiang W, et al. Circumvent the uncertainty in the applications of transcriptional signatures to tumor tissues sampled from different tumor sites. Oncotarget. 2017;8(18):30265–75.
PubMed PubMed Central Google Scholar
Chen R, Guan Q, Cheng J, He J, Liu H, Cai H, Hong G, Zhang J, Li N, Ao L, et al. Robust transcriptional tumor signatures applicable to both formalin-fixed paraffin-embedded and fresh-frozen samples. Oncotarget. 2017;8(4):6652–62.
PubMed Google Scholar
Liu H, Li Y, He J, Guan Q, Chen R, Yan H, Zheng W, Song K, Cai H, Guo Y, et al. Robust transcriptional signatures for low-input RNA samples based on relative expression orderings. BMC Genomics. 2017;18(1):913.
Article PubMed PubMed Central CAS Google Scholar
Eddy JA, Sung J, Geman D, Price ND. Relative expression analysis for molecular cancer diagnosis and prognosis. Technol Cancer Res Treat. 2010;9(2):149–59.
Article CAS PubMed Google Scholar
Cai H, Li X, Li J, Ao L, Yan H, Tong M, Guan Q, Li M, Guo Z. Tamoxifen therapy benefit predictive signature coupled with prognostic signature of post-operative recurrent risk for early stage ER+ breast cancer. Oncotarget. 2015;6(42):44593–608.
Article PubMed PubMed Central Google Scholar
Li X, Cai H, Zheng W, Tong M, Li H, Ao L, Li J, Hong G, Li M, Guan Q, et al. An individualized prognostic signature for gastric cancer patients treated with 5-fluorouracil-based chemotherapy and distinct multi-omics characteristics of prognostic groups. Oncotarget. 2016;7(8):8743–55.
PubMed PubMed Central Google Scholar
Ao L, Song X, Li X, Tong M, Guo Y, Li J, Li H, Cai H, Li M, Guan Q, et al. An individualized prognostic signature and multiomics distinction for early stage hepatocellular carcinoma patients with surgical resection. Oncotarget. 2016;7(17):24097–110.
Article PubMed PubMed Central Google Scholar
Qi L, Li T, Shi G, Wang J, Li X, Zhang S, Chen L, Qin Y, Gu Y, Zhao W, et al. An individualized gene expression signature for prediction of lung adenocarcinoma metastases. Mol Oncol. 2017;11(11):1630–45.
Article CAS PubMed PubMed Central Google Scholar
Zhou X, Shi T, Li B, Zhang Y, Shen X, Li H, Hong G, Liu C, Guo Z. Genes dysregulated to different extent or oppositely in estrogen receptor-positive and estrogen receptor-negative breast cancers. PLoS One. 2013;8(7):e70017.
Article CAS PubMed PubMed Central Google Scholar
Chen ST, Lai HW, Tseng HS, Chen LS, Kuo SJ, Chen DR. Correlation of histologic grade with other clinicopathological parameters, intrinsic subtype, and patients' clinical outcome in Taiwanese women. Jpn J Clin Oncol. 2011;41(12):1327–35.
Article PubMed Google Scholar
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486(7403):346–52.
Article CAS PubMed PubMed Central Google Scholar
Metzger-Filho O, Catteau A, Michiels S, Buyse M, Ignatiadis M, Saini KS, de Azambuja E, Fasolo V, Naji S, Canon JL, et al. Genomic grade index (GGI): feasibility in routine practice and impact on treatment decisions in early breast cancer. PLoS One. 2013;8(8):e66848.
Article CAS PubMed PubMed Central Google Scholar
Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160–7.
Article PubMed PubMed Central Google Scholar
Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817–26.
Article CAS PubMed Google Scholar
Ping Z, Xia Y, Shen T, Parekh V, Siegal GP, Eltoum IE, He J, Chen D, Deng M, Xi R, et al. A microscopic landscape of the invasive breast cancer genome. Sci Rep. 2016;6:27545.
Article CAS PubMed PubMed Central Google Scholar
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–64.
Article PubMed Google Scholar
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5.
Article CAS PubMed PubMed Central Google Scholar
Bahn AK. Application of binomial distribution to medicine: comparison of one sample proportion to an expected proportion (for small samples). Evaluation of a new treatment. Evaluation of a risk factor. J Am Med Womens Assoc. 1969;24(12):957–66.
CAS PubMed Google Scholar
Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57(1):289–300.
Google Scholar
T S, E S. A class of rank test procedures for censored survival data. Biometrika. 1982;69(3):553–66.
Article Google Scholar
Boschini C, Andersen KK, Scheike TH. Excess risk estimation for matched cohort survival data. Stat Methods Med Res. 2018. https://doi.org/10.1177/0962280218804269.
Harrell FE Jr, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med. 1984;3(2):143–52.
Article PubMed Google Scholar
Pencina MJ, D'Agostino RB Sr, Song L. Quantifying discrimination of Framingham risk functions with different survival C statistics. Stat Med. 2012;31(15):1543–53.
Article PubMed Google Scholar
Wang J, Zhou X, Zhu J, Gu Y, Zhao W, Zou J, Guo Z. GO-function: deriving biologically relevant functions from statistically significant functions. Brief Bioinform. 2012;13(2):216–27.
Article PubMed Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge the resources at GEO and TCGA that facilitated this research and thank the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for providing the data sets used in this study.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 61602119, 81872396, 81903186 and 81602738) and the Joint Technology Innovation Fund of Fujian Province (Grant No. 2017Y9109 and 2016Y9044).

Author information

Jing Li and Wenbin Jiang contributed equally to this work.

Authors and Affiliations

Fujian Key Laboratory of Medical Bioinformatics, Department of Bioinformatics, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Jing Li, Wenbin Jiang, Qirui Liang, Guanghao Liu, Yupeng Dai, Hailong Zheng, Jing Yang & Guo Zheng
Medical Big Data and Bioinformatics Research Center, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
Hao Cai

Authors

Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Qirui Liang
View author publications
You can also search for this author in PubMed Google Scholar
Guanghao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yupeng Dai
View author publications
You can also search for this author in PubMed Google Scholar
Hailong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Cai
View author publications
You can also search for this author in PubMed Google Scholar
Guo Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZG, HC and JL conceived the project. JL, WBJ, QRL, HLZ and YPD performed computational experiments. JL, QRL and WBJ designed data analyses. JL, GHL, HLZ, and JY interpreted data. JL, HC and ZG wrote the manuscript. All authors contributed to the preparation of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Hao Cai or Guo Zheng.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

The performance of the transcriptional grade signature in each training dataset, shown with apparent specificity, sensitivity and F-score. Table S5. Comparison of prognostic risks between 10-GPS and PAM50 (or Oncotype DX, GGI).

Additional file 2: Fig. S1.

Kaplan–Meier estimates of survival. (a-c) Relapse-free survival curves for pHG1 and pHG3 patients in dataset GSE7390, GSE6532 and GSE4922. (d) Overall survival curves for pHG1 and pHG3 patients reclassified from all breast cancer patients in dataset EGA.

Additional file 3: Fig. S2.

Kaplan–Meier estimates of survival. (a-c) Relapse-free survival curves for pHG1 and HG1 patients reclassified from all breast cancer patients in dataset GSE7390, GSE6532 and GSE4922 (d) Overall survival curves for pHG1 and HG1 patients reclassified from all breast cancer patients in dataset EGA. (e-g) Relapse-free survival curves for pHG3 and HG3 patients reclassified from all breast cancer patients in dataset GSE7390, GSE6532 and GSE4922 (h) Overall survival curves for pHG3 and HG3 patients reclassified from all breast cancer patients in dataset EGA.

Additional file 4: Table S2–4

Differential expressed gene identified in TCGA dataset. Table S2. Differential expressed gene identified in pHG1 and pHG3 samples. Table S3. Differential expressed gene identified in HG1 and HG3 samples reclassified from pHG1 and pHG3 samples. Table S4. Differential expressed gene identified in HG1 and HG3 samples reclassified from all samples.

Additional file 5: Fig. S3.

Reproducibility of pathways enriched by differential expressed genes identified in TCGA dataset. The number of times each pathway enriched in the four validation datasets was ranged from 0 to 4. The more positive the number, the deeper the red color of horizontal bar.

Additional file 6: Fig. S4.

Kaplan–Meier estimates of survival (a-c) Relapse-free survival curves for low-risk and high -risk group identified by GGI in HG1 group identified by 10-GPS in GSE7390, GSE6532, GSE4922 (d) Overall survival curves for low-risk and high -risk group identified by GGI in HG1 group identified by 10-GPS in METABRIC (e-g) Relapse-free survival curves for low-risk and high -risk group identified by GGI in HG3 group identified by 10-GPS in GSE7390, GSE6532, GSE4922 (h) Overall survival curves for low-risk and high -risk group identified by GGI in HG3 group identified by 10-GPS in METABRIC.

Additional file 7: Fig. S5.

Kaplan–Meier estimates of survival. (a) Relapse-free survival curves for HG1 and HG3 patients reclassified from all breast cancer patients in dataset GSE7390. (b) Relapse-free survival curves for HG1 and HG3 patients reclassified from all breast cancer patients in dataset GSE6532. (c) Relapse-free survival curves for HG1 and HG3 patients reclassified from all breast cancer patients in dataset GSE4922. (d) Overall survival curves for HG1 and HG3 patients reclassified from all breast cancer patients in dataset EGA.

Additional file 8: Fig. S6.

Kaplan–Meier estimates of survival. Relapse-free survival curves for HG1 and HG3 patients reclassified from all breast cancer patients in dataset GSE16391.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Li, J., Jiang, W., Liang, Q. et al. A qualitative transcriptional signature to reclassify histological grade of ER-positive breast cancer patients. BMC Genomics 21, 283 (2020). https://doi.org/10.1186/s12864-020-6659-0

Download citation

Received: 10 June 2019
Accepted: 09 March 2020
Published: 06 April 2020
DOI: https://doi.org/10.1186/s12864-020-6659-0

A qualitative transcriptional signature to reclassify histological grade of ER-positive breast cancer patients

Abstract

Background

Results

Conclusions

Background

Results

Development of the REO-based grade signature

Application of the signature to reclassification of HG2 samples

Transcriptional characteristics of the low-HG and high-HG samples recognized by the 10-GPS

Comparison of 10-GPS prognostic significance with GGI, Oncotype DX and PAM50

Discussion

Conclusions

Methods

Data collection and pre-processing

Development of the transcriptional signature for histological grade

Survival analysis

Differential expression and functional enrichment analysis

Availability of data and materials

Abbreviations

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us