Skip to main content

Bioinformatic screening of human ESTs for differentially expressed genes in normal and tumor tissues



Owing to the explosion of information generated by human genomics, analysis of publicly available databases can help identify potential candidate genes relevant to the cancerous phenotype. The aim of this study was to scan for such genes by whole-genome in silico subtraction using Expressed Sequence Tag (EST) data.


Genes differentially expressed in normal versus tumor tissues were identified using a computer-based differential display strategy. Bcl-xL, an anti-apoptotic member of the Bcl-2 family, was selected for confirmation by western blot analysis.


Our genome-wide expression analysis identified a set of genes whose differential expression may be attributed to the genetic alterations associated with tumor formation and malignant growth. We propose complete lists of genes that may serve as targets for projects seeking novel candidates for cancer diagnosis and therapy. Our validation result showed increased protein levels of Bcl-xL in two different liver cancer specimens compared to normal liver. Notably, our EST-based data mining procedure indicated that most of the changes in gene expression observed in cancer cells corresponded to gene inactivation patterns. Chromosomes and chromosomal regions most frequently associated with aberrant expression changes in cancer libraries were also determined.


Through the description of several candidates (including genes encoding extracellular matrix and ribosomal components, cytoskeletal proteins, apoptotic regulators, and novel tissue-specific biomarkers), our study illustrates the utility of in silico transcriptomics to identify tumor cell signatures, tumor-related genes and chromosomal regions frequently associated with aberrant expression in cancer.


Large-scale transcriptome analysis of genes that are differently expressed in tumor tissues compared to their normal counterparts is an important route to the identification of candidates that could play a role in human malignancies. A number of techniques, ranging from differential display and nucleic acid subtraction to serial analysis of gene expression, expression microarrays and gene chips, have been used to the discovery of such aberrantly expressed cancer-related genes [1]. The well-established differential screening technology, that allows for the simultaneous comparison of multiple gene expression levels between two samples differing in tissue type and pathological state, has been the more extensively applied. This simple and powerful method could be performed either experimentally or, since late 1999, digitally using expression databases. The computer-based differential display methodology, also referred to as 'in silico subtraction' or 'electronic northern' [27], could identify transcripts preferentially expressed or repressed in the tumor context by comparing cancerous libraries (present in publicly available databases) against the remaining libraries. Strikingly, only few attempts were made to apply in silico transcriptomics to genome-wide and multi-tissue screening of cancer genes [810]. Thus, given the continuous expansion of the EST databases, both in terms of sequence and source diversity, updated and independent transcriptomic analyses are permanently needed.

In this study, we mined EST libraries for genes differentially expressed in normal and tumor tissues by using a novel computational approach, with the assumption that both the up- and down-regulated pools might contain genes involved in tumorigenesis. This strategy identified differential expression profiles and cancer candidate genes which may be useful in future cancer research. Higher expression of the anti-apoptotic protein Bcl-xL in liver cancer specimens compared to normal liver was confirmed by immunoblot analysis. Strikingly, we found that most cancer-associated changes in gene expression corresponded to genes that were actually downregulated or repressed. The chromosomes and chromosomal regions most frequently associated with aberrant expression changes in tumor versus normal cells were also determined. This analysis suggests that, although genes differentially expressed in cancerous libraries are distributed throughout the genome, chromosomal 'hot spots' of candidate genes could be identified.


Identification of differentially expressed genes between normal and cancer tissues

Genes differentially expressed in tumor libraries compared to their normal counterparts are likely to play important roles in cancer etiology or could constitute relevant genetic markers for cancer diagnosis. Here, we have performed in silico differential display to identify novel and known cancer-associated genes by comparing all the libraries representing tumors to the corresponding normal libraries for each tissue type. Details about the data mining procedures are presented in Table 1. In order to be able to compare expression levels between normal and tumor state, we compared EST counts from non-normalized, non-subtracted cDNA libraries. To overcorrect for the false positive rate, we decided to perform the highly conservative Bonferroni correction. Using this procedure, a total of 673 genes showed differential expression in tumor versus normal libraries by a factor of 10 or higher (Additional File 1: 'Upregulated candidates complete list', and Additional File 2: 'Downregulated candidates complete list'), with about one third being up-regulated (299) and the remaining being down-regulated (539). The in silico subtraction also resulted in the identification of 181 and 336 genes predicted to be present or absent in the tumor types compared to normal tissues, respectively. Because these EST clusters were identified either in normal or tumor libraries, it was not possible to derive their expression ratio, so we decided to present them as separated tables (Additional File 3: 'Tumor specific candidates complete list', and Additional File 4: 'Normal specific candidates complete list'). However, these two groups of genes have been fused to the 'up-regulated' and 'down-regulated' pools in the subsequent analyses. All in all, a sum of 112 novel transcripts was also found (i.e. sequences for which no description was available at the time of the study). Noteworthy, in silico subtraction identified 14.5 % (154/1060) previously studied genes involved in oncogenesis, based on a list of ~ 2500 genes compiled as previously described [11]. Since the fraction of such reference genes in our initial data set was 7.5 % (2401/31800), our data mining protocol expectedly lead to a significant enrichment in cancer genes (p value = 2.2 10-16; exact Fisher test). These previously characterized and well-studied genes include the p57KIP 2and p19INK 4dcyclin inhibitors, and the ras-GAP, c-fos, ret and myc oncogenes. Last, in order to independently verify the validity of the EST-based tissue profiles, SAGE data were used to give an indication of the tissue distribution of our transcripts in normal tissues. While SAGE results specified by tissue type converged with the analysis of ESTs for 65,4 % (197/301), 53,9 % (91/169), and 53,2 % (93/171) of the examined hits in the 'down', 'up' and 'normal-specific' groups respectively, i.e. precisely in the classes where an expression in the normal condition was expected, this percentage decreased to 37,7 % (46/122) for the 'tumor-specific' group of transcripts.

Table 1 Overview of the EST-based data mining strategy. Screening for differentially expressed genes between normal and cancer tissues. EST counts in each analytical step. Total number of EST clusters in each class (upregulated, downregulated, tumor-specific or absent in tumors) was determined after Bonferroni corrected exact Fisher test.

Cancer candidate gene analysis

The first general observation that could be made from our results is that a same gene could be either up-regulated or repressed according to the tumor cell type, allowing identification of tissue-specific gene expression profiles in tumor versus normal cells. For instance, among the set of candidates with differential expression in cancer, we observed a massive down-regulation of several collagen alpha chain genes (but not beta chain genes) in various tumor tissues, including decreased expression of collagen alpha 2(I) (also termed col1A2) in skin, placenta, testis, eye and bone (see Figure 1). Interestingly, col1A2 has been reported as a tumor suppressor gene that could inhibit ras-induced oncogenic transformation [12, 13]. Apart from collagens, other types of proteins that could be used as useful biomarkers include cytokeratins (CK). CK are particularly interesting epithelia specific intermediate filaments because their degradation gives rise to soluble fragments, measurable in the blood of patients and capable of cancer monitoring [14]. Our results show that a total of 13 CK genes were differentially expressed between normal and malignant cells in 9 different tissues (Figure 1), allowing tissue-specific expression profiling (e.g. specific expression of CK 5, 13 and 16 in tumor brain). Additionally, in line with previous microarray data [15], we found that hair-specific type II keratin was overexpressed in breast tumors compared to normal breast. We further determined that over the 190 genes which displayed aberrant expression in more than one tissue, 131 were "deregulated" in the same way (either up- or down, Figure 2 and Additional File 5: 'Consistent candidates in multiple tissues'). Included in this list of 'consistent' candidates are 13 transcripts encoding different ribosomal components, in accordance with the increasing body of evidence from the literature that correlates changes in the protein synthesis machinery with cancer [1618]. Specific signatures for ribosomal genes could be determined, e.g. downregulation of the genes encoding 60S ribosomal L37, L38 and L44 in libraries prepared from tumor skin and tumor blood, whereas placental cancer libraries appear to be specifically enriched in transcripts encoding 40S ribosomal S2, S3 and S17. As depicted in Figure 3 (for the full data set, see Additional File 6: 'Tissue specific candidates'), some genes display a tissue-specific pattern of differential expression in tumor types, thus making them candidates for specific diagnostic markers. Among these 114 genes differentially expressed in only one tissue are 14-3-3 sigma in brain tumors and Bnip3L in blood. This latter gene, belonging to the Bcl-2 family of apoptotic regulators, has been described as a potential tumor suppressor [19, 20]. Last, it is worth noting that a novel member of the methyltransferase enzyme family (ENST00000270172), that contains clear transcriptional repressors [21, 22], was found to be specifically overexpressed in placental tumors.

Figure 1

Patterns of differential expression for collagen and cytokeratin genes in multiple normal and tumor tissues. The data are shown in a table format, in which rows represent individual genes and columns represent individual normal tissue. The color in each cell reflects the differential expression level of the corresponding gene in a particular tissue. A four color code was used to represent gene induction and repression in cancer libraries (dark green: 'normal-specific', i.e. not expressed in tumor libraries; light green: downregulated in tumor libraries; orange: upregulated in tumor libraries; red: 'tumor-specific'). If there was no significant change in gene expression between normal and tumor libraries or in case of missing/excluded data, the gene was given in a black color. The number inside the colored cells indicates the statistical significance (p-value < 0.01 after Bonferroni correction). See additional information for the full data.

Figure 2

Genes whose transcripts varied significantly and consistently in abundance in at least two different tissues. Thirty genes were selected in each class of differential expression (upregulated, downregulated, tumor-specific or absent in tumors). The results are shown for twelve tissues. The legend is the same as in Figure 1. See additional information for the full data.

Figure 3

Genes whose transcripts exhibited tissue-specific differential expression in normal versus tumor libraries. This figure is a compilation of genes that appear to be differentially expressed in only one of the 15 studied tissues. The results are shown for fourteen tissues. The color code is the same as in Figure 1. See additional information for the full data.

Taken together, these results suggest that EST data could be successfully mined to provide digital profiles of differential gene expression at the full genome level between normal and cancerous tissues. Our lists of transcriptional signatures might help to select candidate markers in cancer genetics or potential targets for therapy.

Increased expression of Bcl-xL in liver tumors

Bcl-2 family member Bcl-xL (Bcl2-associated X membrane protein) was selected for confirmation by immunoblotting due to its plausible biological role in cancer susceptibility. Moreover, both EST and SAGE results indicated that Bcl-xL was poorly expressed in normal liver, while abundant in other tissues (both normal and cancerous, data not shown), suggesting that this apoptotic regulator could constitute a good marker for liver cancer progression. As depicted in Figure 4, western blot analysis confirmed overexpression of Bcl-xL in a subset of human liver cancer specimens (hepatocellular carcinoma, adenocarcinoma but not cholangiocellular carcinoma) compared to normal liver (and placenta).

Figure 4

Western blot analysis of Bcl-xL expression in human normal and tumoral liver. Lane 1: hepatocellular carcinoma (male, age 65); lane 2: adenocarcinoma (male, age 52); lane 3: cholangiocellular carcinoma (male, age 46); lane 4: normal liver (male, age 24); lane 5: normal placenta (female, age 24). Bcl-xL immunoreactivity (26 kD) was observed in two out of three liver cancer samples (lanes 1–3). Normal samples (lane 4–5) had no signal for Bcl-xL expression. Note that GADPH protein levels varied between normal tissues and cancer liver specimens and did not correlate with the mRNA levels predicted by the computer-based screen. The expression of tubulin was used as control for equal protein loading.

Identification of chromosome locations of differential gene expression in cancer

We next sought to analyze the chromosomal distribution of the genes which were over-expressed or repressed in tumor tissues. To this end, we mapped the previously identified genes showing significant differential expression between normal and tumor tissues along human chromosomes according to their banding, in order to build cancer-oriented transcriptome maps. To avoid possible biases due to chromosome length (e.g. chromosome Y as an obvious case) or different chromosomal gene densities, we computed the percentage of candidate genes against the total number of genes present on a particular chromosome or banding (see Table 2).

Table 2 Chromosomal regions of differential gene expression in cancer. (A) Number of hits, i.e. number of genes with differential expression per chromosome, is depicted. 'Up' and 'down' mean chromosomal regions with increased and decreased tumor expression, respectively. '%' represents the percentage of candidate genes against the total number of genes present on the chromosome. (B) Chromosomal regions found to be associated with at least 5 hits in the digital subtraction analysis are shown ('banding'). '%' represents the percentage of hits for a particular chromosomal banding against the total number of genes present in the same banding. Chromosomal bandings marked in bold correspond to previously identified regions associated with either tumor amplicon ('Up' column) or deleted ('Down' column) regions in tumors.

First, our results show that some chromosomes appear to be more active than others (Table 2A), with, for instance, chromosomes 15, 19 and Y being rarely involved in cancer-related gene expression changes compared to chromosomes 4 and 6. As expected from the results of Table 1, most chromosomal regions associated with changes in expression levels actually correspond to gene inactivation patterns in cancer cells (373 up-regulated versus 744 down-regulated hits), striking examples of cancer-associated inactivation of gene expression being chromosome 17 and chromosome 3. While most tissues (14/16) were clearly subject to these cancer-associated gene inactivation patterns (especially lung, eye, colon, prostate and stomach), two tissues (tumor blood and liver) did not follow this trend. Chromosomal regions displaying at least five hits were further listed and this rough analysis was sufficient to detect 11 and 29 regions of clustering of up- and down-regulated genes, respectively (Table 2B). We found previously identified chromosomal regions associated with either tumor amplicon (e.g. 12q13) or deleted (e.g. 11p15) regions in tumors [2326]. Interestingly, some of the chromosomal locations which were identified show tissue specificity, e.g. 12q13.3 in muscle. Moreover, in some cases, candidate genes could be contiguous or clustered in limited banding intervals. For instance, 19q13 is associated in tumor tissues of placental origin with complete extinction of eight clustered genes, namely pregnancy-specific beta-1-glycoproteins PSG-1, -2, -3, -4, -5, -6, -9 and -11. These genes belonging to the carcinoembryonic antigen family encode the major placental proteins found in maternal circulation during pregnancy [27, 28].

In conclusion, in addition to providing differential expression profiles for individual genes, our EST-based procedure identified discrete regions on specific chromosomes that are enriched in genes deregulated in cancer libraries.


Owing to advances in biotechnology and bioinformatics, researchers can now capture "molecular portraits" of various particular cancers using gene chips or SAGE data. These methods provide information on tens of thousands of genes simultaneously, and some variations in genes might be directly related to the cancer phenotype [1, 29]. As multi-dimensional analysis of EST data is analogous to microarray experiments, we used the virtual differential display methodology to identify genes differentially expressed in normal versus tumor tissues. Our comprehensive approach gives an overview of numerous candidate genes which may be useful as improved biomarkers for diagnosis or as targets for developing novel treatment methods. For instance, EST-based formulation of collagen, integrin or cytokeratin expression profiles may have potential as a diagnostic aid for the detection of both tumor formation and development. Noteworthy, for discovery of tumor-associated molecules, it may be beneficial to use a combination of various digital differential display procedures and experimental data on gene expression. This is illustrated by the identification of prostate-specific Ets factor as a novel marker for breast cancer both computationally [8, 30] (and this study) and experimentally [3032].

General limitations of EST-based strategies, which have been abundantly discussed elsewhere [4, 33, 34], include poor sequencing depth of the libraries, uncertainty concerning the origin of the samples, and differences in library sizes. In addition, analysis of tumor-related differential expression patterns of individual transcripts may have specific drawbacks. For example, cancer cells often proliferate more rapidly than adjacent normal cells and it is possible that, in some cases, the observed changes in transcript abundance may reflect a response to increased proliferation rather than transformation per se. One related problem is that many cell types are often pooled together during the preparation of EST libraries. Given that most cancers start as growths of single cells, the lack of cell-type specific libraries is a major limiting factor of the method. Lastly, the determined variations in transcript expression may not correlate with similar variations in the abundance of the encoded protein, highlighting the need to experimentally test the computer-based predictions either by western blotting or immunohistochemistry. Our validation result showing that Bcl-xL protein expression was markedly increased in hepatocellular carcinoma and liver adenocarcinoma suggests that this Bcl-2 family member represent a potential marker for progression of a subset of liver cancers. Analysis of Bcl-xL immunoreactivity in more liver cancer specimens is needed to enhance the reliability of this finding. However, as it correlates with previous results [3537], and in view of the pro-survival effect of Bcl-xL, we hypothesize that Bcl-xL overexpression could confer specific protection from death to several types of liver cancer cells compared to their healthy counterparts. If true, modulation of Bcl-xL expression level and/or activity might represent an interesting strategy to optimize the efficacy of chemotherapeutic agents in this particular tissue, as liver cancer represents a significant source of morbidity and mortality worldwide [38].

Aside from the proposal of potential diagnosis markers and targets for future cancer research, a more theoretical perspective of our study is the identification of critical factors that could influence differential gene expression levels in normal versus cancer cells, including genomic landscape features, e.g. levels of polymorphisms, chromosome breakpoints, gene density, GC content and chromatin methylation status. In this regard, although we cannot rule out the possibility of unidentified biases in our data mining procedure, our result showing a higher frequency of gene inactivation patterns in tumor tissues is intriguing, and sheds light on the importance of understanding the molecular mechanisms of negative gene regulation in cancer.


The final outcomes of the present work are identification of chromosomal regions frequently associated with aberrant expression in cancer libraries, description of differential expression profiles, and listing of cancer candidate genes (e.g. Bcl-xL) which may be useful as tissue-specific biomarkers for cancer diagnosis or as targets for anticancer research.


Data preparation

We have used an EST-based pipeline to scan for differential gene expression levels between normal and tumor states. Human ESTs from dbEST [39] (October 2004 release) were first extracted using the ACNUC sequence retrieval system [40]. ESTs were classified according to their UNIGENE library features [41] (October 2004). For each EST in dbEST, we extracted the accession code of the EST and the tissue or organ from which the EST library has been made. The tissue type was retrieved from the line containing 'Tissue_type', 'Tissue description', 'Organ' or 'Keyword'. This parsing approach stored no data when the tissue information did not appear in these fields, or in case of typographical errors or ambiguous aliases. ESTs that were labeled as coming from an unspecified tissue (e.g. 'mixed', 'pooled organs', 'cell line') or from a mixture of specified tissues, were discarded. The eVOC ontology [42] (October 2004) for anatomical sites and pathology types was then used to classify the libraries through a number of criteria such as tissue origin and pathological context including tumor state. This well-accepted hierarchical vocabulary provided us with a mean to determine when a specific tissue was part of an organ and when a specific label was part of the 'tumoral' state. A total of 5135 'tumor' and 2503 'normal' (i.e. non-pathological) libraries were catalogued. Our approach to EST clustering used the human genome as a reliable guide. ENSEMBL RNAs [43] annotated on human genome assembly (release 16.3) were used as a backbone for the clustering of dbEST sequences using MEGABLAST (alignment length = 100 bp and similarity = 95%) [44]. In order to avoid paralogous false positive assignation, only best EST hit matches were subsequently selected. RNA clustering of ESTs in both normal and tumor tissues was the starting point for digital differential analysis of gene expression.

Computer-based differential display procedure

The cDNA libraries were categorized into non-normalized or normalized/subtracted libraries by screening for the appropriate keywords in the original annotation of the respective dbEST entries (in the 'Keyword' and 'Library treatment' fields). All libraries for which none of the keywords were found were defined as being non-normalized. After removal of normalized and subtracted EST libraries, we created pools of equivalent EST libraries, i.e. libraries derived from the same tissue type and state (normal and tumor). Differential screening analysis was accomplished for a considered tissue when both normal and tumor pools displayed at least 10,000 ESTs. A total of 15 distinct paired tissue pools (blood, bone, brain, colon, eye, liver, lung, lymph, mammary gland, muscle, placenta, prostate, skin, stomach, testis) representing approximately 1.5 million ESTs were therefore retained for the whole genome screening. Differential screening was performed for each tissue type individually using EST counts from tumors and corresponding normal counterparts. The relative expression of one particular gene in a tissue was characterized by the ratio of the number of ESTs matching this gene to the total number of ESTs sequenced in the respective tissue. As such 'gene expression profiles' were derived from 'normal' and 'tumor' libraries, it was possible to build 2 × 2 contingency tables and then to apply the Fisher exact test against the null hypothesis that there was no association between a particular gene and the tumoral state. A p-value was determined for statistical significance and, because multiple tests were performed, a Bonferroni correction was applied on each pairs in order to reduce the false positive rate and to perform candidate gene sorting. Statistically significant hits showing at least 10-fold differences were compiled. Four classes of genes were defined, namely (i) genes displaying significantly higher expression levels in tumor tissues ('up-regulated' genes); (ii) genes displaying significantly lower expression levels in tumor tissues ('down-regulated' genes); (iii) genes expressed in tumor but not in normal tissues ('tumor-specific' genes); (iv) genes absent in the tumor types compared to normal tissues. Apart from the genes displaying absolute differences between normal and tumor condition, a ratio based on EST abundance in both conditions was computed to estimate the expression fold change for up- and down-regulated genes. Cytogenetic map position of the hits was inferred using ENSEMBL data (release 16.3). The pattern of expression of the differentially expressed transcripts (n = 1190, as determined by the EST analysis) in normal tissues was independently assessed by comparison to SAGE results obtained on the SAGE Genie website [45] and processed as previously described [46]. A total of 141 (non-tumoral) libraries containing more than 20,000 tags were partitioned into 19 normal tissues. The expression pattern of 13,435 transcripts was determined. Eight tissues (blood, brain, colon, liver, lung, mammary gland, placenta and prostate) were unambiguously mapped to the tissue terms used in the EST data mining procedure. From this sample, we queried as to which candidate transcripts associated with differential expression in a particular tissue (on the basis of the EST predictions) was expressed in the corresponding normal tissue (according to the SAGE data). Information on differential expression was also gained from reference to primary literature. As this effort corresponded to a manual task particularly unfitted to the large number of candidate genes presented here, we limited the analysis to the "up-regulated" and "down-regulated" lists related to the liver and breast tissues. We found that 54.2% (for liver) and 41.7% (for breast) of the annotated candidates identified through our computer-based screen were consistent with previously published data (see Additional files 1 and 3). The differential display procedure and other analytical steps were developed with R [47]. Expression and genomic data were stored in a local PostgreSQL database (GeMCore) [48] using PERL and Java script.

Western blot analysis

Nitrocellulose membrane was from Euromedex (Souffelweyersheim, France). The membrane was immunoblotted with anti-human Bcl-xL antibody (1:1 000 dilution, BD Pharmingen), and then with anti-mouse IgG antibody conjugated to horseradish peroxidase (1:5 000 dilution, Dako). Protein bands were revealed using enhanced chemiluminescence kit (ECL, Amersham). The membrane was stripped according to manufacturer's instructions and reprobed with anti-glyceraldehyde-3-phosphate dehydrogenase (GAPDH) monoclonal antibody (1:1 000 dilution) and with anti-alpha-tubulin (1:1000 dilution, Sigma) to correct for differences in protein loading.


  1. 1.

    Gray JW, Collins C: Genome changes and gene expression in human solid tumors. Carcinogenesis. 2000, 21 (3): 443-452. 10.1093/carcin/21.3.443.

    PubMed  CAS  Article  Google Scholar 

  2. 2.

    Rajkovic A, Yan MSC, Klysik M, Matzuk M: Discovery of germ cell-specific transcripts by expressed sequence tag database analysis. Fertil Steril. 2001, 76 (3): 550-554. 10.1016/S0015-0282(01)01966-5.

    PubMed  CAS  Article  Google Scholar 

  3. 3.

    Wang J, Liang P: DigiNorthern, digital expression analysis of query genes based on ESTs. Bioinformatics. 2003, 19 (5): 653-654. 10.1093/bioinformatics/btg044.

    PubMed  CAS  Article  Google Scholar 

  4. 4.

    Schmitt AO, Specht T, Beckmann G, Dahl E, Pilarsky CP, Hinzmann B, Rosenthal A: Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues. Nucleic Acids Res. 1999, 27 (21): 4251-4260. 10.1093/nar/27.21.4251.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  5. 5.

    Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA, Prange C, Morin PJ, Polyak K, Papadopoulos N, Vogelstein B, Kinzler KW, Strausberg RL, Riggins GJ: A public database for gene expression in human cancers. Cancer Res. 1999, 59 (21): 5403-5407.

    PubMed  CAS  Google Scholar 

  6. 6.

    Dahl E, Sadr-Nabavi A, Klopocki E, Betz B, Grube S, Kreutzfeld R, Himmelfarb M, An HX, Gelling S, Klaman I, Hinzmann B, Kristiansen G, Grutzmann R, Kuner R, Petschke B, Rhiem K, Wiechen K, Sers C, Wiestler O, Schneider A, Hofler H, Nahrig J, Dietel M, Schafer R, Rosenthal A, Schmutzler R, Durst M, Meindl A, Niederacher D: Systematic identification and molecular characterization of genes differentially expressed in breast and ovarian cancer. J Pathol. 2005, 205 (1): 21-28. 10.1002/path.1687.

    PubMed  CAS  Article  Google Scholar 

  7. 7.

    Grutzmann R, Pilarsky C, Staub E, Schmitt AO, Foerder M, Specht T, Hinzmann B, Dahl E, Alldinger I, Rosenthal A, Ockert D, Saeger HD: Systematic isolation of genes differentially expressed in normal and cancerous tissue of the pancreas. Pancreatology. 2003, 3 (2): 169-178. 10.1159/000070087.

    PubMed  Article  Google Scholar 

  8. 8.

    Scheurle D, DeYoung MP, Binninger DM, Page H, Jahanzeb M, Narayanan R: Cancer gene discovery using digital differential display. Cancer Res. 2000, 60 (15): 4037-4043.

    PubMed  CAS  Google Scholar 

  9. 9.

    Baranova AV, Lobashev AV, Ivanov DV, Krukovskaya LL, Yankovsky NK, Kozlov AP: In silico screening for tumour-specific expressed sequences in human genome. FEBS Lett. 2001, 508 (1): 143-148. 10.1016/S0014-5793(01)03028-9.

    PubMed  CAS  Article  Google Scholar 

  10. 10.

    Brentani H, Caballero OL, Camargo AA, da Silva AM, da Silva WA, Dias Neto E, Grivet M, Gruber A, Guimaraes PE, Hide W, Iseli C, Jongeneel CV, Kelso J, Nagai MA, Ojopi EP, Osorio EC, Reis EM, Riggins GJ, Simpson AJ, de Souza S, Stevenson BJ, Strausberg RL, Tajara EH, Verjovski-Almeida S, Acencio ML, Bengtson MH, Bettoni F, Bodmer WF, Briones MR, Camargo LP, Cavenee W, Cerutti JM, Coelho Andrade LE, Costa dos Santos PC, Ramos Costa MC, da Silva IT, Estecio MR, Sa Ferreira K, Furnari FB, Faria M, Galante PA, Guimaraes GS, Holanda AJ, Kimura ET, Leerkes MR, Lu X, Maciel RM, Martins EA, Massirer KB, Melo AS, Mestriner CA, Miracca EC, Miranda LL, Nobrega FG, Oliveira PS, Paquola AC, Pandolfi JR, Campos Pardini MI, Passetti F, Quackenbush J, Schnabel B, Sogayar MC, Souza JE, Valentini SR, Zaiats AC, Amaral EJ, Arnaldi LA, de Araujo AG, de Bessa SA, Bicknell DC, Ribeiro de Camaro ME, Carraro DM, Carrer H, Carvalho AF, Colin C, Costa F, Curcio C, Guerreiro da Silva ID, Pereira da Silva N, Dellamano M, El-Dorry H, Espreafico EM, Scattone Ferreira AJ, Ayres Ferreira C, Fortes MA, Gama AH, Giannella-Neto D, Giannella ML, Giorgi RR, Goldman GH, Goldman MH, Hackel C, Ho PL, Kimura EM, Kowalski LP, Krieger JE, Leite LC, Lopes A, Luna AM, Mackay A, Mari SK, Marques AA, Martins WK, Montagnini A, Mourao Neto M, Nascimento AL, Neville AM, Nobrega MP, O'Hare MJ, Otsuka AY, Ruas de Melo AI, Paco-Larson ML, Guimaraes Pereira G, Pesquero JB, Pessoa JG, Rahal P, Rainho CA, Rodrigues V, Rogatto SR, Romano CM, Romeiro JG, Rossi BM, Rusticci M, Guerra de Sa R, Sant' Anna SC, Sarmazo ML, Silva TC, Soares FA, Sonati Mde F, de Freitas Sousa J, Queiroz D, Valente V, Vettore AL, Villanova FE, Zago MA, Zalcberg H: The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags. Proc Natl Acad Sci U S A. 2003, 100 (23): 13418-13423. 10.1073/pnas.1233632100.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  11. 11.

    Aouacheria A, Navratil V, Wen W, Jiang M, Mouchiroud D, Gautier C, Gouy M, Zhang M: In silico whole-genome scanning of cancer-associated nonsynonymous SNPs and molecular characterization of a dynein light chain tumour variant. Oncogene. 2005, 24 (40): 6133-6142. 10.1038/sj.onc.1208745.

    PubMed  CAS  Article  Google Scholar 

  12. 12.

    Du W, Lebowitz PF, Prendergast GC: Elevation of alpha2(I) collagen, a suppressor of Ras transformation, is required for stable phenotypic reversion by farnesyltransferase inhibitors. Cancer Res. 1999, 59 (9): 2059-2063.

    PubMed  CAS  Google Scholar 

  13. 13.

    Andreu T, Beckers T, Thoenes E, Hilgard P, von Melchner H: Gene trapping identifies inhibitors of oncogenic transformation. The tissue inhibitor of metalloproteinases-3 (TIMP3) and collagen type I alpha2 (COL1A2) are epidermal growth factor-regulated growth repressors. J Biol Chem. 1998, 273 (22): 13848-13854. 10.1074/jbc.273.22.13848.

    PubMed  CAS  Article  Google Scholar 

  14. 14.

    Moll R: Cytokeratins in the histological diagnosis of malignant tumors. Int J Biol Markers. 1994, 9 (2): 63-69.

    PubMed  CAS  Google Scholar 

  15. 15.

    Jiang Y, Harlocker SL, Molesh DA, Dillon DC, Stolk JA, Houghton RL, Repasky EA, Badaro R, Reed SG, Xu J: Discovery of differentially expressed genes in human breast cancer using subtracted cDNA libraries and cDNA microarrays. Oncogene. 2002, 21 (14): 2270-2282. 10.1038/sj.onc.1205278.

    PubMed  CAS  Article  Google Scholar 

  16. 16.

    Holland EC, Sonenberg N, Pandolfi PP, Thomas G: Signaling control of mRNA translation in cancer pathogenesis. Oncogene. 2004, 23 (18): 3138-3144. 10.1038/sj.onc.1207590.

    PubMed  CAS  Article  Google Scholar 

  17. 17.

    Ruggero D, Grisendi S, Piazza F, Rego E, Mari F, Rao PH, Cordon-Cardo C, Pandolfi PP: Dyskeratosis congenita and cancer in mice deficient in ribosomal RNA modification. Science. 2003, 299 (5604): 259-262. 10.1126/science.1079447.

    PubMed  CAS  Article  Google Scholar 

  18. 18.

    Bader AG, Vogt PK: An essential role for protein synthesis in oncogenic cellular transformation. Oncogene. 2004, 23 (18): 3145-3150. 10.1038/sj.onc.1207550.

    PubMed  CAS  Article  Google Scholar 

  19. 19.

    Lai J, Flanagan J, Phillips WA, Chenevix-Trench G, Arnold J: Analysis of the candidate 8p21 tumour suppressor, BNIP3L, in breast and ovarian cancer. Br J Cancer. 2003, 88 (2): 270-276. 10.1038/sj.bjc.6600674.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  20. 20.

    Matsushima M, Fujiwara T, Takahashi E, Minaguchi T, Eguchi Y, Tsujimoto Y, Suzumori K, Nakamura Y: Isolation, mapping, and functional analysis of a novel human cDNA (BNIP3L) encoding a protein homologous to human NIP3. Genes Chromosomes Cancer. 1998, 21 (3): 230-235. 10.1002/(SICI)1098-2264(199803)21:3<230::AID-GCC7>3.0.CO;2-0.

    PubMed  CAS  Article  Google Scholar 

  21. 21.

    Rountree MR, Bachman KE, Herman JG, Baylin SB: DNA methylation, chromatin inheritance, and cancer. Oncogene. 2001, 20 (24): 3156-3165. 10.1038/sj.onc.1204339.

    PubMed  CAS  Article  Google Scholar 

  22. 22.

    Robertson KD: DNA methylation and chromatin – unraveling the tangled web. Oncogene. 2002, 21 (35): 5361-5379. 10.1038/sj.onc.1205609.

    PubMed  CAS  Article  Google Scholar 

  23. 23.

    Wikman H, Nymark P, Vayrynen A, Jarmalaite S, Kallioniemi A, Salmenkivi K, Vainio-Siukola K, Husgafvel-Pursiainen K, Knuutila S, Wolf M, Anttila S: CDK4 is a probable target gene in a novel amplicon at 12q13.3-q14.1 in lung cancer. Genes Chromosomes Cancer. 2005, 42 (2): 193-199. 10.1002/gcc.20122.

    PubMed  CAS  Article  Google Scholar 

  24. 24.

    Elkahloun AG, Krizman DB, Wang Z, Hofmann TA, Roe B, Meltzer PS: Transcript mapping in a 46-kb sequenced region at the core of 12q13.3 amplification in human cancers. Genomics. 1997, 42 (2): 295-301. 10.1006/geno.1997.4727.

    PubMed  CAS  Article  Google Scholar 

  25. 25.

    Moskaluk CA, Rumpel CA: Allelic deletion in 11p15 is a common occurrence in esophageal and gastric adenocarcinoma. Cancer. 1998, 83 (2): 232-239. 10.1002/(SICI)1097-0142(19980715)83:2<232::AID-CNCR5>3.0.CO;2-S.

    PubMed  CAS  Article  Google Scholar 

  26. 26.

    Scelfo RA, Schwienbacher C, Veronese A, Gramantieri L, Bolondi L, Querzoli P, Nenci I, Calin GA, Angioni A, Barbanti-Brodano G, Negrini M: Loss of methylation at chromosome 11p15.5 is common in human adult tumors. Oncogene. 2002, 21 (16): 2564-2572. 10.1038/sj.onc.1205336.

    PubMed  CAS  Article  Google Scholar 

  27. 27.

    Hammarstrom S: The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues. Semin Cancer Biol. 1999, 9 (2): 67-81. 10.1006/scbi.1998.0119.

    PubMed  CAS  Article  Google Scholar 

  28. 28.

    Beckers JF, Zarrouk A, Batalha ES, Garbayo JM, Mester L, Szenci O: Endocrinology of pregnancy: chorionic somatomammotropins and pregnancy-associated glycoproteins: review. Acta Vet Hung. 1998, 46 (2): 175-189.

    PubMed  CAS  Google Scholar 

  29. 29.

    Strausberg RL, Simpson AJ, Wooster R: Sequence-based cancer genomics: progress, lessons and opportunities. Nat Rev Genet. 2003, 4 (6): 409-418. 10.1038/nrg1085.

    PubMed  CAS  Article  Google Scholar 

  30. 30.

    Ghadersohi A, Sood AK: Prostate epithelium-derived Ets transcription factor mRNA is overexpressed in human breast tumors and is a candidate breast tumor marker and a breast tumor antigen. Clin Cancer Res. 2001, 7 (9): 2731-2738.

    PubMed  CAS  Google Scholar 

  31. 31.

    Mitas M, Mikhitarian K, Hoover L, Lockett MA, Kelley L, Hill A, Gillanders WE, Cole DJ: Prostate-Specific Ets (PSE) factor: a novel marker for detection of metastatic breast cancer in axillary lymph nodes. Br J Cancer. 2002, 86 (6): 899-904. 10.1038/sj.bjc.6600190.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  32. 32.

    Katayama S, Nakayama T, Ito M, Naito S, Sekine I: Expression of the ets-1 proto-oncogene in human breast carcinoma: differential expression with histological grading and growth pattern. Histol Histopathol. 2005, 20 (1): 119-126.

    PubMed  CAS  Google Scholar 

  33. 33.

    Imyanitov EN, Togo AV, Hanson KP: Searching for cancer-associated gene polymorphisms: promises and obstacles. Cancer Lett. 2004, 204 (1): 3-14. 10.1016/j.canlet.2003.09.026.

    PubMed  CAS  Article  Google Scholar 

  34. 34.

    Qiu P, Wang L, Kostich M, Ding W, Simon JS, Greene JR: Genome wide in silico SNP-tumor association analysis. BMC Cancer. 2004, 4 (1): 4-10.1186/1471-2407-4-4.

    PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Garcia EJ, Lawson D, Cotsonis G, Cohen C: Hepatocellular carcinoma and markers of apoptosis (bcl-2, bax, bcl-x): prognostic significance. Appl Immunohistochem Mol Morphol. 2002, 10 (3): 210-217. 10.1097/00022744-200209000-00004.

    PubMed  CAS  Google Scholar 

  36. 36.

    Takehara T, Liu X, Fujimoto J, Friedman SL, Takahashi H: Expression and role of Bcl-xL in human hepatocellular carcinomas. Hepatology. 2001, 34 (1): 55-61. 10.1053/jhep.2001.25387.

    PubMed  CAS  Article  Google Scholar 

  37. 37.

    Watanabe J, Kushihata F, Honda K, Sugita A, Tateishi N, Mominoki K, Matsuda S, Kobayashi N: Prognostic significance of Bcl-xL in human hepatocellular carcinoma. Surgery. 2004, 135 (6): 604-612. 10.1016/j.surg.2003.11.015.

    PubMed  Article  Google Scholar 

  38. 38.

    Guyton KZ, Kensler TW: Prevention of liver cancer. Curr Oncol Rep. 2002, 4 (6): 464-470.

    PubMed  Article  Google Scholar 

  39. 39.

    DbEST: Expressed Sequence Tags database. []

  40. 40.

    Gouy M, Gautier C, Attimonelli M, Lanave C, di Paola G: ACNUC – a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput Appl Biosci. 1985, 1 (3): 167-172.

    PubMed  CAS  Google Scholar 

  41. 41.

    Unigene: organized view of the transcriptome. []

  42. 42.

    Evoke: expression ontology toolkit. []

  43. 43.

    Ensembl database. []

  44. 44.

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  45. 45.

    Liang P: SAGE Genie: a suite with panoramic view of gene expression. Proc Natl Acad Sci U S A. 2002, 99 (18): 11547-11548. 10.1073/pnas.192436299.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  46. 46.

    Semon M, Mouchiroud D, Duret L: Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Hum Mol Genet. 2005, 14 (3): 421-427. 10.1093/hmg/ddi038.

    PubMed  CAS  Article  Google Scholar 

  47. 47.

    The Comprehensive R Archive Network. []

  48. 48.

    GeM (Genomic Mapping) Website. []

Download references


VN is supported by a grant from INRA. AA is recipient of a fellowship from the ARC. The authors wish to thank Sandy Jacquier for critical reading of the manuscript, Dr. Pierre Colas at Aptanomics for providing the anti-Bcl-xL antibody and Dr. Cyrile Lamigeon for the anti-GAPDH antibody. We are grateful to Dr. Marie Semon for sharing the SAGE data.

Author information



Corresponding author

Correspondence to Abdel Aouacheria.

Additional information

Authors' contributions

AA designed the study, performed the immunoblot assays, analyzed the data and drafted the manuscript. VN developed the algorithm for the differential display procedure, processed the SAGE data, participated in the data analyses and reviewed the manuscript. AB provided the antibodies and participated in the western blotting experiments. DM and CG provided funding and supervision for the work. All authors read and approved the final manuscript.

Abdel Aouacheria, Vincent Navratil contributed equally to this work.

Electronic supplementary material

Additional File 1: Upregulated candidates complete list. Upregulated genes in tumor tissues (complete list). Hits displaying at least a 10-fold increase in tumor-derived libraries compared to their normal tissue counterpart are shown. Chromosomal locations for each hit were inferred from Ensembl cytogenetic map. Hits were sorted by p value (exact Fisher's test; p < 0.05, Bonferroni corrected), ranked by expression ratio and ordered by tissue. Both known and novel ('NULL') transcripts are listed. 'Y': 'Yes'; 'ND': non-determined. Pubmed ID (PMID) is given for annotated candidate transcripts whose differential expression was documented in previously published data. '*': in silico studies. (XLS 102 KB)

Additional File 2: Downregulated candidates complete list. Downregulated genes in tumor tissues (complete list). Hits displaying at least a 10-fold decrease in tumor-derived libraries compared to their normal tissue counterpart are shown. Chromosomal locations for each hit were inferred from Ensembl cytogenetic map. Hits were sorted by p value (exact Fisher's test; p < 0.05, Bonferroni corrected), ranked by expression ratio and ordered by tissue. Both known and novel ('NULL') transcripts are listed. 'Y': 'Yes'; 'ND': non-determined. Pubmed ID (PMID) is given for annotated candidate transcripts whose differential expression was documented in previously published data. '*': in silico studies. (XLS 172 KB)

Additional File 3: Tumor specific candidates complete list. Complete list of genes absent from normal tissues and present in tumor types. Tumor-specific hits are shown. Chromosomal locations for each hit were inferred from Ensembl cytogenetic map. Hits were sorted by p value (exact Fisher's test; p < 0.05, Bonferroni corrected) and ranked by tissue origin. Both known and novel ('NULL') transcripts are listed. 'Y': 'Yes'; 'ND': non-determined. (XLS 68 KB)

Additional File 4: Normal specific candidates complete list. Summary of genes absent from tumor types and present in normal tissues. Genes absent in the tumor types compared to normal tissues is shown. Chromosomal locations for each hit were inferred from Ensembl cytogenetic map. Hits were sorted by p value (exact Fisher's test; p < 0.05, Bonferroni corrected) and ranked by tissue origin. Both known and novel ('NULL') transcripts are listed. 'Y': 'Yes'; 'ND': non-determined. (XLS 104 KB)

Additional File 5: Consistent candidates in multiple tissues. Genes whose transcripts varied significantly and consistently in abundance in at least two different tissues. (XLS 42 KB)

Additional File 6: Tissue specific candidates. Genes whose transcripts exhibited tissue-specific differential expression in normal versus tumor libraries. (XLS 32 KB)

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Aouacheria, A., Navratil, V., Barthelaix, A. et al. Bioinformatic screening of human ESTs for differentially expressed genes in normal and tumor tissues. BMC Genomics 7, 94 (2006).

Download citation


  • Differential Expression Profile
  • Sage Data
  • Cancer Candidate Gene
  • Digital Differential Display
  • Data Mining Procedure