Characterization of the global profile of genes expressed in cervical epithelium by Serial Analysis of Gene Expression (SAGE)

Background Serial Analysis of Gene Expression (SAGE) is a new technique that allows a detailed and profound quantitative and qualitative knowledge of gene expression profile, without previous knowledge of sequence of analyzed genes. We carried out a modification of SAGE methodology (microSAGE), useful for the analysis of limited quantities of tissue samples, on normal human cervical tissue obtained from a donor without histopathological lesions. Cervical epithelium is constituted mainly by cervical keratinocytes which are the targets of human papilloma virus (HPV), where persistent HPV infection of cervical epithelium is associated with an increase risk for developing cervical carcinomas (CC). Results We report here a transcriptome analysis of cervical tissue by SAGE, derived from 30,418 sequenced tags that provide a wealth of information about the gene products involved in normal cervical epithelium physiology, as well as genes not previously found in uterine cervix tissue involved in the process of epidermal differentiation. Conclusion This first comprehensive and profound analysis of uterine cervix transcriptome, should be useful for the identification of genes involved in normal cervix uterine function, and candidate genes associated with cervical carcinoma.


Background
One of the most frequent malignancies in women worldwide is the Uterine Cervical Carcinoma (CC), both in incidence and mortality and the first cause of death among the Mexican female population [1]. High-risk human papillomavirus (HPV) persistent infection is considered the most important risk factor associated with the development of this tumor [2,3]. Although HPV is a mandatory cause for CC, it is not sufficient to trigger all the changes required for its development [4].
A number of recent studies about gene expression profiles in in vitro HPV-infected cultured keratinocytes and from (CC) clinical samples have provided an initial notion of the changes in gene expression induced by HPV and in early CC [5][6][7][8][9][10]. Moreover, some studies have compared normal versus tumor-induced gene expression in cervical samples with the aim to identify potential tumor markers of clinical value [11][12][13].
At present, there are reports of genes expressed by keratinocytes derived from a normal human epidermis and from mouse uterus carried out by Serial Analysis of Gene Expression (SAGE) [14][15][16][17]. However, no such study exists for human cervix. Therefore, the aim of our study was to describe the first compendium of expressed genes in normal cervical epithelium, which is composed mainly by keratinocytes strongly influenced by hormones. To achieve this we used SAGE, which is capable of producing an accurate molecular picture of cervical tissue based on expressed genes, as the main methodology. As SAGE is not dependent on preexisting databases of expressed genes, it provides an unbiased view of gene expression profiles within the mRNA populations [18]. SAGE allows the simultaneous quantitative and qualitative analysis of thousands of gene transcripts based on two principles: first, 14 mers are sufficient to uniquely identify 95% of cell transcripts [19]; and second, cloning of these 14 bp tags serially with the insertion of a restriction enzyme recognition sequence as an anchor, the throughput is considerably increased. To obtain a catalog of expressed genes and their relative frequencies we performed database analysis to relate each tag to its corresponding gene [20]. As an important drawback of SAGE is that a large amount of messenger RNA (2.5-5 µg polyA RNA) is required, and our tissue supply was limited (a punch biopsy) we employed the MicroSAGE protocol in RNA thereof [21]. The present report describes a partial transcriptome of a sample derived from normal cervical epithelium used to construct a SAGE library with 30,418 sequenced tags.

SAGE library derived from one normal uterine ectocervical sample
Our Sage library was obtained from ectocervical tissue from a 38 year old healthy woman with active sexual life, not taking any hormonal therapy, nor any other drug that could potentially alter cervical physiology, we designated this as SAGE_cervix_normal_B_1. Histological analysis of this sample by H&E revealed normal ectocervical tissue, approximately 80% epithelium and 20% stroma without evidence of glands. There were minimal inflammatory infiltrates in the periphery of the sample, considered normal for this type of tissue.
The SAGE library yielded 30,418 sequenced tags, which was used to generate a table, which represents genes expressed in normal human cervix. For a complete list of the expressed genes, please visit the SAGEmap website, [22,23]. The derived catalog of expressed genes represents the first attempt to generate a comprehensive and profound analysis of the cervical epithelium expression profile. The wealth of information obtained allows detection of genes involved in normal epithelium physiology, as well as possible target genes of HPV infection. In general, tag frequency in a typical SAGE experiment follows a normal distribution [24,25]. Table 1 summarizes the general statistics of this library. As seen there is a normal distribution, where only a limited number of tags were either highly expressed or at an extremely low frequency (4.4 and 4.9%, respectively). Tags with a frequency of 1 were not considered for quantitative purposes, because these are likely to represent artifacts of sequencing or of the SAGE procedure [26].

Representativity of the data
According to Zhang et al. [27], a study of SAGE data mining analysis of 300,000 tags, 75% of mRNA consists of transcripts expressed at more than five copies per cell, and, in general, transcripts are expressed at a range from one to 5,300 copies per cell. With this in mind, our ~30,000-tag library, represents 10% of the total tags analyzed by Velculescu et al. The most frequently represented tag in the current report had a frequency of 515 (16,930 tags per million TPM). An estimate of such data indicates that this gene tag has an expression level of ~5,150 copies per cell, similar to what is observed in digital northerns of other top expressed tags in SAGE libraries ( Figure 1A). We have to keep in mind, however, that in certain tissues, some genes are expressed at much higher levels, such as growth hormone, with 149,630 TPM in pituitary gland [28]. Because SAGE analysis represents a qualitative and quantitative assay of messenger RNA abundance not biased by cloning or polymerase chain reaction efficiency [29], our data provide an estimate of the genes normally expressed by normal uterine cervix.
Among the most frequently expressed tags in our library ( Table 2), some corresponded to ubiquitously expressed transcripts (GRIN2C, FTH1, GNS, RPLP2, RPL21). The presence of this type of genes is a common result in SAGE experiments with an expected heterogeneity in their expression levels [14,15,17,19], indicating a possible role as housekeeping genes ( Figure 1B). In a report Velculescu et al., by means of data base analysis of SAGE libraries, found that ~1,000 genes are present in all normal or tumor tissues analyzed with over five copies per cell [30]. Hence, this list of genes identified by data mining is termed minimal transcriptome (i. e., the set of genes expressed by every cell), which represents genes constitu-tively expressed. In supplementary information of Velculescu's work [30] a search for the minimal transcriptome in our library, indicates >95% of housekeeping genes (data not shown), further validating the cervical library.

Spectrum of genes expressed by normal cervical tissue
To obtain better knowledge of the functional categories of global gene expression profile, we employed the Fatigo Data mining website [31,32]. Figure 2 shows the distribution of expressed genes by functional categories defined by the Gene Ontology Consortium. As seen, the most frequent individual transcripts correspond to genes involved in maintenance and basic metabolism. On the other hand, genes corresponding to other processes such as cell growth regulation, morphogenesis, cell differentiation, or death were not as frequently expressed.

Top expressed non-ubiquitous genes in normal cervical tissue mainly correspond to epithelial growth and differentiation
It was important to distinguish which non-ubiquitous genes were predominantly expressed in normal cervix. As seen in table 2, genes related with epithelial differentiation and squamous architectural maintenance are abundantly represented in our library. These include S100A8, S100A9, and SPRR3, that belong to a complex of genes that are subject to coordinate regulation during keratinocyte differentiation. This complex has been called the epidermal differentiation complex (EDC) and is located on the 1q21 chromosome [33,34]. These genes share spatial and temporal expression and interrelated functions and are grouped in three related gene families: cornified envelope precursor proteins (involucrin, loricrin, and the small proline-rich proteins [SPRRs]); intermediate filament-associated proteins (profilaggrin and trichohyalin), and calcium binding proteins (the S100As) [reviewed in [35]]. Approximately 30 genes belonging to the EDC are clustered together in a 200 Mb region, from which there are 20 genes expressed the in cervical SAGE library (Table  3).

End point RT-PCR analysis confirms expression of genes detected by SAGE
It was important to confirm the expression of some EDC representative genes in different normal cervical tissues by a different technique. For this, we chose end point reverse transcriptase polymerase chain reaction (RT-PCR) analysis. Figure 3 shows the expression of five EDC genes in HPV negative tissue samples with no histopathologic lesion. As expected, the majority of cases expressed these genes. However, there were some differences in the level of expression among the different normal samples. This could be due to the fact that samples were taken on different days of the menstrual cycle (hormonal influence) or to unknown physiological differences among biological systems.

Minor expression of fibroblast-related genes in cervical tissue
The gene expression catalog reported on here was obtained from a heterogeneous population of cells composed mainly of epithelial keratinocytes in dissimilar differentiation stages (basale, spinosum and granulosum strata). Nevertheless, these tissues also contain fibroblasts associated with connective, besides other minor cell populations. To know which genes are related to fibroblasts, we compared a SAGE library derived from neonatal foreskin primary fibroblasts (Agnes Baross, British Columbia Genome Sciences Centre). We found 923 gene tags shared by both libraries, which could due to the presence of fibroblasts in the Cervix SAGE library (supplementary information). Shared genes with known biological function reveal that processes as signal transduction, regulation of transcription and cell adhesion are mainly involved. We consider important to identify minor contributions to global gene expression profile in a heterogeneous cell population; however, it is important to note that unknown differences between cervical and neonatal foreskin fibroblasts could exist.

Conclusion
To our knowledge, this is the first effort to achieve a global profile of gene expression in normal cervical tissue. This was accomplished by means of a methodology that produced an accurate catalog of expressed genes in this tissue. Analysis of gene expression revealed genes involved in keratinocyte differentiation. These genes have not been detected in cervical epithelium by traditional methodolo-gies such as RT-PCR or in situ hybridization. Although our SAGE library was derived from a single donor, the majority of samples analyzed expressed the genes selected, indicating reproducibility in human samples. SAGE methodology is a complex and expensive analysis mainly due to the great sequencing efforts required to achieve SAGE libraries. Nevertheless, the overwhelming information derived from these justifies the effort and provides better knowledge of cervical biology and physiology. In a near future, it could also provide an insight of cervical physiology or HPV infection and in other pathologies affecting cervical tissue.

Tissues
Normal cervices were obtained from women with negative Pap smears, confirmed by histopathological analysis, attending at the Dysplasia Clinic at General Hospital of Mexico, SS who had been subjected to hysterectomy due Functional categories assigned to individual genes identified in normal cervical SAGE library Figure 2 Functional categories assigned to individual genes identified in normal cervical SAGE library. Genes can be assigned in different functional categories. a The percentage was calculated with 3,764 initial genes from which 2,720 genes had Gene Ontology classification. to uterine myomatosis. All patients were in reproductive age and none of them received hormonal therapy or contraceptives. All the described procedures were evaluated and approved by the local ethics committee of the Mexican Institute of Social Security. Written informed consent was obtained from all the patients. All tissue samples were longitudinally divided in three sections, the central part was snapped frozen in liquid nitrogen and stored at -70°C until nucleic acid extraction, and the other two were fixed overnight in 70% ethanol and were paraffin embedded at the Department of Pathology, Oncology Hospital, National Medical Center SXXI, Mexico. Serial sections from these fractions stained by Haematoxilin/ Eosin were inspected for representativity of the tissue.

HPV detection and typing
Genomic DNA was extracted from the phenol phase left by the TRIzol reagent (Gibco BRL, USA) RNA isolation protocol and amplified by PCR with MY11/MY09 primers [36] (Table 4). PCR products were separated by electrophoresis on 1% agarose gel. Only HPV negative samples were included in this study.
Expression of genes clustered in 1q21, in normal cervical tissues Figure 3 Expression of genes clustered in 1q21, in normal cervical tissues. One hundred nanograms of total RNA purified of each sample was used in one RT-PCR reaction with gene specific primers; then one tenth of each RT-PCR reaction was subjected to agarose gel electrophoresis. MW: molecular weight marker; C1-C6 six different normal cervical samples  ACGCCCATCTTTATCACCAG  58  160  This paper  S100 A9 TCAGCTGGAACGCAACATAGA  TCAGCTGCTTGTCTGCATT  56  205  This Paper  SPRR3  TTCCACAACCTGGAAACACA  TTCAGGGACCTTGGTGTAGC  55  174  This paper  NICE-3 ACGGCTATGAAACAGCCCGCTA  GCACATTGCAACTGACTGGCTT  57  330  This paper  NICE-4 ACGGAATCCAATGAGGAAGGCA TCAGTATTGGCTGGCTCTGCAT  57  294  This paper  GAPDH CATCTCTGCCCCCTCTGCTGA  GGATGACCTTGCCCACAGCCT  60 205 [38] a Tm was calculated using primerquest program from [39]; however, it was necessary to adjust Tm in some cases.

Micro SAGE protocol
Micro SAGE was performed according to Datson et al. [21] with minor modifications, by means of the Invitrogen's I-SAGE kit (Invitrogen, San Diego, CA USA). RNA isolation was done in TRIzol according to manufacturer's instructions. Five µg of total RNA was used as input material. A heating step was introduced at 65°C for 10 minutes followed by 2 minutes on ice to allow a better separation of concatenamers [37]. Products greater than 300 bp and smaller than 2,000 bp were excised, extracted and cloned in the SphI site of pZero vector. Clones were selected and screened for inserts by PCR. Cervix library was sequenced by Agencourt through SAGE sequencing service (CGAP collaboration, GR). Sequence files were analyzed with the SAGE300 software [18,20], which identifies the anchoring enzyme sites and extracts the two tags flanked by NlaIII site. Gene identity and UniGene cluster assignment of each SAGE tag was obtained by using the tag-to-gene "reliable" map, from SAGEmap NCBI site [22,23]. The tags extracted were uploaded to SAGEmap and corresponding accession numbers were retrieved using the H. sapiens NCBI-GenBank database.

Reverse Transcription-Polymerase Chain Reaction (RT-PCR) analysis
Total RNA was extracted from six normal cervical tissues using TRIzol, quantified by densitometric analysis and its quality evaluated by denaturing gel electrophoresis. Contaminiating DNA was digested and removed with Rnasefree Dnase (Promega). Expression analysis was performed using 100 ng total RNA in a RT-PCR reaction (Access RT-PCR System, Promega). The mRNA was reverse-transcribed at 48°C for 45 min. After an initial denaturation at 94°C for 2 minutes, the double stranded cDNA synthesized was amplified for 40 cycles with denaturation at 94°C for 30 seconds, annealing at 54-60°C for 1 minute and extension at 70°C for 2 minutes with specific oligonucleotides (Table 4) in a Perkin Elmer 480 Thermocycler.
Sense and antisense sequence of oligonucleotides for S100 A8 and 9, SPRR3, NICE-3 and -4 genes were designed with the program Primerquest [38]. GAPDH gene expression was used as an internal control.