The Retinome - Defining a reference transcriptome of the adult mammalian retina/retinal pigment epithelium
© Schulz et al. 2004
Received: 07 April 2004
Accepted: 29 July 2004
Published: 29 July 2004
Skip to main content
© Schulz et al. 2004
Received: 07 April 2004
Accepted: 29 July 2004
Published: 29 July 2004
The mammalian retina is a valuable model system to study neuronal biology in health and disease. To obtain insight into intrinsic processes of the retina, great efforts are directed towards the identification and characterization of transcripts with functional relevance to this tissue.
With the goal to assemble a first genome-wide reference transcriptome of the adult mammalian retina, referred to as the retinome, we have extracted 13,037 non-redundant annotated genes from nearly 500,000 published datasets on redundant retina/retinal pigment epithelium (RPE) transcripts. The data were generated from 27 independent studies employing a wide range of molecular and biocomputational approaches. Comparison to known retina-/RPE-specific pathways and established retinal gene networks suggest that the reference retinome may represent up to 90% of the retinal transcripts. We show that the distribution of retinal genes along the chromosomes is not random but exhibits a higher order organization closely following the previously observed clustering of genes with increased expression.
The genome wide retinome map offers a rational basis for selecting suggestive candidate genes for hereditary as well as complex retinal diseases facilitating elaborate studies into normal and pathological pathways. To make this unique resource freely available we have built a database providing a query interface to the reference retinome .
The mammalian retina is a highly structured tissue developmentally originating from neuroectodermal evagination of the diencephalon and subsequent invagination processes resulting in the formation of two cellular layers which ultimately give rise to the inner neural retina and the outer retinal pigment epithelium (RPE) monolayer . In the adult, the neural retina consists of approximately 55 distinct cell types histologically structured into three layers of cells (photoreceptors, intermediate neurons and ganglion cells) and two layers of neuronal interconnections (outer and inner plexiform layers) . The RPE is differentiated into polarized cells with an apical and a basal orientation separating the neural retina from the underlying choroidal blood supply. With its apical microvilli-like processes, the RPE establishes an intimate contact with the photoreceptor outer segments to sustain their metabolic support and maintain photoreceptor integrity . Together, the neural retina and the RPE provide the structural and functional basis for light perception by ensuring the capture of photons, the conversion of light stimuli into complex patterns of neuronal impulses and the transmission of the initially processed signals to the higher visual centers of the brain.
Recent progress in retinal research has greatly enhanced our current understanding of basic functional processes in the adult retina (e.g. [4, 5]). A great deal of effort has focused on the molecular dissection of the phototransduction pathway and the retinoid cycle (e.g. ref. ). Besides elucidating physiological mechanisms in normal tissue, the identification of genes involved in hereditary retinal disease has provided another valuable source of insight into functional pathways of the retina and the RPE (reviewed in [7, 8]).
Despite these advances, a remaining challenge is to obtain a reference genome-wide expression map of the retina/RPE transcriptome, further facilitating the identification of retinal susceptibility genes, but most importantly, offering an invaluable resource for functional genomics studies. Initial analyses of human [9, 10] and mouse  whole genome sequences and the use of more recent comparative gene prediction algorithms [12, 13] suggest an overall number of mammalian gene loci in the range of 35,000 to 45,000. These estimates have largely been validated by experimental data on gene transcription [14, 15] although alternative promoter usage, differential exon splicing during mRNA maturation, alternative usage of polyadenylation sites and other post-transcriptional modifications may further increase the genetic diversity required to encode the full complement of cellular transcripts [16, 17]. In addition, there may be a considerable number of non-coding genes unaccounted for by current annotations .
In recent years, a number of approaches and technologies were adopted to identify genes expressed in the retina/RPE of human, cow, dog and mouse including data-mining and assembly of publically available expressed sequence tag (EST) information [19–23], sequencing of cDNA libraries generated via conventional methods [24–29] or via normalization techniques [30, 31], hybridization to gene arrays of various formats [32, 33] and serial analysis of gene expression (SAGE) [34, 35]. Suppression subtractive hybridization (SSH) has been shown to be an efficient technique with which differentially expressed genes can be normalized and enriched over 1000-fold in a single round of hybridization . Subsequently, applications of SSH to identify retina and RPE-enriched genes have been reported [37–39].
Based on a comprehensive survey of data available from 27 independent studies applying a wide spectrum of gene identification approaches we have now assembled a first genome-wide reference transcriptome of the adult mammalian retina/RPE. This reference transcriptome comprises 13,037 non-redundant transcripts and likely reflects up to 90% of the mammalian retinome.
Studies identifying adult mammalian retina / RPE transcripts and details on gene data retrieval
Source / method a)
No. of genes retrieved c) (non-redundant)
No. of genes identified in ≥ 2 studies
Retina, TIGR (ID: version 3.3)
Retina, TIGR (ID: version 3.3)
Retina, UniGene (ID: build 118)
Retina, UniGene (ID: build 113)
Retina, EST (dbEST)
UniGene (Build 162) d)
UniGene (Build 162) d)
UniGene (Build 162) d)
Retina & RPE
cDNA library sequencing
Retina & RPE, SSH
Retina, PC and subtracted
RPE, primary and amplified, conventional
 and online e)
 and online f)
Retina, PC (ID: MRA)
Retina, Affymetrix (ID: Mu11K subB)
Retina, custom array
Serial Analysis of Gene Expression
Retina (ID: HMAC2)
Retina (ID: HPR1)
Retina (ID: HPR2)
RPE (ID: HRPE1)
Total (with inter-study redundancy)
Total (without redundancy)
Frequency of unique genes in studies
No. of studies
No. of unique genes
Representation of retinome and partial assemblies of heart, liver, and prostate transcriptomes in defined retina/RPE gene groups
No. of non-redundant genes identified in ≥ 2 studies a)
Retina / RPE-specific genes b) (n = 43)
Vitamin A / phototrans-duction pathway c) (n = 57)
Retina / RPE genes (verified by immunohisto-chemistry d)) (n = 260)
Retinal disease genes e) (n = 102)
Source / Reference
UniGene Build 166, SAGE library GSM1499
UniGene Build 166, SAGE library GSM785
UniGene Build 166, SAGE libraries GSM685, GSM739, GSM764
A comparison of the 13K retinome with partial transcriptomes of heart, liver, and prostate suggests a high degree of overlapping expression between retina/RPE and heart (3,496/3,660), liver (5,343/5,780) and prostate (6,471/7,018). A total of 2,330 genes are expressed in all tissues and represent putative "housekeeping" genes (see additional File 12). It should be noted that the low number of ubiquitously expressed genes is largely due to the fragmentary nature of the heart, liver, and prostate transcriptomes. With increasing transcriptome complexities this number is likely to increase. Analysis of the least complete transcriptome, the heart, reveals that 2,330/3,660 (64%) transcripts can be classified as ubiquitously expressed (see additional Files 9 and 12) while a maximum of 1,330/3,660 (36%) genes may display tissue-restricted or tissue-specific expression. A comparison of more complete transcriptomes may significantly reduce the latter estimate. So far 5,051 genes are only found in the retinome representing a collection of "retinome-enriched" transcripts, while 7,986 are also present in at least one of the partial transcriptomes of the heart, liver or prostate. Thirty-two genes were found to be expressed in heart, liver and prostate but not in the retinome (see additional File 13).
Number of genes mapped to retinal disease loci
Flanking DNA marker
Size of locus (Mb)
No. of SGPs (n = 43,109)
No. of retinome transcripts a) (n = 13,307)
No. of "retinome-enriched" transcripts (n = 5,051)
marker position not available
marker position not available
marker position not available
Compiling the transcriptome of a cell or tissue is arguably more demanding than establishing the number of gene loci encoded by a given genome sequence . This may mainly be explained by the dynamic nature of mRNA itself which frequently produces alternative transcripts from a single gene locus by usage of tissue-specific promoters, cryptic splice sites or variable polyadenylation signals [44, 45]. In addition, variation in gene expression is known to occur within and between populations [46, 47] and allele-specific expression, even from non-imprinted genes, appears to be common . Further complicating transcriptome definition are effects of gender and age on RNA expression  as well as agonal and postmortem factors which greatly affect RNA integrity and thus frequently influence subsequent analyses . Finally, differences in experimental technologies and data post-processing add an additional level of variability. Taken together, the complexities in mRNA metabolism and experimental data handling strongly suggest that there is not a single transcriptome for a given cell or tissue but implies an arbitrary number of individual transcriptomes which need to be defined by a series of parameters such as age, gender, ethnicity, cause and time of death of the tissue donor besides many others. It is therefore advisable to initially aim for a reference transcriptome providing a blueprint of an expression profile within a broadly defined time-frame. Following this line of reasoning, we here present a framework of a first reference transcriptome of the retina/RPE consisting of 13,037 unique transcripts which broadly characterize the mature state of expression in this tissue.
The present meta-analysis has integrated information from 27 studies employing diverse technologies to identify retinal/RPE transcrips. Among these, SAGE represents a sensitive tool to detect low level transcription  while the PCR-based SSH method is well suited to enrich for differentially expressed genes . The combined use of these approaches together with conventional cDNA library sequencing and microarray-based techniques provides a more solid assessment of gene expression than would each method alone. For example, SAGE is based on sequencing of hundreds of thousands of short (10, 14, or 21 bp) tags, ideally derived from a unique location of a single transcript. Rare tags could originate from infrequently expressed transcripts but could also reflect minor genomic contamination or minor sequencing errors. For the assembly of the reference retinome we have addressed these concerns by including only those transcripts that have independently been confirmed in a second unrelated study. This has led to a conservative assembly of the 13K retinome. It should be kept in mind however that this proceeding likely excludes a number of authentic transcripts. This is illustrated by the finding that the 15K retinome which comprises 15,645 transcripts including those which were solely found in a single study (Table 2), contains an additional five of the 102 known retinal disease genes (RHOK, MTATP6, CHM, LRAT, RIMS1) not included in the 13K retinome. Similarly, an additional three genes (RHOK, LRAT, GPRK7) involved in the vitamin A/phototransduction pathway are part of the 15K but not the 13K retinome. With additional transcription data on the retina/RPE becoming available, a second generation retinome map will need to address this issue.
The estimation of transcriptome size represents one of the fundamental questions in molecular biology. Early studies using reassociation kinetics have calculated the number of distinct mRNA transcripts present in various mouse tissues to be between 11,500 and 12,500 . Initial SAGE analyses have led to the conclusion that the number of different transcripts observed in normal and tumorous tissue may lie between 14,247 and 20,471 . Recent data from comprehensive EST sequencing of a number of tissues including brain, breast, colon, head/neck, kidney lung, ovary, prostate, and uterus suggest expression of between 7,500 and 13,500 distinct genes for each tissue . Although the size of the reference retinome is consistent with these estimates, the question of adequate transcript representation by the current compilation remains open. We have addressed this by defining a number of gene groups with known expression in retina/RPE and comparing these to the reference retinome. Genes exclusively expressed in retina/RPE are highly represented in the retinome (100%), as are mainly tissue-specific genes known to play a role in the vitamin A/phototransduction pathway (93%) (Table 3). A partial list of 260 genes whose encoded proteins were shown by immunohistochemistry to be expressed in the retina/RPE (but may also be present in other tissues), were represented in the reference retinome at a rate of approximately 79%. Similar numbers were obtained for the retinome coverage of retinal disease genes (85%). From these data we conclude that the 13K reference retinome is highly representative of retina/RPE-expressed genes and may describe as much as 90% of the transcript complement in the adult state.
Another point of interest concerns the proportion of retinome transcripts which is uniquely expressed in this tissue. Brentani et al.  estimate that any two tissues may share between 73% and 84% of their transcriptomes. Comparing transcription in three tissues (breast, colon, head/neck) the authors found overlapping expression in 47% of transcripts. To investigate this in more detail, we have compiled three partial transcriptomes from heart (n = 3,660), liver (n = 5,780) and prostate (n = 7,018) by applying the same stringent criteria as defined for the retinome. Limited by the size of the partial heart transcriptome, we determined 2,330 transcripts (termed "housekeeping" genes) to be expressed in all four tissues (i.e. 64% of the heart transcriptome). Comparing the retinome to any of the partial transcriptomes revealed overlapping gene profiles between 92 % and 95 %. This would suggest that only a minor proportion of retinome transcripts is indeed unique to the retina/RPE. Thus far, we have identified a group of so called "retinome-enriched" genes comprising 5,051 transcripts which are not present in the partial transcriptomes of heart, liver and prostate. This group most likely contains additional "housekeeping" or tissue-restricted transcripts and needs further adjustment by more refined in-silico normalization to comprehensive reference transcriptomes of other tissues.
Highly expressed genes including those with a ubiquitous or a tissue-specific transcription profile, have been shown to cluster in chromosomal regions of increased gene expression (termed RIDGEs) [55, 56]. Functionally, this higher order structure has been related to transcriptional regulation [56, 57]. To search for a possible correlation, we have determined the chromosomal distribution of the reference retinome independent of gene density. Our data show good agreement with the previously established regional expression map defining approximately 30 RIDGEs within the human genome. Overlaps are most evident for chromosomes 6, 9, 11, 17, and 19. From this we conclude that the majority of transcripts assembled in the reference retinome share characteristics of the RIDGEs including moderate to high level expression. This finding may be ascribed to the stringent selection criteria we have applied to assemble the reference retinome by excluding all transcripts (n = 2,608) that were reported in only a single study. Conversely, the RIDGE-like pattern of the reference retinome could be an indication that missing transcripts may have features compatible with chromosomal domains defined as anti-RIDGEs . As opposed to RIDGEs, clustering of genes in anti-RIDGEs seems associated with significant decreased expression . In contrast to their fractional occurrence in transcriptomes, the identification of such low abundant transcripts are likely to require significant resources in order to compile more complete transcriptomes.
To provide positional candidates for retinal disease genes, we have mapped the transcripts representing the reference retinome to the minimal regions defined for 42 retinal disease loci with as yet undefined gene mutations. To further limit the number of candidate genes, in particular for loosely defined disease loci such as RP28 or VRNI, we have similarly integrated the "retinome-enriched" transcripts. This also accommodates for the fact that approximately 50% of retinal disease genes are retina/RPE-specific . For 41 of 42 unknown disease genes we have now identified strong candidates although for some disease loci including AIED, COD4, CORD8, CRV, LCA5, RP28, and USH2B, the number of candidates may still exceed capacities of most laboratories for direct analysis. For other disease loci (e.g. BCD, BBS3, COD2, CYMD, MCDR4, OPA4, PRD, RNANC, RP24, RP29, RP6, WFS2 and WGN1), a restricted number of candidates are now available (see additional File 14).
We here present a first near-complete transcriptome of a defined tissue, the retinome, which may serve as a reference for further efforts to establish spatial, i.e. cell-specific, and developmental transcriptomes of the retina/RPE. A fundamental aspect of the current study was to integrate the available information on gene identification generated by a wide range of techniques. This ensures robustness and reliability of transcript data providing a stringent framework for further expression studies in systems biology. A similar approach for other tissues/cells would be advisable as this may greatly facilitate in-silico identification of tissue-specific genes to elucidate functional pathways vital for a defined cell population. In addition, the reference retinome may prove valuable for providing strong candidates for hereditary as well as genetically complex diseases and thus may help to further our understanding of retinal biology in health and disease.
To assemble a list of genes expressed in the adult mammalian retina and RPE we reviewed 27 studies reporting raw or processed transcript data derived from several mammalian species including H. sapiens, B. taurus, C. familiaris and M. musculus (Table 1). The data were generated by cDNA library sequencing, microarray studies, and SAGE. Publically available data analysing transcripts from adult mammalian retina/RPE tissues published until December 2003 were included. Excluded were studies investigating transcription in retina/RPE by using RNA sources such as fetal tissues, cell lines or non-mammalian species. Gene identifiers such as GenBank accession number, gene nomenclature symbol, gene description, UniGene cluster ID, cDNA sequences or tags were available from sources as detailed in additional File 15 and were used to retrieve the unique human LocusLink ID for each gene (as of December, 2003). Only genes with established LocusLink ID were included in the present study. For SAGE data, tag-to-gene assignment was done by querying the SAGEmap_tag_ug-rel dataset [59, 60]. Tags assigned to multiple genes were excluded from further analysis. Human orthologous genes were established via the NCBI-curated homology database or by BLAST sequence comparison .
To assemble partial transcriptomes of heart, liver and prostate, for each tissue data were mined from at least one SAGE library, in addition to expressed sequence tag (EST) sources (see additional File 16). Similar to the criteria for the assembly of the retinome, genes identified in only one study were disregarded. EST retrieval was facilitated by use of the Gene Library Summarizer  which retrieves the known genes represented by at least one EST and generated from a tissue sample with normal histology.
Partial lists of genes known to play a role in the retina and/or the RPE were assembled from the literature (see additional Files 5, 6, 7 and 8). Additional File 5 summarizes genes known to be exclusively expressed in retina and/or RPE, while additional File 6 includes genes involved in the phototransduction cascade and the vitamin A cycle. Additional File 7 is a partial compilation of genes/proteins verified by immunohistochemistry to be present in adult mammalian retina and/or RPE. A list of 102 genes involved in retinal diseases was retrieved from the RetNet database, January 2004  (see additional File 8).
A total of 43,109 human non-redundant syntenic gene predictions (SGP) were retrieved (as of December 2003) and chromosomally mapped to the reference sequence of the human genome (July 2003) utilizing the USCS Genome Table Browser . Based on the position of their putative transcription start sites, the SGPs were assigned to 5 Mb bins along the human chromosomes. In addition, one-megabase bins were defined for refined analysis of chromosome 6 and 19 (Fig. 1b). Similarly, the chromosomal map positions of the retinome transcripts were determined by querying the USCS Genome Table Browser with the respective LocusLink, UniGene or RefSeq IDs.
Mapped loci of retinal dystrophies with unknown genetic basis (n = 45) were taken from RetNet, January 2004  and placed on the human genome sequence by querying the USCS Genome Table Browser with DNA marker sequences shown to flank the minimal candidate region. Three disease loci (CORD1, CORD4 and RCD1) are insufficiently mapped on the respective human chromosomes and were therefore not included in the analysis.
To determine if either of the two datasets, the 43,109 human non-redundant SGPs and the 13K retinome transcripts, is distributed in a non-parametric and distribution free manner over the genome, the Kolmogorov-Smirnov Goodness-of-Fit Test was used . Statistical significance of the median difference in paired chromosomal distribution of retinome transcripts versus the SGPs was then evaluated by the non-parametric Wilcoxon two-sample paired signed rank test . To carry out the test we calculated the difference between all genes versus retinal genes per 5-Mb bin. To correct for the total number of genes within the two groups, the SGPs per bin were adjusted by a factor of 13,037/43,109 = 0.30. Mean and median values per bin were 21.05 and 10.93 for all genes and 21.26 and 19.93 for retinal genes, respectively.
This work was supported by grants from the Deutsche Forschungsgemeinschaft (DFG) (We1259/14-2 and 14-3) and the Bundesministerium für Bildung und Forschung (BMBF) (01KW9921/0).
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.