In silico characterisation and chromosomal localisation of human RRH (peropsin) – implications for opsin evolution

Background The vertebrate opsins are proteins which utilise a retinaldehyde chromophore in their photosensory or photoisomerase roles in the visual/irradiance detection cycle. The majority of the opsins, such as rod and cone opsins, have a very highly conserved gene structure suggesting a common lineage. Exceptions to this are RGR-opsin and melanopsin, whose genes have very different intron insertion positions. The gene structure of another opsin, peropsin (retinal pigment epithelium-derived rhodopsin homologue, RRH) is unknown. Results By in silico analysis of the GenBank database we have determined that the human RRH comprises 7 exons spanning approximately 16.5 kb and is localised to chromosome 4q25 in the following gene sequence: cen-EGF-RRH-IF-qter – a position that excludes this gene as a candidate for the RP29 autosomal recessive retinitis pigmentosa locus. A comparison of opsin gene structures reveals that RRH and RGR share two common intron (introns 1 and 4) insertion positions which may reflect a shared ancestral gene. Conclusion The opsins comprise a diverse group of genes which appear to have arisen from three different lineages. These lineages comprise the "classical opsin superfamily" which includes the rod and cone opsins, pinopsin, VA-opsin, parapinopsin and encephalopsin; the RRH and RGR group; and the melanopsin line. A common lineage for RRH and RGR, together with their sites of expression in the RPE, indicates that peropsin may act as a retinal isomerase.


Background
The photosensory opsins are a family of membrane bound, heptahelical G-protein coupled receptors (GPCRs) characterised by their ability to covalently bind a vitamin A-based retinaldehyde chromophore via a Schiff base to a lysine residue located in the 7th transmembrane α-helix [1,2]. In the natural environment, the vertebrate vitamin A based chromophore takes the form of either 11cis-retinal (A 1 ) or 11-cis-3,4-didehydroretinal (A 2 ) [3]. In the vertebrate retina, the primary events in image detection occur in the outer segments of rod and cone cells, with the opsins located in these lamellar stacks being named after these cell types -rod-opsin in rods and coneopsins in cones. Absorption of a photon of light by the chromophore located in the retinal binding pocket of an opsin causes its photoisomerisation from the 11-cis to an all-trans conformation. This change of the chromophore induces a conformational change of its surrounding opsin molecule allowing transducin (G-protein) binding and activation of the phototransduction cascade [1,2]. The vertebrate retina generally contains a single rod opsin class and as many as four different cone opsins [4], though the majority of mammals are dichromats being in possession of only two types of cone opsin. In the nonmammals, non-rod, non-cone photosensory opsins have been described. For example, pinopsin (P-opsin) was isolated from the avian pineal [5] and vertebrate ancient opsin (VA-opsin) was isolated from the teleost inner retina and pineal/deep brain structures [6]. Other opsins have also been described from a variety of vertebrate classes which either act as a photoisomerase, e.g. RGR-opsin [7]; or have an unknown function, e.g. peropsin [8], parapinopsin [9] and encephalopsin [10]; or are expressed in photoreceptive cells but have as yet an unknown function, e.g. melanopsin [11]. The one key feature present in all the opsin classes is the presence of a lysine residue in the 7 th transmembrane α-helix which is thought to enable retinal attachment in all cases [12].
The first complete vertebrate opsin sequence (bovine rod opsin) was described in 1983 [13], and this opsin has since become the numeration template for critical residues that are conserved throughout all of the opsins. The genomic structure of bovine rod opsin consists of five exons separated by four introns [13], a structure that has been conserved in all vertebrate rod-opsins including those of the ancient Agnatha [14], with the single exception of the rod opsin in teleost fish which is intronless [15]. Employing the nomenclature proposed by Hunt et al. [16] which correlates the spectral sensitivity (λ max ) with amino acid sequence identity, the first cone-opsin sequences isolated were those of the human blue (ultraviolet/violet sensitive, UVS/VS) (OPN1SW), green and red cone opsins (both longwave sensitive, LWS) (OPN1MW, OPN1LW) [17]. The UVS/VS opsin gene shares the same genomic structure as the rod-opsin gene in that is possesses four conserved intron insertion positions. The LWS opsin also shares the four conserved introns with the rodand UVS/VS cone-opsin, but has an additional fifth intron in a 5' prime position to the four conserved introns [17], which we have termed intron 0 (zero) [12]. Sequences from the remaining two visual opsin classes of the vertebrates, shortwave sensitive (SWS) and middlewave sensitive (MWS), have been isolated and share the same genomic structure as the rod and UVS/VS cone opsins [18]. Interestingly, the SWS and MWS opsin families found in birds, fish, reptiles and amphibia are absent in the mammalian lineage -a feature thought to be related to the "nocturnal bottleneck" occupied by mammals during their evolution resulting in the loss of these two cone classes [19,20].
Many of the more recently described non-rod, non-cone opsins also share a relatively high degree of intron position conservation with the visual opsins, only differing in the position of intron 2 as in pinopsin where it is shifted 14 nucleotides in a 3' direction [21] and VA-opsin where it is shifted 42 nucleotides in a 3' direction [22], or its absence as in parapinopsin [9] and encephalopsin (OPN3) [23]. However, RGR-opsin (retinal G-protein coupled receptor, RGR) [24] and melanopsin (OPN4) [25] do not share intron insertion positions with any other opsins for which a gene structure is known, whilst the gene structure of peropsin is presently unknown.
Little is known about peropsin (retinal pigment epithelium-derived rhodopsin homologue, RRH) except that it shares ~26% amino acid identity with the photosensory opsins, is uniquely expressed in the retinal pigment epithelium (RPE), and by synteny with mouse will map to human chromosome 4q [8]. To address this issue, we have determined by in silico database searches that the human peropsin gene (RRH) has seven exons and maps to chromosome 4q25. A comparison with the gene structures of other opsins indicates that RRH and RGR share two intron insertion sites. These findings and their phylogenetic relationships are discussed within the context of the possible function of peropsin. In view of the variety of nomenclature systems that are applied to the various opsin classes [2,12], we have attempted to be as consistent as possible with nomenclature and have deferred to preferred locus symbols accepted by the Gene Nomenclature Committee of The Human Genome Organisation, HUGO -http://www.hugo-international.org/hugo/ and Online Mendelian Inheritance in Man, OMIM -http://www.ncbi.nlm.nih.gov/omim/. Thus, RRH and peropsin should be considered synonymous.

Structure of the RRH gene
We have used the 1374 base pairs (bp) of the RRH (peropsin) cDNA sequence (GenBank: AF012270) that has been previously described [8] to conduct an online BLAST search of the GenBank nr database. The RRH cDNA matched 7 non-contiguous regions in the Plus/Plus strand orientation of the 118642 bp human BAC clone RP11-602N24 (GenBank: AC126283) at a level of 99-100% identity with highly significant Expect-values (E-value)the closer the value to "0" the more "significant" the match [26]. Subsequently, a BLAST 2 Sequences comparison of RP11-602N24 with the RRH cDNA indicated that the RRH cDNA is contained within 7 putative exons. Using these putative exons as a model, a sequence alignment was produced such that each intron concurred with the GT/AG intron donor/acceptor site rule [27] (Figure 1). Exon 1 contains the bases 17-156 of the RRH cDNA, which are equivalent to 34 bp of 5' untranslated region (UTR) and the initial 106 bp of coding sequence. We were unable to align the first 16 bp of the RRH cDNA to RP11-602N24 and have concluded that these 16 bp are a cDNA cloning artifact since the breakdown of sequence similarity does not occur at an apparent intron acceptor site in the genomic sequence. Exons 2-6 contain the next 793 bp of coding sequence, whilst exon 7 contains the final 115 bp of coding sequence and the 3'UTR. No nucleotide substitutions were observed between the published coding sequence and that derived from RP11-602N24, but three separate nucleotide substitutions were observed in the 3'UTR. The entire exon/intron structure encompasses approximately 16.5 kilo base pairs (kb) and has been ascribed GenBank accession number BK000958.

Chromosomal localisation of RRH
In order to map RRH, we subjected the sequence of RP11-602N24 to a BLAST search and found that it contains the I factor (complement) gene, IF, which has been shown to map to 4q25 [28]. Localisation of RP11-602N24 and hence RRH to 4q25 is consistent with the syntenic prediction from the mapping data for mouse Rrh [8]. To further refine the localisation of RRH, a BLAST search using a terminal 2 kb of sequence (bases 116643-118642) of RP11-602N24 identified BAC clone B200N5 which maps to 4q25 and contains approximately 143 kb of sequence (GenBank: AC005509). A BLAST search using B200N5 indicated that this clone contains the epidermal growth factor gene, EGF, which is also known to map to 4q25 in the order cen-EGF-IF-qter [28]. We assembled a mini-contig consisting of approximately 260 kb of sequence from clones RP11-602N24 and B200N5 and have been able to determine the following gene order: cen-EGF~85 kb~RRH~26 kb~IF-qter, and their directions of transcription ( Figure 2). Localisation of RRH to an interval of approximately 110 kb between EGF and IF on chromosome 4q25 excludes this gene as a causative candidate for the autosomal recessive retinitis pigmentosa locus (RP29) located on 4q32-q34 [29].

Comparison of opsin gene structures
To determine whether any of the intron insertion sites in RRH are conserved with those of other opsins, nucleotide sequences of representatives of the various opsin classes were aligned and intron positions marked ( Figure 3). RRH shares two common intron insertion sites (1 and 4) with RGR, whilst the other four intron insertion sites appear novel amongst the opsin families represented. RRH and RGR share two further introns that are present in relatively close proximity. Intron 3 of RRH is located 9 bp in a 3' direction of intron 3 in RGR (both introns being in phase +1), whilst intron 5 in RRH is located 3 bp (1 codon) in a 3' direction of the corresponding intron in RGR (both introns being in phase 0). Between RRH and members of the other opsin families only two intron positions are in relatively close proximity -intron 6 in RRH is located 1 bp 5' of intron 4 in the rod and cone opsins, and intron 2 of RRH (phase 0) is 8 bp 3' of intron 3 (phase +1) of OPN4.
From the perspective of intron positions, it would appear that the opsins have arisen from three ancestral genes (Figure 4). Considering the clade (81% bootstrap confidence) which includes the visual opsins (rod and cones), brain (pinopsin, VA-opsin, parapinopsin) and encephalopsin, it can be seen that three intron positions (introns 1, 3 and 4) are perfectly conserved throughout (Figures 3 and 4).  (Figure 4). Given the organisation of the lamprey (Agnatha) rod opsin gene [14] and pinopsin gene [30] (which has an intron arrangement of the VA-opsin family rather than that of the pinopsin family [22]), these highly conserved gene structures have been in existence for at least 550 million years. Further support for an ancestral chordate opsin gene with these three conserved introns is provided by an opsin from the sea squirt Ciona intestinalis. This urochordate possesses Ci-opsin1 whose gene has seven introns, three of which are in positions conserved with those in vertebrate visual opsins [31], corresponding to positions 1, 3 and 4 [12]. Given their common lineage, we have chosen to term this collection of opsins the "classical opsin superfamily".
RRH and RGR form a separate clade (87% bootstrap confidence) and share two conserved intron positions with each other, 1 and 4, but none with classical opsin superfamily or melanopsin (Figures 3 and 4). This may be indicative that RRH and RGR derive from the same ancestral gene which possessed introns at positions equivalent to positions 1 and 4 in RRH and RGR, even though they only share ~25% amino acid identity across the transmem-brane domains compared with ≥ 40% identity exhibited between the majority of the classical opsin superfamily [12]. An exception being encephalopsin which shares 30% identity with other members of the classical opsin superfamily [12], but clearly has a highly conserved gene structure. The phenomenon of intron sliding or slippage [32], which has been proposed as a mechanism contributing to the slight variations in position of some introns observed between some insect opsins [33], may explain the 1 bp shift between intron 6 of RRH and intron 4 of the classical opsin superfamily, but their proximity may also be due to chance. Whilst single nucleotide intron slippage is an evolutionary phenomenon thought to occur in <5% of introns [34], these two opsin groups do not appear to have a recent common ancestor which probably indicates a random insertion of introns. Similarly, the close proximity of introns 2 and 3 of RRH with introns 3 of OPN4 and RGR respectively are likely also to be chance occurrences given that intron insertion and deletion events are much more common than intron sliding events [32,35,36].
OPN4 (melanopsin) shares no conserved intron positions with any of the other opsin classes, and it has been proposed that this opsin class represents a separate line of opsin evolution [12,37]. Collectively the data suggest that three separate lines of evolution have occurred to form the present day vertebrate opsins represented by the structurally highly conserved classical opsin superfamily, the Human RRH coding sequence and conceptual translation (peropsin) with transmembrane domains (boldface) as predicted by the rod opsin model of Palczewski et al. [48]. The six intron insertion sites in the RRH gene defined in Figure 1 are indicated by black filled circles. Equivalent intron insertion sites for the visual (rod and cone) opsins are indicated by red filled circlesintron positions 1-4 are common to all rod (RHO) and cone opsins, whilst intron 0 is only found in the LWS opsins (OPN1MW, OPN1LW) [13,17]. The shifted second intron of the pinopsin and VA-opsin families [21,22] are indicated as 2a and 2b respectively in red circles. Parapineal opsin [9] and encephalopsin (OPN3) [10] possess only introns 1, 3 and 4 of the rod and cone opsins. Equivalent intron insertion sites for RGR-opsin (RGR) [24] are indicated by open circles, whilst those for melanopsin (OPN4) [25] are indicated by yellow filled circles. Only intron insertion sites 1-7 for melanopsin are indicated due to the extreme length of the melanopsin C-terminus which makes positioning of introns 8 and 9 inaccurate since there is no overlap with any other opsin. Note that intron insertion sites 1 and 4 in RRH are equivalent to intron insertion sites 1 and 4 in RGR. more relaxed RRH and RGR grouping, and OPN4 which stands as the single member of the melanopsin class to date.
An ancestral link between RRH and RGR may provide an insight at to the potential function of peropsin. RGRopsin is known to act as a photoisomerase in the reconversion of retinal chromophore from the all-trans to the 11cis conformation [38,39], and perhaps peropsin also functions as an isomerase in some manner given its localisation in the RPE. This suggestion is supported by recent findings that indicate that an amphioxus (Branchiostoma belcheri) homologue of peropsin acts as an all-trans to 11cis photoisomerase [40]. Alternatively, peropsin may act as a thermal isomerase since Rgr -/mice are able to convert all-trans-retinal to the 11-cis conformation under dark adaptation [7]. It has been suggested that peropsin may also bind a different retinal isomer, or indeed a non-retinoid ligand [8]. We have recently shown that the non-rod, noncone opsins such as RGR-opsin, peropsin, melanopsin and encephalopsin are all expressed early in the embryonic development of the mammalian eye. These opsins are expressed by E11.5 in mice and by 8.6 weeks in humans [41], a finding which in all cases predates the expression of the visual rod and cone opsins [42]. This early expression pattern may be supportive of a role in retinoid metabolism for these opsins in the developing retina [43], or that these opsins have a role in the embryonic eye that are quite different to those in the adult eye.

Conclusions
Using GenBank database searches we have been able to determine that the human peropsin gene, RRH, comprises seven exons spanning about 16.5 kb. By assembling sequences from large genomic clones we have determined that RRH localises within an interval of approximately 110 kb between EGF and IF on chromosome 4q25. Of the six introns present in RRH, two are located in positions conserved with introns 1 and 4 of RGR, which given the phylogenetic relationship of RRH and RGR may suggest that these two opsins have arisen from a common ancestor, and by inference possess a common function e.g. acting as a retinal chromophore (photo)isomerase. Our data also strengthens the argument that the present day opsins are represented by genes that have arisen from three different ancestral genes that have given rise to (1), a classical opsin superfamily cosisting of visual opsins and those opsins who share a highly conserved gene structure with the visual opsins; (2), a peropsin and RGR-opsin family; (3), a melanopsin family.

Database Searches
The GenBank database was screened using the online BLAST [26] server -http://www.ncbi.nlm.nih.gov/BLAST/ . Searches were carried out using the standard nucleotidenucleotide BLAST (blastn) option against the "nr database" using default values with the low complexity filter off. Subsequent sequence manipulations utilised the online BLAST 2 Sequences [44] server -http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html and MacVector 7.0 (Accelrys Ltd., Cambridge, UK).  [45] -http://aix1.uottawa.ca/~xxia/software/ software.htm, and converted to amino acid sequences which were then aligned using the ClustalW [46] option. In order to maintain codon integrity the nucleotide sequences were then aligned to the amino acid alignment. Phylogenetic analyses were conducted using MEGA version 2.1 [47] -http://www.megasoftware.net. A maximum parsimony tree with branch confidence values based on 500 bootstrap replicates was constructed and the tree drawn in TreeViewPPC version 1.6.6 -http://taxonomy.zoology.gla.ac.uk/rod/treeview.html. Branches with less than 40% support were collapsed.