Structure and expression pattern of Oct4 gene are conserved in vole Microtus rossiaemeridionalis

Background Oct4 is a POU-domain transcriptional factor which is essential for maintaining pluripotency in several mammalian species. The mouse, human, and bovine Oct4 orthologs display a high conservation of nucleotide sequence and genomic organization. Results Here we report an isolation of a common vole (Microtus rossiaemeridionalis) Oct4 ortholog. Organization and exon-intron structure of vole Oct4 gene are similar to the gene organization in other mammalian species. It consists of five exons and a regulatory region including the minimal promoter, proximal and distal enhancers. Promoter and regulatory regions of the vole Oct4 gene also display a high similarity to the corresponding regions of Oct4 in other mammalian species, and are active during the transient transfection within luciferase reporter constructs into mouse P19 embryonic carcinoma cells and TG-2a embryonic stem cells. The vole Oct4 gene expression is detectable starting from the morula stage and until day 17 of embryonic development. Conclusion Genomic organization of this gene and its intron-exon structure in vole are identical to those in all previously studied species: it comprises five exons and the regulatory region containing several conserved elements. The activity of the Oct4 gene in vole, as well as in mouse, is confined only to pluripotent cells.


Background
The transcription factor Oct4, known also as Oct3 and Oct3/4, belongs to class V of the POU (Pit, Oct, Unc) transcription factor family. The POU family includes transcription factors containing the POU domain and regulating transcription via binding to an octamer motif located in the promoter or enhancer regions of target genes [1,2].
An important role of Oct4 factor in sustaining pluripotency of preimplantation embryonic cells and mouse embryonic stem cells (ES cells) has been convincingly demonstrated using the directed mutagenesis and RNA interference [3][4][5]. Embryos homozygous for Oct4 gene mutation die during implantation because of the inability to form the inner cell mass (ICM) of the blastocyst [3]. Suppression of Oct4 expression in ES cells by RNA inter-ference induces cell differentiation into trophectodermal derivatives [4].
Oct4 gene expression in preimplantation mouse embryo is confined exclusively to pluripotent ICM cells [9]. After the implantation, Oct4 expression in somatic tissues reduces and remains only in primordial germ cells. In addition, expression of Oct4 is characteristic of embryonic stem (ES) cell lines, embryonic carcinoma (EC) cells, and embryonic germ (EG) cell lines [2,10]. Unlike mouse, the Oct4 expression in human, cattle, and swine preimplantation embryos is not restricted to ICM, but is also observed in trophectoderm cells [11][12][13].
Oct4 expression is regulated at the transcriptional level by cis-regulatory elements located in the 5' region of the gene [14][15][16]. Analysis of LacZ reporter gene expression, controlled by a fragment of the mouse Oct4 genomic locus, revealed two elements essential for regulating the cell-specific expression of Oct4 gene. These elements were named as proximal and distal enhancers. The studies have shown that distal enhancer is active in ICM, ES, EG and primordial germ cells, whereas the proximal enhancer is needed to activate Oct4 gene in embryonic ectoderm and mouse EC cells [15]. So far the structure, expression and regulation of the Oct4 gene have been studied comprehensively in mouse, human, and cattle [7,17,18]. In this work, a genomic copy and cDNA of the vole Oct4 gene were isolated and cloned as well as its exon-intron structure and expression were studied. Nucleotide sequences of the coding and regulatory regions of vole Oct4 gene were compared to the corresponding sequences of six mammalian species, including mouse, rat, human, chimpanzee, cattle, and dog. Use of luciferase reporter constructs allowed demonstrated that individual elements of the regulatory region of vole Oct4 gene were functionally active in mouse pluripotent cells. This fact suggests a high conservation of the system regulating this gene in mammals. A comprehensive study of the genes involved in maintaining pluripotency such as Oct4 in voles will make it possible to obtain new information about the species-specific features of their structure, expression, and regulation, and thereby enhancing optimization of the experiments on obtaining of vole ES cells.

Results
Nucleotide sequence, exon-intron structure, and expression of M. rossiaemeridionalis Oct4 gene Clone containing genomic sequence of vole Oct4 was isolated via screening of the M. rossiaemeridionalis genomic phage library. Overall, 10315 bp were identified, where five exons and the regulatory region of Oct4 gene were conditionally mapped based on the homology to mouse Oct4 gene sequence. The coding region contains no stop codons, and canonical splice sites are present at the putative exon-intron boundaries. The minimal promoter, as well as proximal and distal enhancers were identified within the regulatory region on the basis of comparative sequence analysis. Repeated DNA sequences were found in introns and in the 5' and 3' regions. These repeats are represented mainly by SINE elements (Figure 1). The sequences located 3' to exon 5 of vole Oct4 gene, which were sequenced only partially, displayed a similarity to the major histocompatibility complex.
To verify the exon-intron structure of vole Oct4 gene, 3'and 5'-RACE experiments were performed. Preimplantation embryos of M. rossiaemeridionalis were used as a source of RNA. Comparison of cDNA clones obtained by RACE with the genomic sequence confirmed that vole Oct4 gene contained five exons corresponding to the Oct4 exons of other mammalian species studied ( Figure 1). The Oct4 genes organization in mouse (M) and vole (V) Figure 1 Oct4 genes organization in mouse (M) and vole (V). 3'-RACE allowed us to detect the expression of vole Oct4 gene during preimplantation development at the morula and blastocyst stages (2.5-4.5 dpc) and in genital ridges of postimplantation stages from 8 dpc up to 17 dpc ( Figure  2). Transcription of Oct4 gene was undetectable in the organs of adult voles (the liver, kidneys, spleen, intestines, and testicles), trophoblast stem cells, and extraembryonic endoderm cells (data not shown).

Analysis of the activity of vole Oct4 promoter
To analyse the role of putative regulatory elements of vole Oct4 gene, we have used a luciferase reporter assay ( Figure  3). Pluripotent ES cell lines have not been derived for vole species, therefore we transiently transfected luciferase reporter vectors containing various elements of vole Oct4 putative regulatory region into mouse ES cell line TG-2a, mouse EC cells P19, and vole trophoblast stem cells R1. An empty vector pGL2-Basic has been used as a negative control. In pluripotent mouse TG-2a cells, pDEH6 plasmid, containing promoter and both enhancers (-2186/ +106) in a direct orientation with respect to luciferase gene, displayed the highest luciferase activity, which exceeded the background level (compare to luciferase activity of empty vector pGL2-Basic) approximately sixfold ( Figure 3). Similar, if not even more pronounced effect was observed when the same construct was transfected into mouse EC cells. Deletion of the distal enhancer (p DE1, -1895/+106) almost completely abolishes the promoter activity in TG-2a cells, however it does not have any effect on the activity in P19 EC cells that demonstrate equally high luciferase activity as a complete construct. This observation is in line with the data obtained for mouse which indicate the crucial role of distal enhancer for Oct4 activity in ES cells [15].
Deletion of proximal enhancer (PE 1A, pDEH6K4) or both distal and proximal ehancers (pDEH6S7) reduce the promoter activity 2-3 folds, demonstrating that the vole Oct4 promoter is capable of promoter activity in pluripotent cells by itself, but at significantly lower level ( Figure  3). The lowest activity comparable to the background was observed in the case of constructs pDEH7 and pDE2. The promoter and proximal enhancer in clone pDEH7 were in the opposite orientation to the gene, and this region (-1895/+106) was deleted in clone pDE2. Consistent with the lack of Oct4 expression in trophoblast cells, no luciferase activity was detected when constructs were transfected into vole TS cell line R1. The data obtained in this experiment indicate that, similar to mouse, vole Oct4 regulatory elements are necessary for efficient and cell type specific expression of the gene.

Comparative sequence analysis
Vole Oct4 nucleotide sequence was aligned with Oct4 sequences of mouse, rat, human, chimpanzee, cattle, and dog, obtained from GenBank ( Figure 4 and Table 1). Overall nucleotide sequence homology varies between 81% (vole v. dog) and 89% (vole v. rat), however the homology is much higher for the exons 2 and 3, encoding for DNA-binding POU domain in all pairs of species compared ( Table 1). The homology is lower (55-70%) in intronic as well as 5' and 3' UTR regions, especially where species-specific mobile elements are localized ( Figure 4).
Comparison of the Oct4 amino acid sequences demonstrates relatively high similarity level for the POU-specific domain for all species under study (encoded by exons 2 and 3, see Figure 1 and Table 1). Surprisingly, the highest homology is observed between vole and cattle (95%) and the lowest, between vole and mouse (91%). Thus, the homology between the amino acid sequences of POU domain from evolutionarily more distant species (volecattle, vole-human, vole-chimpanzee, and vole-dog) appeared higher as compared to the pairs of evolutionarily related species belonging to the same order (volemouse, vole-rat). When comparing the total homology of Oct4 proteins, we found that vole and rat are the most similar (87%). It is of considerable interest that mouse Oct4 protein demonstrates the least amino acid homology with closely related vole (81%), as well as with all other species studied (data not shown). Such a low similarity between mouse and vole Oct4 proteins contrasts with much higher nucleotide homology (87%) between Oct4 genes in these species.
Next we compared the regions important for Oct4 gene regulation and function ( Figure 5). Previously it was demonstrated that three motifs (GGGAGGG in the proximal enhancer, CCCTCCC the distal enhancer, and GGGGGCGGGG in the minimal promoter) represent transcription factor binding sites in mouse ES and EC cells [19]. Treatment of undifferentiated ES and EC cells with retinoic acid causes loss of transcription factor binding to these motifs, thereby suppressing Oct4 gene expression and leading to cell differentiation. Therefore we have analysed the conservation of these three motifs as well as several other motifs located in the regulatory region.
The homology between the minimal promoters varies between 83-98% in different mammalian species. In vole and other mammals, the GGGGGCGGGG motif (positions -126/-117 in vole) within the minimal promoter is the most conserved (Fig. 5C). It is a potential transcription factor binding site for the factors of Sp family, partially overlaps the hormone responsive element (HRE) (positions -119/-101 in vole), is sensitive to retinoic acid, and is recognized by some nuclear receptors [8,14,15,19,20] ( This results on formation of GGGAGGG site, which is also potentially functional ( Figure 5A, B and Table 2).
Five so-called E boxes, having a 5'-CANNTG-3' consensus and being potential binding sites for transcription factor Mash-2 [21] were detected in the 5' region of vole Oct4 gene (-2547/-1). The most conserved E box is located at positions -332/-327 and -978/-973 of the vole Oct4 gene regulatory region; it retains the consensus in the four species compared (no data are available for dog). The E box at positions -978/-973 is identical to human, chimpanzee, and bovine sequences, but differs in mouse and rat. The E box at positions -479/-474 displays a 100% homology to human and chimpanzee sequences and is absent in mouse and rat sequences. Potential binding sites for transcription factor Pem, with an ATTA consensus, were In the 5' region of vole Oct4 gene, an autoregulatory composite site was detected, containing potential binding sites for transcription factors Oct4 (5'-AGATGCAT-3') and Sox2 (5'-GACAAAG-3'). Electrophoretic mobility shift assays and chromatin immunoprecipitation experiments indicate that transcription factors Oct4 and Sox2 bind directly to the composite Oct4/Sox2 elements in Oct4 and Sox2 genes in mouse and human ES cells [22,23]. In vole, this site is located at positions -2081/-2067 and displays a 87-93% homology to the sequences found in regulatory regions of other mammalian species ( Figure 5A and Table  2).
The sequences located upstream of the Oct4 minimal promoter are not conserved; their homology reflects the degree of evolutionary relationship of the species compared. Nevertheless, the short sequences corresponding to site 2A of the distal enhancer and sites 1A and 1B of the proximal enhancer are distinctly detectable even in the evolutionally distant species. The homology between vole and the species taken for comparison varies from 81 to 94% in the region of site 2A of the distal enhancer; in the region of proximal enhancer site 1A, from 68 to 89%; and in the region of proximal enhancer site 1B, from 88 to 96% (Table 2).

Discussion
In this work, an ortholog of the vole M. rossiaemeridionalis gene encoding transcription factor Oct4, which is involved in sustaining the pluripotent cell state at the early stages of embryogenesis, has been isolated and studied. Genomic organization of this gene and its intron-exon structure in vole are identical to those in all previously studied species: it comprises five exons and the regulatory region containing several conserved elements. A tight coupling with the major histocompatibility complex is typical of the vole Oct4, as well as of its orthologs in mouse, cattle, and human [7,17,18].
Comparative analysis of nucleotide sequences and the experiments with transient transfection of Oct4 gene promoter and its regulatory regions within reporter constructs demonstrate once again the conservation of these sequences and their functions. Individual substitutions and insertions in this region detected in the vole Oct4 do Alignments of nucleotide sequences of the regulatory regions the Oct4 genes belonging to seven mammalian species not prevent transcription of this gene in the pluripotent mouse cells. The element that is absolutely necessary for the activity of this gene is the region of minimal promoter (-527/-1); the other elements cannot provide the transcription by themselves and are only able to modulate its level. Moreover, the activation and modulation of transcription depend not only on the presence of certain elements, but also on their correct orientation relative to the transcription start site. In addition, the results obtained demonstrate that Oct4 regulatory regions function in different types of pluripotent cell. A lower activity of the construct with deleted distal enhancer in TG-2a cells versus P19 line cells supports the idea that the distal enhancer is a key element providing a high activity of Oct4 gene in embryonic stem cells [15].
The comparative analysis of the nucleotide sequences of Oct4 orthologs in seven mammalian species demonstrated a high conservation of the genomic organization, coding region, and the main elements (minimal promoter, distal, and proximal enhancers) of Oct4 gene regulatory region. Phylogenetic sequence analysis of the promoter regions showed that the most conserved elements in this region are the sites for transcription factors of Sp1 family and the hormone-responsive element, which retain high similarity and similar organisartion. In addition, a high homology of GC-rich motifs and autoregulatory Oct4/Sox2 site was observed, whereas the homology of other elements in the regulatory region, in particular, the sites for transcription factors Mash-2 and Pem, is rather limited. This fact may suggest that the GCrich motifs and autoregulatory Oct4/Sox2 site are the elements that are most essential for the regulation of Oct4 gene expression.
Note that Oct4 is the most conserved among the genes involved in sustaining the pluripotency. For comparison, the Nanog gene of mouse and human shows a 58% similarity on their nucleotide sequence in the gene coding region [24]. However, despite the conservation of these sequences, considerable interspecific differences in expression of this gene are detectable in ontogenesis.
Expression of this gene in vole is detected at the early preimplantation (morula, blastocyst) stages and also in the genital ridges of postimplantation embryos. Presumably, the expression of Oct4 gene in postimplantation vole embryos depends on its activity in primordial germ cells, the gamete precursors, as was shown also for mouse [25]. Expression of vole Oct4 gene is undetectable in the organs of adult animals and somatic cell lines, unlike the human Oct4, which is detected in adult tissues [17,18]. The vole Oct4 transcript shows no alternative variants of splicing, which is characteristic of human [18]. Since the expression of this gene was not found in the trophoblast stem cell lines and extraembryonic endoderm lines of M. rossiaemeridionalis, this suggests that, at the stage of late blastocyst, this gene in vole, as well as in mouse, is expressed only in epiblast cells. Conceivably, the epiblast cells in vole late blastocysts retain the pluripotent state at least up to the start of implantation, which allows expecting a successful obtaining of ES cells in this species.

Conclusion
In this study, we cloned, sequenced and an alysed vole ortholog of Oct4 gene. A special attention was focused on studying the regulatory region of Oct4 gene using reporter constructs and comparative sequence analysis. The comparison involved a large number of species (seven, including the vole sequence that we determined) belonging to various families, which allowed us to detect the most evolutionarily conserved elements in the regulatory region containing the potential transcription factor binding sites.
Using the reporter constructs, we have demonstrated that the enchanter element and promoter of vole Oct4 gene are active in mouse pluripotent cells. We analysed vole Oct4 gene expression in early preimplantation embryos, genital ridges of postimplantation embryos, adult tissues and stem cell lines precursors of trophoblast and extraembryonic endoderm. We have found expression only in preimplantation embryos and in genital ridges, containing primordial germ cells, consistent with Oct4 being expressed specifically in pluripotent cells.

Microtus rossiaemeridionalis genomic library screening
Lambda DASH II genomic library of vole M. rossiaemeridionalis [26] has been screened according to a conventional method [27] using vole genomic PCR fragment as a probe.

Reporter vector construction
To obtain a reporter construct containing distal enhancer of the vole Oct4 gene (pDE2) fragment of regulatory region (-2186/-1895) was cloned into pGL2-Basic vector (Promega) in BglII-HindIII sites at the 5' region of luciferase gene. To produce a construct enclosing all elements of the regulatory region a (-1895/+106) fragment, containing proximal enhancer and promoter region of vole Oct4 gene, was cloned into plasmid pDE2 at HindIII site (pDEH6 construct). The (-1895/+106) fragment was also cloned into pDE2 in an opposite orientation (pDEH7 construct). In addition, a fragment (-1895/+106) alone was cloned into pGL2-Basic vector in both orientations relative to luciferase gene (constructs pΔDE1 and pΔDE2, respectively). Deletion of 1A site in proximal enhancer was achieved by KpnI digestion and following self-ligation of pDEH6 plasmid, producing pDEH6K4. pDEH6S7 plasmid, containing only the promoter region (-603/+106) of vole Oct4 gene, was obtained by excision of SmaI fragment from pDEH6 plasmid.

Transient transfection with reporter vectors
TG-2a mouse ES cells, P19 mouse EC cells, and R1 vole trophoblast stem cells were plated into six-well plates (Nunc) at 0.5 × 10 6 cells per well and cultivated for 24 hrs. Each reporter construct (4 μg/well) was transfected into cells using Lipofectamin2000 (Invitrogen). Cell lysate was prepared 48 hrs after the transfection; luciferase activity was detected using a Luciferase Assay System (Promega). Transfection of the pGL2-Basic vector (Promega) without insertion was used as a control of the basic level of luciferase activity. Each transfection experiment was performed in triplicate.