Expression of alternatively spliced isoforms of human Sp7 in osteoblast-like cells

Background Osteogenic and chondrocytic differentiation involves a cascade of coordinated transcription factor gene expression that regulates proliferation and matrix protein formation in a defined temporo-spatial manner. Bone morphogenetic protein-2 induces expression of the murine Osterix/Specificity protein-7 (Sp7) transcription factor that is required for osteoblast differentiation and bone formation. Regulation of its expression may prove useful for mediating skeletal repair. Results Sp7, the human homologue of the mouse Osterix gene, maps to 12q13.13, close to Sp1 and homeobox gene cluster-C. The first two exons of the 3-exon gene are alternatively spliced, encoding a 431-residue long protein isoform and an amino-terminus truncated 413-residue short protein isoform. The human Sp7 protein is a member of the Sp family having 78% identity with Sp1 in the three, Cys2-His2 type, DNA-binding zinc-fingers, but there is little homology elsewhere. The Sp7 mRNA was expressed in human foetal osteoblasts and craniofacial osteoblasts, chondrocytes and the osteosarcoma cell lines HOS and MG63, but was not detected in adult femoral osteoblasts. Generally, the expression of the short (or beta) protein isoform of Sp7 was much higher than the long (or alpha) protein isoform. No expression of either isoform was found in a panel of other cell types. However, in tissues, low levels of Sp7 were detected in testis, heart, brain, placenta, lung, pancreas, ovary and spleen. Conclusions Sp7 expression in humans is largely confined to osteoblasts and chondrocytes, both of which differentiate from the mesenchymal lineage. Of the two protein isoforms, the short isoform is most abundant.


Background
The osteoblast is the precursor of the osteocyte, the boneforming cell, [1]. The ability to regulate osteoblast proliferation and subsequent differentiation might provide a useful cell source for tissue engineering and repair [2][3][4]. The murine transcription factor Osterix is required for osteoblast differentiation and bone formation since in Osterix null mice no bone formation occurs [5]. To further elucidate the molecular basis of osteoblast-specific gene expression and differentiation we examined the sequence and expression of the human homologue of Osterix transcription factor, Specificity protein-7 (Sp7).
The Runx2/Cbfa1 is regarded as the master transcription factor for osteoblast differentiation since osteoblast differentiation is arrested in both endochondral and intramembranous skeletons in Runx2/Cbfa1 null mice [11][12][13]. In these Runx2/Cbfa1 null mice Sp7/Osterix is not expressed, whereas in Sp7/Osterix null mice Runx2/Cbfa1 is expressed, suggesting that Runx2/Cbfa1 acts upstream of Sp7/Osterix [5]. During bone development the mineralised cartilage matrix of the endochondral skeletal is invaded by preosteoblasts derived from mesenchymal cells. In Sp7/Osterix null mice these Runx2/Cbfa1 positive preosteoblast cells do not deposit bone matrix and still express chondrocyte marker genes suggesting that they have the potential to develop into chondrocyte or osteoblasts. Sp7/Osterix regulates the expression of a number of important osteoblast genes such as osteocalcin, osteonectin, osteopontin, bone sialoprotein and type I collagen. Sp7/Osterix was first identified by its induction in the murine pluripotent myoblast C2C12 cell line stimulated with bone morphogenetic protein-2 (BMP-2) and is similarly upregulated in bone marrow stromal cells and chondrocytes [4,5,14]. In bone marrow stromal cells from COX-2 knockout mice Osterix expression is reduced compared to wild-type mice and can be recovered by the addition of prostaglandin E 2 , indicating that COX-2 mediated skeletal repair involves Sp7/Osterix [4].
Sp7 belongs to the Sp subgroup of the Krüppel-like family of transcription factors (XKLF), that are characterised by a three zinc-finger DNA-binding domain that is located towards the carboxy-terminus of the protein [15][16][17]. In mammals, 6 other specificity proteins, Sp1-Sp6, have been identified and characterised to date. Outside the zinc-finger DNA-binding domain, the homology between the proteins is much less and often has no significant homology at all. The Sp proteins contain central activation domains that are often glutamine-rich or serine/threonine-rich [17]. Some genes contain a TATA box in their promoters that is involved in binding the transcription factor TFIID and the subsequent initiation of transcription. However, other genes lack a TATA box, and instead contain another major eukaryotic promoter element, the GC-box with the consensus sequences GGGCGG [18]. The Specificity protein-1 (Sp1) is a ubiquitously expressed transcription factor that binds the GC-box [19] and assists the further binding of TFIID for the initiation of transcrip-tion. Sp2 shows the lowest level of similarity with the other Sp proteins, being restricted only to the DNA-binding domain, and binds to the GT-box rather than the GCbox [18]. Sp3 is ubiquitously expressed. However, it is involved in the control of transcription of a number of tissue-specific genes [20]. Sp3 null mice suffer from a lack of ossification, impaired bone formation, decrease in the expression of osteocalcin, an osteoblast specific gene, being associated with a delay in osteoblast differentiation [17]. In these mice expression of Runx2/Cbfa1 was not affected, suggesting that Sp3 is involved in a Runx2/ Cbfa1-independent regulation of osteoblast differentiation. Sp1 and Sp3 may compete for the same GC-boxes as Sp3 was found to suppress the transcriptional activator function of Sp1 by binding to Sp1 specific sites [21]. Sp1 and Sp3 are also associated with chondrocyte de-differentiation by regulating the α1(II) procollagen gene (COL2A1) [22,23]. Type II collagen is a specific marker of chondrocyte differentiation and chondrocytes can de-differentiate and turn in to osteoarthritic chondrocytes by reducing the production of type II collagen and increasing the production of types I and III collagen. Here, Sp1 was an activator of COL2A1 transcription, whilst Sp3 acted as a repressor. Sp4, a transcriptional activator, is highly expressed in tissues of the central nervous system [20]. Sp5 only shares homology with Sp1 in the DNA-binding domain, binding the GC-box with high affinity and is expressed during early development, mediating the process of gastrulation and axial elongation [24]. Currently, only the sequences of human and mouse Sp6 are known (GenBank accession numbers AK127850 and AK029830 respectively). Here we describe the Sp7 transcription factor, the human homologue of Osterix and identify two alternatively spliced isoforms.

Human Sp7 cDNAs
We cloned two human Sp7 cDNA clones by RT-PCR from foetal osteoblast RNA and determined their sequence (Fig.  1). The clones differed at their 5' termini utilising two different first exons that are alternatively spliced. Clone 1 utilises exon 1B and encodes the long or alpha protein isoform of Sp7 (Fig. 1B)(GenBank accession No. AY150673). Clone 2 utilises exon 1A which is untranslated, and encodes the short or beta protein isoform (Figs. 1A and 2B) (GenBank accession No. AY150674). The first ATG codon of clone 1 is in excellent sequence consensus for an initiation methionine having an adenosine at -3 and a guanosine at +4, and that of clone 2 is in good sequence consensus having a guanosine at -3 [25]. The sequence of clone 1, encoding the long protein isoform, is similar to another GenBank clone (accession No. AF477981, B.W. Ganss, unpublished) which is longer and contains a potential 3' polyadenylation signal sequence.

Human Sp7 gene
The location and structure of the human Sp7 gene was identified by blast searches of the human genome using the two human cDNA sequences Sp7/Osx cDNA. It is located on genomic DNA clone RP11-147A18 (GenBank accession No. AC021103, Genome Sequencing Center, St. Louis, USA). The human Sp7 gene maps to 12q13.13 and neighbours the Sp1 gene, being 57 kb apart ( Fig. 2A). These two genes are also linked to the homeobox gene cluster HOX C, being approximately 0.4 Mbp away ( Fig.   2A). The human Sp7 gene spans 9.7 kb and lacks a CpG island, a characteristic of genes that do not have a housekeeping role. It has three exons (Fig. 2B). However, the first two exons are alternatively spliced, being separated by the 392 bp intron 1a, followed by the much longer, 6205 bp intron 1b. All splice donor/acceptor sites contained consensus GT/AT dinucleotides. An analysis of the human Sp7 promoter region identified certain potential cis-regulatory sites (Fig. 2C) that are conserved in both the mouse and rat Sp7/Ost promoters (located on mouse supercontig Mm15_WIFeb01_286, Sanger Centre, U.K. and rat chromosome 7q35, accession number AC097309, Human Genome Sequencing Center, Houston, USA).
Translation of the human Sp7 cDNA sequences Figure 1 Translation of the human Sp7 cDNA sequences. (A) Sequence of 5' untranslated alternatively spliced exon 1A encoding the short protein isoform. (B) Sequence encoding long protein isoform. The exon/exon boundaries (shown in blue bold type and underlined) were determined by comparison with the sequence of genomic DNA clone RP11-147A18. The atg/methionine start codon for the long protein isoform is shown in bold and coloured red and that of the short protein isoform coloured pink. The ORF stop codon, tga, is shown in bold and coloured red and is indicated by a red asterisk (*).
Binding sites for the S8 homeobox gene and core-binding factor alpha-1 (Cbfa1/Osf2/Runx2) (also called an osteocalcin-specific element 2, OSE2) are located approximately 90-100 bp upstream of exon 1a. The S8 site is on the plus strand and partially overlaps with the Runx2/ Cbfa1 site on the negative strand. There is a conserved inverted CCAAT box, a potential nuclear factor Y (NF-Y) binding site [26] approximately 115-125 bp upstream of exon 1b.

Comparison of the human and rodent Sp7 protein sequences
The ORF of clone 1 encodes the 431 residue alpha protein with a 44,994 Da molecular mass and an isoelectric point 8.67. The human Sp7 protein is very similar to that of mouse [5] and rat (accession number BK001412) having 95% identity and 99% similarity, with an insertion of 3 residues in the human amino-terminal (Fig. 3). The ORF of clone 2 starts at the second ATG of the clone 1, thereby Chromosomal localisation and gene structure of the human Sp7 gene Residues conserved with the mouse protein are shown by (*), strongly conserved residues by (:) and weakly conserved residues by (.). Residues are colour coded: basic, DE, blue; acidic, KR, pink; polar, CGHNQSTY, green and hydrophobic, AFILM-PVW, red. The start of translation for the short protein isoforms is indicated by a red M. A repetitive motif, GSSPL, is highlighted in yellow. The proline rich domain is underlined. A putative nuclear localization signal is indicated, NLS. By comparison with Sp1, those residues under the alignment illustrate the 3 zinc finger structures, coloured blue, green and red, with those Cys and His residues involved in co-ordinating the zinc ion shown in bold, those residues involved in stabilising the fold in italics and those residues involved in DNA binding.
The protein contains three classical zinc finger structures (residues 294-376) of the Cys2-His2 type where the conserved cysteine residues in two short β-sheets and the histidine residues in an α-helix tetrahedrally co-ordinate a zinc ion. The amino terminal part of the helix binds the major groove in DNA binding zinc fingers. These fingers in Sp7 can be described by the motifs HxCxxxxCxxx-YxKxxHLxAHxxxH, FxCxxxxCxxxFxRxxELxRHxxxH and FxCx--xCxxxFxRxxHLxKHxxxH for fingers 1 to 3 respec-tively, where those hydrophobic residues in italics are important for the stability of the fold, and those underlined are likely to bind DNA [28]. These zinc finger motifs are highly homologous to those present in the Specificity proteins 1 and 3 and similar to other Krüppel family transcription factors [15,18]. The most similar human protein to Sp7 is Sp6 (Fig. 4). A basic region, RKK (residues 289-291), is similar to the nuclear localization signal of the related immediate-early transcription factor, Egr-1 [29].
The protein is glycine and proline rich. The proline rich region (20% proline, residues 65-249) is involved in strong transcriptional activation [5]. Apart from the zincfinger domain, the protein lacks any similarity to known domains. There are two copies of a GSSPL motif of unknown function at the amino-terminus of the long protein isoform and the short protein isoform possess one alpha-helix/GSSPL motif.

Tissue and cellular distribution of human Sp7 mRNA
The expression of both long and short isoforms of Sp7 in osteoblast cell types was examined by RT-PCR using primers specific for each isoform (Fig. 5A). The Sp7 short protein isoform was expressed in foetal and craniofacial osteoblasts, chondrocytes and the osteosarcoma cell lines HOS and MG63, but was below the limit of detection in adult osteoblasts. In general, the expression of the Sp7 long protein isoform was much lower than the short protein isoform, being found in foetal and craniofacial osteoblasts, and the osteosarcoma cell lines HOS and MG63, but was not found in adult osteoblasts and chondrocytes. Runx2/Cbfa1 and the housekeeping gene β-actin were expressed in all osteoblast-like cell types (Fig. 5A).
The expression of Sp7 in a range of adult tissues was examined by RT-PCR using oligo-dT primed cDNAs. Isoform specific expression was not detected. However, using primers located in the 3'UTR that detect both isoforms expression was found in testis, heart, brain, placenta, lung, pancreas, ovary and spleen (Fig. 5B) HsTGF-β β β β-IEG located in the 3'UTR of another transcription factor, TCF23, [30] no expression was detected under these amplification conditions (data not shown), indicating that the cDNAs were free of genomic DNA.

Discussion
We have cloned two human Sp7 cDNAs that are homologues of the murine Osterix transcription factor [5] that is specifically expressed in all developing bones. As would be expected for a gene involved in bone development, the Sp7 gene has a limited phylogenetic distribution being identified only in mammals, amphibians and teleosts (bony fish) to date. The long Sp7 protein isoform has 94.7% identity with the murine protein. The Sp7 proteins share a high degree of homology with other members of the Sp family in the three zinc-finger DNA-binding domain. The human Sp7 protein has 78% identity and 89% similarity with human Sp1 protein in the DNA-binding domain, but the two proteins have little homology elsewhere in their sequences. All those residues in the three zinc fingers that are important for maintaining the structure and that contact DNA are conserved between Sp7 and Sp1, indicating that Sp7 will bind Sp1 DNA consensus sites. Indeed, Osterix/Sp7 bound a consensus Sp1 binding site and Sp1-like sequences in the collagen type-Iα1 and collagen type-2α1 promoters [5]. No genetic bone disease has yet been associated to the Sp7 gene [31]. However, we speculate that abnormal Sp7 binding to the polymorphic Sp1 binding site in the collagen type I α1 gene promoter [32] may lead to reduced bone density and osteoporosis.
The human Sp7 gene is separated by 57 kb from the Sp1 gene at chromosomal localisation 12q13.13. This arrangement is similar in the mouse with the genes being separated by 53 kb on a supercontig (Mm15_WIFeb01_286), on chromosome 15 between the Wnt10b and Itga5 genes [5] and leads to the suggestion that the two genes evolved through a duplication event from a common ancestor. However, they are encoded on opposite strands and have very different gene structures, with Sp1 having 6 exons. These two Sp genes are in close proximity to the HOX-C gene cluster. Colocalization between Sp genes and HOX gene clusters is a reoccurring feature of the genome [33]. In humans, Sp2 and Sp6 are linked to the HOX-B gene cluster located on chromosome 17q21.3-q22 and Sp3 and Sp5 are linked to the HOX-D gene cluster located on chromosome 2q31.1 and Sp4 is linked to the HOX A gene cluster located on chromosome 7q21.3-q22. Together, the combination of Sp and homeobox genes may form a genome unit in which transcription of these genes is co-ordinated since they are all involved in determining the pattern of development.
Both the human and rodent Sp7 genes have a similar structure, spanning 9.7 kb and having three exons with the first two exons being alternatively spliced. The two alternatively spliced human Sp7 cDNAs encode a fulllength protein (alpha or long isoform) and an amino-terminal truncated protein (beta or short isoform) of 431 and 413 residues respectively. This arrangement is also found in the homologous rodent genes. Alternative spliced variants are a feature of other members of the Sp/ XKLF family [34]. TIEG, an oestrogen regulated protein in osteoblasts, has 2 isoforms that are expressed from alternative promoters in a similar manner to Sp7 [35][36][37]. The isoforms are called the TGFβ-inducible early gene (TIEG1) and the early growth response gene-α (TIEG2). TIEG1 possesses an extra 10 residues at its amino-terminus. Alternative isoforms of other family members may possess different transcriptional properties since different Sp3 isoforms can stimulate or repress transcription [38,39].
We have shown that the relative abundance of the two Sp7 isoforms varies in different cell types. There are some potential transcription factor binding sites that are conserved in the promoter regions of the human and rodent Sp7 genes. These may be important in regulating the expression of the different Sp7 isoforms. Close to the start of transcription of the short protein isoform are overlapping S8 homeobox transcription factor and Runx2/Cbfa1, OSE2 elements [40][41][42]. The S8 homeobox transcription factor is involved in mesenchyme-specific pattern formation in the embryo, and Runx2/Cbfa1 is essential for osteoblast differentiation [41,43]. In mice, Runx2/ Cbfa1 acts upstream of Sp7 in bone development [5] so the presence of a conserved OSE2 element suggests that Runx2/Cbfa1 may directly regulate the expression of the short protein isoform of Sp7. Runx2/Cbfa1 has two amino-terminal isoforms which exert different functions in the process of osteoblast differentiation [13,44,45]. However, the role of these isoforms in regulating Sp7 expression remains to be determined. The presence of a CCAAT box, a potential NF-Y binding site close to the start of transcription of the long protein isoform, suggests that expression of this isoform may be regulated by NF-Y In a similar manner to the way that bone sialoprotein expression is stimulated by the NF-Y transcription factor [46].
Human Sp7 mRNA expression was limited to cells of the mesenchymal lineage such as foetal and craniofacial osteoblasts, chondrocytes and the osteosarcoma cell lines HOS and MG63. Similarly, in mice and rat cell lines Sp7/ osterix expression was found in the preosteoblastic cell line, MC3T3-E1, but not in fibroblasts, nor plasmacytoma or pheochromocytoma cells [5]. The expression pattern of Sp7 supports the idea that this transcription factor plays a role in both osteoblast differentiation in both endochondral and intramembranous ossification since it was expressed in foetal osteoblasts and neonatal craniofacial osteoblasts.
Adult osteoblasts expressed Runx2/Cbfa1, which acts upstream of Sp7 [5]. However, Sp7 expression was below the level of detection in these cells. It remains to be determined whether loss of Sp7 expression is the result of partial de-differentiation in the in vitro culture conditions or reflects the situation in aged bone. Adults constantly remodel bone and a possible consequence of a loss of Sp7 expression in human adult osteoblasts may be a reduction in bone mineralization. Potential treatments that increase Sp7 expression may help treat bone-wasting diseases such as osteoporosis.
Human articular chondrocytes also expressed Runx2/ Cbfa1 and Sp7. In mice embryos, Sp7/osterix is expressed in young differentiating chondrocytes and germ tooth mesenchymal cells, but not in condensed chondrogenic mesenchyme nor in later chondrocytic cells [5]. Rat chondrosarcoma cells were also found to express Sp7/osterix. This suggests that Sp7 plays a role in chondrocyte differentiation, but not in cartilage matrix deposition.
Osteosarcomas are primary malignant tumours of bone or soft parts arising from bone-forming mesenchymal cells. Sp7 was expressed in both the 2 human osteosarcoma cell lines examined and has previously been found in the rat osteosarcoma cell line, ROS17/2.8, [5]. Additionally, amplification of chromosomal region 12q13-15 that contains Sp7 has been noted in various human osteosarcoma [47,48] suggesting that Sp7 may be involved in neoplastic disease.

Conclusions
The expression of Sp7 in human primary foetal osteoblast cells involved in endochondral and intramembranous ossification is consistent with a role for Sp7 in the coordination of changes in transcription required to generate bone formation. However, expression of Sp7 in chondrocytes suggests that it also plays a role in cartilage formation. Two different promoters generate two alternatively spliced isoforms, the proximal promoter generating the full-length protein and the distal promoter an amino-terminal truncated short protein isoform. Generally, the short protein isoform was more abundant.

Cell culture and RNA extraction
Two cell lines derived from osteosarcomas, Hos and MG-63, [49,50] were a kind gift from Colin Scotchford (University of Nottingham). They were cultured at 37°C in 5% CO 2 using Dulbecco's Modified Eagle's Medium, containing 4 mM L-glutamine, 4500 mg glucose/L, 1500 mg bicarbonate/L (Invitrogen, UK) with the addition of 10% foetal bovine serum, 10 µM ascorbic acid, 100 IU/mL penicillin and 50 µg/mL streptomycin. Primary human osteoblasts were isolated from trabecular bone of femoral heads taken during total hip arthroplasty and cultured as previously described [51]. Primary human craniofacial osteoblasts were obtained from paediatric skull and cultured as previously described [52]. Human primary foetal osteoblasts were obtained and cultured as previously described [53]. Human primary articular chondrocytes were obtained from isolated femoral heads and cultured as previously described [54]. Total RNA was extracted from cells using the SV Total RNA Isolation System, (Promega, UK). The concentration and purity of eluted RNA was determined spectrophotometrically and the quality of the RNA was verified by non-denaturating agarose gel electrophoresis. For Sp7 cloning, total RNA was reverse transcribed with an oligo-dT primer using Ther-moScript, an AMV RNase H-reverse transcriptase (Invitrogen, UK).

Molecular cloning of human Sp7
The mouse Osterix/Sp7 cDNA sequence [5] was used to search the human genome sequence and the EST database to identify the human homologue of Sp7. PCR primers designed from the sequence of genomic DNA clone RP11-147A18 and an EST (accession No. BM687008) was used to clone two alternative spliced isoforms of the human Sp7 cDNA by PCR from foetal osteoblast cDNA. The PCR primers were: forward, long protein isoform A, TCTCTCCATCTGCCTGG; forward, short protein isoform B, ACCCGTTGCCTGCACTCTC and common reverse, CACAATGTTCTCTCCCCAAGCT (Helena Biosciences, France). RT and PCR were carried out in a thermocycler (PE Applied Biosystems 2400) using Advantage cDNA polymerase (Clontech, UK). PCR products were excised from agarose gels stained with ethidium bromide and eluted from the agarose using a DNA extraction kit (Qiagen, UK). The PCR products were cloned into the T-A vector pCR4-TOPO (Invitrogen, The Netherlands). Transformed colonies were picked and vectors containing inserts were extracted using the Wizard Plus SV minipreps DNA purification system (Promega, UK) and sequenced in both directions as previously described [55].

Tissue and cellular distribution of human Sp7 mRNA by reverese transcriptase PCR
Human cDNA was analysed for the relative expression of the Sp7, Runx2/Cbfa1 and β-actin mRNA by real time PCR. Sixteen adult tissue cDNAs (BD Clontech, UK) were generated from polyA + selected RNA and reverese transcribed using an oligo-dT primer. Cell type cDNAs were generated from total RNA and reverese transcribed using random hexamers. Approximately 4.0 ng of cDNA from each tissue, and cDNA derived from 50 ng of total RNA from each cell type was amplified by PCR using Taq Gold polymerase. Tissue master mixes were divided into gene specific mixes with the addition of PCR primers to a final concentration of 200 µM. The primers were: Sp7 3'UTR, TTGACTGCAGAGCAGGTTCCT and GCTGCAAGCTCTC-CATAACCA producing a 97 bp amplicon; Sp7 forward primer for the long protein isoform, TCCTCCCTGCTTGAGGAGGA(exon 1b/2); Sp7 forward primer for the short protein isoform, CAGGTTCCCCCAG-GAGGA(exon 1a/2) and common reverse primer, AGTC-CCGCAGAGGGCTAGAG (exon 2) producing a 100 and 98 bp amplicons respectively; Runx2/Cbfa1, AGAA-GAGCCAGGCAGGTGCT(exon 6/7) and TTCGTGGGTTGGAGAAGCG (exon 7) producing a 102 bp amplicon, a measure of the sum of all Runx2/Cbfa1 isoforms, and β-actin, GGCCACGGCTGCTTC and GTT-GGCGTACAGGTCTTTGC producing a 208 bp amplicon. The amplification conditions were; a 10 min hot start to activate the polymerase followed by 50 cycles of 95°C for 15 sec and 60°C for 1 min. Amplification specificity was confirmed by agarose gel electrophoresis and direct sequencing of the amplicons.