Analysis of the conservation of synteny between Fugu and human chromosome 12

Background The pufferfish Fugu rubripes (Fugu) with its compact genome is increasingly recognized as an important vertebrate model for comparative genomic studies. In particular, large regions of conserved synteny between human and Fugu genomes indicate its utility to identify disease-causing genes. The human chromosome 12p12 is frequently deleted in various hematological malignancies and solid tumors, but the actual tumor suppressor gene remains unidentified. Results We investigated approximately 200 kb of the genomic region surrounding the ETV6 locus in Fugu (fETV6) in order to find conserved functional features, such as genes or regulatory regions, that could give insight into the nature of the genes targeted by deletions in human cancer cells. Seven genes were identified near the fETV6 locus. We found that the synteny with human chromosome 12 was conserved, but extensive genomic rearrangements occurred between the Fugu and human ETV6 loci. Conclusion This comparative analysis led to the identification of previously uncharacterized genes in the human genome and some potentially important regulatory sequences as well. This is a good indication that the analysis of the compact Fugu genome will be valuable to identify functional features that have been conserved throughout the evolution of vertebrates.


Background
The short arm of chromosome 12 is frequently deleted in a wide variety of hematological malignancies [1,2] (as well as in certain solid tumours including breast, lung, ovarian, and prostate carcinomas (reviewed in Aissani et al. [3]). Loss of heterozygosity studies revealed hemizygous deletions of chromosome 12p12 in 26 to 47 % of pre-B acute lymphoblastic leukemia (ALL) cases, making it one of the most common genetic alterations found in this disease [4][5][6]. The frequent loss of genetic material in tumour cells is usually indicative of the inactivation of tumour suppressor genes. The construction of both high-resolution genetic and physical maps led to the delineation of the shortest commonly deleted region in ALL patients. This 1 Mb interval is flanked by the genes ETV6 and CDKN1B [4,7,8] that encode a member of the ets-like family of transcription factor (reviewed in Rubnitz et al. [9]) and a cyclin-dependent kinase inhibitor [10,11], respectively. Eight other genes have been mapped within this region: LRP6, encodes a member of the LDL receptor family; GPR19 is a gene encoding the G-protein coupled receptor 19; CREBL2 encodes Cre binding protein-like 2; BCL-G is a new member of the BCL-2 family that possess proapoptotic activity; MKP-7 is a new member of the dual-specificity ser/thr and tyr phosphatase; as well as three previously uncharacterized genes, LOH1CR12, LOH2CR12 and LOH3CR12 with unknown functions [12].
The pufferfish Fugu rubripes (Fugu) is a vertebrate model for comparative genome analysis because of its 400 Mb compact genome [13]. The Fugu genome is about eight times smaller than its human counterpart, but has a similar gene set both in terms of total number and structure. Only 3 % of the human genome is coding, compared to nearly 20 % for Fugu, which greatly facilitates gene identification and cloning. Fugu has been successfully used to characterize functional protein domains [14,15] as well as putative regulatory elements [16][17][18]. If substantial linkage and synteny are conserved, Fugu genome could be a very useful tool for transcriptional unit mapping and comparative positional cloning of human homologous genes [19][20][21]. Indeed, many reports present evidence of gene order conservation (synteny) over relatively long stretches of sequence between human and Fugu indicating that Fugu could be used to identify new genes (reviewed in Venkatesh et al. [22]). However, other reports showed considerable differences in the order of the genes in some regions between these two species and gave rise to a controversy on the potential of Fugu as a model organism for positional cloning (e.g Gilley and Fried [23]). So far too few loci have been analysed to give an actual view of the extent of conserved synteny and conserved gene order between Fugu and human genomes (e.g. Smith et al. [24]).
In the present study, we wanted to characterize the Fugu ETV6 (fETV6) locus in order to identify genes that might have been missed previously in the human putative tumor suppressor locus. We characterized approximately 200 kb of genomic sequences around the fETV6 locus. We identified seven genes, including three uncharacterized genes in the human genome and some potentially important regulatory sequences. However, although some synteny with chromosome 12 was conserved, extensive genomic rearrangements have occurred between the Fugu and human ETV6 loci. These observations raise important questions on the usefulness of Fugu as a model for disease gene cloning.

Results
We previously isolated Fugu genomic clones that encompassed the fETV6 gene [25]. To characterize the genomic regions surrounding the fETV6 locus, 8 overlapping clones extended toward the 3' of fETV6 were mapped by Southern blot hybridizations. The resulting physical map covering approximately 200 kb is shown in Fig. 1. To delineate the physical relationship between the human and the fETV6 loci, we sequenced two contiguous but non-overlapping Fugu BAC clones 220c16 (approximately 65 kb) and 231j9 (approximately 110 kb) using a shotgun cloning strategy. 506 sequence reads corresponding to more than 400 kb were assembled into 87 contigs using DNAStar software. About 60 kb of sequence, from 22 contigs, were obtained for BAC 220c16 (GenBank acc. no. AY339212) and 95 kb of sequence, from 65 contigs, were obtained for BAC 231j9 (GenBank acc. no. AY339213). Most contigs could be precisely assembled using the ETV6 genomic sequence (GenBank acc. no. AF340230) and shotgun reads from the cosmid clone 188P21 characterized as part as the Fugu genome project ( [26]; see Fig 1). After assembly, we found that both BAC clones ended at a common HindIII site used for cloning.

Sequence analysis of the fETV6 locus
The NIX integrated resource was used to predict the presence of 8 putative coding sequences in the 155 kb of sequence. Blast analysis of the sequence revealed homology to several known or hypothetical proteins in addition to ETV6 (Table 1). From the Sp6 end of the BAC 220c16 (see Fig.1) the gene order is: Tissue inhibitor of metalloproteinase 3 (Timp3), Synapsin III (SynIII), hypothetical gene similar to FLJ22693 and chromosome 12 open reading frame gene 6 (C12ORF6), fETV6, DKFZp761E1217, Solute carrier family 16, member 7 (SLC16A7), Lig-2 as well as Ku-70-binding protein 3 (KUB3). With the exception of KUB3, the transcription orientation could be inferred (Fig 1). The human homologues of 5 of the 8 genes mapped to human chromosome 12. Most Fugu genes show high sequence similarity to their known human orthologs ( Table 1). The number of exons and the intron/exon organization are well conserved between the 2 species (data not shown).
We found strong sequence identity to the genes SynIII and Timp3 within the genomic sequences derived from the BAC 220c16 (Table 1), although only the first 5 exons of SynIII and the last 4 exons of Timp3 were present in this clone. As in human, Timp3 is located within the fifth intron of SynIII on the opposite strand [27]. Of note, the Drosophila homologues for the Timp and Syn genes are organized in the same way [27]. Because both genes map to human chromosome 22, we wanted to gain more insight in the evolution of this locus by studying the genome organisation of zebrafish, another teleost fish. We mapped the ETV6 gene using a zebrafish radiation hybrid panel between the markers Z1366 and Z21743 on the zebrafish LG4 chromosome (data not shown). Z21743 was identified as SynII, but a closer inspection revealed similar identity (88 % vs 85 %) to SynIII at the protein level suggesting that this marker might indeed be Syn III, thus indicating that ETV6 and SynIII are syntenic in zebrafish and Fugu, but not in human.
Immediately upstream of fSynIII, on the same strand, we identified a Fugu homolog of the human gene FLJ22693 (Table 1), encoding a predicted protein that has similarity to the TRF1-interacting ankyrin-related ADP-ribose polymerase, although its actual function is unknown. Blast analysis revealed 3 human genomic clones (AC004961, AC004849 and AC025816), all mapping to chromosome 7q33-35, that contained the hypothetical gene FLJ22693. This locus seems to have been involved in several rearrangements since at least six partial and two complete copies of this gene are present, although only one could encode a functional protein. Because of the relatively low homology between the human and the Fugu genes (47 % identity at the nucleotide level and 38 % identity at the protein level) it is possible that the actual ortholog might be different. For instance, we observed another homolog, C12ORF6 (Gen Bank acc. no. NM_020367) with 45% identity at the nucleotide level which lies on chromosome 12p13. However more than 8 Mb of sequence separates C12ORF6 from ETV6 on human Physical map of the Fugu ETV6 locus Figure 1 Physical map of the Fugu ETV6 locus. The direction of transcription of the identified genes, when known, is indicated with an arrow. BACs and cosmids used in this study are indicated under the physical map. The approximate sizes of the inserts are indicated in parentheses. The position of the Sp6 and T7 extremities of the clones and the presence of NotI, SfiI and SrfI restriction sites are also indicated. The asterisk (*) indicates a putatively non-functional copy of the ADP-ribose polymerase-like I gene. The next Fugu gene, DKFZp761E1217, showed strong sequence identity (81%) to many uncharacterized human mRNAs (data not shown). Its function could not be inferred since no known domains are encoded by this gene. However, the clone with the greatest amino acid identity (91%; GenBank acc.no. AL834160) was located on human chromosome 12 (Table 1, Fig. 1), and is thus probably the actual ortholog. Transcribed in the opposite direction we found the Fugu ortholog of SLC16A7 (solute carrier family 16 member 7) that has 6 exons encoding a monocarboxylic acid transporter that is present on chromosome 12 ( [28]. The seventh gene from the left (see Fig.  1) showed significant homology (72 % identity) to a human cDNA (GenBank acc. no. BC012380) ( Table 1) encoding a predicted protein with 67% sequence similarity to LIG-1, an integral membrane protein, that we named LIG-2. We observed a non-coding sequence element, within a LIG-2 intron, that was extremely conserved between Fugu and human (Fig. 2). Although other LIG-2 homologues were observed in the human genome, this conserved sequence was only conclusively identified on chromosome 12 thus identifying LIG-2 as the true ortholog of the Fugu gene and illustrating that the observed synthenic relationship could be useful for predicting human orthologs for Fugu genes. Finally, the ortholog of KUB3 gene was identified within the sequencing data released from cosmid 188P21. This gene encodes a member of the Ku70-binding protein family, although its exact function is unknown [29]. Genes If we assume that C12ORF6 is the actual ortholog of the Fugu gene, then synteny with human chromosome 12 is observed with 6 consecutive Fugu genes (ADP-ribose polymerase-like I, ETV6, DKFZp761E1217,SLC16A7, LIG-2 and KUB3). However gene order is not conserved since many other human genes lie in the chromosome 12 region covered by this Fugu contig and many of them can be found in other Fugu loci (see Fig. 3). These observations suggest that multiple rearrangements, linkage breaks and gene losses occurred since the two species diverged 430 Myr ago. Highly conserved non-coding sequence within a LIG-2 intron Figure 2 Highly conserved non-coding sequence within a LIG-2 intron. Vertical bars indicate identical nucleotides. Dashes were introduced to maintain the maximal alignment.

Discussion
Conservation of synteny and linkage between distant species is a characteristic that could be integrated in positional cloning strategies. Fugu was proposed as a vertebrate model for this purpose since its genome is extremely compact, it shares a similar gene content with humans, and many examples of conserved synteny with human have been documented (reviewed in Venkatesh et al. [22]). However, only a few reports could demonstrate that gene order was indeed conserved. That includes the PAX6 region on chromosome 11p [30] and the Alzheimer disease locus on chromosome 14 [21]. Other reports have shown extensive rearrangements between human and Fugu genomic regions, including the surfeit locus [23] and the Hox gene cluster [31]. Additional work is needed to uncover the complete set of conserved syntenies in order to gain insights into the extent to which gene order has been preserved within synthenic groups.
In this study, we characterized approximately 200 kb of Fugu genomic sequence encompassing fETV6 in order to find new genes that would be associated with the human ETV6 locus that is frequently rearranged in human malignancies. We found 8 genes from which 6 homologs were located on human chromosome 12 indicating that synteny was to some extent maintained throughout the evolution. However, when taken together with data from another study of Fugu homologs on human chromosome 12 [32], it became obvious that many chromosomal inversions have occurred since the divergence of both species 430 Myr ago (Fig. 3). Similar observations were made with other human genes present on the tumor suppressor locus on chromosome 12, including CDKN1B, using the Synteny analysis between Fugu and the human chromosome 12 Figure 3 Synteny analysis between Fugu and the human chromosome 12. The physical relationship between the genes present near fETV6 (this study) and fPTHLH (see reference 32) loci and their homologues on the human chromosome 12 is indicated with straight lines. Distances between the genes are not to scale. KCNA1: potassium voltage-gated channel, shaker-related subfamily member 1 (NM_000217); PTHLH: parathyroid hormonal-like hormone (NM_002820); LDHB: lactate dehydrogenase B (NM_002300); TMPO: thymopreietin (NM_003276).
now available complete Fugu genomic sequence (data not shown; [33]). Our data are consistent with others suggesting that at least some regions of the Fugu genomes underwent substantial intrachromosomal rearrangements over evolution [22][23][24]26,34,35]. The presence of conserved synteny but lack of gene linkage are in line with the hypothesis that inversions were more prevalent than translocations in the evolution of fish genomes (e.g. Postlethwait et al. [36]). In this regard, computer simulation showed that between 4000 to 16000 chromosomal rearrangements occurred since the divergence between Fugu and human, which is a much greater rate than the 180 rearrangements predicted between human and mouse, even when normalized for the divergence time [37]. One plausible explanation is that when the bony fish lineage underwent a complete genome duplication it has been followed by differential gene loss. As many as 80 % of these duplicated genes were selectively lost [38][39][40]. This mechanism should in principle maintain synteny but not absolute gene order, which is what we observed. Noteworthy, the genes ETV6 and SynIII that lie on two different human chromosomes are on the same linkage group in both Fugu and zebrafish. This would indicate that the observed chromosomal rearrangement is not specific to Fugu but rather generalized in teleost lineages, thus making these fishes a less suitable model for positional cloning of human genes.
On the other hand, our findings support the notion that Fugu genome is suitable to confirm gene and find putative regulatory domains in the human genomes [41]. The characterization of both coding and non-coding sequences allowed the identification of regions of synteny, thus providing tools to distinguish the actual orthologs from other homologs. In this study, by characterizing a short region of 200 kb in Fugu, the genomic structure of three uncharacterized genes (LIG-2, ADPribose polymerase-like I and DKFZp761E1217) in human have been determined using the Fugu sequence (data not shown), indicating the importance of this model. Furthermore, one conserved non-coding sequence was identified within a LIG-2 intron (this study) and another one in an ETV6 intron [25]. These sequences are more conserved than the actual coding sequence indicating a putative functional significance. These sequences may act as transcriptional enhancers (e.g. Muller et al. [42]), although other experiments will be needed to prove it. All these characteristics show that the Fugu genome is a powerful tool to characterize the human genome. Thus, in addition to its importance in understanding vertebrate evolution, the availability of a complete pufferfish genome sequence [33] will also be of great importance for the annotation and characterization of the human genome.

Conclusions
This comparative analysis led to the identification of previously uncharacterized genes in the human genome and some potentially important regulatory sequences as well. This is a good indication that the analysis of the compact Fugu genome will be valuable to identify functional features that have been conserved throughout the evolution of vertebrates.

Isolation of BAC clones
Two gridded genomic Fugu libraries, a cosmid library 66 in Lawrist 4 (constructed by C. Burgtorf and distributed by RZPD, Berlin, Germany) and a BAC library cloned in pBeloBAC11 (obtained from Genome Systems) were screened at low stringency with a human ETV6 cDNA probe as described in Montpetit and Sinnett [25]. Four overlapping BAC clones, 220c16, 225j23, 227h21 and 235e13, were obtained. To construct a physical map, the genomic Fugu clones were digested with various endonucleases (NotI, SfiI and SrfI) followed by Southern hybridization with T7, Sp6 and ETV6-derived probes. To extend the contig toward the T7 extremity of the BAC 227h21, the sequence information generated from this end was used to design specific PCR primers (227T7F: AAGCTT-GGATCTGGGTCCGTC and 227T7R: GTCCTTTCATTC-CACCACAG) that amplified a 200 bp fragment. The latter was used as a probe to rehybridize both Fugu genomic libraries. This led to the identification of three new overlapping clones, one cosmid (ICRFc66G248Q1.3) and 2 BACs (207f2 and 231j9).

Large-scale DNA sequencing
The BACs 220c16 and 231j9 were sequenced to 3.1 and 2.2 fold redundancy, respectively, using a shotgun strategy as described in Wilson et al. [43]. Briefly, the clones were isolated using NucleoBond Plasmid Maxi Kits (Clontech) and randomly sheared by nebulization. Blunt ends were assured using mung bean nuclease, Klenow fragment and DNA polymerase I and fragments from 1.5 to 3 kb were size-fractionated by agarose gel electrophoresis. Fragments were ligated into SmaI-digested M13mp9 vector and single-stranded templates were purified using Qiagen M13 plates. Random clones from each sub-library were sequenced using dye-terminator and dye-primer chemistry (Amersham and ABI) using either ABI 373 or 377 automated DNA sequencers.

Sequence analysis
DNAStar software was used for gel trace analysis, contig assembly and to obtain global alignments of genomic data. Repeat elements were identified using RepeatMasker2 http://ftp.genome.washington.edu/cgibin/RepeatMasker. NIX package software http:// www.hgmp.mrc.ac.uk/NIX, which integrates several dif-ferent algorithms for exon prediction (FGenes, Genescan, Grail, Hexon, etc.) and Blast analysis (Swissprot, dbEST, Unigene, etc.) was used to identify the putative genes. All other sequence analyses were performed using BLAST or CD-Search programs on the NCBI site http:// www.ncbi.nlm.nih.gov. Sequence from clone 188P21 was available from the Fugu sequencing project http:// Fugu.hgmp.mrc.ac.uk/. Multiple sequence alignments were obtained using the ClustalW algorithm. The physical position of human genes was obtained using the November 2002 version of the Human Genome Browser http:// genome.ucsc.edu.