Comparative genomics using teleost fish helps to systematically identify target gene bodies of functionally defined human enhancers

Background Human genome is enriched with thousands of conserved non-coding elements (CNEs). Recently, a medium throughput strategy was employed to analyze the ability of human CNEs to drive tissue specific expression during mouse embryogenesis. These data led to the establishment of publicly available genome wide catalog of functionally defined human enhancers. Scattering of enhancers over larger regions in vertebrate genomes seriously impede attempts to pinpoint their precise target genes. Such associations are prerequisite to explore the significance of this in vivo characterized catalog of human enhancers in development, disease and evolution. Results This study is an attempt to systematically identify the target gene-bodies for functionally defined human CNE-enhancers. For the purpose we adopted the orthology/paralogy mapping approach and compared the CNE induced reporter expression with reported endogenous expression pattern of neighboring genes. This procedure pinpointed specific target gene-bodies for the total of 192 human CNE-enhancers. This enables us to gauge the maximum genomic search space for enhancer hunting: 4 Mb of genomic sequence around the gene of interest (2 Mb on either side). Furthermore, we used human-rodent comparison for a set of 159 orthologous enhancer pairs to infer that the central nervous system (CNS) specific gene expression is closely associated with the cooperative interaction among at least eight distinct transcription factors: SOX5, HFH, SOX17, HNF3β, c-FOS, Tal1beta-E47S, MEF and FREAC. Conclusions In conclusion, the systematic wiring of cis-acting sites and their target gene bodies is an important step to unravel the role of in vivo characterized catalog of human enhancers in development, physiology and medicine.

A B C D

Figure 5. Target gene identification of human CNE-enhancers through orthology mapping (hs174/hs644/hs1022 /hs12 ) .
Human CNE-enhancers by tracing the genic context of their orthologous copies in teleost fish lineage. (A) Human cis-regulatory element positioned in the intergenic space between PKN2 and LMO4, suggesting that this enhancer might be associated with one of these genes. Examining the neighboring genes of this enhancer in teleost fish suggest that PKN2 is a bystander gene because in all teleost fish analyzed this gene is physically uncoupled from CNE enhancer. Another gene HS2ST1 present in the neighborhood of human CNE-enhancer (upstream of PKN2) is similarly linked to this enhancer in all teleost fish analyzed. However the duplication of this locus in stickleback indicated that this gene is also a bystander gene as in one of the duplicated fragment (on Group XIII) HS2ST1 ortholog is lost. The only gene in the human locus that preserved its association with this CNE-enhancer in all teleost fish analyzed (even in duplicated loci of stickleback) is LMO4. Therefore human cis-regulatory element positioned in the intergenic space between PKN2 and LMO4 is unambiguously associated with LMO4 and named CNE-LMO4. (B) Human CNE-enhancer positioned within the intronic interval of HDAC9 gene on chromosome 7. Approximately 2 Mb of human locus encompassing this enhancer (containing at least 6 genes) was analyzed for the maintenance of conserved gene contents in teleost fish. The locus appeared to be duplicated in medaka and zebrafish. The differential gene loss from teleost duplicated loci suggest that CNEenhancer within intragenic interval of human HDAC9 gene is associated with the regulation of TWIST1. It is noteworthy that in zebrafish in addition to locus duplication event an independent gene duplication event occurred that produced two tandem copies of TWIST1 and associated CNE-TWIST1. (C) Human CNE-enhancer positioned within the intronic interval of EBF1 gene on chromosome 5. Comparative analysis of this locus in teleost fish revealed that genomic interval encompassing this conserved enhancer is duplicated in zebrafish and Fugu. It appeared that after duplication of the locus, one copy of this CNE-enhancer had been lost in both fishes. Tracing the correlation between gene loss-enhancer loss/gene retention-enhancer retention in fish duplicated loci suggest that CNE-enhancer within the intragenic interval of human EBF1 might be associated with the regulation EBF1 gene. (D) Comparative analysis of a genic context around a CNE-enhancer within the intronic interval of human WWOX suggest that this enhancer act at a distance of ~ 2Mb on MAF gene.

CNE-PAX6
A C
Analysis of genic environment of human CNE-enhancers in teleost fish orthologous genomic intervals helps in identifying their target genes. (A) CNE-enhancer within an intron of human ELP4 gene is associated with neighboring PAX6 gene.

(B)
Human CNE-enhancer positioned on chromosome 7 probably duplicated in the common ancestor of teleost fish.
Comparative syntenic analysis of human locus in multiple fish lineages and redundancy in the CNE induced reporter expression pattern and endogenous expression pattern of EN2 clearly suggest that this CNE enhancer is associated with the regulation of human EN2. (C) Conserved positioning of CNE-enhancer within an intergenic space between human CENTG2 and GBX2 and all other fish lineages and expression pattern studies suggest that this CNE is associated with either of flanking genes.  Analysis of human CNE-enhancers loci in teleost fish orthologous genomic intervals helps in identifying their target genes.  Gene, controlled by specific regulatory elements, are identified through the systematic analysis of orthologous genomic content, which harbors functionally identified CNE enhancers, of tetrapod-teleost lineages. (A) CNE residing between FUSSEL18 and SMAD2 in humans is also conserved along with SMAD2 , with differential loss and gain of bystander genes, in teleost fishes.