Identification of functional TFAP2A and SP1 binding sites in new TFAP2A-modulated genes

Background Different approaches have been developed to dissect the interplay between transcription factors (TFs) and their cis-acting sequences on DNA in order to identify TF target genes. Here we used a combination of computational and experimental approaches to identify novel direct targets of TFAP2A, a key TF for a variety of physiological and pathological cellular processes. Gene expression profiles of HeLa cells either silenced for TFAP2A by RNA interference or not were previously compared and a set of differentially expressed genes was revealed. Results The regulatory regions of 494 TFAP2A-modulated genes were analyzed for the presence of TFAP2A binding sites, employing the canonical TFAP2A Positional Weight Matrix (PWM) reported in Jaspar http://jaspar.genereg.net/. 264 genes containing at least 2 high score TFAP2A binding sites were identified, showing a central role in "Cellular Movement" and "Cellular Development". In an attempt to identify TFs that could cooperate with TFAP2A, a statistically significant enrichment for SP1 binding sites was found for TFAP2A-activated but not repressed genes. The direct binding of TFAP2A or SP1 to a random subset of TFAP2A-modulated genes was demonstrated by Chromatin ImmunoPrecipitation (ChIP) assay and the TFAP2A-driven regulation of DCBLD2/ESDN/CLCP1 gene studied in details. Conclusions We proved that our computational approaches applied to microarray selected genes are valid tools to identify functional TF binding sites in gene regulatory regions as confirmed by experimental validations. In addition, we demonstrated a fine-tuned regulation of DCBLD2/ESDN transcription by TFAP2A.


Background
The coordination of various complex biological functions as well as the response to environmental and developmental stimuli are governed by biochemical processes that regulate gene activity.Transcription is the initial step of gene expression and it involves a multitude of transcription factors (TFs), their corresponding cis-acting elements on DNA, additional co-factors and the influence of chromatin structure [1].Functional TF binding sites (TFBSs) can be identified in the genome by computational approaches or experimentally by Chromatin ImmunoPrecipitation and hybridization on a genomic microarray (ChIP on Chip) [2] or by high-throughput selection procedures (SELEX) in which pools of random DNA sequences are mixed with a TF and those that are preferentially bound are recovered and sequenced [3,4].
However, an alternative and very promising approach consists in combining in silico TFBS predictions in the gene promoter regions and microarray analyses, comparing gene expression of cells in which a TF is either overexpressed or deleted [5][6][7].Indeed, the analysis of regulatory sequences of putative co-regulated genes might be useful in identifying common cis-regulatory elements recognized by specific TFs [5].The microarray assays help to narrow down the number of genes to be analyzed, focusing on those more likely to be regulated by the same TFs, thus reducing the false positive and negative rates.
Every TFAP2 protein possesses a unique, highly conserved helix-span-helix dimerization motif at the C-terminal half of the protein, a central basic region and a less conserved proline-and glutamine-rich domain at the amino terminus [14].The helix-span-helix motif and the basic region mediate DNA binding and dimerization [15] while the proline-and glutamine rich region is responsible for transcriptional transactivation.The TFAP2 proteins are able to form hetero-as well as homo-dimers and bind to GC-rich DNA sequences within regulatory regions of their target genes, mediating both activation and repression of gene transcription [12].Functional TFAP2 binding sites, such as 5'-GCCN3GGC-3' or 5'-GCCN4GGC-3' or 5'-GCCN3/4GGG-3' have been identified [16].However other well characterized binding sites, such as 5'-CCCCAGGC-3' [17] or others [18], which differ considerably from the previous sequences, have also been found, indicating that TFAP2 binding sites may represent promiscuous GC-rich elements varying considerably in binding affinity.This makes the computational identification of TFAP2 binding sites not a trivial process.A Positional Frequency Matrix (PFM) obtained by multiple alignment algorithms, which leads to nucleotide scores indexed by letters and positions is often used to localize degenerated cis-regulatory elements [19].In addition, given that TFAP2 isoforms are very similar in their DNA binding domains, a specific sequence preference between different TFAP2 proteins has not been found, as demonstrated by an in vitro binding site selection with recombinant TFAP2A and TFAP2C proteins [20].
We previously performed whole-genome microarray analysis for HeLa cells either silenced for TFAP2A by RNA interference or not and identified a set of differen-tially expressed genes [39].The regulatory regions (-900/ +100, considering the TSS as +1) of the genes that unambiguously mapped to known ENSEMBL IDs were analyzed for the presence of TFAP2A binding sites, employing the canonical Positional Weight Matrix (PWM) reported in Jaspar http://jaspar.genereg.net/.264 genes containing at least 2 high score TFAP2A binding sites were identified, several of which could be validated by Chromatin ImmunoPrecipitation (ChIP) assays.Additionally, a detailed analysis of the TFAP2A-driven regulation of the Discoidin, CUB and LCCL domain containing 2/Endothelial and Smooth muscle Derived Neuropilin like/CUB, LCCL-homology, coagulation factor V/VIII homology domains (DCBLD2/ESDN/CLCP1) gene was performed.Finally we searched for TFs that might cooperate with TFAP2A in the transcriptional regulation of genes containing at least 2 high score TFAP2A binding sites and found SP1 as a potential candidate for TFAP2A activated genes.

Identification of TFAP2A binding sites in newly identified TFAP2A-modulated genes
In order to define TFAP2A binding sites in newly identified TFAP2A-modulated genes [39] we first assembled a dataset of core promoter regions (-900/+100, considering the TSS as +1) for all known human protein-coding genes (21316) using the ENSEMBL database and searched for TFAP2A binding sites employing the canonical TFAP2A Positional Weight Matrix (PWM) MA0003 (Figure 1) reported in the Jaspar database.Affinity scores were assigned using standard log-likelihood ratios [40] and a binding site defined as an oligonucleotide with log-likelihood ratio higher than 66% of the maximum score possibly associated to the PWM.After ranking the binding sites by score, we used various thresholds (top-scoring 10%, 20% and 30% sites) to classify the genes containing at least one or two high score TFAP2A binding sites (Table 1).In the following we will mostly consider the top-scoring 20% binding sites.We then focused on the set of the differentially expressed genes identified by microarray analysis [39], in which gene expression profiles of HeLa cells, either silenced for TFAP2A by RNA interference or not were compared considering a Fold Change (FC) > ± 1.5 and a p value (p v ) < 0.01.For each of them the longest available transcript was chosen (see Additional file 1).Significant enrichment for TFAP2A binding sites was found in the regulatory regions of these genes compared with genome-wide abundance as calculated using an exact Fisher test, as shown in Table 1.In the whole genome the genes containing at least one or two high score TFAP2A binding sites were respectively 12686 and 8636 whereas, among TFAP2A-regulated genes, 363 out of 494 genes (ENSEMBL) contained at least one high score TFAP2A binding site while 264 out of 494 (157 down-and 107 up-regulated) genes (ENSEMBL) contained a minimum of two sites indicating an enrichment for TFAP2A binding sites in the TFAP2A-regulated genes (p v = 1.5E-05), see Table 1.The results for different thresholds (top 10% and 30%) were similarly significant and shown in Table 1.We ranked the genes according to the number of TFAP2A binding sites present in their core promoter regions (Table 2) and found that the majority of the genes contained one or two TFAP2A binding site/s.It's important to underline that already reported TFAP2A target genes were identified in our analysis (see Additional file 1).

Functional classes enrichment for predicted TFAP2A target genes
To identify the functional pathways in which the potential TFAP2A targets could be involved Gene Ontology (GO) and network analyses were performed for the 264 TFAP2A-modulated genes containing at least two TFAP2A high score binding sites using the Ingenuity Pathway Analysis Systems.Two high score molecular networks were identified and in Figure 2A and 2B we show a selection of these genes and their connections with TFAP2A.The first network associated with Cellular Movement (Figure 2A, score 38) and included 26 genes, i.e.SLIT2 (slit homolog 2 -Drosophila); PDGFA (plateletderived growth factor alpha polypeptide); RAC1 and RAC2 (ras-related C3 botulinum toxin substrate 1 and 2, rho family, small GTP binding protein Rac1 and Rac2); DCBLD2/ESDN (discoidin, CUB and LCCL domain containing 2/Endothelial and Smooth muscle cell Derived Neuropin-like molecule); ACTA2 (actin, alpha 2, smooth muscle, aorta).The second network associated with Cellular Development (Figure 2B, score 38) and included 24 genes, i.e.PPARG (peroxisome proliferator-activated receptor gamma); MAPK1 (mitogen-activated protein kinase 1); CXCL1 (chemokine, C-X-C motif, ligand 1melanoma growth stimulating activity, alpha); ADAMTS1 (metallopeptidase with thrombospondin type 1 motif, 1); AREG (amphiregulin); IL11 (interleukin 11).
ChIP analysis was performed on HeLa cells, that endogenously express TFAP2A and as shown in Figure 3 and 6, enrichment for TFAP2A was found on the promoter of each gene compared with negative controls, suggesting in vivo binding and direct regulation of these genes by TFAP2A.Negative controls for ChIP analysis were performed using genes in which low score or no TFAP2A binding sites were identified such as PLCXD2 (pleckstrin homology-like domain family B member 2) or IFI44 (interferon-induced protein 44).In fact, no enrichment for TFAP2A was observed in the promoter of these two genes compared with the negative IgG controls suggesting that genes containing only low score or no TFAP2A binding sites are not direct TFAP2A targets and their TFAP2A-dependent modulation is indirect.ChIP analysis for PPARG and PLCXD2 genes was also performed in HepG2 cells that do not express TFAP2A and no enrichment for TFAP2A was observed for any of the analyzed sequences supporting the significance of the results obtained in HeLa cells (Figure 3 and 6).4B and 4C.Both cell lines were transiently co-transfected with either ESDNwt or its 5' deletant pGL3-ESDN-DEL3 (del3) starting at -950 or pGL3-Basic (basic) control reporter vector and an expression plasmid for TFAP2A, pSP(RSV)TFAP2A (TFAP2A) or its control empty vector (EV) (Figure 4B and 4C).Alternatively HeLa cells (Figure 4B) were transfected with an expression vector for TFAP2A silencing, pSUPER-TFAP2AshRNA2 (shTFAP2A), or with the empty pSUPER control vector (shEV).In addition, cells were transfected with the pRLTK vector for Renilla luciferase expression, to perform transfection efficiency normalization.TFAP2A basal levels, overexpression or silencing were verified by Western Blot (WB) analyses (Figure 4B and 4C) where Glyceraldheyde-3-phosphate dehydrogenase (GAPDH) was used as loading control.3 fold higher activity was observed for the ESDNwt reporter vector in MDA-MB-231 cells compared with HeLa cells (compare Figure 4B with 4C).The inhibitory function of TFAP2A on DCBLD2/ESDN gene transcription was further supported when HeLa and MDA-MB-231 cells were cotransfected with ESDNwt and TFAP2A (250 ng, otherwise specified) with respectively 2 and 3.5 fold reduction in luciferase activity (Figure 4B and 4C).This reduction was inversely proportional to the TFAP2A levels in cells (12.5 or 125 or 250 ng), as shown in Figure 4C for MDA-MB-231 cells.Instead, TFAP2A silencing in HeLa cells caused a 1.6 fold increase in reporter activity (Figure 4B).All together these results strongly suggest a direct repressive activity of TFAP2A on DCBLD2/ESDN promoter and are in agreement with our previous microarray results [39].

Specific role of the TFAP2A binding sites present in DCBLD2/ESDN promoter
A detailed functional analysis was performed for the main TFAP2A binding site present in DCBLD2/ESDN promoter by carrying out site-directed mutagenesis to    Table 3: TFAP2A binding site sequences present in the regulatory regions of some TFAP2A-modulated genes.
gesting that each binding site plays a role in repressing the promoter activity.

Identification of Transcription Factor Binding Sites (TFBSs) present in the promoter regions of TFAP2A-regulated genes by over-represented DNA oligonucleotides or oPOSSUM or MEME analysis
The properties of the core promoter region sequences of the 264 best 20% TFAP2A-regulated genes were studied by using three different approaches based on different biological assumptions and statistical filters such as: 1) short over-represented oligonucleotides; 2) oPOSSUM; 3) MEME.

Over-represented oligonucleotides
We performed a genome-wide characterization of the previously described core promoters of human-proteincoding genes, working on the statistical properties of short (5 to 9 nt) DNA oligonucleotides (oligos) present in these sequences.In particular, we identified sets of genes sharing over-represented oligos in their promoter regions according to a binomial model (see Methods for details).
We then characterized the evolutionary properties of these oligos using a "conserved over-representation" approach, an alignment-free methodology applied to human-mouse comparison [41].The resulting different sets of genes were then compared with the up-and down-TFAP2A-regulated gene datasets, described above and the enrichment for oligos in TFAP2A-regulated genes was assessed in the different sets of genes using an exact Fisher test (p v < 0.05) as shown in Table 4. Results were ranked according to the corresponding p v and the top 10 over-represented oligos are shown in Table 4.For some of these oligos the over-representation is conserved (indicated in Table 4 with an asterisk, *) suggesting an evolutionarily conserved role for them.When possible, a known TFBS consensus was then associated to each oligo, using the list of TRANSFAC TFBS consensus sequences reported in [42].Interestingly, as shown in Table 4, some of the most over-represented oligos found for the TFAP2A-down-regulated genes correspond to the SP1 consensus sequence (Fisher test: 7.76E-07; 38 target genes).For the up-regulated genes we found many overrepresented oligos but we were not able to link them to any known TF.

oPOSSUM
oPOSSUM [43] is able to evaluate the over-representation of known TFBS on human/mouse conserved regions.We used this tool to analyze the sets of TFAP2Aregulated genes setting the parameters indicated in Methods.Results are reported in

MEME
MEME is a software for the ab-initio identification of relevant motifs in a given set of sequences in which a motif is a sequence pattern that occurs repeatedly in a group of related DNA sequences [44].The parameters used in our analysis are indicated in Methods and the results obtained are reported in Table 6: MEME regular expres-  sion motifs and the relative E-values for the most interesting motifs are indicated.MEME results are not directly associated to known TFBSs and to investigate whether the resulting motifs could be recognized as known TFBSs the same approach used previously for the oligo analysis was used here [42].For this identification, perfect match between MEME regular expression and the IUPAC equivalent was required, as shown in Table 6, and association between motifs and known TFs was found in some cases.Enrichment for SP1 was also found with this method, in particular for down-regulated genes (E-value: 3.1E-085; 24 target genes).

Positioning of the SP1 binding sites in the promoter regions of TFAP2A-down-regulated genes and functional validation
After having observed an enrichment for SP1 binding sites in the 157 TFAP2A-down-regulated genes with the three methods mentioned above, we searched for SP1 sites in the promoter regions of these genes using the same approach employed to recognize TFAP2A binding sites: Jaspar, SP1 Positional Weight Matrix MA0079, cut off on the best 20% score.57 genes containing at least one SP1 binding sites were identified and are listed in Table 7.
In the same table it is also indicated if these SP1 motifs were identified with the over-represented oligo or oPOS-SUM or MEME approach or not.SP1 binding sites were found in: 2 common genes identified with the oligo analysis and MEME; 12 genes common to oligo analysis and oPOSSUM; 5 genes common to MEME and oPOSSUM; the only common gene to the triple intersection was OLFML2A.It is important to underline that SP1 sites did not overlap with TFAP2A binding motifs (data not shown).Potential SP1 binding was tested for 4 candidate target genes containing one or two best 20% SP1 and TFAP2A binding sites by Chromatin Immuno Precipitation (ChIP) assay, as indicated by Jaspar PWMs.See Figure 6.The 4 candidate genes were CASP9 (caspase 9); KRT16 (keratin 16); KRT17 (keratin 17) and TGFBI (Transforming Growth Factor B-Induced).ChIP analysis was performed on HeLa cells, using specific anti-SP1 and TFAP2A antibody.As shown in Figure 6, enrichment for SP1 together with TFAP2A was found on the promoter of each gene compared with negative controls, suggesting a functional role for both cis-elements.Negative controls for ChIP analysis were performed using genes where no high score binding sites for either SP1 or TFAP2A were identified

Discussion
The results presented in this work show how powerful an in silico approach can be for the identification of functional Transcription Factor Binding Sites (TFBSs), in particular when computational investigations are associated with microarray analysis.In fact, while microarray analysis is, by definition, not able to discriminate between direct or indirect Transcription Factor (TF) gene expression modulation [45], the positioning of TFBSs on differentially expressed genes allows the identification of genes directly regulated by the TF of interest.By analyzing gene expression profiles on HeLa cells either silenced for TFAP2A by RNA interference or not [39] we were previously able to identify a subset of new TFAP2A regulated genes on which it was possible to position high score TFAP2A binding sites and find an enrichment of sites compared with genome-wide [5,6].The strength of our computational approach was consolidated by experimental validations revealing TFAP2A binding to the high score TFAP2A sites identified but not to portions of DNA without TFAP2A binding sequences or containing only low score sites.
A network analysis performed with Ingenuity Pathway Analysis Systems for the TFAP2A-modulated genes containing at least two high score TFAP2A binding sites revealed "Cellular Movement" and "Cellular Development" as the main networks confirming results previously obtained in our and other laboratories.In fact, many reports demonstrated that TFAP2A plays a major role in development [12] and we recently showed a function for TFAP2A in cell migration and/or invasion for tumor cells [39] and neurons [46].Genes involved in both biological processes were previously identified however several new ones have been discovered here.Among the already known genes, PPARG, MAPK1, and VEGF [47] are present in the networks confirming the validity of our analysis.
The "Cellular Movement" network includes genes with specific biological functions and some examples are listed here.SLIT2 (slit homolog 2 -Drosophila) is involved in induction of negative chemotaxis in neuronal cells, glial cell migration, motor axon guidance and nervous system development [48].PDGFs (platelet-derived growth factors) are known to regulate cell proliferation as well as migration for mesenchymal or endothelial cells [49].For each TF its consensus binding site sequence, abbreviated name, number of target genes and statistical enrichment values (E-value) are reported.
Interestingly, in vascular cells PDGFBB is known to upregulate DCBLD2/ESDN (discoidin, CUB and LCCL domain containing 2/Endothelial and Smooth muscle cell Derived Neuropin-like molecule), the most TFAP2Amodulated gene in our microarray analysis.PDGFBB is also known to be related to another TFAP2A-modulated gene, ACTA2 (actin, alpha 2, smooth muscle, aorta), which codes for a protein belonging to the actin family that plays a role in cell motility, structure and integrity and regulates blood pressure via vascular and smooth muscle contraction [50].RAC1 and RAC2 (ras-related C3 botulinum toxin substrate 1 and 2) are small GTPases belonging to the RAS superfamily and regulate a variety of cellular events, including growth control, cytoskeletal reorganization and protein kinase activation [51].PDGF and RAC1 are connected with each other since it was demonstrated that all RAC1-related GTPases expressed in mouse primary fibroblasts, Cdc42, Rac1, and RhoG, are required for efficient migration following PDGF stimulation [52].Some of the genes included in Cellular Development network are: PPARs (peroxisome proliferator-activated receptors) form heterodimers with retinoid X receptors (RXRs) and regulate the transcription of various genes.Three subtypes of PPARs are known: PPARA, PPARD and PPARG.The last one regulates adipocyte differentiation and is involved in the pathology of numerous diseases including obesity, diabetes, atherosclerosis and cancer [53].MAPK1 (mitogen-activated protein kinase 1) is a member of the MAP kinase family and it is involved in cellular proliferation, differentiation, transcription regulation and development [54].CXCL1 (chemokine, C-X-C Table 7: Summary of TFAP2A-down-modulated genes containing SP1 binding sites in their promoters (Continued) motif, ligand 1 -melanoma growth stimulating activity, alpha) belongs to the Chemokine family, a group of small, structurally related molecules that regulate cell trafficking of various types of leukocytes via the interaction with a subset of 7-transmembrane, G protein-coupled receptors [55].In addition, CXCL1 is known to play a major role in inflammation, angiogenesis, tumorigenesis and wound healing [56].ADAMTS1 (a disintegrin-like and metalloprotease with thrombospondin type 1 motif, 1) gene encodes for a member of the ADAMTS protein family that has anti-angiogenic activity and the expression of this gene may be associated with various inflammatory processes as well as development of cancer and cachexia [57].The protein encoded by the AREG (amphiregulin) gene is a member of the epidermal growth factor (EGF) family involved in cell growth stimulation of astrocytes, Schwann cells, fibroblasts and epithelial cells by interacting with EGF receptor [58].IL11 (interleukin 11) encodes for a cytokine which stimulates the T-cell-dependent development of immunoglobulin-producing B cells and potentiates proliferation of hematopoietic stem cells and megakaryocyte progenitors [59].
Since it is well known that TFAP2A cooperates with other transcription factors (TFs) to regulate transcription, three different methods, over-represented oligonucleotides, oPOSSUM and MEME, were used to identify TF which could possibly work in cooperation with TFAP2A to regulate the 264 genes containing the best score TFAP2A binding sites.Remarkably a common enrichment for SP1 binding sites was found in genes containing at least one or two high score TFAP2A binding sites and transcriptionally activated by TFAP2A but not in the repressed ones.SP1 is known to cooperate with TFAP2A in transcription [60,61], however here we underline the importance of SP1 specifically in TFAP2A gene activation, but not in transcriptional repression and localize SP1 binding sites to DNA nucleotide sequences distinct from TFAP2A binding sites, although, from our experiments, we cannot exclude the possible physical interaction between TFAP2A and SP1.
Among the ChIP validated genes, the gene encoding for Discoidin, CUB and LCCL domain containing 2/ Endothelial and Smooth muscle Derived Neuropilin like (DCBLD2/ESDN) resulted the most modulated in our microarray analysis with a (FC + 5.7).This is one of the reasons why we decided to investigate its TFAP2Adependent transcription in detail together with its interesting functions.Its protein structure resembles that of neuropilins, transmembrane proteins which are promiscuous for ligands and co-receptors.DCBLD2/ESDN is ubiquitously expressed but linked to metastasis formation since it has been cloned and found to be significantly up-regulated from highly metastatic lung cancer cells [62].Various functional studies link DCBLD2/ESDN to A search for SP1 Transcription Factor Binding Sites (TFBS) was performed on the promoter regions (-900/+100) of 157 TFAP2A-downmodulated genes [39], mapped in ENSEMBL, containing at least one or two best 20% TFAP2A binding sites (see Table 1) using the SP1 Positional Weight Matrix (PWM) provided by Jaspar.57 genes containing SP1 TFBS were identified.The outcome obtained with the Oligo, oPOSSUM and MEME analyses are also indicated (see Table 4,5,6).

Table 7: Summary of TFAP2A-down-modulated genes containing SP1 binding sites in their promoters (Continued)
tumor progression but a specific role in tumor promotion or repression has not been defined yet [63,64].DCBLD2/ ESDN expression was analyzed in our (unpublished data) and other laboratories [63,65] in melanoma and breast cell lines and found to be expressed only in highly metastatic cells but not in their related poorly malignant variants suggesting a positive role for DCBLD2/ESDN in tumor progression.DCBLD2/ESDN was also shown to be part of an invasive breast cancer gene signature [66].However in HeLa cells, used for our microarray analysis, we demonstrated that TFAP2A regulates tumor cell motility and invasion, at least partially, via DCBLD2/ ESDN in a negative manner [39].Here TFAP2A downmodulation prompted DCBLD2/ESDN up-regulation, suggesting a possible direct repressive effect of DCBLD2/ ESDN transcription by TFAP2A.In our present investigation, overexpression of TFAP2A in cells expressing low or high levels of TFAP2A, respectively MDA-MD-231 and HeLa cells, led to decreased DCBLD2/ESDN promoter activity although in MDA-MB-231 cells DCBLD2/ESDN promoter activity was higher in comparison with HeLa cells.Accordingly TFAP2A silencing induced higher DCBLD2/ESDN promoter activity.Importantly, the nega-tive effect of TFAP2A on DCBLD2/ESDN promoter was dose-dependent since when MDA-MB-231 cells were transfected with increasing levels of the TFAP2A-expression vector, a proportional down-regulation of transcription was observed.All together, these findings strongly suggest a direct repressive activity of TFAP2A on DCBLD2/ESDN promoter and are in agreement with our microarray results [39].For these reasons we made the hypothesis that if TFAP2A represses DCBLD2/ESDN transcription, inverse expression profiles should exist for the two genes and therefore we used an on-line expression atlas for RNA expressions in tumors http:// biogps.gnf.org.In some case high DCBLD2/ESDN expression coincided with very low levels of TFAP2A in tumor cells, while in other cases DCBLD2/ESDN high or low expression co-existed with high TFAP2A expression.
In addition, for many tumors both genes showed comparable TFAP2A low or medium RNA levels.Since it is known that TFAP2 activity can be modulated by a wide range of interacting proteins [12], it is conceivable that differential expression and functional roles of TFAP2A co-factors may account for distinct effects on DCBLD2/ ESDN gene transcription.Moreover, the presence of other TFAP2 isoforms and the relative ratios with one another may be crucial here.On the other hand, in many cases, both TFAP2A and DCBLD2/ESDN genes might not be expressed.Finally, it is important to keep in mind that RNA levels do not always correspond to actual protein levels or activity.For instance, TFAP2 proteins are known to be modified post-translationally by phosphorylation, sumoylation and redox status, which may affect their activity and cellular localization [12].Three high and several low score TFAP2A binding sites were identified in the promoter region of the DCBLD2/ ESDN gene by computational analysis however we only investigated the functional role and contribution of the main TFAP2A binding sequences.By doing so we observed that each TFAP2A site was essential for repression of DCBLD2/ESDN transcription, in fact the inactivation of one or two or three site/s equally affected promoter activity in luciferase assays.These experimental validations, confirm once again, that our computational analyses represent a powerful tool for the identification of TF regulatory targets by predicting precisely their cis-elements [19].To better understand the repressive effect of TFAP2A on DCBLD2/ESDN transcription, the interaction of TFAP2A with other co-factors will be studied in the future.A better comprehension of the TFAP2A-driven regulation of the DCBLD2/ESDN gene should provide novel and useful insights on mechanisms of tumor progression and metastasis formation.

Conclusions
Our study was essential for: 1) identifying functional TFAP2A binding sites in novel TFAP2A-regulated genes; 2) defining "Cellular Movement" and "Cellular Development" as the main networks in which the TFAP2A target genes are involved; 3) associate SP1 to TFAP2A gene transcription activation but not repression; 4) dissecting the TFAP2A-driven regulation of DCBLD2/ESDN, an important player of tumor progression.

Definition of promoter sequences and TFAP2A binding site identification genome-wide
Whole-genome human protein-coding gene sequences and annotations were downloaded from the ENSEMBL database, version 46, [67].Only the longest transcript was considered for each gene and the promoter region defined as 900 bps upstream and 100 bps downstream of the Transcriptional Start Site (TSS), +900/-100 [68].Each promoter sequence was analyzed using the canonical TFAP2A Position Weight Matrix (PWM) MA0003 reported in Jaspar database http://jaspar.genereg.net/which consists of a 9 nucleotide GC-rich sequence.Affinity scores were assigned to each TFAP2A binding site using a standard log-likelihood ratio (LLR) scoring func-tion with intergenic background frequencies.All sites with score exceeding 66% of the maximum possible score for the PWM were initially selected, then ranked by score.We considered the top ranking sites (the thresholds used were 10%, 20% and 30%) to identify genes carrying at least two high score sites in their regulatory region.The software described in [69] was used to rank the sites.
Identification of TFAP2A potential co-factors in the (-900/ +100) promoter regions using three different approaches 1) Over-represented and Conserved Oligonucleotides (Table 4) We first classified human and mouse genes in two categories (CG-rich and CG-poor) by analyzing the CG content of their promoters using the median CG content of the whole dataset as threshold.The two categories of genes were then independently searched for over-represented 5 to 9 bps-long oligonucleotides (oligos) where the overrepresentation was assessed using a binomial model [70] and the overall frequency f(w) of each oligo w was computed as: where N(w) is the number of times that w occurs in the entire collection of sequences and .
Instead n g (w) is the number of occurrences of w in the promoter region of each gene g.The statistical significance of over-representation was determined using the binomial P-value: where is the total number of oligos of the same length as w that can be found in the promoter region of g.Self-overlapping matches of the same oligo were discarded and motifs were counted on both DNA strands.For each oligo w we defined the set S(w) of the genes whose promoter shows overrepresentation of w (P g (w) < 0.01) An oligo (w) was defined "conserved over-represented" if the sets of genes S human (w) and S mouse (w) contained a significantly larger number of orthologous genes than expected by chance.Pairs of human-mouse orthologous

∑
genes were obtained from ENSEMBL.In order to obtain one-to-one orthology relationships, only orthologous genes defined as Unique Blast Reciprocal Hit were considered.The significance of the overlap between S human (w) and S mouse (w) was determined with the exact Fisher test, and multiple testing taken into account by computing the False Discovery Rate (FDR) with the method of Benjamini and Yekutiely [71].For further analysis we retained the oligos with FDR < 0.1.Additional details on this procedure can be found in [41,72].In order to identify possible TFAP2A co-factors a direct comparison of the over-represented oligo sequences with the known consensus sequences for vertebrate Transciption Factors (TFs) [42] was performed and the association between motifs and TFs was accepted only if the overrepresented oligo fully overlapped (according to the IUPAC alphabet).

2) oPOSSUM (Table 5)
The oPOSSUM program [43] was used to identify Transcription Factor Binding Sites (TFBS) recognized by potential TFAP2A co-factors considering 60% sequence conservation between human and mouse as a minimum requirement.With this approach the regulatory region explored coincided with the smallest cut off we could choose (-2000/0) even if it included additional upstream bps compared with the over-represented oligo (see above) and MEME analyses (see below).The other parameters were left unchanged.A Fisher's exact test with a p v < 0.05 was performed here to identify the highly enriched TFBS [43].

3) MEME (Table 6)
The MEME program [44] was used to identify TFAP2A potential co-factors considering 20 bps as the maximum length of any motif (to fit the standard size of a typical TFBS) and searching for motifs in both DNA strands.To assess whether the motifs obtained by MEME may be associated to any known TFBSs, each motif was associated to a putative TF based on [42].

Localization of SP1 TFBS in the (-900/+100) promoter regions
Using the three approaches mentioned above it was possible to identify an enrichment for SP1 TFBS in the promoter regions of 157 TFAP2A-down-modulated genes mapped in ENSEMBL containing at least two high-score TFAP2A binding sites (see Table 1).In order to position the SP1 TFBS an additional analysis using the JASPAR PWM for SP1 (MA0079) was performed on the (-900/ +100) promoter regions of this group of down-regulated genes, as described for TFAP2 (see above).

Cell lines, Antibodies and DNA constructs
The following human cell lines were used; their origins and general properties, as illustrated by American Type Culture Collection (ATCC, Manassas, VA, USA), are as follows: HeLa: cervix adenocarcinoma (AC), HPV-18 positive; MDA-MB-231: breast AC, pleural effusion; HepG2 (Human Caucasian hepatocyte carcinoma).Each cell line was grown as suggested by ATCC.Primary antibodies used were anti-: TFAP2A mAb 3B5 or TFAP2A pAb C-18 or GAPDH pAb V-18 (Santa Cruz Biotechnology, Santa Cruz, CA) or SP1 pAb (Active Motif, Carlsbad, CA) or acetylated-H3 histone pAb (Upstate Biotechnology, Lake Placid, NY, U.S.A.).Secondary antibodies used were: goat anti-mouse or anti-rabbit IgG HRP-conjugated, donkey anti-goat IgG HRP-conjugated (Santa Cruz Biotechnology, Santa Cruz, CA); pSP(RSV)TFAP2A and pSP(RSV)-empty expression vectors, a gift from Dr. H. Hurst [73,74] were respectively used to overexpress human TFAP2A in cells and as empty vector control.TFAP2AshRNA2 [39] and pSUPER.retro.purovector (OligoEngine, Seattle, WA, USA) were respectively used to down-modulate human TFAP2A in cells and as empty vector control.

Molecular cloning of the human Endothelial and Smooth muscle Derived Neuropilin like (DCBLD2/ESDN/CLCP1) promoter
The upstream regulatory region of the DCBLD2/ESDN/ CLCP1 (discoidin, CUB and LCCL domain containing 2/ Endothelial and Smooth muscle cell Derived Neuropinlike molecule/CUB, LCCL-homology, coagulation factor V/VIII homology domains protein) gene was identified using the National Center for Biotechnology Information (NCBI) gene bank database (accession number NM_080927).

Chromatin ImmunoPrecipitation (ChIP) assays
ChIP was performed using the Magna ChIP™ G kit (Millipore, Billerica, MA) reagents and protocols.Briefly, chromatin was prepared from HeLa or HepG2 cells at ~80-90%.Cells were crosslinked with 1% formaldehyde (Sigma Aldrich, St Louis, MO) for 10' at 37°C.Chromatin shearing was obtained by digesting the DNA with an enzymatic shearing cocktail (200 U/ml) for 10' at 37°C.Chromatin was pre-cleared with Protein G beads, to reduce non-specific background.Pre-cleared chromatin was immunoprecipitated overnight using 2 μg of specific antibodies.In addition, chromatin aliquots were precipitated with either non-specific IgGs or with anti-RNA polymerase II or anti-acetyl-histone H3 antibody, used respectively as negative and positive controls.Immunoprecipitated chromatin was collected by adding protein G beads to the tubes and the beads were washed with ChIP-IT™ Washing Buffers supplemented with protease inhibitors.Immunoprecipitated DNA was collected and after reversing the cross-links, DNA was purified by using the QIAquick ® PCR Purification Kit mini spin-columns (QIA-GEN, Stanford CA), according to manufacturer's instructions.The eluted immunoprecipitated DNA was analyzed by PCR, together with a non-immunoprecipitated chromatin sample (input).Polymerase chain reaction (PCR) was performed using Taq DNA Polymerase (Invitrogen Life Technologies, Carlsband, CA) using 1 × PCR Buffer without MgCl 2 , 0.2 mM dNTPs, 1.5 mM MgCl 2 , 0.5 μM forward and reverse primers, 0.625 U Taq DNA Polymerase and 10 μL of precipitated DNA.The annealing temperature and the number of cycles were specific for each primer pair.The different primer pairs were designed using the NCBI Primer designing tool and primer sequences as well as PCR experimental conditions are added in Additional file 2. To verify the quality of our ChIP reactions, primers to the GAPDH promoter were used as positive controls.

Transient transfections and luciferase assays
Twenty-four hours before transfection, HeLa or MDA-MB-231 cells were seeded in 24-well plates at 8 × 10 4 cells per well.Cells were transfected using Lipofectamine 2000 (Invitrogen Life Technologies, Carlsband, CA) and 700 ng of either pGL3-Basic (control) or the various pGL3-DCBLD2/ESDN promoter fragments generated and mentioned above in presence of 20 ng of pRLTK (Promega, Madison, WI) to normalize for transfection efficiency following the manufacturer's instructions.In co-transfection experiments, 250 ng of pSP(RSV)TFAP2A or pSP(RSV)NN or TFAP2AshRNA2 or pSUPER.retro.purovectors were used.Forty-eight hours after transfection cell extracts were prepared by adding 100 μl of 1 × Passive Reporter Lysis Buffer (Promega, Madison, WI).The luciferase activities were measured using the Dual Luciferase Assay System (Promega, Madison, WI) according to the manufacturer's instructions.Each transfection was performed in triplicate and repeated three times.For the statistical analysis a student's t test was performed: * p v < 0.05; ** p v < 0.01; *** p v < 0.001.

Ingenuity Pathway Analysis Systems
The Ingenuity Pathways Knowledge Base http:// www.ingenuity.com is currently the world's largest database of knowledge on biological networks, with annotations organized by experts.We exploited this database to define the presence of functional associations within the genes detected by microarray analysis, to identify enriched ontological gene classes and to draw simplified network connections among genes.Each network was ranked based on "scores" which consider how relevant the networks are to the genes in our input dataset.Each score is based on a p-value calculation, which takes in account the probability that the genes present in a network are found in it just by chance.Mathematically, the score is simply the negative exponent of the right-tailed Fisher's exact test result.

Additional material
Additional file 1 Microarray analysis (Whole Human Genome Agilent 44 K) was performed on HeLa cells transiently transfected with either generic non-silencing (NS) or specific TFAP2A (oligo TFAP2A) siRNA oligos.The accession numbers of 494 modulated gene [39] were unambiguously converted to ENSEMBL IDs (version 46) and used for our analysis as listed here.

Table 2 :
of genome-wide or TFAP2A-modulated genes based on absolute number of genes (or percentages, %) containing one or more high score (best 20%) TFAP2A binding sites in their core promoters.Distribution of genes containing TFAP2A binding sites.(Continued) obtain 7 bp deletions in the central portion of each TFAP2A binding site in single or multiple combinations to generate the constructs reported in Figure 5. Mutations in the TFAP2A binding sites 1 or 2 or 3 (ESDNMUT1 or ESDNMUT2 or ESDNMUT3) pro-duced statistically significant increased promoter activity just like mutations for multiple TFAP2A binding sites 1,2 or 1,3, or 2,3 or 1,2,3 (ESDNMUT1; ESDNMUT1,2; ESDNMUT1,3; ESDNMUT2,3; ESDNMUT1,2,3) as indicated by the student's t tests: * p v < 0.05; ** p v < 0.01 sug-

Figure 1
Figure 1 Sequence logo and canonical Positional Weight Matrix (PWM) of TFAP2A.Logo (A) and frequencies (B) defining the TFAP2A PWM MA0003 are shown as provided by Jaspar database http://jaspar.genereg.net/.

Figure 2
Figure 2 Ingenuity Pathway Analysis Systems for the genes containing at least 2 high score TFAP2A binding sites.Specific functional networks for the 264 genes containing at least 2 high score TFAP2A binding sites were obtained using Ingenuity Pathway Analysis Systems.A simplified representation of the two main molecular networks identified, "Cellular Movement" (A) and "Cellular Development" (B) is shown.Legend: Up-or down-regulated genes are shown in red or green respectively; continuous grey lines indicate direct interactions experimentally proven in literature; dashed grey lines represent potential indirect connections; continuous black lines represent direct connections demonstrated by our microarray and ChIP analyses; dashed black lines represent potential direct connections demonstrated by our microarray results [39].Octagons indicate genes absent in the 264 gene dataset but related to it as indicated from the literature.Red lines indicate direct connections demonstrated by our microarray, ChIP and promoter analyses.

Figure 3
Figure 3 In vivo Chomatin ImmunoPrecipitation (ChIP) analysis for putative TFAP2A binding sites on the core promoters of potential TFAP2A target genes.Pre-cleared chromatin from HeLa or HepG2 cells was immunoprecipitated with either non-specific IgG (IgG, negative control) or anti-RNA polymerase II or anti-acetylhistone H3 (positive controls: Positive Ctrl) or anti-TFAP2A (TFAP2A) antibodies.Immunoprecipitated DNA or non-immunoprecipitated chromatin samples (input) were amplified by PCR using primer pairs designed with the NCBI Primer Designing Tool (see Methods) across the TFAP2A putative binding sites.CD59: CD59 molecule, complement regulatory protein; CXCL1: chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity alpha); DCLBD2/ESDN: endothelial and smooth muscle cell derived neuropilin like molecule; EREG: epiregulin; FASTK: fast kinase; GLO1: glyoxalase I; IFI44: interferon-induced protein 44; PLCXD2: pleckstrin homology-like domain family B member 2; PPARG: peroxisome proliferator activated receptor gamma; SLIT2: slit homolog 2 (Drosophila).C-, PCR negative control; FC, fold changes obtained from the microarray analysis [39].Number (n°) of TFAP2A binding sites identified computationally.Three independent experiments were performed and a representative one is shown here.

Figure 4
Figure 4 Regulation of the DCBLD2/ESDN promoter by TFAP2A.(A) The genomic organization of the human DCBLD2/ESDN promoter (-2185/+89) is shown and the three highest score TFAP2A binding sites identified are framed and named 1 or 2 or 3.The Transcription Start Site (TSS) is considered +1.(B)(C) HeLa (B) and MDA-MB-231 (C) cells were transiently co-transfected with either pGL3-Basic (basic) or pGL3-ESDN-WT (ESDNWT) or pGL3-ESDN-del3 (del3) vectors together with pSP(RSV)TFAP2A (TFAP2A) or TFAP2AshRNA2 (shTFAP2A) or their respective control empty vectors (EV or shEV) to obtain TFAP2A over-expression or down-modulation.pRLTK (Renilla luciferase) vector was transfected along to evaluate transfection efficiency and perform normalizations.Forty-eight hours later Firefly Luciferase activity was measured and normalized against Renilla Luciferase.Fold Changes were calculated referring to the pGL3-basic control vector and expressed as Relative Luciferase Units (RLUs).250 ng of TFAP2A or shTFAP2A or EV or shEV were transfected unless specified differently.Three independent experiments were performed in triplicate and a representative one is shown here.The error bars indicate the Standard Errors (SE) of the triplicates.A student's t test was performed to evaluate if the experiments were statistically significant.* p v < 0.05; ** p v < 0.01; *** p v < 0.001.TFAP2A levels were measured by Western Blot (WB) analysis.Glyceraldheyde-3-phosphate-dehydrogenase (GAPDH) expression was used for protein loading controls.

Figure 6
Figure 6In vivo Chomatin ImmunoPrecipitation (ChIP) analysis for putative SP1 binding sites on the core promoters of TFAP2A target genes.Pre-cleared chromatin from HeLa cells was immunoprecipitated with either non-specific IgGs (IgG, negative control) or anti-RNA polymerase II or anti-acetylhistone H3 (positive controls: Positive Ctrl) or anti-SP1 (SP1) or anti-TFAP2A (TFAP2A) antibodies (Ab).Immunoprecipitated DNA or nonimmunoprecipitated chromatin samples (Input) were amplified by PCR using primer pairs designed with the NCBI Primer Designing Tool (see Methods) across the TFAP2A or the SP1 putative binding sites.ADAMTS1: ADAM metallopeptidase with thrombospondin type 1 motif, 1; CASP9: caspase 9; KRT16: keratin 16; KRT17: keratin 17; PLCXD2: pleckstrin homology-like domain family B member 2; TGFBI: Transforming Growth Factor Beta-Induced.The number (n°) of computationally predicted high score TFAP2A or SP1 binding sites are indicated.C-, PCR negative control; FC, fold changes obtained from the microarray analysis[39].Three independent experiments were performed and a representative one is shown here.

Table 3 : TFAP2A binding site sequences present in the regulatory regions of some TFAP2A-modulated genes.
05; Z-score: 14.76; 44 target genes) on down-regulated genes.An enrichment for Pax5 (Fisher score: 3.31 E-03; Z-score: 9.98; 12 target genes) and Cebpa (Fisher score: 6.67 E-03; Z-score: 8.20; 41 target genes) was observed for the up-regulated genes even if the statistical relevance was weaker then what observed for the down-regulated genes.

Table 4 : Identification of transcription factor binding sites (TFBSs) present in TFAP2A-modulated genes by over- represented DNA oligonucleotides.
For each TF its consensus binding site sequence, abbreviated name, number of target genes and statistical enrichment values (Fisher-score) are reported.* conserved over-represented oligo

Table 5 : Identification of transcription factor binding sites (TFBSs) present in TFAP2A-modulated genes by opossum.
For each TF its Positional Weight Matrix (PWM) numbers, abbreviated name, number of target genes and statistical enrichment values (Zscore, Fisher-score) are reported.