Comparative genome analysis of cortactin and HS1: the significance of the F-actin binding repeat domain

Background In human carcinomas, overexpression of cortactin correlates with poor prognosis. Cortactin is an F-actin-binding protein involved in cytoskeletal rearrangements and cell migration by promoting actin-related protein (Arp)2/3 mediated actin polymerization. It shares a high amino acid sequence and structural similarity to hematopoietic lineage cell-specific protein 1 (HS1) although their functions differ considerable. In this manuscript we describe the genomic organization of these two genes in a variety of species by a combination of cloning and database searches. Based on our analysis, we predict the genesis of the actin-binding repeat domain during evolution. Results Cortactin homologues exist in sponges, worms, shrimps, insects, urochordates, fishes, amphibians, birds and mammalians, whereas HS1 exists in vertebrates only, suggesting that both genes have been derived from an ancestor cortactin gene by duplication. In agreement with this, comparative genome analysis revealed very similar exon-intron structures and sequence homologies, especially over the regions that encode the characteristic highly conserved F-actin-binding repeat domain. Cortactin splice variants affecting this F-actin-binding domain were identified not only in mammalians, but also in amphibians, fishes and birds. In mammalians, cortactin is ubiquitously expressed except in hematopoietic cells, whereas HS1 is mainly expressed in hematopoietic cells. In accordance with their distinct tissue specificity, the putative promoter region of cortactin is different from HS1. Conclusions Comparative analysis of the genomic organization and amino acid sequences of cortactin and HS1 provides inside into their origin and evolution. Our analysis shows that both genes originated from a gene duplication event and subsequently HS1 lost two repeats, whereas cortactin gained one repeat. Our analysis genetically underscores the significance of the F-actin binding domain in cytoskeletal remodeling, which is of importance for the major role of HS1 in apoptosis and for cortactin in cell migration.

The deduced amino acid sequence of cortactin revealed three main distinguishable domains: the N-terminal acidic domain containing a DDW-Arp2/3 binding motif followed by a six and one-half 37-amino acid F-actin binding repeat domain, a central region and an SH3 domain at the very C-terminal. The DDW-Arp2/3 binding site and the actin-binding domain together regulate Factin polymerization and dynamics by activating the Arp2/3 complex [16] and both are necessary for translocation of cortactin to sites of actin polymerization [17]. Recently, we reported the identification of two alternative splice variants of human cortactin lacking either 6 th or the 5 th /6 th repeat, present in normal tissues as well as squamous cell carcinomas cell lines [14]. These splice variants differ significantly in their ability to (i) bind F-actin, (ii) cross-link F-actin (iii) activate Arp2/3 mediated actin polymerization and (iv) induce cell migration in vitro [14]. This indicates that also the number of repeats determines the affinity for F-actin and ability to regulate cell migration. Similar cortactin splice variants were also reported in the mouse [18], rat [19] and frog [20]. The SH3 domain is a conserved protein module found in various signal proteins and mediates the interaction with various proteins such as N-WASP involved in actin polymerization, dynamin-2 in endocytosis, ZO-1 in cellcell interactions and SHANK-2 in neuronal growth cones (reviewed in [21]). The central part of the protein between the F-actin repeat domain and the SH3 domain contains an alpha-helix sequence and a proline-rich region with three c-Src tyrosine phosphorylation sites [22,23] and three serine/threonine phosphorylation sites [24]. Cortactin tyrosine phosphorylation occurs in response to growth factor treatment, integrin cross-linking, bacterial invasion and cell shrinkage (reviewed in [21]). Tyrosine phosphorylation of cortactin reduces its F-actin cross-linking activity and is required for its ability to stimulate cell migration [13]. Since cortactin operates mainly in cytoskeletal rearrangements, it may link other proteins via its SH3 domain to sites of actin polymerization. Alternatively, serine phosphorylation of cortactin by Erk enhances, whereas Src phosphorylation inhibits the activation of N-WASP by cortactin [25] and as a result affects actin polymerization. This suggests that cortactin at first instance may be directed to the site of actin polymerization by other proteins. Thus, changes in protein expression level, phosphorylation state, the relative expression of splice variants and interactions with other proteins can all influence cell migration.
Cortactin shows the highest similarity to the hematopoietic lineage cell-specific protein 1 (HS1). Human HS1 (also designated HCLS1 , see Genecard [26]) was originally isolated by its homology to the adenovirus E1A gene [27]. HS1 overall similarity to cortactin at the amino acid level is 51% but is highest at the SH3 domain (86%) and the 37-amino-acids repeat domain (86%), except that HS1 carries only three and one-half repeats. Despite this high homology, the function of HS1 differs considerable from cortactin. First, HS1 is mainly expressed in hematopoietic cells [27], whereas cortactin is widely expressed in all cell types except most hematopoietic cells [28]. Only in platelets and in megakaryocytes both genes are expressed [29,30]. Second, in concordance with this tissue distribution, HS1 is tyrosine phosphorylated after receptor cross-linking in B-cells [31], T-cells [32], mast cells [33] and erythroid cells [34], but at different residues compared to the functional phosphorylation residues in cortactin [13,23]. Third, HS1 is, like cortactin, a cytoplasmic protein, but after tyrosine phosphorylation HS1 translocates to the nucleus [35], whereas cortactin is never found in the nucleus. This is because HS1, but not cortactin, contains a nuclear localization signal (NLS) [36,37]. Fourth, HS1 plays an important role in the receptor-mediated apoptosis and proliferative responses as demonstrated by the analysis of HS1 deficient mice [38] and WEH1-231 B lymphoma cells [37,39]. An HS1 tyrosine mutant that could not translocate to the nucleus, also failed to induce apoptosis [37]. Consistent with its role in apoptosis, HS1 is able to bind to the mitochondrial protein HAX-1, a Bcl2 like protein [40]. Finally, the SH3 domain of HS1 at the C-terminus binds to other proteins (Ste20 related kinase HPK1 [41] and HS1-BP3 [42]) than those binding to cortactin, despite the very high amino acid sequence similarity of both SH3 domains (86%). This most probably reflects the different tissue-specific expression pattern.
Cortactin and HS1 share also remarkable similarities. First, HS1 binds with its DDW-motif directly to Arp2/3 and is involved in Arp2/3 mediated actin polymerization in vitro , although less efficient than cortactin [43]. Second, HS1 binds to F-actin with its 37-amino-acid repeat domain [36], however, it contains only three and one-half repeat in contrast to cortactin. Third, also HS1-splice variants have been detected such as a variant lacking the 3 rd  [44]. Fourth, HS1 is sequentially phosphorylated on three tyrosine residues by various Src family tyrosine kinases [31,45] and two serine/threonine residues [30], although at different residues than cortactin [25]. Finally, both cortactin and HS1 can accumulate into podosomes, structures found in osteoclasts [46] and marcrophages [47], but also in RSV transformed cells [48] and carcinoma cells [49].
Although cortactin and HS1 share a high amino acid sequence and structural similarity, their functions differ considerable. In this paper, we compare their genomic organization in order to provide more insight into their evolution, which may form the basis towards understanding specific functions of both genes. We describe the genomic organization and the exon-intron boundaries for human cortactin. Both the genomic cDNA and deduced amino acid sequences of human cortactin were compared to cortactin and HS1 genes from other species. Genomic comparisons revealed the evolution and underscore the significance of the conserved F-actin binding repeat domain for HS1 and cortactin and the importance of alternative splicing for cortactin function.

Results and discussion
The genomic organization of cortactin homologues We have previously described the isolation and sequencing of the EMS1 cDNA [28,49] Figure 1A). By amplifying the intron sequences (smaller than 2 Kb) using primers on adjacent exons followed by end-sequencing of these products, we confirmed the intron/exon boundaries of the human EMS1/cortactin gene.  [73]. o The deduced cDNA and protein sequence from the genomic zebrafish Finished_845 sequence is more related to human HS1, while the zebrafish mRNA/protein sequence (AF527956, AAQ09010) showed more homology to human cortactin. p The deduced cDNA and protein sequence from the genomic chicken ENSGALG00000009778 showed more homology to human HS1[67].  Other cortactin homologues have been reported in mouse [3], rat [19], chicken [50], fruit fly (Drosophila melanogaster ) [51], and frog (Xenopus laevis ) [20]. We searched in numerous databases for all known cortactin genes in other species (listed in Table 1). The identification is based on overall amino acid sequence and overall structural homology with human cortactin. Cortactin homologues exist in mammalians (human, chimpanzee, cattle, pig, mouse, rat), birds (chicken), amphibians (frog), fishes (zebrafish, pufferfish), urochordates (sea squirt), invertebrates (sea urchin), insects (fruit fly, mosquito), shrimps, worms and sponges. To date, there is no evidence for the existence of cortactin in unicellular species, nor in plants. Thus, cortactin seems to be restricted to metazoans.
For several species, both cDNA and genomic sequences (total or partial) are available and therefore we were able to reveal their genomic organization using BLASTn. The exon/intron-boundaries were determined and compared to human cortactin [see Additional file 1]. As schematically presented in Figure 1, the genomic organization and the lengths of the exons as well as the locations of the exon/intron boundaries are highly conserved from urochordates to mammalians. Pufferfishes have the shortest known genome of all vertebrate species due to much shorter introns, nevertheless most exon/intron boundaries were conserved and similar to mammalian cortactin. Intriguingly, the number of repeats in the actin-binding domain differs between species ( Figure 1A-G). The number of exons and the location of the intron/exon borders of insect cortactin (Drosophila and mosquito) differ considerably with mammalian cortactin, despite the proteins sequences are very similar. Drosophila and mosquito carry 4 repeats in the actin-binding domain. In both species, repeat 1-to-3 and 4 are on separate exons with in mosquito the 4 th repeat of the actin binding domain to be encoded by a single 111 bp large exon 2 ( Figure 1F,G). Both, sponge (the lowest metazoan) and sea squirt (urochordate) cortactin protein carry 5 repeats. During evolution, after creation of sponges and worms, the coelomata divided into insects and urochordates (that evolved later into vertebrates). The genomic organization of ancestors of the coelomata should reveal the roots of cortactin evolution. However, complete cDNA and/or genomic DNA of cortactin homologues in these species are not yet available.

The genomic organization HS1 homologues
Both nucleotide and amino acid sequence comparisons with cortactin revealed the highest similarity with the hematopoietic lineage cell-specific protein 1 (HS1). So far, HS1 homologues have been reported in human [27], mouse [33], rat and chimpanzee (NCBI database), suggesting that HS1 exists in mammalians only. We determined the intron/exon boundaries of mammalian HS1 genes by aligning the cDNA with the genomic DNA using BLASTn ( Figure 1H and In addition to a single cortactin homologue in all other species, nucleotide sequences comparisons using the mammalian HS1 mRNA and genomic DNA sequences revealed (incomplete) genomic sequences in chicken, pufferfish, zebrafish and frog (Table 1 and Figure 1I-M) that were more related to the HS1 protein ( Figure 3 and [see Additional file 3]). Because no HS1 homologues for these species were present in the mRNA/dbEST database (except for X. laevis HS1), the cDNA (and corresponding protein) sequences were deduced from the genomic DNA with BLASTn or were predicted by Ensemble program. In these lower species, two cortactin related proteins exist. To distinguish between cortactin and HS1 variants, only the most conserved N-terminal part of cortactin and HS1 protein variants, including repeat 3 (corresponding to amino acid 1-190 of human cortactin) was used in BLASTp analysis. In each species, one protein variant turned out to be more homologous to human cortactin, and was called cortactin, whereas the other protein variant appeared to be more related to HS1 and was called HS1. This analysis unveiled HS1 proteins with more than 3 repeats in chicken and pufferfish Tetraodon nigroviridis (containing 4 1/2 repeats), pufferfish Takifugi rubripes and Xenopus laevis HS1 (5 1/2 repeats) and zebrafish HS1 (6 1/2 repeats) ( Figure 1 I-M).
Moreover, alignments of the exon/intron boundaries of these HS1 genes to the mammalian HS1 genes [see Addi-Exon map of cortactin and HS1 from different species tional file 2] revealed that exon 7 (repeat 3) of HS1 was most similar to exon 10 (repeat 5) of cortactin suggesting that in mammalians exon 8 and 9 (repeat 3 and 4) of HS1 were lost during evolution. This is supported by the presence of at least one sequence of 111 nucleotides in the 5670 bp intron 6 of human HS1 (location 3271-3381) that is predicted by the program HMMER when performing alignments using a consensus sequence of the 37 amino acid repeats. However, this sequence is not functional because it does not represent an exon based on the consensus sequence of exon-intron junctions ('gt ... ag' rule of intron sequences) and no human transcripts or ESTs of HS1 including this sequence are present in the NCBI databases. In summary, HS1 is not restricted to mammalians only, but exist also in fishes, amphibians and birds and its genomic structure is very similar to that of cortactin.

Different promoter regions explain distinct tissue specificity of cortactin and HS1
Cortactin is widely expressed in most cell types suggesting to be important for vital functions, while HS1 expression is restricted to hematopoietic cells suggesting to be tailored later in evolution to serve a specific function in these cells. In concordance with their tissue-specific expression pattern, we suppose that their expression might be differently regulated. Therefore, we compared the upstream promoter regions of several cortactin and HS1 genes (Figure 2). The mammalian cortactin gene is very GC rich and contains putative SP-1 transcriptional factor binding sites that are common to many TATA-less promoters and typical for promoter regions in 'widely-expressed housekeeping genes'. Ets family transcription factors, found in the HS1 promoters, are specific for hematopoietic cells and involved in controlling the expression of many B cell-and macrophage-specific genes [52] and are critical for development of lymphoid and myeloid cell lineages. The promoter region of Drosophila and mosquito cortactin shares putative transcription factors found both in mammalian cortactin and HS1. Thus at least in mammalians, the nature of the promoters seemed to determine the broad distribution of cortactin expression in various tissues except most hematopoietic cells and the limited expression of HS1 to hematopoetic cells.

The significance of the actin binding repeat domain in cortactin and HS1
We recently reported the identification of two alternative splice variants of human cortactin; SV1-cortactin lacking the 6 th repeat and SV2 lacking the 5 th and 6 th repeat resulting in a different F-actin binding properties and decreased cell migration [14]. As shown in Table 1, cortactin splice variants exist in other mammalians as well as in chicken and frog. So far, splice variants in other species have not been identified, suggesting that alternative splicing of cortactin seems to be restricted to higher metazoans. All intron sequences of cortactin bordering the splice site junctions follow the general GT/AG rule [53] except for intron 11 (GC/AG) [see Additional file 1]. As has been shown for other genes, a GT-to-GC transition might be responsible for the generation of an alternatively mRNA transcript [54]. However, in frog (Xenopus laevis ), the SV1cortactin variant exists despite the splice donor of intron 11 begins with a GT [20]. Thus, concerning the genome of these different species, alternative splicing of the actinbinding domain of cortactin seems to be facilitated during evolution by modulating the splicing machinery by a GTto-GC transition to create cortactin related variants that influences cellular properties [14]. The relative expression of cortactin splice variants by tissue origin [14] suggested that splice variants might have tissue-specific functions such as fine-tuning the organization of the F-actin cytoskeleton and consequently regulating cell adhesion and migration.
Alternative splicing also occurs in human HS1. Recently a splice variant lacking the 3 rd repeat (exon 7) has been found in an SLE patient [44], resulting in enhanced BCRmediated cell death. This alternative splicing event was due to a germ line mutation. In contrast, the splice donor of HS1 intron 6 begins with a GC [see Additional file 2]. With respect to the similarities between cortactin and HS1, it might be of interest to investigate the occurrence of splicing of HS1 exon 6 and possible biological consequences. The 3 rd repeat and its NLS links HS1 to a role in apoptosis, while such a role has not been described for cortactin lacking a NLS. Since the cytoskeleton architecture in hematopoietic lineage cells is very different from that in adherent cells, it is likely that HS1 plays an important role in the construction of tissue-type specific actin networks. Other types of actin cytoskeleton factors, such as the Arp2/3 complex activators of the WASP family have been reported to have distinct tissue specific expression profiles as well. Thus, the apparent role of HS1 in apoptosis is likely due to its actin remodeling related function. Additionally, our genomic comparisons revealed that the 3 rd repeat of HS1 corresponds with the 5 th repeat of cortactin, and therefore it might be of interest to investigate whether cortactin SV2 variant (lacking the 5 th and 6 th repeat) might be involved in apoptosis.
The 4 th repeat of cortactin has been suggested to be required for F-actin-binding [17]. Genomic comparisons revealed that HS1 lacks this 4 th repeat. Nonetheless, HS1 does bind to F-actin and activate the Arp2/3 complex, although at a lower efficiency than cortactin [43]. This suggests that not only a single repeat but the number of repeats is crucial for the F-actin-binding affinity [14,18]. In addition, HS1 contains a PIP 2 binding site in each of its 3 repeats, whereas cortactin has only one in the 4 th repeat.
A schematic view over 800 bp of the proximal promoters  PIP 2 reduces F-actin cross-linking by cortactin, probably due to competition for the same binding site. Due to its higher affinity for PIP 2 [36], HS1 restores this cortactin/Factin cross-linking process by trapping PIP 2 . This might be of importance in platelets and megakaryocytes where both, cortactin and HS1 are expressed. Taken together, the composition of the repeat domain is also involved in diverting the functions of both genes.
An elegant way to study the function of a protein is to perform loss-of-function experiments. So far, cortactin knock-out models have not yet been generated successfully, because deletion of one allele of cortactin leads to premature differentiation of embryonic stem cells (personal communication in [55]). However, complete lossof-function mutants of the Drosophila cortactin gene were viable and fertile, except impaired border cell migration during oogenesis [56]. Down-regulation of cortactin by Phylogenetic relationship of cortactin and HS1 genes RNA interference, revealed an essential role for cortactin in dendritic spine morphogenesis [57] and in E-cadherin mediated contact formation in epithelial cells [58]. Mice lacking HS1, showed normal development of the lymphoid system [38], however, the antigen-receptor induced clonal expansion and deletion of B and T lymphocytes were impaired. Thus, loss of function studies underscores the divergent functions of HS1 and cortactin in different cell systems.

Cortactin and HS1 are derived from an ancestral vertebrate cortactin-gene by gene duplication
To examine the genesis of the cortactin family, we studied the relationship between the cortactin and HS1 homologues by generating a phylogenetic tree based on a multisequence alignment with the ClustalW 1,83 program [see Additional file 3]. We compared the N-terminal regions including repeat 3 (corresponding to nucleotide 1 to 190 of human cortactin), because this is the best-conserved region among all homologues ( Figure 3). One cluster contains all known HS1 proteins and appeared to be closest related to a cluster composed by insects (Mosquito (Ag), Drosophila (Dm)), urochordate (sea urchin, (Sp)) and sponge (Sd) cortactin. In this last cluster all the species with only one gene (with the highest similarity with cortactin) are present. This suggests that with the appearance of the vertebrates, an ancestral gene became duplicated to create two genes, which later evolved into cortactin and HS1. This hypothesis is supported by the fact that many genes duplicated at this stage in the evolution, the overall amino-acid sequence in both genes is very similar and the introns are located at the same amino acid position. Furthermore, gene duplication often correlates with a tissue specific expression pattern of the duplicated genes, which is true for mammalian cortactin and HS1. Figure 4 displays a hypothetical model for the origin of the cortactin and HS1 genes during evolution. The oldest ancestor is the sponge that, like sea squirt (urochordate), carries one cortactin protein with 5 c1/2 repeats. Insects have also one cortactin gene and evolved to 4 1/2 repeats. During evolution, after the creation of the sponge and the worms, the coelomata divided into insects and urochordates (that evolved later into vertebrates). This suggests that during the evolution, the number of repeats decreased in the insects. Unfortunately, no genomic sequences of ancestors of the coelomata that could reveal the roots of cortactin evolution are available yet to perform more detailed genomic analysis.
The genome of pufferfish Takifugu rubripes contains two cortactin-related genomic sequences both including 5 1/2 repeats. Most likely, an ancestor vertebrate cortactin gene underwent gene duplication. From this moment on during evolution, two cortactin/HS1-releated genes are present in all higher species. One gene evolved to mammalian HS1 with a specific function in apoptosis in hematopoietic cells. For its function, exon 8 and 9 (encoding repeat 3 and 4) were not useful and lost during evolution. However, the HS1 protein in pufferfish Takifugu rubripes and frog Xenopus laevis contains 5 1/2 repeats, while chicken and pufferfish Tetraodon nigroviridis HS1 carries 4 1/2 repeats. It might be of interest to investigate the function of these HS1 proteins and their functional differences to mammalian HS1. The other gene evolved to a ubiquitously expressed mammalian cortactin protein with a vital function in the organization of the cytoskeleton and cell migration. The 6 th repeat of cortactin most likely originated from a duplication event of the 5 th repeat, since the 6 th repeat is most similar to the 5 th repeat in all species with 6 1/2 repeats. We recently demonstrated that 6 1/2 repeats are necessary for optimal F-actin crosslinking activity and cell migration, while the splice variant lacking both the 5th and 6th repeats (SV2) was less efficient [14]. Thus, the number of repeats in the F-actin binding domain of cortactin fine-tunes its function in cytoskeletal remodeling. For that reason, in higher metazoans, alternative splicing of the F-actin binding domain is most likely facilitated by a GT-GC transition in the splice donor. Alternatively, we can not exclude that gene duplication might have taken place after duplicated of the 5 th repeat (dotted arrows), since both zebrafish cortactin and HS1 contain 6 1/2 repeats.

Conclusions
We report the genomic organization of cortactin and HS1 genes of several species. These genes display a conserved genomic organization as the coding regions have almost identical exon/intron structure. Comparison of 5' sequences allows possible regulatory elements that stress their specific tissue distribution. Comparative analysis of the genomic organization and amino acid sequences of cortactin and HS1 provides insight into the evolution of the conserved actin-binding repeat domain, which forms the basis towards understanding specific functions of both genes. Most likely, both genes originated from a gene duplication event and subsequently HS1 lost two repeats, whereas cortactin gained one repeat. Our analysis genetically underscores the significance of the F-actin binding domain in cytoskeletal remodeling, which is of importance for the major role of HS1 in apoptosis and for cortactin in cell migration.

The genomic structure of human cortactin
To determine the genomic structure of the human cortactin gene, an algorithm was applied based on the consensus sequence of exon-intron junctions ('gt . rule of intronic sequence) as well as on the codon usage within ORF [61].
To confirm the predicted genomic structure, we determined the intron/exon boundaries using a cloning procedure as described [62]. Genomic DNA of two cosmid clones COS-7.12 and COS-3.72 covering the cortactin gene as determined by the full-length cDNA [5], was amplified with randomly selected primers from the cDNA sequence (GeneBank accession no. M98343). All PCR Model for the origin of cortactin and HS1 during evolution  products that were larger than the cDNA control sample were considered to be caused by intron sequences and compared to genomic sequence (accession number AP000487 and AP000405) using BLASTn [59]. The size of intron 1, 5, 8, 12 and 13 was too large to obtain a reliable sequence.
Because no overlapping genomic sequences immediately 5' of the first exon were present in the database, we performed sequence analysis of a 2.7-kb HincII-HincII fragment representing the first exon and its 5'-flanking sequences from cosmid COS-7.12 cloned into pUC18 (p5'EMS_3135). In addition, we sequenced a 5-kb PCR product using a 5'-primer in the vector (within the TET gene) and 3'-primer (p3135p601: 5'-ccgggtcggccctggattcc-3') within exon 1, subcloned in pUC18 (p5'EMS_4911). Nucleotide sequences of both products were compared with the genomic clones representing the cortactin gene present in the NCBI database (Accession number AP000487 (GI 8118774 and GI 6277297) and AP000405 (GI 8118742)) and used to define the 7.4 kb 5'-flanking region. The PROSCAN program [63] from BIMAS was used to define the 316 bp promoter region preceding exon 1. Putative transcription factor binding sites where determined by the TFSEARCH program [64] and graphically represented in figure 2. Sequences from human cortactin were submitted to NCBI GenBank [65] as accession No. M98343 (cDNA) and AJ288897 (promoter).

Database searching
The (deduced) protein and genomic sequences of all cortactin and HS1 genes were retrieved from various WEBsites and their available sequence data are summarized in Table 1. In addition, partial cortactin sequences (ESTs and/or genomic) of various organisms were identified based on amino acid sequence homology with existing cortactin proteins. The genomic organization of the sea squirt and Takifugu rubripes could not be completely elucidated, because cDNA/genomic sequences were only partially available. All data were compiled using BLAST searches of the following databases: National Center for Biotechnology Information ( [75] and UCSC Genome Bioinformatics (Santa Cruz, CA, USA) [76].
To determine the exon/intron boundaries of all cortactin and HS1 genes, available genomic sequences were subjected to sequence alignments of each species-specific cDNA sequence using the BLAST program of NCBI. Using the same algorithms, as described for human cortactin, the exon/intron-boundaries could be predicted. The complete genomic sequences of the 5' flanking region of cortactin of human, chimpanzee, mouse, rat, fruit fly, and mosquito were determined using the various accession numbers of genomic DNA in Table 1. Putative transcription factor binding sites of 800 bp of the 5' flanking regions where determined by the TFSEARCH program ( Figure 2). The predicted exon in intron 6 of HS1 was predicted by the bio-informatics program HMMER [77]) The human cortactin 6 1/2 repeats of the actin-binding domain were aligned, resulting in a consensus sequence: (kfGvqkdrvDksAvGfdyqekvekhesqkDysk). With HMMER this consensus sequence was 'tBLASTn' to intron 6 of human HS1. With an acceptable probability (E-value 0.095), the program predicted an exon in this intron 6 (at location 3271-3381).

Amino acid sequence comparisons
Sequence alignments were carried out using the BLAST program of NCBI. The multiple sequence alignments of various cortactin proteins were constructed using Basic GeneBee ClustalW 1.83 [78]. The genome, cDNA or protein was completed for all cortactin homologues and the number of repeats differs across species and between HS1 and cortactin. Only the N-terminal of cortactin and HS1 proteins including repeat 3 (corresponding to amino acid 1-190 of human cortactin) was used to generate a phylogenetic tree, because this is the most conserved part. Predicted nuclear localization signals sequences were obtained using Predict NLS program [79].

Authors' contributions
AGSHvR designed the study on comparative genome analysis, performed database searches, sequence alignments and gene structure prediction and drafted the manuscript. ESS designed, conducted and analyzed the cloning and sequencing of the promoter of human cortactin. VvBvS conducted and analyzed the PCR and sequencing experiments of the exon-intron boundaries of human cortactin and its splice variants. PMK read the manuscript and provided comments. ES helped with writing the paper, provided overall technical guidance and coordination. All authors read and approved the final manuscript.