Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

Background A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. Results Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. Conclusion We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins.


Background
Chickens are known to produce several different proteins which bind biotin in a non-covalent fashion. One of them is avidin, which is expressed by oviduct cells upon progesterone induction and is then transferred to the egg-white where it constitutes a minor fraction of the total protein content of the egg-white [1]. Independently of progester-one, avidin expression is also induced by an inflammation response in almost all of the studied chicken tissues [2]. Another biotin-binder, called literally biotin-binding protein (BBP), is presumably induced by estrogen [3] and secreted from the liver into chicken plasma [4]. From plasma, the BBP is thought to be deposited in egg-yolk [5]. In addition, Seshagiri and Adiga found another egg-white BBP, distinct from avidin, the biochemical characteristics of which resemble those reported for yolk BBP [6].
The affinities that avidin and yolk BBP exhibit toward biotin are extremely high, the dissociation constant being femtomolar for avidin [1] and picomolar for BBP [7]. According to the published data, the yolk BBP serves as a biotin reserve for the developing embryo and hence it is saturated with the vitamin [5]. In contrast, avidin in eggwhite is mainly found as an apoprotein and it is assumed to function as an antimicrobial agent that harvests free biotin from its environment [1,2]. Because of its high affinity to biotin, avidin has long been used as a separation, labelling and targeting tool in various bioscience fields [8].
The yolk BBP has been further characterised to consist of two different forms, BBP-I and BBP-II [3,9]. It has been proposed that BBP-I is the primary gene product (67 kDa) that is converted by proteolytic cleavage to BBP-II (19 kDa) [4]. The biological function of BBP-I is believed to be a general biotin transporter in plasma, whereas the actual deposition role in egg-yolk is reserved for BBP-II [3]. BBP-II is a tetrameric protein, like avidin, and is composed of subunits homologous to each other. BBP-I has been thought to be a pseudotetramer containing four binding domains in a polypeptide chain and its gene should, therefore, contain four subsequent repeats encoding for similar peptide sequences [4,9]. The egg-white BBP was also reported to exist in a large form similar to yolk BBP-I [6]. Interestingly, some egg-laying species, such as turkeys and alligators, showed only one type of BBP in the yolk of their eggs [10].
Despite their similar function, some biochemical properties of BBPs and avidin are different. The pI of the chicken yolk BBP (not defined which form) was reported to be 4.6 [7] in contrast to avidin which has a highly basic pI (≈ 10.4) [1]. BBP-I exhibits higher thermostability, being active at 60°C, whereas BBP-II is denatured at temperatures above 40°C [10]. Bush and White III have published the N-terminal amino acid sequences for both chicken yolk BBP forms, which are highly similar to each other and also resemble the avidin N-terminal sequence [4,10]. BBP-I has been proposed to be a glycoprotein [7] whereas BBP-II is shown to be nonglycosylated [10]. Avidin is known to contain one N-linked carbohydrate moiety per subunit [1]. Differences in the radiobiotin exchange rates between these two BBP forms have also been observed: BBP-I showed slower exchange than BBP-II [10].
The published N-terminal sequences, the similar overall sizes of the proteins and the tetrameric appearance of BBP-II as well as the reported high biotin-binding affinities suggest that the BBPs could be related to avidin. Struc-turally and functionally, avidin is considered to be a member of the larger protein superfamily called calycins [11]. These proteins form a large and divergent family of relatively small extracellular proteins which typically bind small hydrophobic ligands. Two particular groups of calycin protein family, the lipocalins and avidins, are β-barrels composed of eight antiparallel β-strands. The ligand is bound inside the protein at one end of the β-barrel [12]. An interesting feature among the lipocalins and avidins is a structural signature wherein a conserved basic amino acid residue, close to the last β-strand, packs over a specific tryptophan residue on the first β-strand and forms hydrogen bonds with the short 3 10 helix prior to the first βstrand [11].
Because the genes, genomic locations, cDNAs or full amino acid sequences of chicken BBPs are not known, it is impossible to evaluate their true relationship to avidin or, in a broader sense, to the calycin protein superfamily. New hope to solve this enigma aroused when the first draft of the chicken genome was published in March 2004 [13,14]. In addition to the genome project, a comprehensive collection of chicken cDNAs is also in progress [15].
In the current study we searched these databases in order to find the cDNAs and genes for BBPs. Indeed, we found two independent cDNAs whose translated amino acid sequences fitted well to the published N-terminal sequences of BBPs. The genomic fragments corresponding to these cDNAs were also identified and analysed. They showed features similar to those of the avidin gene family members. Interestingly, one of these BBP gene candidates is located together with the avidin gene family in the chicken chromosome Z [16]. In addition, more evidence supporting the previous hypothesis of the high recombination frequency in the avidin gene family is gathered. One of the two putative BBPs was found to significantly resemble avidin, showing a theoretical molecular mass and pI close to those of avidin, whereas the other showed theoretical characteristics fitting more closely to those published for BBPs. Almost all amino acids important for biotin binding in avidin [17] are conserved in both of these supposed BBPs. Neither of the found cDNAs/genes, however, encodes a protein composed of four similar domains as expected for the isolated pseudotetrameric BBP-I [3]. Instead, the encoded proteins show calculated molecular masses corresponding to one BBP domain per polypeptide. In silico analysis of these genes as well as modelled structures of the putative BBP proteins are presented. The gene containing the fully identical sequence to BBP-A cDNA with three introns was found in chicken genome database in Contig166.108. Two contig-sequences (Contig55972.2 and Contig26844.1) containing parts of BBP-B were found in the database. These were manually joined together (Figure 1). The final product contained five changes in the nucleotide sequence when compared to BBP-B cDNA ([GenBank:BX936151]), causing differences in three amino acid residues close to the C-terminal part of the protein (N118I, V119L, F120L). The avidin gene and three avidinrelated genes were also found from the chicken genome database. The gene of BBP-A and a novel allele of one of the previously cloned AVRs (or a novel avidin-related gene), which we named AVR-A were  [53] Schematic presentation of the genomic locations and orientations of the genes found in the same Contig166.108. AVR-A was similar to AVR2 and AVR6 ( Figure 4) [16]. BBP-A and AVR-A genes point towards each other (BBP-A→ ←AVR-A) separated by an intergenic distance of 8.1 kB. A chicken repeat 1 (CR1) element [18] was found between these two genes.

Database queries and sequence analyses
The distance between AVR-A and CR1 was 0.6 kB while the distance between the BBP-A gene and CR1 element was 6.1 kB. CR1 pointed towards BBP-A and it was in parallel orientation with AVR-A. Previously, Wallén et al. have reported CR1 elements located 1.4-2.1 kB upstream from the 5'-ends of AVR4 and AVR5 genes and pointing towards the genes [19]. Contig166.109 contains a partial gene (named AVR-B) clearly resembling AVR4. However, it has a mutation that converts Phe-29 in the AVR4 protein to leucine. In addition, Contig166.110 contains a partial gene (named AVR-C) resembling AVR2 with the exception that it codes for Ser and Arg in positions 25 and 26 (as in avidin) instead of Asp and Asn found in AVR2 ( Figure 4). Finally, Contig166.111 contains the avidin gene ( Figure  1).
Alignment of BBP cDNAs with their corresponding DNA contig sequences revealed that both of these genes contain four exons and three introns, as shown for avidin and avidin-related genes [20]. The exon and intron lengths of the BBP genes and their comparison with the avidin gene structure are shown in Figure 2. The fourth exons are cut after the stop-codon, and the first exons (N-terminus) are cleaved before the ATG starting open reading frame. The sizes of the exons are relatively similar with the exception of the fourth exon of BBP-B which encodes 96 amino acids residues, compared to the 30-42 residues in BBP-A and AVD/AVRs, respectively. The number of variable sites among the exons ranged from 25% (fourth exons) to 58% (second exons). The first intron is similar in size in all compared sequences, whereas the second intron is considerably longer in the avidin and AVR genes (about 425 bp) than in the BBP genes (175 bp in BBP-A and 114 bp in BBP-B). On the contrary, the third intron is longer in the BBP-B gene (252 bp) than in the avidin/AVR genes (87 bp) ( Figure 2). The number of variable sites among intron sequences ranged from 24% in the third intron to 59% in the first intron.
A high similarity among the genes was observed at the exon/intron junctions as shown in Figure 2. Sequence divergence (p-distance) among avidin and BBP genes ranged from 1.4% between AVR4 and AVR-B and 48.5% between AVR-B and BBP-B (Table 2). Similar values were obtained when sequence divergence among exons only or introns only (the combined sequence) were analysed (not shown).
The phylogenetic relationship of the AVD, AVRs and biotin-binding protein genes is shown graphically in Figure 3 (the same relationships were obtained from the amino acid sequences; not shown). In the unrooted tree, avidin and avidin related genes formed a well supported cluster, which was the sister group of BBP-A. Finally, basal to the tree, was BBP-B.
All characterised genes contained a potential promoter region upstream of the coding region according to prediction program used. In the case of AVR-C, the upstream region of the gene was not analysed due to a missing sequence. The promoters of BBP-A and BBP-B contained a TATA sequence TATAAA at position (-30)-(-25) nt upstream of the predicted transcription initiation site. In the case of the AVD and AVR-A/B genes, the sequence AATAAAA was detected (-31)-(-25) nt upstream of the predicted transcription initiation site. The putative promoter regions contained possible binding sites for several transcription factors (not shown).

Amino acid primary sequence characteristics
The most obvious difference between the BBP-B, when compared to avidin and BBP-A, is a C-terminal extension which makes it 18 residues longer than avidin and 22 residues longer than BBP-A. The sequence identity between BBP-A and BBP-B is 49%. The identity between the aligned regions of chicken avidin and BBP-A is 59% and between BBP-B and avidin is 47%. The residues involved in biotin binding in avidin [17] are almost perfectly conserved in both the BBP-forms ( Figure 4). The only substitutions within these residues are Ser-73 which is replaced with alanine in BBP-A and Ser-75 which is replaced with alanine in BBP-B. Moreover, the T-A-T sequence in avidin Evolutionary relationships of genes Figure 3 Evolutionary relationships of genes. Neighbour joining tree obtained from the gene sequences of avidin, avidin related genes 2, 4 and 6 (AVR genes are from  and BBP-A and B genes. The tree was obtained from a pairwise p-distance matrix between sequences as implemented in MEGA v.3 . Numbers indicate node bootstrap supports.

Sequence alignment
Both BBPs have one possible N-glycosylation site being 17 Asn-Met-Thr-Ile 20 for BBP-A (identical to avidin) and 74 Asn-Ala-Thr-Thr 77 for BBP-B. The prediction shows, however, a low probability for glycosylation to occur in BBP-B.
The cysteine residues (Cys-4 and Cys-83) which form the intrasubunit disulphide bridge in avidin are conserved in both BBPs. In addition, two cysteine residues are found in the putative signal sequence of BBP-A and one in the signal sequence of BBP-B. Furthermore, there are two additional cysteines in BBP-B, one in the position corresponding to Glu-43 in avidin and one located in the middle of its C-terminal extension.
The aromatic amino acids are conserved throughout the sequences. The only exceptions are the two tryptophans found only in BBP-B in the region corresponding to βsheet 5 in avidin.

Secondary and tertiary structure characteristics based on the homology modelling
The residues in BBP proteins corresponding to the β-sheet secondary structure elements of avidin are significantly more conserved when compared to the loop regions of avidin.
Overall, the homology modelling strongly suggests avidin-like secondary ( Figure 4) and tertiary ( Figure 5) structures for both BBPs.
Based on the modelled structures, both BBPs have the common lipocalin-motif: Gly-Xaa-Trp residues close to the N-terminus (residues 8-10 in alignment) and arginine in the last β-strand. This structural signature indicates that BBPs belong to the calycin superfamily together with avidin and streptavidin (which is a bacterial analogue of chicken avidin) [12].
At the tertiary structure level the most striking feature of the BBPs, when compared to avidin, is the conservation of the amino acid residues forming the inner part of the βbarrel. These amino acids also include almost all biotinbinding contact residues (Figures 5F, 5G). The hydrogen bond between biotin and Asp-128 in streptavidin [23], and biotin and the analogous residue Asn-118 in avidin (Hytönen VP et al., unpublished results), are known to be important for their ligand binding. The bonding network including this residue comprises bonds between Gln-24 and Asp-128 in streptavidin [24] and Asn-118 and Asp-13 in avidin [17]. The residue corresponding to Asn-118 in avidin is conserved in both BBPs.
The role of the C-terminal extension of BBP-B was hypothesised by modelling. Since there is an orphan cysteine residue near the end of β-strand 4 in BBP-B and another cysteine residue close the end of the C-terminal extension, one could assume a disulphide bridge between these cysteines. Several details support this possibility. Firstly, the region close to the cysteine residue in β-strand 4 in BBP-B seems to be rather hydrophobic (K9L, N17L, T34L, E46I), in comparison with the corresponding region in avidin. This might indicate a presence of a shielding structure in this region (i.e. contact to another protein or peptide) ( Figure 5C). Secondly, the distance between the end of β-strand 8 and the cysteine in β-strand 4 is in good agreement with the length of the polypeptide sequence.
Thirdly, similar structures are found in structurally similar lipocalin family proteins. For example, retinol-binding protein (PDB code: 1RBP) has a similar α-helix connected by a disulphide bridge in corresponding region [25].

Quaternary structure: interface-regions
All residues forming the 1-2 interface (numbering according to Livnah et al. [17]) in avidin are conserved in all of the studied proteins ( Table 3). The 1-4 interface, being the most extensive, shows interesting similarities and differences when compared to that of avidin. Residues Gln-53, Thr-67, Trp-70, Gln-82, Val-103 and Thr-113 in this interface are conserved in all of the studied proteins. Asn-54 in the 1-4 interface has been shown to have a central role in the structurally important hydrogen-bonding network in the avidin structure [17,26]. Interestingly, histidine is found in this position in the AVR-proteins, which are known to be stable tetramers [27,28]. Glutamine in this position in the BBP model-structures seems to be able to form similar contacts between the subunits over the 1-4 interface. Taken together, only 10 out of 21 interacting residues in the 1-4 interface are conserved in BBP-A when compared to avidin, with the value being 8 out of 21 in BBP-B.
In the 1-3 interface, Val-115 is conserved, whereas both Met-96 and Ile-117 show variance both in BBPs and AVRs. The position of Met-96 shows interesting substitutions in other proteins since this residue faces the identical residue from the neighbouring subunit in avidin structure. According to previous mutagenesis studies this residue is known to be important for the tetrameric quaternary structure of avidin [26,29]. According to the model structure, Arg-117 in BBP-B might form an interesting intersubunit salt bridge with Glu-13 from subunit 3.

Discussion
The circumstantial evidence has indicated that chicken yolk BBPs may be structurally related to avidin and other members of the avidin family. Therefore we were eager to scan the chicken genome data to evaluate the correctness of this hypothesis. The database queries revealed two independent hits showing high similarity to the published N-terminal sequences of yolk BBPs I and II [4] and, indeed, a potential kinship between BBPs and avidin gene family members was revealed.

1-4 interface
β4 The avidin gene belongs to the gene family that has several other members called AVRs (avidin related genes). Previously, seven different AVR-genes have been cloned [16,20] and the chromosomal location of this gene family has been tracked down to a relatively short region in the telomeric region q21 of the chicken sex chromosome Z [16]. It seems that the number of AVR genes varies between individual chickens and even between cells within the same chicken [30]. The deposited genome data, analysed in the present study, support this observation demonstrating a novel assembly of 3 AVR genes together in the same cluster with the avidin gene. Interestingly, the two AVR genes found in the chicken genome database seem to be novel variants of the formerly cloned AVRs, which also support the previous hypothesis of the high recombination frequency within the avidin gene family [30,31].
Our observations link the BBPs to the avidin family for the first time, at the cDNA and gene level. There are many independent features indicating this. Firstly, the found cDNAs encode proteins that are evidently homologous to avidin. Secondly, the genomic location of the BBP-A gene close to the avidin gene family supports their relationship. Finally, the exon/intron structures of the BBP genes and avidin family genes are similar to each other.
According to the phylogenetic relationships and genome locations of the genes, one scenario for the BBP/avidin evolution is as follow: an initial duplication may have occurred leading to the origin of BBP-B and the precursor of the BBP-A/avidin family, followed by a further duplication leading to the origin of BBP-A and the precursor of the avidin family. This could have been followed by the formation of AVD and an AVR gene and finally the duplication of the latter in several avidin-related genes.
According to molecular modelling, BBP-A and B proteins both showed features that make them suitable for biotin binding. The biotin-binding contact amino acids of avidin [17] were almost perfectly conserved in both BBP sequences. In addition, good conservation of the inner part of the β-barrel in both BBPs is also important for the function of the ligand binding cavity. If we assume that one or both of these putative BBPs represent yolk BBPs I and/or II, other sequence differences should explain their observed weaker dissociation constant for biotin [7]. Similar observation has been done for AVRs which have only a few differences in their biotin-binding residues when compared to avidin, but still exhibit remarkable differences in their biotin-binding affinities [27].
It is probable that both BBP-A and BBP-B form similar tetrameric quaternary structures as avidin. The basis for this assumption is the fact that the conservation of the presumed interface residue patterns were highly similar to those of AVRs, which are also known to form extremely stable tetramers [27,28]. The 1-2 interface, in particular, in which mutations have previously been shown to be extremely important to the stability properties of both avidin and streptavidin [32][33][34], was perfectly conserved in both BBPs. Hence the existing changes were concentrated on the 1-3 and 1-4 interfaces which are known to tolerate substitutions in avidin and AVRs [27][28][29]. The yolk BBPs have, however, been reported to be clearly less heat-stable than avidin [10] and, therefore either the sequence differences in the interfaces may explain this difference or the tertiary structure of the BBP barrel may be weaker than that of avidin. Overall, interfaces in the BBP models suggested tetramer formation, since the putative interface regions were hydrophobic. Furthermore, differences at the subunit interfaces of the BBP models, when compared to those of avidin, were at least partially complementary. For example the Thr-80-Val mutation at the 1-4 interface of BBP-A seemed to be in a highly hydrophobic environment.
The most striking feature that distinguished BBP-B from avidin and BBP-A was its extraneous, approximately 20 amino acid residue-long, C-terminal extension. According to modelling, this stretch could form an α-helix and the cysteine residue at the end of this stretch could form a disulphide bridge with another cysteine residue in βstrand 4 ( Figure 5C). It is, however, hard to interpret the relevance of this predicted α-helix and the possible effect of the helix and the cysteine bridge to the structural and functional properties of BBP-B. One effect could be that it strengthens the structure of the protein. Interestingly, many members of the lipocalin protein family have similar C-terminal α-helical domains [11,12]. Nonetheless, the exon/intron structure of the avidin gene family is different when compared to the lipocalin family [35,36]. This suggests that even if the overall tertiary structures of these proteins are similar, the evolutionary distance between these lineages is overwhelmingly long, or alternatively these protein families have been developed independently. The manner in which BBP-B has acquired its Cterminal extension, found in lipocalins, remains therefore an enigma. Alternative models for this extension can be done as well; the distance between the beginning of C-terminal extension (Lys-123) of the subunit 1 and the free cysteine in loop 3 of the subunits 3 and subunit 4 is around 40 A (not shown). This is also approximately the length of the fully extended C-terminal extension (Lys-123-Cys-138). Based on these distances both 1-3 and 1-4 intersubunit disulphide-bridges are possible.
In the light of the current study, the story of the chicken BBP proteins gets blurred. As before, we have two possible candidates. However, these cDNAs/genes are able to encode proteins having molecular mass of 14-16.5 kDa, which correspond to only one BBP subunit instead of the four subsequent repeats hypothesized earlier [3,9]. This means that either the database still lacks the genuine yolk-BBP gene or that some other phenomenon, like the abovementioned hypothetical disulphide bridges, must explain the previously reported molecular weight properties of BBP-I. Nonetheless, the fact that cDNAs for BBP-A and BBP-B were isolated from the chicken liver library [15], and that the putative proteins they encode have signal sequences, suggest that these cDNAs may really be the yolk BBP cDNAs, which have been reported to be secreted from the liver into the egg [4].
If we try to fit the characteristics of the found BBP candidates to those previously associated with BBPs, BBP-B looks more promising. It has similar low pI [7], its C-terminal extension makes its molecular mass closer to that determined for BBP-II [10] and it is most probably nonglycosylated as is BBP-II [10]. What is then the role of the BBP-A gene? Is it the mysterious BBP isolated from eggwhite [6] or some unknown chicken BBP? It is evident that we need to continue the database queries and/or start cloning BBP genes to clarify this puzzle. In addition, we need to produce these found putative BBPs as recombinant proteins to investigate whether their properties are in agreement with the previous findings and the models of the present study. Furthermore, these new proteins can serve as a source for development of new tools for life sciences.
For example, it could be possible to construct a chimeric avidin-BBP-dimer [37] to adjust the ligand-binding properties of the resultant hybrid protein.

Conclusion
We have identified two putative genes and cDNAs for chicken egg-yolk biotin-binding proteins from NCBI database and chicken genome database. The genomic location and the structures of the found genes and the proteins they encode link clearly BBPs to the avidin family and, moreover, give an insight to the evolutionary history of this gene family. Our molecular modelling results support many preceding observations concerning the biochemical properties of BBPs but also impugn some of the previous hypothesis. Most importantly, the gene/cDNA structures provided no evidences of proteolytic processing of pseudotetramers to tetramers that has been presented as a possible maturation process for BBPs, i.e. conversion of BBP-I to BBP-II.

Database queries and sequence analyses
The N-terminal sequences VEIKXQLSGLWENEQDSL-MEISALADDGG and VERKXQLSGLWENEQDSLMEIS-ALADDLEN [4] were used to search the deposited collection of chicken cDNAs [15] by using TBlastn at the NCBI web site. The obtained cDNA sequences were used to find the corresponding genes and their genomic locations by searching the chicken genome database at the Ensembl web site [38] using blastn. Furthermore, cDNA of avidin [39] and AVRs [16] were used as search strings from this database. The intron/exon structures of these putative genes were analyzed. DNA sequences of AVD, AVR2, AVR4, AVR6, AVR-A, AVR-B, AVR-C and BBP-A and BBP-B were aligned exon by exon and intron by intron using Clustal X in multiple alignment mode with default values for both pairwise and multiple alignment parameters. Relationships among avidin and other biotin-binding proteins were obtained by the Neighbour Joining method from the p-distance matrix as implemented in Mega software [40]. The Dragon Promoter Finder v. 1.5 [41] was used to predict the promoter regions of the genes. The located promoter regions were further characterised using the transcription factor analysis implemented in the Dragon Promoter Finder program using Match™ [42] with default parameters.

Structural modelling and polypeptide analysis
The three-dimensional structure of the avidin-biotin complex (PDB code: 2avi [17]) obtained from the Protein Data Bank [43] was used as a template structure in BBP modelling. Sequence alignment of all proteins was made using the MALIGN [44] multiple alignment tool of BODIL [45,46] by using a structure-based sequence comparison matrix [47] with a gap penalty of 40. Comparative models of BBPs were made with Modeller 6v2 [48] according to alignment: disulphide bridges were forced between cysteine residues 3 and 83 in both BBPs and also between cysteine residues 43 and 138 in BBP-B. Furthermore, carboxy-terminal extension of BBP-B (amino acids 129-137) was forced to adopt α-helix conformation. Visual analysis of obtained models was done with the BODIL molecular modelling program. Alignment representation was made using ALSCRIPT [49] and protein representations were made using PyMOL [50]. The putative signal cleavage sites were analyzed by SignalP [21,51]. The theoretical molecular weights, pIs and extinction coefficients were calculated using the program ProtParam [52,53]. The potential N-glycosylation sites and their qualities were studied by NetNglyc [54].