Genomic analysis reveals extensive gene duplication within the bovine TRB locus

Background Diverse TR and IG repertoires are generated by V(D)J somatic recombination. Genomic studies have been pivotal in cataloguing the V, D, J and C genes present in the various TR/IG loci and describing how duplication events have expanded the number of these genes. Such studies have also provided insights into the evolution of these loci and the complex mechanisms that regulate TR/IG expression. In this study we analyze the sequence of the third bovine genome assembly to characterize the germline repertoire of bovine TRB genes and compare the organization, evolution and regulatory structure of the bovine TRB locus with that of humans and mice. Results The TRB locus in the third bovine genome assembly is distributed over 5 scaffolds, extending to ~730 Kb. The available sequence contains 134 TRBV genes, assigned to 24 subgroups, and 3 clusters of DJC genes, each comprising a single TRBD gene, 5–7 TRBJ genes and a single TRBC gene. Seventy-nine of the TRBV genes are predicted to be functional. Comparison with the human and murine TRB loci shows that the gene order, as well as the sequences of non-coding elements that regulate TRB expression, are highly conserved in the bovine. Dot-plot analyses demonstrate that expansion of the genomic TRBV repertoire has occurred via a complex and extensive series of duplications, predominantly involving DNA blocks containing multiple genes. These duplication events have resulted in massive expansion of several TRBV subgroups, most notably TRBV6, 9 and 21 which contain 40, 35 and 16 members respectively. Similarly, duplication has lead to the generation of a third DJC cluster. Analyses of cDNA data confirms the diversity of the TRBV genes and, in addition, identifies a substantial number of TRBV genes, predominantly from the larger subgroups, which are still absent from the genome assembly. The observed gene duplication within the bovine TRB locus has created a repertoire of phylogenetically diverse functional TRBV genes, which is substantially larger than that described for humans and mice. Conclusion The analyses completed in this study reveal that, although the gene content and organization of the bovine TRB locus are broadly similar to that of humans and mice, multiple duplication events have led to a marked expansion in the number of TRB genes. Similar expansions in other ruminant TR loci suggest strong evolutionary pressures in this lineage have selected for the development of enlarged sets of TR genes that can contribute to diverse TR repertoires.


Background
Diverse αβTR repertoires are crucial to the maintenance of effective T cell-mediated immunity [1]. Estimates based on direct measurement indicate that in humans and mice individuals express a repertoire of approximately 2 × 10 7 [2] and 2 × 10 6 [3] unique αβTRs respectively. As with the other antigen-specific receptors (IG of B cells and γδTRs of γδT cells) diversity is generated in lymphocytic precursors by somatic recombination of discontiguous variable (V), diversity (D -TRB chains but not TRA chains) and joining (J) genes to form the membrane-distal variable domains. Diversity is derived from both the different permutations of V(D)J genes used to form the TRA and TRB chains expressed by individual thymocytes (combinatorial diversity) and also by the activity of terminal deoxynucleotide transferase and exonuclease at the V(D)J junction during recombination (junctional diversity). Consequently, much of the diversity is focused in the third complementarity determining region (CDR3), which is encoded by the V(D)J junction and forms the most intimate association with the antigenic peptide component of the peptide-MHC (pMHC) ligand of αβTRs, whereas the CDR1 and CDR2 of the TRA and TRB chains, that predominantly interact with the MHC, are encoded within the germline V genes [4,5].
TRB chain genes are located in the TRB locus, which in humans is ~620 Kb long and situated on chromosome 7 and in mice is ~700 Kb and located on chromosome 6 [6][7][8]. In both species, the organisation of TRB genes is similar, with a library of TRBV genes positioned at the 5' end and 2 DJC clusters (each composed of a single TRBD, 6-7 TRBJ and a single TRBC gene) followed by a single TRBV gene with an inverted transcriptional orientation located at the 3'end [9,10]. The germline repertoire of TRBV genes in humans is composed of 65 genes belonging to 30 subgroups (genes with > 75% nucleotide identity), whilst in mice the repertoire comprises 35 genes belonging to 31 subgroups [10][11][12] The disparity between the number of TRBV genes in the 2 species is the result of multiple duplication events within the human TRB locus, most of which have involved tandem duplication of blocks of DNA (homology units) containing genes from more than one subgroup [10,13]. V(D)J recombination is initiated by site-specific DNA cleavage at recombination signal sequences (RSs) mediated by enzymes encoded by recombination activating genes (RAG) 1 and 2 [14]. RSs comprise conserved heptamer and nonamer sequences separated by spacers of either 12 bp (12-RS -located 5'to TRBD and TRBJ genes) or 23 bp (23-RS -located 3' to TRBV and TRBD genes). Correct V(D)J assembly is achieved as recombination can only occur between genes flanked with RS of dissimilar length (the '12/23 rule') and direct TRBV/TRBJ recombi-nation is prohibited by the 'beyond 12/23' phenomenon [15][16][17]. As with other antigen-specific receptor loci, recombination in the TRB locus is under strict lineage-, stage-and allele-specific regulation associated with control of RAG accessibility to RSs mediated through alterations in chromatin structure (the 'accessibility hypothesis') [18][19][20]. Numerous studies have shown that both the TRB enhancer (Eβ) and transcriptional promoters within the TRB locus serve as RAG accessibility control elements, playing a critical role in regulating chromatin structure and therefore recombination of TRB genes [21][22][23][24][25][26][27].
Current knowledge of the TRB gene repertoires of agriculturally important artiodactyl species (e.g. pigs, cattle and sheep) is limited. Published analyses of rearranged TRB transcripts have demonstrated the expression of 19 TRBV subgroups in pigs [28,29], 13 subgroups in sheep [30] and 17 subgroups in cattle, some of which have undergone extensive duplication [31][32][33][34]. Information on the genomic organisation of the TRB loci is predominantly restricted to the DJC region, which in the pig was found to be composed of 2 tandemly arranged DJC clusters [35] but in sheep contained 3 tandemly arranged DJC clusters [36]. Preliminary analysis of a BAC clone corresponding to part of the DJC region indicates that in cattle the DJC region may also consist of 3 DJC clusters [37].
Sequencing of the complete TRB loci in human and mice allowed the repertoire of TRB genes in these species to be fully characterised and also permitted analysis of the organisation, regulation and evolution of this immunologically important locus [9,10]. In this study we have used the sequence of the third bovine genome assembly (Btau_3.1) to further study the bovine TRB repertoire and TRB locus. Although the sequence of the TRB locus is incomplete, the results reveal that duplication within the locus has been prolific leading to a massive expansion of TRBV gene numbers and the generation of a third DJC cluster. Furthermore, the analysis shows that the genomic organisation of the TRB locus and the non-coding elements that regulate TRB expression are highly conserved in cattle when compared to that of humans and mice.
nucleotide identity with the other members of this subgroup.
Of the 24 bovine subgroups present in the genome assembly, 11 have multiple members. Subgroups TRBV6, 9 and 21 have all undergone substantial expansion, having 40, 35 and 16 members respectively -together representing 68% of the total Btau_3.1 TRBV gene repertoire. Southern blot analysis corroborates the presence of large numbers of TRBV6 and 9 genes in the genome (Figure 1).
A prominent feature of the genomic organisation of TRBV genes ( Figure 2)  genes are located ( Figure 3). Six homology units, ranging in size from ~7 Kb to ~31 Kb and encompassing from 1 to 11 TRBV genes were identified. Three of these homology units (represented by the orange, dark blue and black bars in Figure 2) have undergone multiple (2-3) duplications: variation in the length of the different copies of these homology units (represented by broken lines in Figure 2), suggests that either i) distinct iterations of a duplication event have involved different components of the homology unit or ii) the different copies have been subject to different post-duplication deletions.
The levels of nucleotide identity between TRBV genes in corresponding positions in homology units is frequently high: 12 pairs of TRBV6 genes, 11 pairs of TRBV9 and 1 pair each of TRBV19 and TRBV20 have identical coding sequences whilst 1 pair of TRBV4 genes and 3 pairs of TRBV21 as well as 4 triplets of TRBV6 and 4 triplets of TRBV9 genes have > 97% sequence identity in the coding region.

Duplication has expanded the repertoire of TRBD, TRBJ and TRBC genes in the bovine genome
A total of 3 TRBD, 18 TRBJ and 3 TRBC genes were identified in the assembly (Additional File 1). These genes were all located within a ~26 Kb region of scaffold Chr4.003.108 and organised into 3 tandemly arranged clusters, each of ~7 Kb length and composed of a single TRBD gene, 5-7 TRBJ genes and a single TRBC gene ( Figure  2). Dot-plot analysis reveals the presence of a third DJC cluster is attributable to duplication of a ~7 Kb region, one copy of which incorporates TRBC1, TRBD2 and the TRBJ2 cluster whilst the other copy incorporates TRBC2, TRBD3 and the TRBJ3 cluster ( Figure 4). Numerous interruptions in the line representing the duplicated region indicate that there has been significant post-duplication deletion/insertion related modification of the duplicated region.
The nucleotide and deduced amino acid sequence of the 3 TRBD and 18 TRBJ genes as well as the flanking RS are shown in Figure 5a and 5b respectively. The 13 bp (TRBD1) or 16 bp (TRBD2 and 3) TRBD genes are G-rich and encode at least one glycine in all 3 potential reading frames with the exception of the 3 rd reading frame of TRBD1. The TRBJ genes range in size from 43 bp to 59 bp in length and all encode the canonical FGXG amino acid motif that defines TRBJ genes.
As with all mammalian TRBC genes so far characterised, bovine TRBC1 and TRBC3 genes are composed of 4 exons, 3 introns and a 3'UTR region. The structure of the TRBC2 gene is anticipated to be the same but due to a region of undetermined sequence between exons 1 and 3 we were unable to identify exon 2. The exon nucleotide sequences of TRBC1 and 3 are very similar (97%), resulting in the Dot-plot analyses of Chr4.003.105  Figure 6a). The incomplete sequence for TRBC2 is predicted to encode a product identical to that of TRBC1. In contrast to the high levels of pairwise identity between the exonic nucleotide sequences of all 3 TRBC genes, the nucleotide sequences of the 3 rd intron and the 3'UTR regions of TRBC3 show low identity with TRBC1 and 2, whereas the latter two genes show a high level of identity ( Figure 6b). The similarity in the lengths of TRBD2 and 3, the phylogenetic clustering of TRBJ2 and TRBJ3 genes in corresponding genomic positions ( Figure  7) and the similarity in the sequences of the 3 rd introns and 3'UTRs of TRBC1 and 2 all reflect the duplication history of the DJC region as described in Figure 4.

The repertoire of functional TRBV, TRBD and TRBJ genes available for somatic recombination is large and phylogenetically diverse
Computational analysis was used to predict the functional competency of the TRBV, TRBD and TRBJ genes present in the genome assembly. Fifty-five (41%) of the TRBV genes identified are predicted to encode pseudogenes (Additional File 2), whilst TRBJ1-2 (which has a 1 bp deletion that results in the canonical FGXG motif being lost in the ORF) and TRBJ1-3 (which lacks a RS that is compatible with somatic recombination) are also predicted to be non-functional ( Figure 5). Thus, the functional repertoire comprises 79 (59%) TRBV genes (comprising 66 unique coding TRBV sequences) belonging to 19 different subgroups, 3 TRBD genes and 16 TRBJ genes. This provides a potential 3168 (66 × 3 × 16) unique VDJ permutations that can be used during somatic recombination of TRB chains.
Phylogenetic analysis demonstrates that the repertoire of functional TRBV genes is diverse ( Figure 8), with representatives in each of the 6 phylogenetic groups (A-F) described previously in humans and mice [13,39]. The phylogenetic groupings were supported by high (99%), bootstrap values (P B ), with the exception of group A (P B = 76%). Maximum likelihood analysis using a variety of nucleotide models provides essentially similar phylogenetic clustering (data not shown) indicating the reliability of the tree presented in Figure 8. The extensive intermingling of murine, human and bovine TRBV subgroups is consistent with the establishment of distinct subgroups having occurred prior to mammalian radiation. Conversely, the formation of distinct clades of TRBV genes of orthologous subgroups from different species (e.g. TRBV6 genes from human and bovine form distinct clades) indicates that duplication within subgroups has predominantly occurred post-speciation. Despite this and the substantial disparity in the number of functional TRBV genes present in the 3 species, the distribution amongst the different phylogenetic groups is similar (Figure 8b). Phylogenetic groups C and F form a minor component of the functional TRBV repertoire, whilst the contributions from groups B and D are also fairly modest. In contrast, group E and to an even greater extent group A, are overrepresented, together representing between 61.9% (in the mouse) and 81.6% (in humans) of the total functional repertoire.
Phylogenetic analysis resolves the functional TRBJ genes in human, mice and Btau_3.1 into 12 groups (Figure 7). With the exception of group 8, each group is supported by high P B values and is composed of orthologues that share a conserved order in the genome; consistent with the duplication history of the DJC region, TRBJ genes from both the 2 nd and 3 rd bovine DJC clusters group together with the respective genes from the 2 nd murine and human DJC clusters. Group 8, which contains TRBJ2-3, human and murine TRBJ2-4 and bovine TRBJ3-3 and 3-4 genes is only supported by a P B value of 57%. The diversity of the functional TRBJ repertoire across the 3 species is comparable, with humans having functional genes in each of the 12 phylogenetic groups whilst in both mice and Btau_3.1 only 2 groups lack functional members: groups 3 (TRBJ1-Dot-plot analysis of the bovine DJC region on Chr4.003.108 Figure 4 Dot-plot analysis of the bovine DJC region on Chr4.003.108. Duplication of a ~7 Kb region (diagonal line between black arrows) has generated a third DJC cluster. One of the homology units incorporates TRBC1, TRBD2 and the TRBJ2 whilst the other incorporates TRBC2, TRBD3 and TRBJ3. Smaller lines parallel to the main diagonal reflect the similarity in sequence of TRBC3 with TRBC1 and 2 (grey arrows). TRBJ3  TRBD3  TRBC2  TRBJ2  TRBD2  TRBC1  TRBJ1  TRBD1   TRBD1   TRBJ1   TRBC1   TRBD2  TRJB2   TRBC2   TRBD3   TRBJ3 TRBC3 0 0 39000 39000

TRBC3
The genomic sequence of the (A) 3 TRBD genes and (B) 18 TRBJ genes Figure 5 The genomic sequence of the (A) 3 TRBD genes and (B) 18 TRBJ genes. The nucleotide and predicted amino acid sequences of (A) The TRBD genes. TRBD genes have the potential to be read in all 3 reading frames, and with the exception of the 3 rd reading frame of TRBD1 encode at least 1 glycine residue. (B) The TRBJ genes. TRBJ1-3 is predicted to be non-functional due to loss of consensual RS heptamer sequence (bold and underlined).( †) In the genome TRBJ1-2 has a frameshift due a single base pair deletion in the TRBJ region and would therefore be predicted to be a pseudogene, but based on sequences correlating with this TRBJ gene derived from cDNA analyses we have introduced a thymidine (shown in parentheses).

Comparison with cDNA data identifies additional TRBV gene sequences missing from the genome assembly
Using a variety of RT-PCR based methods, our group has isolated and sequenced over 1000 partial TRB chain cDNAs [31][32][33]40]. With a few exceptions, these cDNA sequences incorporated > 230 bp of the TRBV gene (i.e. over 80% of the sequence encoding the mature peptide) and in some cases the full length of the TRBV gene. Based on the assumption that sequences sharing ≤ 97% nucleotide identity represent distinct genes, as applied in studies of human and murine TRBV genes [41,42], our analysis identified 86 putative unique TRBV genes belonging to 22 subgroups ( Table 1). Analysis of the sequence data available for each cDNA sequence indicated that only one of these genes is predicted to be non-functional (TRBV6-6 -due to a loss of a conserved cysteine encoding codon at position 104 according to the IMGT numbering The bovine TRBC genes   Neighbor-joining phylogenetic tree of the functional genomic repertoire of murine, human and bovine TRBJ genes Figure 7 Neighbor-joining phylogenetic tree of the functional genomic repertoire of murine, human and bovine TRBJ genes. Analysis was completed on the coding and RS nucleotide sequence of functional TRBJ genes following complete deletion to remove gaps in the alignment. The final dataset included 59 positions. The 12 phylogenetic groups (1-12) have been indicated and the percentage bootstrap interior branch test value (P B ) based on a 1000 replications is shown for each of the groups. Generally each group is composed of genes from the 3 species that share a conserved order in the genome; group 8 is unique in containing the orthologues of two adjacent genes human and murine TRBJ2-3 and TRBJ2-4 (and in the bovine TRBJ3-3 and TRBJ3-4 as well as TRBJ2-3). at least some of the cDNAs fall into the latter category, is supported by the identification of sequences exhibiting 100% identity with 4 of these cDNA sequences, in the genome project's WGS trace archive (data not shown). Conversely, 40 (63.5%) of the 63 predicted functional genes identified in these subgroups within the genome did not have cDNA sequences displaying 100% nucleotide identity. Twenty-two of these (34.9%) showed 98-99% identity with cDNA sequences, whilst the remaining 18 (28.6%) exhibited < 97% identity to any of the cDNA sequences. In contrast to the findings with multi-member subgroups, cDNAs corresponding to 9 subgroups with single members identified in the genome showed 100% identity with their respective genome sequence. Thus, comparison with cDNA evidence suggests that substantial numbers of genes, predominantly from the large subgroups, are still absent from Btau_3.1.
In contrast to the TRBV situation, all TRBD and TRBC genes and the 16 functional TRBJ genes identified in Btau_3.1 have been found expressed in cDNA. In addition, a functional allele of the TRBJ1-2 gene, which compared to the genomic sequence has a 1 bp insertion that restores the ORF encoding the FGXG motif ( Figure 5), has been identified. No evidence for further TRBD, TRBJ or TRBC genes was found from cDNA analysis, suggesting the repertoire of these genes in Btau_3.1 is complete.

Conserved synteny between the human TRB locus and scaffolds Chr4.003.105 and Chr4.003.108
The organisation of genes within Chr4.003.105 and Chr4.003.108 shows marked conservation in order with that of genes at the 5' and 3'ends of the human TRB locus respectively ( Figure 9). Genes belonging to orthologous TRBV subgroups show a similar order although in some areas substantial tandem duplication has obscured synteny at the level of individual genes (e.g. the TRBV3-13 regions in the human TRB locus and on Chr4.003.105). TRBVX, the only bovine TRBV gene that has no human orthologue, is located in a position (between the dopamine-β-hydroxylase-like (DβH-like) gene and trypsinogen genes) syntenic with its murine orthologue (mTRBV1). As mentioned previously, synteny is also shown in the organisation of TRBJ genes, with human and bovine orthologues occupying conserved positions in their relative clusters.
Synteny also extends to non-TRB genes located within and adjacent to the human TRB locus. The 5 trypsinogen genes located on Chr4.003.105 and Chr4.003.108 are syntenic to those located towards the 5'end and 3'end of the human TRB locus respectively, and the DβH-like gene flanking the 5' end of the human TRB locus and the ephrin type-b receptor 6 precursor (EPH6), transient receptor potential cation channel subfamily V (TRPV) Comparison of the genomic organisation of genes on Chr4.003.105 and Chr4.003.108 with the human TRB locus Figure 9 Comparison of the genomic organisation of genes on Chr4.003.105 and Chr4.003.108 with the human TRB locus. The relative position of genes or groups of genes in the human TRB locus and the orthologues on Chr4.003.105 and Chr4.003.108 are shown. Human TRBV genes without bovine orthologues are shown in red script, as is bovine TRBVX which lacks a human orthologue. The hatch areas marked with an asterisk in Chr4.003.105 and Chr4.003.108 indicate large areas of undetermined sequence. DβH-like (dopamine-β-hydroxylase like gene), TRY (trypsinogen genes), EPH-6 (ephrin type-b receptor 6 precursor), TRPV5 (transient receptor potential cation channel subfamily V member 5), TRPV6 and Kell (Kell blood group glycoprotein). TRY   TRBV1  TRBV2   TRBV3 -13   TRBV14  TRBV15  TRBV16  TRBV17  TRBV18 -21  TRBV22  TRBV23  TRBV24  TRBV25  TRBV26  TRBV27  TRBV28  TRBV29  TRY   DJC  TRBV30   EPH6  TRPV6  Although fluorescent in situ hybridisation studies have previously shown that the position of the TRB locus with respect to the blue cone pigment (BCP) and chloride channel protein 1 (CLCN1) genes are conserved between ruminants and humans [46], this analysis shows for the first time the high levels of synteny between human and bovine orthologues both within and adjacent to the TRB locus. Extrapolation of this synteny predicts that Chr4.003.105 and Chr4.003.108 (in reverse complement) should be juxtaposed on chromosome 4, whilst Chr4.003.106, which contains bovine orthologues to numerous genes that in humans are telomeric to the TRB locus (including CLCN1) should be located 3' to Chr4.003.108 and Chr4.003.107, which contains a bovine orthologue to the acylglycerol kinase (AGK) gene that in humans lies centromeric to the TRB locus, should be positioned 5' to Chr4.003.105. This location of Chr4.003.106 has also been predicted by clone pairedend analysis (data not shown).

RS and regulatory elements sequences are conserved in the bovine TRB locus
The RS sequences of the bovine TRBV, TRBJ and TRBD genes show a high degree of similarity to canonical RS sequences defined for the corresponding human and murine genes ( Figure 10). In the bovine TRBV 23-RS sequences the CACAG of the heptamer and a poly-A stretch in the centre of the nonamer show a high degree of intra-and inter-species conservation. Although conservation of the spacer is less marked, the CTGCA sequence proximal to the heptamer is reasonably well conserved and similar to that of humans. Despite more limited conservation, the 8 bp proximal to the nonamer also displays a degree of cross-species similarity. Similarly, the bovine TRBJ RS exhibits intra-and inter-species conservation of the first 3 bp (CAC) of the heptamer sequence and a poly-A stretch in the nonamer. Conservation in the spacer is limited but overrepresentation of G at the position 6 bp from the heptamer and C 4 bp from the nonamer is seen in both the bovine and human.
We identified a 187 bp sequence ~8.7 Kb 3' to the TRBC3 gene that displays high nucleotide similarity with the sequences of the enhancers (Eβ) identified in the murine (76.2%) and human (78.3%) TRB loci [47][48][49]. Sequences of the protein binding sites described in the Eβs of humans (Tβ2-4) and mice (βE1-6) are well conserved in the aligned bovine sequence (Figure 11a); several of the transcription binding sites shown to be functionally important in the regulation in Eβ function [47][48][49][50], such as the GATA-binding site in βE1/Tβ2 and the κE2-binding motif in βE3 are absolutely conserved, whilst others (such as he CRE in βE2/Tβ2) show minimal sequence divergence. In contrast, the sequence of the TRBD1 promoter (PDβ1), which includes the ~300 bp directly upstream of the TRBD1 gene and has been well defined in the mouse [51,52], displays a more limited nucleotide identity (59.2%) with the bovine sequence. As shown in Figure  11b, some transcription factor binding sites demonstrated to be important for PDβ1 function (SP-1 and GATA) in mice and/or humans are absent from the bovine sequence, whereas others (TATA box, AP-1 and Ikaros/Lyf-1) have been well conserved [51][52][53].
We were also able to identify a conserved cAMP responsive element (CRE) motif (AGTGAxxTGA) in the ~80-120 bp upstream sequence of 57 (42.6%) of the bovine TRBV genes (Figure 11c). This motif is found within conserved decamer sequences in the promoter regions of some murine and human TRBV genes [54] and has been shown to specifically bind a splice variant of a CRE binding protein preferentially expressed in the thymus [55]. In general, the CRE motif has been found in bovine genes that are members of subgroups that are orthologous to the human TRBV subgroups in which the CRE motif is also found [10].

Discussion
Sequencing of the human and murine TRB loci has defined the repertoire of TRB genes in these species as well as provided insights into the organisation, evolution and regulation of this immunologically important locus [9,10]. Although the bovine TRB locus sequence in the third bovine genome assembly is incomplete, the analysis conducted in the present study has provided insight into the nature of the bovine TRB gene repertoire and its genomic organisation and evolution.
The most striking result from the study was the large number of TRBV genes identified (134) which is over twice the number found in humans and four times that in mice [11,12]. Although 11 of the 24 bovine subgroups identified in the genome contain multiple genes, the majority of the TRBV repertoire expansion is attributable to the extensive membership of just 3 subgroups, TRBV6 (40 members), 9 (35 members) and 21 (16 members). By comparison, the largest subgroups in humans are TRBV6 and TRBV7, with 9 members each, whilst in mice the only multi-membered subgroups are TRBV12 and 13 with 3 members each. As in humans the expansion of the TRBV repertoire has predominantly occurred through the tandem duplication of DNA blocks containing genes from more than 1 subgroup [9,10]. Dot-plot analyses shows that this duplication in the bovine is complex, leading to the generation of 6 homology units ranging in size from 7 Kb to ~31 Kb and encompassing between 1 and 11 TRBV genes. Unequal cross-over (non-homologous meiotic recombination) between genome-wide repeats (e.g. SINEs, LINEs and LTRs) has been proposed to act as the substrate for such duplication events in TR loci [9]. Although genome-wide repeats are found in the DNA surrounding the bovine TRBV genes (Additional file 3), as in the human TRB locus they are only rarely found at the boundaries of duplicated homology units (data not shown), suggesting their contribution to mediating duplication is minimal [10].
Although gene conversion of TRBV genes has been documented [56], as with other multi-gene families in the immune system, TRBV genes predominantly follow a 'birth-and-death' model of evolution [13,57,58], by Comparison of recombination signal sequences of human, murine and bovine TRB genes which new genes are created by repeated gene duplication, some of which are maintained in the genome whilst others are deleted or become non-functional due to mutation. Genes maintained following duplication are subject to progressive divergence, providing the opportunity for diversification of the gene repertoire. Gene duplication within the TR loci has occurred sporadically over hundreds of millions of years with ancient duplications accounting for the generation of different subgroups and more recent duplications giving rise to the different members within subgroups [9,59]. The continuous nature of duplication and divergence of bovine TRBV genes is evident in the multi-membered subgroups where nucleotide identity between members ranges between 75.5% and 100%. The complete identity observed between some TRBV genes suggests that some of the duplication events have occurred very recently. Similar features have been described for the murine TRA and human IGκ loci, within Sequence comparison of regulatory elements in the bovine, human and murine TRB loci
The distribution of TRBV genes over 5 scaffolds and the presence of > 180 Kb of undetermined sequence within two of the scaffolds indicate that characterisation of the genomic TRBV repertoire remains incomplete. Comparison with cDNA sequences data indicates that the number of undefined genes is substantial -only 36/86 (42%) of TRBV genes identified from cDNA analysis have corresponding identical sequences in Btau_3.1. Most of the identified TRBV genes missing from the assembly are members of the large subgroups TRBV6, 9, 19, 20, 21 and 29, further enhancing their numerical dominance. Although it is anticipated that completion of the TRB locus sequence will incorporate significant numbers of additional TRBV genes, the possible existence of insertiondeletion related polymorphisms (IDRPs), which can lead to intra-species variation in genomic TRBV gene repertoires as described in human and murine TRB loci [65][66][67][68], may result in some of the genes identified in cDNA being genuinely absent from the sequenced bovine genome The proportion of TRBV pseudogenes in Btau_3.1 is 41%, comparable to that seen in both humans (29%) and mice (40%), suggesting that the 'death rate' in TRBV gene evolution is generally high [58]. Pseudogene formation has occurred sporadically throughout the evolution of TRBV genes, with genes that have lost function tending to subsequently accumulate further lesions [9]. The majority of bovine TRBV pseudogenes (57%) contain a single lesion and thus appear to have arisen recently; the remaining 43% have multiple lesions of varying severity and complexity (Additional file 2). In addition to pseudogenes we also identified 7 sequences showing limited local similarity to TRBV genes in Btau_3.1 (Figure 2 -open boxes). Such severely mutated TRBV 'relics', 22 of which have been identified in the human TRB locus [10]., are considered to represent the remnants of ancient pseudogene formation.
In contradiction to a previous report [39], the repertoire of functional TRBV genes in Btau_3.1 exhibits a level of phylogenetic diversity similar to that of humans and mice. Phylogenetic groups A and E are over-represented in all 3 species, which in humans and cattle is largely attributable to expansion of subgroups TRBV5, 6, 7 and 10 and TRBV6, 9 and 21 respectively; in mice the expansion of subgroups TRBV12 and 13 make a more modest contribution to this over-representation. Much of the expansion of human subgroups TRBV5, 6 and 7 occurred 24-32 MYA [13] and similarly, as described above, in bovines much of the expansion of subgroups TRBV6, 9 and 21 subgroups appears to be very recent. As these expansions have occurred subsequent to primate/artiodactyl divergence (~100MYA) [69], over-representation of phylogenetic groups A and E must have occurred as parallel but independent events in these lineages, raising interesting questions about the evolutionary pressures that shape the functional TRBV repertoire.
In contrast to the wide variation in the organisation of TRBD, TRBJ and TRBC genes in the TRB locus seen in nonmammalian vertebrates [70][71][72][73][74], in mammals the arrangement of tandemly located DJC clusters is well conserved [10,35,36,75,76]. Although most placental species studied have 2, variation in the number of DJC clusters has been observed, with unequal cross-over events between TRBC genes usually invoked as the most likely explanation for this variation [36,77,78]. The results from this study provide the first description of the entire bovine DJC region and confirm that like sheep, cattle have 3 complete DJC clusters [36,37]. Dot-plot and sequence analyses indicate that unequal crossover between the ancestral TRBC1 and TRBC3 genes led to duplication of a region incorporating TRBC1, TRBD3 and TRBJ3 genes, generating the DJC2 cluster. The similarity with the structure of the ovine DJC region suggests that this duplication event occurred prior to ovine/bovine divergence 35.7 MYA [69]. As with duplication of TRBV genes, expansion of TRBD and TRBJ gene numbers has increased the number of genes available to partake in somatic recombination -the 3168 different VDJ permutations possible from the functional genes present in Btau_3.1 is considerably more than that for either humans (42 × 2 × 13 = 1092) or mice (21 × 2 × 11 = 462). Interestingly, the sequence of bovine TRBD1 gene is the first TRBD gene described that doesn't encode a glycine residue (considered integral to the structure of the CDR3β) in all 3 reading frames [79]. However, analysis of cDNA reveals evidence of expression by functional TRB chains of TRBD1in the reading frame that doesn't encode a glycine but have generated a glycine codon by nucleotide editing at the VJ junction (data not shown).
In contrast to TRBV, TRBD and TRBJ genes which encode products that bind to a diverse array of peptide-MHC ligands, TRBC gene products interact with components of the CD3 complex which are non-polymorphic. Consequently, due to structural restrictions TRBC genes are subject to concerted evolutionary pressures with intra-species homogenization through gene conversion evident in both humans and mice [9,80]. Similarly, the bovine TRBC genes were found to encode near identical products, most likely as a result of gene conversion, although in the case of TRBC1 and TRBC2 genes this more probably reflects minimal divergence following duplication.
Comparison with the human and murine sequences shows that non-coding elements that regulate TRB expression, such as the Eβ, promoters and RSs are highly conserved in the bovine. This is consistent with work demonstrating that the critical role of RSs has enforced a high level of evolutionary conservation [70,73,74,81] and that Eβ and PDβ1 sequences are well conserved in eutherian species [36,52]. Although transcriptional factor binding sites are less well conserved in the putative PDβ1 than the Eβ sequence, the Ikaros/Lyf-1 and Ap-1 binding sites of the PDβ1, which are vital in enforcing stage-specific (i.e. Dβ-Jβ prior to Vβ-DβJβ recombination) are conserved [53,82]. Our analysis of putative TRBV promoter elements was restricted to the well described CRE motif [9,10,54]. However, TRBV promoters are complex and expression of TRBV genes whose promoters lack the CRE motif is maintained through the function of other transcriptional factor binding sites [83]. A more detailed analysis of the bovine TRBV promoters would be interesting given the potential influence this may have on shaping the expressed TRBV repertoire [25], but is beyond the scope of the current study.
The portion of the bovine TRB locus described in Btau_3.1 encompasses > 730 Kb of sequence (excluding the regions of undetermined sequence in Chr4.003.105 and Chr4.003.108). Thus, although incomplete, the bovine TRB locus is larger than that of either humans (620 Kb) or mice (700 Kb), mainly as a consequence of the duplications leading to the dramatic expansion of the V genes. In contrast to V genes, duplication of trypsinogen genes within the TRB locus is more limited in the bovine ( Figure  2), where only 5 trypsinogen genes were identified, compared to the human and murine where more extensive duplication has lead to the presence of 8 and 20 trypsinogen genes respectively. Despite the differences in duplication events, the organisation of both TR and non-TR genes within and adjacent to the TRB locus exhibits a striking conserved synteny between cattle, humans and mice [9,84]. Indeed, the organisation of genes within the TRB locus and its position relative to adjacent loci is ancient, with marked conserved synteny also demonstrated between eutherian and marsupial mammalian species and, to a large extent, chickens [9,75]. Given the evidence for conserved synteny of TRBV gene organisation despite dissimilar duplication/deletion events between mice, humans and cattle, the results of the analysis completed in this study suggest that several subgroups including TRBV1, 2, 17, 22 and 23, which were not identified in the genome assembly or from cDNA sequences, may have been deleted from the bovine genome ( Figure 9). Conservation of synteny would predict that the genomic location of the TRBV27 gene identified from cDNA analysis will be within the region of undetermined sequence in Chr4.003.108 between the TRBV26 and 28 genes ( Figure  9).

Conclusion
The primary purpose of this study was to analyse the sequence data made available from the third bovine genome assembly to gain a better understanding of the bovine TRB gene repertoire and the organization and evolution of the bovine TRB locus. The results of this analysis have shown that: (1) the bovine TRBV genomic repertoire has been dramatically expanded through a complex series of duplication events and although incomplete, is the largest described to date. These duplication events have led to massive expansion in the membership of certain TRBV subgroups, particularly TRBV6, 9 and 21; (2) duplication has generated 3 DJC clusters compared to 2 in humans and mice; (3) the elements that regulate TRB expression and the organisation of genes within and adjacent to the TRB loci exhibit high levels of conservation between humans, mice and cattle. (4) cDNA evidence indicates that a substantial number of TRBV genes, predominantly from the larger subgroups are absent from the current assembly.
Notwithstanding the incomplete assembly of the TRB locus, the results of these analyses clearly demonstrate that cattle possess a phylogenetically diverse repertoire of functional TRB genes that is substantially larger than that described for other species. These findings, together with emerging evidence of similar expansions of gene repertoires for other TR chains in ruminants [85,86] suggest that strong evolutionary pressures have driven a generic enlargement of TR gene numbers, and thus greater potential TR diversity, in the ruminant lineage. Further studies are required to define the full extent of these expansions and to understand their evolutionary basis. To be considered functional TRBV gene segment sequences were required to maintain i) splice sites appropriate for RNA editing, ii) open reading frames, which include codons for the conserved cysteine, tryptophan and cysteine residues at positions 23, 41 and 104 (IMGT unique numbering system [43]) respectively and iii) a 23-RS compatible with somatic recombination [98,99].

Nomenclature
As the sequence of the TRB locus was incomplete, it was not possible to fully implement the IMGT nomenclature system which requires knowledge of the genomic order of genes from the 5' to 3' end of the locus [100]. Genomic bovine TRBV gene subgroups have been named according to the orthologous subgroups in humans and members of subgroups have been given an alphabetic rather than numeric description to avoid confusion with previously published cDNA data [32]. The DJC region of the locus appears complete and so the TRBD, TRBJ and TRBC have been named according to their 5' to 3' order in the genome.

Phylogenetic analysis
Phylogenetic analysis was performed on the nucleotide sequences of functional TRBV genes (coding sequences) and TRBJ genes (coding sequence + RS) of humans, mice and bovine as identified in Btau_3.1. Neighbour-joining method [101] analysis was performed with the MEGA4 software [102,103], using the uncorrected nucleotide differences (p-distance), which is known to give better results when a large number of sequences which contain a relatively small number of nucleotides are examined [104]. Maximum likelihood analysis was performed under a variety of substitution models (Jukes-Cantor, Kimura 2parameter, Felenstein 81, Felenstein 84, Tamura-Nei 93 and General Time Reversible) as implemented by the PHYML programme [105,106], using the phylogenetic tree produced by NJ analysis as the primary tree. In each case the reliability of the resulting trees was estimated by the approximate Likelihood Ratio Test (aLRT) method [107].

Authors' contributions
TC, JA, AL and WM conceived the study. TC performed the genome analysis, the cDNA and southern blot work, and with JA completed the sequence and phylogenetic analysis. All authors read and approved the manuscript.