Annotation and classification of the bovine T cell receptor delta genes

Background γδ T cells differ from αβ T cells with regard to the types of antigen with which their T cell receptors interact; γδ T cell antigens are not necessarily peptides nor are they presented on MHC. Cattle are considered a "γδ T cell high" species indicating they have an increased proportion of γδ T cells in circulation relative to that in "γδ T cell low" species such as humans and mice. Prior to the onset of the studies described here, there was limited information regarding the genes that code for the T cell receptor delta chains of this γδ T cell high species. Results By annotating the bovine (Bos taurus) genome Btau_3.1 assembly the presence of 56 distinct T cell receptor delta (TRD) variable (V) genes were found, 52 of which belong to the TRDV1 subgroup and were co-mingled with the T cell receptor alpha variable (TRAV) genes. In addition, two genes belonging to the TRDV2 subgroup and single TRDV3 and TRDV4 genes were found. We confirmed the presence of five diversity (D) genes, three junctional (J) genes and a single constant (C) gene and describe the organization of the TRD locus. The TRDV4 gene is found downstream of the C gene and in an inverted orientation of transcription, consistent with its orthologs in humans and mice. cDNA evidence was assessed to validate expression of the variable genes and showed that one to five D genes could be incorporated into a single transcript. Finally, we grouped the bovine and ovine TRDV1 genes into sets based on their relatedness. Conclusions The bovine genome contains a large and diverse repertoire of TRD genes when compared to the genomes of "γδ T cell low" species. This suggests that in cattle γδ T cells play a more important role in immune function since they would be predicted to bind a greater variety of antigens.


Background
T lymphocytes can be subdivided into at least two types based on the expression of either the αβ or γδ T cell receptor. Although both perceive antigen, they differ in the types of antigens with which they react. αβ T cell receptors react with antigenic protein peptides in the context of self major histocompatibility complex (MHC) proteins while γδ T cell receptors may react with proteins but this does not involve MHC presentation. They also react with autologous molecules on cells [1][2][3][4][5][6] as well as nonproteinaceous molecules [7]. The gene repertoires that code for the γδ T cell receptor chains and the T cell receptor gamma (TRG) and delta (TRD) gene locus organizations have been extensively described for humans and mice but to a lesser extent for the artiodactyls which includes ruminants and swine. These latter are "γδ T cell high" species because of their high levels of γδ T cells in circulation ("γδ T cell low" species exhibit much lower levels of γδ T cells in circulation). It is clear that while both αβ and γδ T cells have large and diverse T cell receptor gene repertoires [8][9][10][11][12][13][14], "γδ T cell high" species have a TRD gene repertoire that is much more extensive than that in the "γδ T cell low" species [15][16][17][18][19][20][21]. The bovine (Bos taurus) locus organization and TRD gene repertoire are the subject of this work.
T cell receptor delta and beta chains are encoded by the rearrangement of variable (V), joining (J) and diversity (D) genes making them more complex than the T cell receptor gamma and alpha chains which lack D gene products. In all mammals evaluated the genes encoding the T cell receptor beta and gamma chains are found at the T cell receptor beta (TRB) and TRG loci, respectively. The genes that encode the T cell receptor delta and alpha chains are found at a single chromosomal location with the TRD genes embedded within the T cell receptor alpha (TRA) locus. For both humans and mice these are located on chromosome 14 and encompass over 1 megabase (Mb) for the combined TRA/TRD locus [13]. The TRD locus embedded within the TRA locus spans approximately 60 kb in humans and 275 kb in mice. The loci comprise TRDV genes (two in humans and five in mice), followed by TRDD genes (three in humans and two in mice), TRDJ genes (four in humans and two in mice), a single TRDC gene and an additional TRDV gene that is located 3' of the TRDC gene in an inverted transcriptional orientation [10,14]. In addition, five and ten functional TRAV/DV genes (that rearrange to either TRDD or TRAJ) have been identified for humans and mice, respectively, and these are found upstream of the embedded TRD locus [10,14].
Limited evidence derived from analysis of cDNA clones from cattle, sheep and swine, as well as limited germline information for sheep, suggests that the general organization of the bovine TRA/TRD locus does not differ greatly from that of humans and mice [17,20]. It also suggests that a much larger repertoire of TRDV genes exists for cattle, as well as for the other "γδ T cell high" species sheep and swine [15,17,20,21]. Indeed, recent mapping of the bovine TRA/TRD locus identified over 100 TRDV and over 300 TRAV or TRAV/DV genes in cattle [22]. For cattle, four genes belonging to three different small subgroups (TRDV2, TRDV3 and TRDV4) have been identified that are orthologous to those same subgroups in sheep [17,23]. In contrast, while the bovine TRDV1 genes have been found to be related to the single TRDV1 gene that occurs in humans, expansion of the TRDV1 subgroup accounts for the larger number of TRDV genes in the "γδ T cell high" species [15,17,[20][21][22]. The swine and sheep TRDV1 subgroups have been estimated to contain at least 31 and 40 genes, respectively [15,21] while the bovine TRDV1 subgroup has been reported to contain at least 37 genes [20]. However, current knowledge about ruminant and swine TRD genes is predominantly based on cDNA evidence rather than on genomic DNA. Because of this, it has been unclear whether the large number of cDNA TRDV1 sequences is the result of multiple genes or polymorphisms among animals; thus, there has been no clear way to classify and name these transcripts. By annotating the bovine genome Btau_3.1 assembly, here we demonstrated the presence of 56 TRDV genes, 52 of which belong to the TRDV1 subgroup. We also proposed TRDV1 sets to classify the bovine and ovine TRDV1 genes based on their phylogenetic grouping.
The existence of multiple D and J genes also contributes to diversity of possible T cell receptor delta chain amino acid sequences since a particular TRDV gene is expected to be able to recombine with any D and J gene. The region of the protein derived from the V-D-J gene rearrangement is known as the complementarity determining region 3 (CDR3), while the CDR1 and CDR2 loops are germline-encoded by the TRDV gene. The V-D-J gene rearrangement involves recognition of the recombination signal (RS) sequences for subsequent DNA cleavage by the enzymes RAG1 and RAG2 [24]. T cell receptor delta chain sequence diversity is augmented by junctional flexibility and the addition of N and P nucleotides. RS flank the V, D and J genes and are composed of highly conserved heptamer and nonamer sequences separated by either a 12 base pair (bp) spacer (located 5' of TRDD and TRDJ genes) or a 23 bp spacer (located 3' of TRDV and TRDD genes). Since the conserved heptamer and nonamer sequences and spacer lengths have been found to be important for efficient recombination [25] they were also evaluated here. Finally, recent evidence demonstrated that diversity of T cell receptor delta chain sequences is amplified by the occurrence of multiple TRDD genes within a single CDR3 [20,26]. Thus we also evaluated TRDD gene usage by analyzing cDNA sequences and report those findings here.

Gene structure and genomic organization of bovine TRD genes
Here we set out to annotate and classify the bovine TRD genes. Prior to the initiation of the studies described here many cDNA sequences representing TRDV1 gene transcripts had been reported [16][17][18][19][20]27] but the actual number of functional bovine TRDV1 genes and their sequences was unknown. As a result it was not possible to differentiate between cDNA sequences derived from distinct genes and those that represented polymorphisms of the same gene among animals. Furthermore, the existence of TRDV2, TRDV3 and TRDV4 genes had been demonstrated [17], along with the TRDJ and TRDD genes [20], but their numbers and placement within the bovine genome had not been definitively described.
The exon-intron structures of TRDV genes ( Figure  1A) and of the single TRDC gene ( Figure 1B) were determined. A TRDV gene encompasses a coding region of approximately 560 bp and comprises two exons. The first exon, L-PART1, is 49, 55 and 37 bp in length for TRDV1/TRDV2, TRDV3 and TRDV4, respectively. The second exon, V-EXON, is between 271 and 335 bp in length for TRDV1, and is 299, 293 and 309 bp long for TRDV2, TRDV3 and TRDV4, respectively. The TRDC gene encompasses a coding region of 1282 bp comprising three exons. RS sequences were identified, as expected, adjacent to the 3' end of each V-EXON (Figure 1A) while the termination codon was at the 3' end of TRDC exon 3. Table 1 summarizes the proposed functional (or Open Reading Frame [ORF]) TRD genes that were identified within the bovine Btau_3.1 assembly. Most genes were identified within regions that were not placed on a chromosome (i.e. were found on ChrUn). However, it is expected that, like in primates and rodents, these genes are all located within the TRA/TRD locus which in cattle is on chromosome 10. The bovine consensus gene set (known as GLEAN) was used to identify gene prediction models of TRD genes. Where applicable, GLEAN numbers are listed in Table 1, along with the Bovine Genome Scaffold identification number, location and orientation. It should be noted that the orientation of each contig individually, as well as in relation to other contigs, is not definitive at this point for the bovine Btau_3.1 assembly. All putatively functional (or ORF) genes (as determined based on criteria described in Methods) are shown. Based on analysis of flanking and intronic genomic sequence it was determined that many of these gene models might represent the same genes although they were found more than once. It is possible that this is due to an anomaly in the Btau_3.1 assembly, although it cannot be ruled out that those gene models do indeed represent duplicated genes that are present in the genome. 67 TRDV1, three TRDV2, two TRDV3 and one TRDV4 genes were identified; after consideration of multiple gene models that might be representative of a single gene we propose the presence of 52 TRDV1, two TRDV2, one TRDV3 and one TRDV4 genes. The number of TRDV2, TRDV3 and TRDV4 genes is fairly consistent with what has been previously reported [17,20], although evidence for two TRDV3 genes has been presented by others [20]. In addition to the TRDV genes, five TRDD genes, three TRDJ genes and a single TRDC gene were identified, also consistent with previous reports [17,20].
The number of functional TRD genes among mammalian species is compared in Table 2. All species evaluated have genes for at least three TRDV subgroups. It is notable that humans have only a single TRDV1 gene (the mouse genes most closely related to human TRDV1 are named TRDV2-1 and TRDV2-2) while the TRDV1 subgroup in cattle and other artiodactyls are large, multigene families. Because of the disparity in numbers of TRDV1 genes in the human and cattle genomes, organization of their TRA/TRD loci cannot be compared. Furthermore, additional assembly of the bovine genome is required in order to definitively determine the size of the TRA/TRD locus, to refine the gene organization, and to evaluate duplication events that resulted in the large TRDV1 subgroup. Schematic representations for regions in which three or more TRDV genes were identified are shown (Figure 2A). Schematics are shown to scale and gene orientation is as indicated, based on the Btau_3.1 assembly. As in other species, a single TRDC gene (see Figure 2B) as well as one TRDV gene located downstream of TRDC in an inverted transcriptional orientation (named TRDV4 in cattle) were identified [17]. In some cases multiple gene models  were found to represent the same gene (see Table 1, as indicated above, and Figure 2) and therefore subsequent analyses were performed after removing redundant sequences. It is likely that the occurrence of redundant sequences is a result of assembly anomalies; however, it is also possible that these duplicated TRDV genes are in fact present within the bovine genome.

TRDV gene analysis
Deduced amino acid sequences of the 56 TRDV genes identified here were initially aligned using ClustalW2 [28]. Alignments were refined in BioEdit according to the IMGT unique numbering for the V-DOMAIN [29], visualized with Jalview [30] and are shown ( Figure 3A) with IMGT numbering and framework regions (FR) and  Gene designations with the same symbols denote gene models that are thought to be representative of the same gene based on analysis of genomic sequence. 2 NA denotes identified genes lacking glean models. Figure 2 Schematic representation of the bovine (Bos taurus) TRD locus organization. (A) 52 unique TRDV1 genes were identified and are all predicted to be functional (or ORF) and found on chromosome 10, although many of them were unplaced in the Btau_3.1 assembly.
Multiple TRDV genes were identified in two or three gene prediction models, as indicated, and labels correspond to those used in Table 1.
However, evidence suggests that in each case only a single gene is represented. Genomic organization is shown in those cases when three or more TRDV genes were found within a single scaffolded region. TRA genes are not shown although in many cases occur among the TRDV genes. In addition to some TRDV1 genes, the schematic includes TRDV2-1 on ChrUn.158 (no other TRDV genes are included in the schematic).
(B) Genomic organization of the five TRDD genes, 3 TRDJ genes and the single TRDC gene is shown, along with TRDV4 and TRAC. Gene designations, orientations and Bovine Genome Scaffold identifications are as indicated. The determination of TRDV gene orientation was based on the assembly of the Bovine Genome Scaffolds, it is possible that some scaffolds were assembled in an incorrect orientation individually and/ or in relation to other scaffolds. Diagrams are shown to scale with base pair increments beneath the schematics. With the exception of TRAC, TRA genes are not shown. TRDV2-2 and TRDV3 are not shown because those genes were identified on scaffolds lacking additional TRDV genes.  Table 1 for GLEAN identification numbers). IMGT unique numbering for V-DOMAIN [29,32] is indicated above the alignment and conserved cysteines are indicated below the alignment. (B) IMGT Collier de Perles [33] are shown for TRDV1 (based on TRDV1a, GLEAN_22158), TRDV2-1 (no glean identification), TRDV3-1 (no glean identification) and TRDV4 (GLEAN_19724) and were determined using IMGT/V-QUEST [32]. The CDR-IMGT lengths are indicated.
CDR, as indicated. TRDV1 genes reported here are temporarily designated TRDV1a -TRDV1bp until the entire chromosomal sequence has been assembled. TRDV1 genes encode CDR1 that range in length from five to ten amino acids (except for TRDV1o which encodes an 18 amino acid long CDR1) while TRDV2, TRDV3 and TRDV4 genes all encode CDR1 that are seven amino acids in length. In contrast, most CDR2 of TRDV1, TRDV2 and TRDV3 genes were three amino acids in length, except for four of the TRDV1 genes as noted below, while that of TRDV4 was five amino acids long. In all but two cases (TRDV1bc and TRDV1w), TRDV1 and TRDV2 genes encode a QXS motif in the CDR2. Interestingly, TRDV1f, TRDV1ae, TRDV1ar and TRDV1o (see Figure 3A) all contained a deletion which spans the last FR2 amino acid to the fifth FR3 amino acid, resulting in a lack of CDR2 in those genes. An identical deletion had also been found in the sheep TRDV1S29 gene [15]. Also, the RS of TRDV1bg and TRDV1bj were found to occur farther downstream than those of the other TRDV1 genes identified here, resulting in a putative transcript that extends 15 (for TRDV1bg) or 12 (for TRDV1bj) amino acids beyond the conserved 2nd-CYS 104, as opposed to extending only three or four amino acids which is the case for the other TRDV1 genes. Because of the unusual nature of those six genes, until their expression by γδ T cells is validated they will be considered ORF instead of functional as defined by IMGT [31]. 2D structure graphical representations of TRDV1, TRDV2, TRDV3 and TRDV4 genes ( Figure 3B) were generated using IMGT/V-QUEST [32] and the IMGT/Collier-de-Perles tool program [33]. The CDR1 and CDR2 loops are indicated in red and yellow, respectively. The amino acids 1st-CYS 23, CONSERVED-TRP 41, hydrophobic amino acid (Leu) 89 and 2nd-CYS 104, which are conserved in all four TRDV subgroups, are shown with letters in red.
As for sheep, genes in the bovine TRDV1 family share many characteristics including a 20 amino acid long highly conserved leader region and the YFC motif found in the 3' end of FR3 (see Figure 3A). However, bovine TRDV1 genes described here were further classified into eleven sets by constructing a phylogenetic tree using the neighbor-joining method allowing us to compare them to equivalent human, mouse and ovine sequences ( Figure 4). The grouping of ovine TRDV1 genes by this method corresponded with what had been previously described [15]. The tree shows grouping of bovine TRDV1 gene sequences into eleven sets, as indicated, and the percentage interior test values (based on 1000 replicates) are shown for each set. Classification was further supported based on CDR characteristics used to classify ovine TRDV1 genes [15] (Table 3). Previously described characteristics used to classify TRDV1 genes included CDR1 length (five, seven, nine or ten amino acids), the chemical characteristics of the amino acid at position 57 (within CDR2) and the presence or absence of Trp 107 in CDR3. The number of TRDV1 genes evaluated here that occur within each set is also shown ( Table 4) with set 1 being the largest and set 4 being the smallest. While some sets can share the same features reported in Table 3 (i.e. sets 5 and 8, 4 and 6) the classification of TRDV1 genes into those separate sets is supported based on the phylogenetic analysis ( Figure 4) and additional sequence features (data not shown).
When bovine TRDV1 RS (total of 52 sequences) and ovine TRDV1 RS (total of 12 sequences) were evaluated, a high level of sequence conservation was observed both among and within species, as depicted in the sequence logos shown in Figure 5A. When the RS of bovine TRDV4 orthologs were compared (including those for human TRDV3, mouse TRDV5 and ovine TRDV4) a similarly high level of sequence conservation was observed among species ( Figure 5B). In contrast, comparison of the remaining TRDV RS (those for bovine TRDV2 and TRDV3, ovine TRDV2 and TRDV3, mouse TRDV1, TRDV2 and TRDV4, human TRDV1 and TRDV2) revealed conservation in only the heptamer and nonamer sequences ( Figure 5C). This is consistent with the lack of relatedness of these genes as determined based on sequence analysis (refer to Figure 4). Overall, sequence conservation was observed within the RS heptamer and nonamer of all TRDV genes, regardless of species (compare Figure 5A, Figure 5B and Figure 5C).

TRDD and TRDJ gene analysis
Genomic locations, sequences and RS of bovine TRDD and TRDJ genes were also determined (Table 1 and Additional File 1) and confirmed what had been reported previously [20]. Evidence for five TRDD genes was found and deduced amino acid sequences for all three open reading frames are shown in Additional File 1. Additional putative TRDD genes were identified based on potential RS (data not shown); however, subsequent cDNA sequence analyses provided little or no evidence that those putative TRDD genes were valid and thus they are not presented. Three TRDJ genes were found and their nucleotide and deduced amino acid sequences are also reported in Additional File 1. Alignments comparing bovine, ovine, swine, human and mouse TRDJ gene nucleotide and deduced amino acid sequences are shown ( Figure 6A and Figure 6B). A phylogenetic tree ( Figure 6C), constructed using the neighbor-joining method to evaluate the relatedness of the above sequences, reflected the alignment. Bovine, human and murine 5'D-RS and 3'D-RS of TRDD genes and J-RS of TRDJ genes were evaluated for sequence conservation (data not shown). Despite a low number of sequences used in the comparisons (because few TRDD and TRDJ genes exist in the genomes of those species) there appeared to be conserved heptamer and nonamer sequences among the RS for all species evaluated. Although heptamers were found to vary, as expected, they all contained the 5' CAC (or 3' GTG for 5'D-RS and J'RS) consensus sequence which has been found to be critical for recombination [25]. Nonamers generally contained the previously reported consensus sequence ACAAAAACC [34] (or GGTTTTTGT for 5'D-RS and J'RS) although this was found to be less conserved. This is consistent with findings that the nonamer sequence requirements are less rigid than those for the heptamer sequence in order to obtain efficient recombination [25].  [47] was used to classify TRDV genes using non-redundant bovine (bo) genomic TRDV sequences identified here and the previously classified ovine (ov), murine (mo) and human (hu) TRDV sequences (described in Methods). Bovine TRAV (GenBank accession number BC148926) was included in the analysis and was used to root the tree. The optimal tree with the sum of branch length = 5.37540770 is shown. Complete deletion to eliminate gaps was performed and the final dataset included a total of 207 positions. Eleven phylogenetic sets are indicated along with the percentage interior branch test value based on 1000 replicates for each set. Prior to these studies the number of TRDV1 genes found in the bovine genome, and their sequences, had not been resolved. Therefore, it was impossible to classify cloned TRDV1 sequences based on their encoding genes. Furthermore, TRDV1 genes are highly polymorphic but the subgroup also contains genes that differ by only one or three nucleotides in their coding regions (for example TRDV1aa and TRDV1bn, TRDV1v and TRDV1as). Therefore, it has been very difficult to differentiate between expressed TRDV1 sequences with regard to whether they represent the same or different genes. Here we classified previously-identified TRDV1 cloned cDNA sequences based on several criteria which included their relative placement in trees based on nucleotide and deduced amino acid sequences, specific sequence characteristics and percent identities based on nucleotide sequence alignments (data not shown). Results are summarized in Table 5 with the provisional IMGT nomenclature, corresponding genes and TRDV1 sets (as defined in Figure 4 and Tables 3 and 4) indicated.
In some cases it was impossible to classify TRDV1 cDNA sequences definitively, because some TRDV1 genes share very high sequence identity; those cases are indicated in Table 5. Furthermore, in 20 cases (see   Table 5) no corresponding TRDV1 genes could be identified although cDNA sequences were found to have features characteristic of particular sets, as indicated. It is possible that no corresponding genes were found because additional TRDV1 genes are present in the bovine genome but were not identified in this study due to gaps in the analyzed regions of the Btau_3.1 assembly. Only cDNA sequences that differed from potential corresponding gene sequences by six or fewer nucleotides were classified; however, it remains possible that those cDNA sequences are not in fact representative of the genes indicated. Overall, corresponding genes were identified for most cDNA sequences (35 out of 55 sequences), although 36 of the 52 predicted TRDV1 genes reported here lacked cDNA confirmation and those genes were found to be distributed among all sets. Last, for the sake of completeness we note that cDNA evidence for bovine TRDV2, TRDV3 and TRDV4 genes reported here has already been demonstrated [17] and has been confirmed [20]. Identification of the TRDD genes within the bovine genome allowed us to evaluate TRDD gene usage in cDNA sequences that had been previously sequenced in our laboratory. CDR3 of 73 TRDV1, 17 TRDV2, 17 TRDV3 and 36 TRDV4 containing sequences were analyzed ( Table 6). All D genes were represented in transcripts containing genes of the four TRDV subgroups. P nucleotides were observed flanking untrimmed TRDD regions and extensive N-regions were frequently found between TRDD regions. Some cases were ambiguous because nucleotides could be attributed to either P-or N-region but also to the presence of more than one TRDD gene. Therefore, we applied the criteria that at least six nucleotides of a particular TRDD gene must be present to claim usage of that gene with reasonable confidence. Examples of TRDD gene usage with nucleotide and deduced amino acid sequences are shown ( Figure  7). These demonstrated the incorporation of one to five TRDD genes within a single CDR3. Only cases in which the order of TRDD genes in the CDR3 corresponded to that found in the genome were retained.

Discussion
In order to evaluate and characterize the bovine T cell receptor delta gene repertoire and their genomic organization we annotated the TRD genes in the Btau_3.1 assembly. Here we describe the existence of two TRDV2 genes, as well as a single gene each for Figure 6 TRDJ gene sequence alignments and phylogenetic tree. Alignments of TRDJ gene (A) nucleotide and (B) deduced amino acid sequences, using bovine (bo) genomic TRDJ sequences identified here and previously classified ovine (ov), swine (sw), human (hu) and murine (mo) TRDJ sequences (described in Methods) are shown with identities indicated by a dot (.) and gaps indicated by a dash (-). The Neighbor-Joining method [47] was used to infer evolutionary history of the above mentioned sequences. The optimal tree (C) with the sum of branch length = 2.43145392 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches [48]. The Maximum Composite Likelihood method [49] was used to compute evolutionary distances. Complete deletion to eliminate gaps was performed and the final dataset included a total of 42 positions. TRDV3 and TRDV4. We confirmed the presence of three TRDJ and five TRDD genes [20] and evaluated TRD junctional diversity incorporating these genes. Furthermore, we described the existence of 52 TRDV1 genes (46 of them functional and six of them ORF), evaluated them for their sequence characteristics and further classified them into eleven sets based on phylogenetic analysis. As for other mammals, the bovine TRD locus was found embedded within the TRA locus and thus TRDV1 genes were distributed amongst the TRAV genes. Indeed, there were a number of genes for which assignment to the TRAV or TRDV group was ambiguous. It is important to note that the presence of TRAV/DV genes is certain and  ND indicates cDNA sequences for which no corresponding gene could be identified in this study. 4 Number of nucleotides that are different between the Provisional IMGT Nomenclature and the corresponding gene sequences between amino acid positions 1 and 104 (2nd-CYS); NA indicates sequence that lacks corresponding gene. 5 Genes listed as corresponding are the closest matches; however, classification of these genes was unclear. 6 cDNA sequence used for comparison was partial. Figure 7 TRD rearranged CDR3 sequences and TRDD gene usage. TRD rearranged CDR3 nucleotide sequences, and their corresponding deduced amino acid sequences, derived from cDNA from peripheral blood mononuclear cells, were aligned to demonstrate TRDD usage. Germline TRDD sequences are shown above representative TRDV1, TRDV2, TRDV3 and TRDV4 sequences containing between one and five TRDD genes. TRDD genes are shaded and gene usage was determined by the presence of at least six nucleotides of a particular TRDD gene. Sufficient cDNA sequence was analyzed to determine TRDV gene usage; however, the sequences shown here have been truncated after the 2nd-CYS 104 and at the beginning of the TRDJ gene.
has been reported [17] but the ability of TRDV or TRAV genes (as defined based on sequence characteristics) to rearrange with either TRDD or TRAJ must be determined experimentally and will be evaluated in future studies. Evaluation of the genes contained within the TRA/TRD locus was reported [22] following the initial submission of this work with overall findings similar to those discussed here. The question remains as to whether or not the findings reported here represent the complete TRD gene repertoire because of the gaps in the Btau_3.1 assembly. The presence of additional TRDV1 genes is almost certain because of extensive gaps in this region although it is unlikely that additional TRDD or TRDJ genes are present in the genome since that region is fully scaffolded and has been carefully evaluated for the presence of additional genes. That region also contains TRDV4 and its analysis is likely definitive. In addition, previous cDNA evidence supports the existence of only two TRDV2 genes [17] and is consistent with our findings. The existence of two TRDV3 genes has been reported [20] but is not supported by our findings which may be due to gaps in the genome assembly. Even with gaps, locus organization can be assisted by comparison to other species. Indeed, all mammals studied have genes that are closely related to human TRDV1 with some being distributed among the TRAV genes. The bovine TRDV4 gene is located downstream of the single TRDC gene in inverse orientation, as found for the human TRDV3 and mouse TRDV5 genes. It is notable that there is significant sequence conservation among the orthologous bovine TRDV4, human TRDV3, mouse TRDV5 and ovine TRDV4 genes as well as their entire RS. This suggests that they derive from a common ancestral gene that appeared before the separation of these species. By comparison, there are no human or mouse genes that are equivalent to bovine TRDV2 or TRDV3. The enormous expansion of the TRDV1 repertoire that has been observed for cattle, sheep and pig [15,21] is striking when compared with the single TRDV1 gene found in human (and its two orthologs in mouse) and thus does not assist us in predicting the extent of the possible missing information.
TRDV1 gene subgroup expansion is also striking when compared with other TRDV gene subgroups which, even in artiodactyls, are found to contain only one or two members. While the number of TRGV genes found in each species does not vary considerably [31,35,36], expansion of the TRDV1 genes in bovine, sheep and pig is consistent with them being "γδ T cell high" species, implying that this increase in antigen receptor diversity permits them to be stimulated by and to respond to more types of antigens. The presence of fewer TRDV genes in "γδ T cell low" species, such as humans and mice, suggests that these species evolved so that γδ T cells play a less important role in their immune function. Sequence conservation among bovine TRDV1 RS (including the 23-bp spacer sequences), supports the concept that the large TRDV1 subgroup arose from multiple duplication events. Many bovine TRDV1 genes share high percent identities, indicating that in some cases this subgroup is not as diverse as the number of individual genes might suggest. For example, some TRDV1 sequences differed by as few as four (TRDV1m and TRDV1s), three (TRDV1v and TRDV1as) or even one (TRDV1aa and TRDV1bn) nucleotide in their coding regions even though all were determined to be distinct genes based on analysis of the flanking genomic sequences. However, it cannot be ruled out that some of these gene models represent the same gene but do not appear as such due to sequencing and/or assembly anomalies. The question of why numerous and highly similar TRDV1 genes have been maintained in the bovine genome is compelling and studies to evaluate their functions will be pursued.
Based on phylogenetic sequence analysis and gene features, we subdivided the bovine TRDV1 genes into eleven sets. Each set corresponds to intraspecies gene duplications of ancestral genes that were present before the separation between bovine and ovine species. We also assigned bovine cDNA sequences to germline genes identified in this work (see Table 5). Of 55 previously identified TRDV1 cDNA sequences evaluated, 20 could not be assigned to a germline gene, possibly because of gaps in that region of the genome assembly. Of the 52 TRDV1 genes identified in this study, cDNA evidence existed for only 16. It is possible that, because a majority of the previously identified TRDV1 cDNA sequences were derived from peripheral blood mononuclear cells [16][17][18][19][20], the remaining TRDV1 genes that lack cDNA verification might be expressed in other tissues that have not been evaluated. It is also possible that other factors, such as age or immunological experience, might impact the expression of particular genes and thus prevented verification of those genes. Future studies will focus on specifically amplifying individual TRDV1 transcripts in order to verify that they are functional and to determine whether they are expressed in an age and/or tissue dependent manner.
The potential importance of the role of the T cell receptor delta rearranged CDR3 is evident instudies evaluating the binding of the T10 and T22 antigens [26,37,38] by murine γδ T cells since antigen recognition by the CDR3 was found to be autonomous, that is, not dependent on the TRDV gene associated with it [39]. It could be reasoned that, because the genomes of humans and mice lack the extensive TRDV repertoirethat has been observed in artiodactyls, thiswould be compensated for by diversity of the genes that rearrange to form the CDR3 (i.e. TRDD and TRDJ genes). Moreover, because many bovine TRDV1 amino acid sequences are very similar, structurally it might be expected that their ligand binding regionsincluding CDR3 would also need to be as diverse as that in humans and mice. Humans have three TRDD and four TRDJ genes while mice have two TRDD and two TRDJ genes. With the exception of the human TRDJ genes, these numbers all fall short of the five TRDD and three TRDJ genes identified in cattle. We demonstrated here that in cattle the TRD CDR3 contained combinations that include between one and five TRDD genes and would range in length from nine to 37 amino acids. With the exception of TRDD4, the germline bovine TRDD genes predominantly encode glycine and the hydrophobic residues valine and tryptophan, with the neutral residues threonine and tyrosine being encoded to a lesser extent (see Additional File 1). This is reminiscent of findings for the equine VH CDR3 loops in which a higher proportion of glycine and lower proportion of cysteine content was identified when sequences were compared with those from humans, sheep and pigs [40]. As was suggested for equine VH CDR3, this indicates that bovine TRDV CDR3 loops have increased flexibility and are therefore better suited to recognize a large number of antigenic conformations. Considering this, it is unlikely that CDR3 diversity in mice and humans compensates for their less diverse TRDV repertoire and, in fact, it seems that cattle have more scope for antigen binding on all levels.
If the highly diverse CDR3 is indeed the main structural component involved in antigen recognition, then the TRDV gene products might be most important for interactions with co-receptors or antigen presenting cells, whatever their nature may be. While the interactions between γδ T cell receptor and the antigens that they bind are still largely unknown, it has been demonstrated that for αβ T cell receptor the CDR1 loops (which are germline encoded) are involved with binding peptide as well as MHC. The CDR2 loops (also germline encoded) bind only MHC. The TRDV CDR1 in cattle exhibited diverse lengths and were generallyfound to be five to ten amino acids long as for mouse and human TRDV, TRBV and TRAV CDR1. Thisdiverse bovine TRDV CDR1 repertoire may contribute to antigen binding in contrast to what has been reported for murine γδ T cell recognition of T22 as described above. It was found that bovine TRDV1 CDR2 were either only three amino acids long or absent completely (e.g., TRDV1f, TRDV1ae, TRDV1ar and TRDV1o) in contrast to TRBV CDR2 lengths of five to seven amino acids in mammals.
It is yet to be determined whether the TRDV1 genes lacking CDR2 are functional, however, if they are found to be so this lackof, or abbreviation of, the CDR2 inbovine TRDV1 genes is logical since γδ T cells are not MHC-restricted.

Conclusions
Based on annotations of the bovine genome we identified the TRD genes including 56 TRDV genes, five TRDD genes, three TRDJ genes and the single TRDC gene and described their organization within the TRD locus. Furthermore, we report that the TRDV1 subgroup contains 52 genes, indicating that this subgroup underwent expansion as has been found for sheep and swine. Also, because this large gene subgroup has been maintained in the bovine genome this indicates a unique role for these genes in γδ T cell biology.

Genome Annotation
In conjunction with the Bovine Genome Sequencing Consortium http://genomes.arc.georgetown.edu/ bovine/, manual annotation of the T cell receptor delta genes was performed using the Apollo Genome Annotation and Curation Tool, version 1.6.5 [41] and the bovine genome assembly Btau_3.1 [42]. First, a BLAST search of bovine T cell receptor delta cDNA sequences against the Bovine Official Gene Set (GLEAN) was performed in order to identify predicted gene models. These were then analyzed using the Apollo software and the following actions were performed where applicable: (i) models were checked for correct exon-intron structure, (ii) initiation and termination codons were identified where applicable and in other cases 5' and 3' ends were set based on RS and splice sites, (iii) exons were either added or deleted if it was determined that the coding region in the predicted model was incorrect, (iv) predicted gene models were split when a single model encompassed more than one gene or merged when two models coded for a single gene and (v) RS were identified for TRDV, TRDD and TRDJ genes. Predicted gene models identified from the BLAST search were considered pseudogenes, and were not included in subsequent analyses, when premature stop codons or frameshifts occurred in areas where the sequence integrity was deemed adequate. Furthermore, predicted gene models were considered Open Reading Frame (ORF), as defined by IMGT [31] if the coding region had an open reading frame but the sequence structure differed significantly from known TRDV sequences; ORF gene models were included in subsequent analyses. Following annotation, predicted gene model identity and classification were verified using BLAST searches and IMGT/V-QUEST [32]; only gene models that were unambiguously identified as TRDV genes, and not TRAV/DV genes, based on those results were included in subsequent analyses.

Sequence analyses
Nucleotide sequences were aligned and analyzed using BioEdit version 7.0.5.3 [43]. Exon-intron structure schematics were based on alignments of cDNA and genomic DNA sequence using SIM4 [44] and visualized with LalnView http://pbil.univ-lyon1.fr/software/lalnview. html. GLEAN numbers of the annotated bovine gene sequences used in sequence analyses are reported in Table 1. Published T cell receptor delta sequences, derived from cloned RT-PCR products, were submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/ index.html; see Table 5 for accession numbers) and assigned to TRDV genes identified here. CDR3 sequence data described in Table 6, from bovine TRD rearranged genes, are available upon request. Rearranged CDR3 lengths were determined from positions 105 to 117, that is, between the 2nd-CYS 104 and J-PHE 118 of the FGXG motif, according to the IMGT unique numbering for V-DOMAIN [29]. The accession numbers of additional sequences used in analyses are as follows: human (Homo sapiens): TRDV1 (M22198), TRDV2 (X15207), TRDV3 (M23326), TRDJ1 (M20289, AE000661), TRDJ2 protocol. cDNA clones were sequenced commercially (GeneWiz, South Plainfield, NJ) in order to verify the insert identity and for subsequent sequence analysis.