Genome-wide association analysis of canine T zone lymphoma identifies link to hypothyroidism and a shared association with mast-cell tumors

Background T zone lymphoma (TZL), a histologic variant of peripheral T cell lymphoma, represents about 12% of all canine lymphomas. Golden Retrievers appear predisposed, representing over 40% of TZL cases. Prior research found that asymptomatic aged Golden Retrievers frequently have populations of T zone-like cells (phenotypically identical to TZL) of undetermined significance (TZUS), potentially representing a pre-clinical state. These findings suggest a genetic risk factor for this disease and caused us to investigate potential genes of interest using a genome-wide association study of privately-owned U.S. Golden Retrievers. Results Dogs were categorized as TZL (n = 95), TZUS (n = 142), or control (n = 101) using flow cytometry and genotyped using the Illumina CanineHD BeadChip. Using a mixed linear model adjusting for population stratification, we found association with genome-wide significance in regions on chromosomes 8 and 14. The chromosome 14 peak included four SNPs (Odds Ratio = 1.18–1.19, p = .3 × 10− 5–5.1 × 10− 5) near three hyaluronidase genes (SPAM1, HYAL4, and HYALP1). Targeted resequencing of this region using a custom sequence capture array identified missense mutations in all three genes; the variant in SPAM1 was predicted to be damaging. These mutations were also associated with risk for mast cell tumors among Golden Retrievers in an unrelated study. The chromosome 8 peak contained 7 SNPs (Odds Ratio = 1.24–1.42, p = 2.7 × 10− 7–7.5 × 10− 5) near genes involved in thyroid hormone regulation (DIO2 and TSHR). A prior study from our laboratory found hypothyroidism is inversely associated with TZL risk. No coding mutations were found with targeted resequencing but identified variants may play a regulatory role for all or some of the genes. Conclusions The pathogenesis of canine TZL may be related to hyaluronan breakdown and subsequent production of pro-inflammatory and pro-oncogenic byproducts. The association on chromosome 8 may indicate thyroid hormone is involved in TZL development, consistent with findings from a previous study evaluating epidemiologic risk factors for TZL. Future work is needed to elucidate these mechanisms.


Background
T zone lymphoma (TZL), a histologic variant of peripheral T cell lymphoma (PTCL), accounts for about 12% of all canine lymphomas [1,2] but is almost never seen in human patients. In dogs, this disease follows an indolent course with average survival of > 2 years independent of treatment, compared to < 1 year with most other lymphoma subtypes [3][4][5]. TZL can be readily diagnosed by histopathology or by flow cytometric identification of a homogeneous expansion of T cells lacking expression of CD45, a pan-leukocyte surface marker [3,6,7]. Previously, we observed that > 30% of Golden Retrievers without lymphocytosis or lymphadenopathy have T cells phenotypically similar (lacking CD45 expression) to TZL in their blood [8]; as we are unsure of the clinical relevance of this finding, we have adopted the term T zonelike cells of undetermined significance (TZUS) for these dogs [9]. We hypothesize that TZUS may represent a pre-clinical state that could undergo neoplastic transformation and progress to overt TZL.
Few studies have investigated the pathogenesis of canine TZL. We recently reported that both hypothyroidism and omega-3 supplementation are associated with decreased odds of TZL [9]. It has also been noted that over 40% of TZL cases are Golden Retrievers [3]. This finding suggests a genetic predisposition for TZL and caused us to pursue a study to identify potential pathways of interest. To date, no studies have agnostically evaluated germline risk for PTCL in dogs or humans.
The objective of this study was to identify genetic risk factors for canine TZL using a genome-wide association study (GWAS) and subsequent targeted sequencing. This aim of this study is to provide insight into the etiology and underlying risk for developing this disease.

TZUS and controls indistinguishable by GWAS
When the combined TZL and TZUS group was compared to controls, no p-values were outside the 95% confidence interval threshold on the quantile-quantile (QQ)-plot (Additional file 1 A). In contrast, when TZL were compared to the combined TZUS and control group, a group of SNPs significantly deviated from the expected distribution (Fig. 1). Supporting this, pairwise GWAS of TZL versus controls and TZL versus TZUS had suggestive associations for this group of SNPs, despite none of the p-values falling outside the 95% confidence interval (CI) on the QQ-plot (Additional file 1 B and C). This implies TZUS and controls are similar, and the enhanced power from combining them as a reference group allows those SNPs to reach genome-wide significance. In contrast, the TZUS versus control comparison did not share any suggestive SNPs with the TZL versus control comparison, as would be expected if TZL and TZUS were similar. We thus chose to combine TZUS and controls for our main analysis and will reference it as the "TZL versus all" comparison for the remainder of the paper. Top peak is near thyroid stimulating hormone receptor locus The strongest GWAS peak contained seven SNPs on chromosome 8 from 52,650,576-53,818,371 bp ( Fig. 1; Table 1). The associated allele for these SNPs was present in about 16% of TZL (range 15-25%) compared to 6% of the reference group (range 4-12%). The top SNP (BICF2P948919; Odds Ratio [OR] = 1.39, p = 2.66 × 10 − 7 ) was located at 53,818,371 bp and was in strong linkage disequilibrium (LD) (R 2 > 0.7) with three significantly associated SNPs in that region and moderate LD (R 2 0.25-0.6) with the other three significantly associated SNPs (Fig. 2). Using the PLINK clumping analysis, we determined that the four SNPs in strong LD (including the top SNP) formed one haplotype block, and the remaining three SNPs were not in strong enough LD with any other SNPs to form blocks. The p-values for all seven associated SNPs on chromosome 8 were non-significant (range 0.17-0.99) in the conditional analysis, suggesting they represent one signal ( Table 1). The haplotype block containing the top SNP is within the noncoding region of Suppressor of Lin-12-Like Protein 1 (SEL1L) gene (Fig. 2). Having at least one risk haplotype was substantially more common among TZL (29%) versus TZUS or controls (12 and 7.5%, respectively).

Targeted resequencing of the chromosome 8 region identifies potential regulatory variants
Targeted resequencing was performed on 16 dogs selected for variation in risk and non-risk haplotypes.
Sequence capture of the 3 Mb region on chromosome 8 identified 814 single nucleotide variants (SNVs) and 229 insertions and deletions (indels) that passed our filters. Median coverage across the region was 131x. Three synonymous coding variants were found in the SEL1L gene (cfa8:53,771,782, cfa8:52,779,502, cfa8:53,797,623). All other identified variants were potential modifiers, including 3′ UTR variants (three SNVs and one indel near CEP128, two SNVs near GTF2A1), up-and downstream gene variants, intron variants, and non-coding transcript exon variants (Additional files 2 and 3). Evaluation of the corresponding positions in the human genome determined multiple variants were in potential regulatory elements (of 685 that were converted [541 SNV, 144 indels]; based on H3K27AC marks and GeneHancer scoring). Two sets of variants were in enhancers for DIO2 (Type II Iodothyronine Deiodinase) and seven sets of variants were in enhancers for combinations of CEP128 (Centrosomal Protein 128), GTF2A1 (General Transcription Factor IIA Subunit 1), STON2 (Stonin2), and SEL1L (Additional file 4).

Shared association with mast cell tumor cases on chromosome 14
The second top association peak is on chromosome 14 and contains four SNPs from 11,778,977-11,807,161 bp ( Table 1). All SNPs were in strong LD (R 2 > 0.9) with the top SNP (OR = 1.18, p = 8.39 × 10 − 5 ). Three of the four SNPs had previously been reported to be associated with mast cell tumors (MCTs) among American Golden Retrievers [10]. Thus, we assessed our data in combination with the American Golden Retriever data from the publicly available MCT dataset. 1 After independently conducting the quality control protocol outlined in the methods section for each dataset, files were merged so that the new "case" population included TZL and MCT cases, whereas the reference population contained TZUS and controls from the TZL dataset and controls from the MCT dataset. Multidimensional scaling (MDS) was performed using PLINK to assess for population stratification (Additional file 5). The chromosome 14 peak for the combined dataset was wider and more strongly associated, with the top SNP reaching p = 1.5 × 10 − 9 ( Fig. 3; similar association shown in Additional file 6A without the addition of controls from the MCT dataset). A GWAS including the TZL dataset and only MCT controls showed no increased association at the chromosome 14 peak (Additional file 6B), confirming that this is a shared association for the two different cancers and not simply a result of increased power from the additional controls. We evaluated haplotype blocks in the combined dataset. The top SNP from the combined dataset was the same as the top SNP in the TZL-only dataset (BICF2G630521681; Table 2). These SNPs are part of a nine SNP haplotype block that spans 11,695,969-11,807,161 bp ( Fig. 4). When we ran a conditional GWAS controlling for the top SNP, none of the SNPs in the larger associated region remained significant (p > 0.3), suggesting they all represent one signal ( were predicted to be benign (PolyPhen-2 score < 0.15). Conversion of these coordinates to CanFam2 determined the non-synonymous mutations in SPAM1 and HYAL4 were identical to those identified in the MCT study [10]. Additional non-coding variants were identified near these genes, including 5′ UTR variants (two SNVs, one indel in HYAL4), 3′ UTR variants (two SNVs, two indels in HYAL4 and three SNVs in SPAM1), up-and downstream gene variants, and intron variants (Additional files 2 and 3). One synonymous coding SNP was identified in ENSCAFG00000024436 (cfa14:11,768,664) (Additional file 2).

Potential cumulative risk for chromosomes 8 and 14
Distribution of number of risk haplotypes by phenotype are shown in Fig. 5. Only 8 dogs (7 of which were cases) had > 3 risk haplotypes, so counts were categorized as 0, 1, > 2 for analysis. Number of risk haplotypes was significantly associated with TZL (p-value < 0.001), indicating a potential cumulative risk. Larger sample sizes are necessary to evaluate statistical interaction of the chromosome 8 and 14 haplotypes.

Additional significantly associated GWAS SNPs
Associated SNPs were also seen on chromosomes 2, 17, and 29, but our study did not have the power to accurately determine the regions of association. We conducted a restricted maximum likelihood analysis [11], assuming TZL has a 2% prevalence in the Golden Retriever breed, and found that the combined set of 17 significant SNPs in our dataset (Table 1)

Discussion
In a GWAS to identify genetic risk factors for TZL in Golden Retrievers, we identified associated regions on chromosomes 8 and 14. Subsequent resequencing of a subset of dogs identified non-synonymous mutations in three hyaluronidase genes on chromosome 14 (SPAM1, HYAL4, and HYALP1). Coding mutations were not found in the chromosome 8 region but identified variants may be located in regulatory elements for numerous genes, including DIO2, CEP128, GTF2A1, STON2, and SEL1L.
Mutations in hyaluronidase genes are associated with risk for TZL and MCT The four SNPs significantly associated with TZL are depicted in red and the nine-SNP haplotype block they represent is shaded in yellow. e Closeup of the region from 11.7-11.8 Mbp where coding mutations (shown in red) were found on resequencing. f Haplotype block containing nine associated SNPs on cfa14 (BICF2G630521558, BICF2G630521572, BICF2G630521606, BICF2G630521619, BICF2P867665, TIGRP2P186605, BICF2G630521678, BICF2G630521681, BICF2G630521696). The risk haplotype was CTTCGGACG and non-risk was TCCTTAGTA. Dogs were considered recombined if neither combination was present and were considered unknown if the genotype for one or more SNPs was missing potential shared mechanism for TZL and MCT pathogenesis. One potential mechanism is via hyaluronan turnover, which is caused by the interaction of hyaluronan and CD44, a cell surface glycoprotein expressed on both T cells and mast cells [12]. This turnover leads to increased low molecular weight hyaluronan, the byproducts of which are pro-inflammatory and pro-oncogenic, with implications in cell proliferation, migration, and angiogenesis [13,14]. In contrast, high molecular weight hyaluronan and decreased hyaluronidase activity have been associated with the increased longevity and cancer resistance seen in naked mole rats [13]. It would be informative to measure hyaluronan in TZL and controls to determine whether the ratio of low to high molecular weight hyaluronan is altered in TZL. Most mammals have six hyaluronidase-like genes, clustered on two chromosomes. In dogs, HYAL1, HYAL2 and HYAL3 are clustered on cfa20, whereas SPAM1, HYAL4, and ENSCAFG00000024436 are clustered on cfa14. ENSCAFG00000024436 is homologous to HYALP1, which is an expressed pseudogene in people [15]. HYALP1 is believed to be functional in other mammals [15], although its functional status is unknown in dogs. SPAM1 is considered a testis hyaluronidase and is important during egg fertilization by sperm [16]. However, SPAM1 has been detected in the epididymis, seminal vesicles, prostate, female genital tract, breast, placenta, fetal tissue, and certain malignancies [17][18][19], suggesting it is multifunctional and not sperm-specific. Despite the potential shared pathogenesis of TZL and MCT, we did not see an association between MCT and TZL in our dataset. Of dogs where medical history was known, 3/76 TZL (4%), 8/142 TZUS (6%), and 4/ 103 controls (4%) had a history of or concurrent MCT. This suggests these the diseases develop independently despite their shared mechanisms. More research is necessary to understand the role of these hyaluronidases in dogs and evaluate how the observed variants alter expression of hyaluronidases and downstream signaling.

Thyroid hormone metabolism may influence TZL risk
In a parallel study, we determined dogs with hypothyroidism were significantly less likely to develop TZL than dogs without hypothyroidism [9]. As thyroid hormone plays an important role in cell growth and metabolism, we hypothesize that lack of this hormone may decrease T cell proliferation and therefore help prevent the development of TZL. A recent study reported an association between polymorphisms in CEP128 and autoimmune thyroid disease in humans, although the mechanism underlying this association is unclear [20]. While we did not identify coding mutations within CEP128, it is possible that mutations we identified in regulatory elements could have similar downstream effects. Additionally, while SnpEff did not predict our SNVs to be modifiers of DIO2 or TSHR, it is possible that the regulatory elements of these genes are far up-or downstream as seen in people. Canine genome annotations for this region may not yet be able to predict these relationships.
While canine hypothyroidism is generally thought to be caused by lymphocytic thyroiditis or idiopathic atrophy [21], it is plausible that changes in expression of DIO2 or TSHR could influence its development. Thyroid hormone regulation depends on an axis of multiple hormones and organs. Thyroid stimulating hormone, released from the pituitary, binds TSHR on the thyroid gland, causing release of thyroxine and, to a lesser extent, triiodothyronine [22]. DIO2 is one of two hormones Recombined haplotypes were considered non-risk. Dogs were considered unknown if the genotype for one or more SNPs was missing responsible for converting thyroxine to triiodothyronine, the more active form, in the peripheral organs [23]. It is feasible that changes in the expression of either of these genes could alter thyroid hormone production and function. SEL1L, another gene in this region, encodes a protein that is part of a complex involved in endoplasmic reticulum-associated degradation of misfolded proteins [24]. Interestingly, levels of Deiodinase 2, the product of DIO2, are tightly regulated, and synthesis can be inhibited by endoplasmic reticulum stress via endoplasmic reticulum-associated degradation [25]. Thus, alterations in SEL1L and endoplasmic reticulum stress could also impact thyroid hormone regulation. While a role of thyroid hormone is plausible based on variants identified on cfa8 and our parallel finding of an inverse association of hypothyroidism and TZL risk, we cannot rule out the possibility of a spurious finding due to chance overrepresentation of dogs with hypothyroidism among our control population. Further studies are needed to validate this finding in an independent population.

Golden Retriever predisposition to TZL
We believe TZUS represents a precursor state to TZL, so we hypothesized TZUS dogs would share the same genetic variants as TZL cases. However, we were unable to differentiate TZUS from controls in our GWAS analysis. The high prevalence of TZL among Golden Retrievers and corresponding high prevalence of TZUS suggest the genetic basis for developing CD5 + CD45 − T cells may be fixed among this breed, and different from the genes that control progression to neoplasia in these cells. If this is the case, we would be unable to identify the genetic risk factor for developing CD5 + CD45 − T cells in our study. Future studies may evaluate fixed regions of the Golden Retriever genome to identify candidate genes. Additionally, a GWAS of TZL among a less predisposed breed may distinguish additional associated regions not identified in our study. It is worth noting that Golden Retrievers of European descent appear less likely to develop TZL [26]. As such, delving into genomic differences between European and American Golden Retrievers may provide insight into regions that could underly TZL risk.

Conclusions
Canine genomics are informative for human genomics and offer computational benefits due to the comparatively recent development of dog breeds. Within a dog breed, there is reduced genetic variation [27,28], allowing us to use smaller sample sizes and fewer genetic markers when evaluating genetic risk factors for canine diseases [28][29][30]. Little is known about the functional implications of the mutations identified on cfa8. Since variants in this region are in moderate to high LD, it is difficult to prioritize which variants are important in disease pathogenesis versus which are bystanders inherited with the causative mutation. Additional studies are necessary to elucidate these associations and better understand the effect of these variants. The likely importance of hyaluronidases and shared association with MCT is noteworthy and warrants further investigation. Further research will increase our understanding of how these coding mutations alter hyaluronidase function. Ultimately, future research will help elucidate TZL pathogenesis and identify causative variants that may be biomarkers or disease risk or potential therapeutic targets.

Study participants
All dogs were recruited from the privately-owned pet population in the United States from October 2013 through May 2015. The study was conducted with approval from the Colorado State University Institutional Animal Care and Use committee (Protocol 13-4473A). Written informed consent was obtained from all dog owners; dogs remained under the custody and care of their owners for the duration of the study. Detailed recruitment information for the larger study population has been previously described [9]. Briefly, TZL cases were identified through submissions to Colorado State University's Clinical Immunology laboratory. Lymphoma-free Golden Retrievers aged > 9 years were recruited from 1) the submitting clinic of TZL cases and 2) email solicitation to Golden Retriever owners in the Canine Lifetime Health Project [9]. Peripheral blood samples were obtained from all participants, and a subset of dogs with adequate DNA quality and quantity (as described below) were used for the GWAS study. Flow cytometric analysis of peripheral blood samples was used to categorize dogs as TZL, TZUS, or controls. Flow cytometry was carried out as previously described [3] and samples were analyzed with the antibody combinations listed in Additional file 8 using a 3-laser Coulter Gallios. 2 We defined TZL cases (n = 95) as a homogeneous expansion of CD5 + CD45 − T cells and lymphocytosis (> 5000 lymphocytes/μL), lymphadenopathy (noted on veterinarian-completed submission form), or both (Additional file 9A). We defined dogs as TZUS (n = 142) if they were > 9 years of age and had no history or clinical signs of a lymphoproliferative disease (no lymphadenopathy or lymphocytosis), but had a small population of CD5 + CD45 − T cells on flow cytometry (> 1% of total T cells; Additional file 9B). Control dogs (n = 101) were those > 9 years of age with no history or suspicion of a lymphoproliferative disease, no population of CD5 + CD45 − T cells identified by flow cytometry (< 1% of total T cells; Additional file 9C), and no evidence of a clonal T cell population in the peripheral blood as assessed using the PCR for Antigen Receptor Rearrangement (PARR) assay [31].

Genome-wide association mapping
Genomic DNA was extracted from white blood cell pellets of peripheral blood samples using the QIAamp DNA blood Midi Kit. 3 DNA quality and quantity were assessed using NanoDrop 4 and only samples with 1) a concentration of at least 35 ng/μL, 2) over 1000 ng of DNA total, and 3) an A260/280 ratio of 1.7-2.0 were submitted for genotyping. Genotyping was performed at GeneSeek Inc. 5 using the Illumina 170 K CanineHD BeadChip SNP array [32]. PLINK software [33,34] was used to perform data quality control, removing individuals with call rates < 97.5% and SNPs with call rates < 97.5% or minor allele frequency < 5%. Only autosomal chromosomes were analyzed.
MDS was performed using PLINK to ensure there were no distinct groupings based on phenotype (TZL/TZUS/control), which would indicate population stratification and/or residual confounding. While we saw no obvious deviations on these plots, we were concerned for bias based on European versus American descent due to apparent divergence in this breed [10,29]. To determine whether any of our dogs were likely of European descent, we downloaded a publicly available dataset 1 [10] including both European and American Golden Retrievers, conducted the same quality control protocol as described above, and merged the two datasets. We then created MDS plots to determine which dogs in our dataset clustered with the known European dogs. Dogs with a value for the first cluster < 2.5 standard deviations from the mean value for known European dogs were removed (Additional file 10).
Genome-wide complex trait analysis (GCTA) software [11] was used to estimate a genetic relationship matrix (GRM) and remove highly related individuals (one dog was removed for each pair of dogs with the same phenotype and a GRM value of 0.25 [half-sibling level]). The diseasegenotype association was estimated using GCTA, adjusting for the first principal component of the GRM in a mixed linear model to correct for cryptic relatedness [35]. QQ-plots with 95% CIs calculated based on the beta distribution of observed p-values were created to assess possible genomic inflation and to establish suggestive significance levels.
It is currently unclear whether TZUS dogs share some or all of the genetic predisposition for TZL, or they simply represent normal variation among controls. To determine in which category they are most genetically informative, we performed separate association studies comparing 1) combined TZL and TZUS versus controls, 2) TZL versus combined TZUS and controls, and 3) pairwise comparisons (TZL versus control, TZL versus TZUS, and TZUS versus control). Quantile-quantile plots were used to determine which analyses had enough power to be evaluated. After peaks of interest were identified, we used GCTA to conduct a conditional GWAS. By adjusting for the genotype of the top SNP of each peak of interest, we can evaluate whether the other significantly associated SNPs in that peak are statistically independent from the top SNP (i.e. whether there are one or multiple peaks).

Haplotype block definition and association analysis
Haplotype blocks for associated loci were defined based on boundaries identified both by clumping analysis in PLINK and R 2 -based LD analysis in Haploview [36]. For clumping analysis, the dataset was subsetted to a region including all SNPs with R 2 > 0.2 from the top SNP on that chromosome. Because of the genomic structure of dogs, a maximum window size was set to 5000 kb. For each block, their haplotype, frequencies, chi-square test, and p-value were obtained using PLINK. To assess cumulative risk, we additionally categorized dogs by number of risk haplotypes (zero to four) present for the associated regions. Logistic regression was used to evaluate the association of number of risk haplotypes and TZL.

Targeted sequencing
Sixteen dogs (10 TZL, 3 TZUS, 3 controls), selected for optimal haplotype representation (i.e. to represent risk-risk, risk-non-risk, and non-risk-non-risk for each haplotype) and distribution in MDS plot, were sequenced across the associated genomic regions (Additional file 11). A custom sequence capture array was designed (NimbleGen SeqCap EZ Developer Kit 6 ) to cover the top associated regions (16.1 Mb total; CanFam 3.1 cfa8:51,700,000-54,800,000, cfa14:8,000, 000-16,100,000, cfa29:7,600,000-12,500,000). Regions were chosen to include all SNPs with R 2 > 0.2 from the top SNP. Standard indexed Illumina libraries were prepared with the KAPA HyperPlus library preparation kit. 7 Targeted pooled (4 samples) libraries were captured by hybridization in solution using the custom probe pool. Estimated coverage of the 16.1 Mb target region was 95%. Library constructions, pooling and captures were performed following the SeqCap EZ HyperCap Workflow User's Guide (V 1.0) 6 . Per suggestion from NimbleGen, developer's reagent (06684335001) was used in place of COT-1. Index-specific hybridization enhancing oligonucleotides were used to improve the efficiency of genomic region capture. Sequencing was carried out on an Illumina NextSeq 500.