Whole genome sequence analysis of equid gammaherpesvirus -2 field isolates reveals high levels of genomic diversity and recombination

Equid gammaherpesvirus 2 (EHV2) is a gammaherpesvirus with a widespread distribution in horse populations globally. Although its pathogenic significance can be unclear in most cases of infection, EHV2 infection can cause upper respiratory tract disease in foals. Co-infection of different strains of EHV2 in an individual horse is common. Small regions of the EHV2 genome have shown considerable genetic heterogeneity. This could suggest genomic recombination between different strains of EHV2, similar to the extensive recombination networks that have been demonstrated for some alphaherpesviruses. This study examined natural recombination and genome diversity of EHV2 field isolates. Whole genome sequencing analysis of 18 EHV2 isolates, along with analysis of two publicly available EHV2 genomes, revealed variation in genomes sizes (from 173.7 to 184.8 kbp), guanine plus cytosine content (from 56.7 to 57.8%) and the size of the terminal repeat regions (from 17,196 to 17,551 bp). The nucleotide sequence identity between the genomes ranged from 86.2 to 99.7%. The estimated average inter-strain nucleotide diversity between the 20 EHV2 genomes was 2.9%. Individual gene sequences showed varying levels of nucleotide diversity and ranged between 0 and 38.1%. The ratio of nonsynonymous substitutions, Ka, to synonymous substitutions, Ks, (Ka/Ks) suggests that over 50% of EHV2 genes are undergoing diversifying selection. Recombination analyses of the 20 EHV2 genome sequences using the recombination detection program (RDP4) and SplitsTree revealed evidence of viral recombination. Analysis of the 18 new EHV2 genomes alongside the 2 previously sequenced genomes revealed a high degree of genetic diversity and extensive recombination networks. Herpesvirus genome diversification and virus evolution can be driven by recombination, and our findings are consistent with recombination being a key mechanism by which EHV2 genomes may vary and evolve.

showing clinical signs of disease and in healthy horses [19,26,48,58,91], including in Icelandic horses which have been isolated from direct contact with other horse populations for over 1000 years [92]. A range of clinical signs have been associated with EHV2 infection, including mild to severe upper respiratory tract disease, and keratoconjunctivitis [5,9,44,58,99]. EHV2 has been detected and isolated from a variety of tissues and samples including the trachea, lymph nodes, lung, spleen, kidney, gastric mucosal epithelium, bronchoalveolar lavage fluid, as well as nasal and ocular swabs of both healthy horses and those showing different signs of clinical disease [28,29,65,92]. Herpesviruses establish lifelong latent infection within the host at varying sites [1,71]. B lymphocytes are the major site of latency for EHV2 infection [23,55].
This study aimed to investigate EHV2 genome variation and recombination by the determination and analysis of the full genome sequences of historical (archived) and contemporary EHV2 isolates from Australian horses.

Culture of contemporary EHV2 isolates from peripheral blood mononuclear cells (PBMCs)
Contemporary EHV2 isolates were collected by coculture of peripheral blood mononuclear cells (PBMCs) originating from 5 different horses, with equine fetal kidney (EFK) cell monolayers after 5 to 21 days of incubation. Of 149 plaques, most were identified as either EHV2 or equid gammaherpesvirus 5 (EHV5) by polymerase chain reaction (PCR) ( Table 1). In total, more plaques tested positive to only EHV2 (n = 58/149, 39%) compared to plaques positive for only EHV5 (n = 26/149, 17%) ( Table 1). To ensure distinct EHV2 isolates were selected for genome analysis, plaque purified isolates from the PBMC co-cultures were initially characterised by qPCRhigh resolution melt (qPCR-HRM) curve analysis of the gB gene and then a subset of the 58 EHV2-positive isolates (n = 8) with distinct melt profiles were selected for sequencing (Table 1) along with a further 10 EHV2 historical isolates (   (homologue of EBV gp350) while the strain G9/92 encodes the full-length protein. Three other isolates in this study also contained mutations in ORF51. These, along with other genes containing mutations that disrupt ORFs (INDELs indivisible by three or deletions resulting in a frameshift) were predicted to result in a truncated protein being expressed as summarised in Table 3.
The percentage nucleotide identity between the 20 genome sequences revealed a high level of genetic diversity amongst EHV2 strains, where nucleotide identities ranged from 86.2 to 99.7%. A nucleotide identity matrix is provided (see Supplementary Table 2, Additional file 2).
Genome sequence polymorphism was interrogated by DnaSP analysis [72,77]. Alignment of the 20 whole genomes contained 175,141 nucleotide sites. Of these, 149,886 sites contained no gaps and 9.1% (15,879) of these sites were polymorphic (S). The average number of nucleotide differences (k) between any two genomes was 4323. Estimated inter-strain nucleotide diversity (π) was 0.029 which represents 2.9% of the analysed sequence sites across the full genome. We analysed the π values of other selected viruses using publicly available complete genome sequences. For equine alphaherpesvirus 4 (EHV4, n = 14) and equine alphaherpesvirus 1 (EHV1, n = 22) strains published by Vaz et al. [95], π was much lower, at 0.0014 and 0.0011, respectively. The inter-strain diversity of EHV2 is more comparable to that of the highly variable betaherpesvirus human cytomegalovirus (HCMV, n = 124, π = 0.021) [77] than that of the gammaherpesvirus EBV (n = 60, π = 0.0079) [64].

Analysis of the diversity and divergence of EHV2 genes
The percentage nucleotide identity amongst the 20 EHV2 isolates for each gene was evaluated from the alignment of each gene sequence (Fig. 3a). The average nucleotide diversity (π) for individual gene sequences was determined for all pairwise comparisons of each gene from the 20 isolates (Fig. 3b). Genes with low nucleotide diversity values (Ka < 0.002) in EHV2 were mostly involved in DNA-processing (ORFs 6, 9, 18, 25, 26, 44 and 54), while  Table 5) [77]. The EHV2 core genome contains 13 genes that are unique to the EGHVs (EHV2 and EHV5). These genes, annotated with the 'E' prefix are located towards both genomic termini, or close to the internal repeat regions and mostly (9 of 13) displayed high diversity (Ka values > 0.025 and π values 0.035 -0.381) (Fig. 3, Table 5,  and Supplementary Table 3, Additional file 2).
The Ka/Ks ratio is used as a measure of the selection pressure acting on a gene [98]. A ratio close to zero indicates strong negative/purifying selection, a ratio close to 1 indicates neutral selection or genetic drift, while a ratio higher than 1 indicates positive/diversifying selection. Almost half of the EHV2 genes (47%) had a Ka/Ks ratio < 0.5, 17% had a ratio between 0.5 and 1, and 36% had a ratio > 1 (see Supplementary Table 3, Additional file 2). Only a weak correlation (Spearman's rank correlation analysis, ρ = 0.212, P = 0.06) was observed between selection pressure (Ka/Ks ratio) and nucleotide diversity (π) of EHV2 genes. High Ka/Ks ratios do not directly translate to a high diversity and vice versa.
The gB, gH and DNA polymerase genes are commonly investigated in EHV2 studies and thus additional analyses were performed for gB and gH genes that incorporated other publicly available sequences from other EHV2 isolates (see Supplementary Table 4, Additional file 2 for details of other sequences). Of these 3 genes, the highest nucleotide identities were observed for the DNA polymerase gene, which also had the lowest Ka value (0.001) ( Table 4). All three genes had Ka/Ks ratios < 0.5 suggesting negative selection (see Supplementary Table 3, Additional file 2).

Amino acid sequence analysis of EHV2 gB sites
The amino acid sequence of the gB from 30 EHV2 strains (including 18 determined in this study and 12 sourced from GenBank, see Supplementary Table 4, Additional file 2) showed conservation of both the proposed endoproteolytic furin cleavage site R-X-K/R-R at residues 437-440 and the GQLG sequence at residues 580-591 reported to be conserved in all herpesviruses examined. The 13 cysteine residues are also conserved in the 30 gB sequences including synonymous substitution observed in two foreign isolates [34,36,75,81]. Antigenic sites of EHV2 gB had previously been characterised based on variability of 4 strains (86-67, 2-141-67, 5FN and T-2) and their reactivity to a panel of monoclonal antibodies [36]. Variability amongst the 30 gB aa sequences within the antigenic site I (aa 27-51), show 18 isolates were 86-67-like and shared between 89.47% to 100% aa identity at this site. The antigenic sites designated II and III (Fig. 4) are further confirmed as regions of high variability amongst EHV2 strains and the 30 strains shared 55-100% and 31-100% aa identities at these sites, respectively. Figure 4 also shows regions of amino acid variation beyond those previously identified antigenic sites.

Phylogenetic analysis of EHV2 GPCR gene family
EHV2 encodes 3 GPCR genes (ORF74, E1, and E6) which share some similarities to cellular chemokine receptors    [87]. The nucleotide sequences of E6, ORF74 and E1 from this study were aligned with sequences (see Supplementary Table 4, Additional file 2) representing previously identified genogroups [75]. The genogroups identified by phylogenetic analysis in the current study were consistent with previous findings [75] except for ORF74. E6 was the least variable gene and divided the isolates into 2 clear genogroups (Fig. 5A), while E1 generally grouped in the 6 existing genogroups (Fig. 5B). ORF74 grouped similarly to the Sharp study [75] although several additional sequences identified in the current study might form a new genogroup (Fig. 5C). Consistent with this, both ORF74 and E1 are amongst the most divergent genes in EHV2, Ka > 0.025 (Table 5) with Ka/Ks ratios of 0.25 and 0.4, respectively, indicating they are under a negative/purifying selection (see Supplementary Table 3, Additional file 2). It was observed that the EHV2 viruses recovered from the same horse did not share similar genogroups in all the GPCR genes (E6, E1, and ORF74) (Fig. 5).

Recombination analysis of EHV2 genome
The 20 aligned EHV2 genome sequences were examined for evidence of recombination across the complete (excluding the left TR) and unique genome regions, as well as repeat regions, IR (1 and 2), and TR using SplitsTree4 [38]. The reticulate networks and pair-wise homoplasy index (Phi) test detected significant recombination between the 20 EHV2 strains in all the genomic regions analysed (Fig. 6). Reticulate phylogenetic recombination networks were also generated for some genes (gB, gH and the 3 GPCRs) (see Supplementary Fig. 5, Additional file 1). The RDP4 program was used to detect recombination events and recombination breakpoints. Evidence of multiple recombination events was detected ( Fig. 6 and Supplementary Table 5, Additional file 2) and recombination breakpoints were widespread through the length of the genome (Fig. 7). Overall, 155 recombination events were reported, representing those detected by 3 programs or more. Of these 105 were detected by 5 or more programs (see Supplementary  Table 5, Additional file 2). While a large number of breakpoints were shown across the length of the EHV2 genome, no significant recombination hot spots were detected.

Discussion
Earlier studies have reported genetic diversity in EHV2 using different techniques, including restriction endonuclease digestion of viral DNA, antigenic studies and evolutionary studies [12,14,36,54,67]. Due to few available complete genome sequences for EHV2 and EHV5, partial and complete sequences of selected genomic regions such as gB, gH, and DNA polymerase have been analysed in various studies, and have consistently revealed a higher diversity for EHV2 compared with EHV5 [5,83,91]. The full genome sequences of eighteen EHV2 isolates generated in this study add substantially to the two full genome sequences previously reported [87,102]. Genomic heterogeneity was observed between EHV2 viruses isolated from individual horses in this study supporting previous reports that genetically heterogenous strains of EHV2 can coinfect the same animal [7,12,14,91]. Advances in high-throughput sequencing and genome-wide analyses of herpesviruses have aided the exploration of their diversity and evolution [69]. Analysis of the twenty EHV2 genome sequences revealed a high level of genomic diversity, consistent with previous reports of genetic diversity between EHV2 isolates [9,12,11,14,15,75,91]. The level of inter-strain nucleotide diversity of the EHV2 genomes is higher compared to the other herpesvirus genomes of EBV (gammaherpesvirus), EHV1 and EHV4 (alphaherpesviruses) and HCMV and murine cytomegalovirus (MCMV, betaherpesviruses) genomes reported elsewhere [64], [77], [79], [95].
Genetic variation is a putative driver of Kaposi's sarcoma-associated herpesvirus (KSHV) and EBV infection, and may be associated with the site of isolation, clinical syndromes, and geographical location [18,40,63,64,84,101]. Genomically distinct EHV2 viruses were isolated from individual horses in this and other studies [9,11,12,14,15,75,91]. Whether this variation is a driver of EHV2 infection remains to be determined, although high genetic diversity may enhance the ability of the virus to modulate host immunity [9,25,36,37,58,62,73,91,92]. Even though all the EHV2 isolates sequenced in this study were from Victoria, Australia, representatives of each genogroup of E1, E6 and ORF74 were found in this and other studies [75]. Similarly, variations in gB gene appeared to show no geographical association [5,58,83,91]. These findings suggest there are no geographic associations with EHV2 genomic variation. This is in contrast to findings in some other gammaherpesviruses (EBV) where geographical association with genome sequence have been observed [17,88,103].
Variations in the nucleotide diversity of individual genes was observed throughout the length of EHV2 genome (Fig. 3b). This is consistent with reports in other herpesviruses including HCMV, human herpes simplex virus 1 (HSV1), EBV and KSHV, where isolates display uneven distribution of diversity, with high diversity observed in genes required for persistence viral infection and latency establishment, including structural genes and some glycoproteins [64,70,86,101].
EHV2 genes that are more diverse have a range of roles in replication and pathogenesis. These include viral proteins that are targets of the immune response (gB, gH) [11,36,76,82,93] and some genes involved in viral immune evasion (ORFs 52 and 64, homologs of EBV BLRF2 and BPLF1) [24,94], as well as in the establishment of latency (ORF73 homologue of KSHV latency associated nuclear antigen LANA-1) [31,80,100]. The diversity of these genes may be driven by pressure from escaping immune responses and establishing successful viral infection. Antigenic variation in neutralisation epitopes of EHV2 gB have been suggested as a means of immune escape and may drive some of the genetic diversity detected in gB and perhaps other glycoproteins [11,36]. During the latent phase, maintenance of the episomal DNA requires Maximum-likelihood phylogenetic trees for complete nucleotide sequences of (a) E6, (b) E1, and (c) 74 genes for a panel of EHV2 Australian isolates and representatives of previously reported EHV2 genogroups for each gene [75]. The genogroup designation from a previous study are shown by the numbers [75]. Phylogenetic analysis was inferred by using the Maximum likelihood method based on the General Time Reversible model (GTR) using 1000 bootstrap replicates. The percentage (> 80%) of trees in which the associated taxa clustered together is shown next to branches. The trees were initially built by applying BioNJ method to a matrix of pairwise distances estimated using Maximum Composite Likelihood (MCL) approach. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site as indicated on the scale bar. Isolates in oval (red or black) indicate those recovered from the same horse replication and division of daughter cell nuclei, KSHV LANA-1 and EBV EBNA-1 are involved in tethering the viral episome to host cell chromatin to facilitate replication. This process entails modulating multiple cellular signalling pathway to recruit enzymes that modify chromatin, replication, and transcription factors to ensure persistent latent infection [6,32,68].
ORF51 is a unique gammaherpesvirus-gene with the homolog in EBV (BLLF1, gp350) known to mediate virus attachment to human B lymphocytes [94] and is a target of neutralizing antibodies in vivo [56,90]. EHV2 ORF51 (second most diverse gene) in this study displays a higher diversity than EBV BLLF1, marked by a higher number of non-synonymous variations [78]. The function of ORF51 in EHV2 is unknown and prior to this study, this ORF was severely truncated in the first complete EHV2 sequence identified   [87] while the second complete genome G9/92 encodes the full gene [102]. This study found full length gp350 homologues are encoded in 16 of the 20 complete genomes now known. Future studies are required to elucidate whether this protein mediates attachment of EHV2 to B cells in a manner homologous to EBV, and how the high level of diversity relates to its function during infection.
Most EHV2-specific genes are highly diverse and are located near the genomic termini (E1-3, E6A, E9-10). This echoes findings from MCMV genome analyses where the level of genetic variability is highest at the genome termini and the most diverse genes are specific to MCMV [79]. Similarly, both ends of KSHV genome have hypervariable genes (K1 and K15) [74]. The presence of more lineage specific genes at the genomic termini is a consistent feature of herpesvirus evolution [20].
We identified 27 EHV2 genes with Ka/Ks ratios greater than 1 (Table S3) compared to only one such gene in the MCMV genome (m102.1) and no HCMV and HSV1 genes with Ka/Ks ratio > 1 [77,79,86]. Nucleotide diversity (π) and Ka/Ks ratio of EHV2 genes are not strongly correlated (Spearman ρ = 0.212). The Ka parameter has been suggested to be relatively consistent for defining gene divergence [46,97]. Interestingly, some EHV2 genes, ). Similarly, the MCMV gB, which is the target of most circulating anti-MCMV antibodies despite the variability it displays, has a Ka/Ks ratio of 0.18 which indicates strong purifying selection [79]. EBV latency associated genes (EBNA2, EBNA3 and LMP1) displayed the highest diversity of all EBV genes, marked by a greater extent of nonsynonymous variations (Ka = 0.06 compared to 0.086 in EHV2) [64]. Functional constraints on these genes may explain the selection pressure acting on them, and the variations could have been introduced through recombination [75]. These pressures may also exist for some of the conserved EHV2 genes, similar to what has also been reported in alphaherpesviruses [89]. The data and results from this study would be useful for future studies on the possible differences in prevalence, pathogenicity or tropism of different EHV2 variants.
Recombination is an important mechanism through which genetic differences that have arisen by mutation can be shuffled to further increase genetic variability and then re-distributed through viral populations [4]. This has been demonstrated for EBV latency associated genes which show high levels of diversity and have high recombination rates [8,64,101]. Previously, genetic variability of EHV2 genes such as gB, gH and the GPCRs (ORFs 74, E1 and E6) was suggested to be due to recombination [11,75]. The number of recombination events detected in EHV2 (155 events in 20 genome sequences as detected by 3 or more programs, and 105 events as detected by 5 or more programs (see Supplementary Table 5, Additional file 2) is higher compared to EHV4 (5 events in 14 genome sequences as detected by 3 or more programs) [95], and MCMV (86 recombination events in 12 genome sequences as detected by 5 or more programs) [79] and consistent with recombination being a key driver of EHV2 genetic diversity [75]. Recombination and genomic variation are correlated in herpesviruses, including in the betaherpesvirus HCMV and several alphaherpesviruses such as HSV1, avian infectious laryngotracheitis virus (ILTV) and EHV4 [10,50,77,95]. In all these herpesvirus species, high recombination rates have been attributed, in part, to a high prevalence of infection and high rates of co-infections in hosts. Infection with EHV2 is known to be highly prevalent in horse populations [30,61,83] and co-infection with different EHV2 strains is common in horses [9,11,12,14,75,91]. In our study, we observed recombination breakpoints spread across the EHV2 genome, with more points concentrated in some genomic regions (genome termini) than in others. Similar observations have been made for other herpesviruses [64,79]. Multiple reticulate network trees revealed extensive recombination amongst EHV2 isolates as shown by isolates clustering in different groups when different genome regions or individual genes were analysed (complete, unique, and repeat regions, or gB, gH, and the GPCRs). In addition, EHV2 isolates from the same horse did not share the same genotype group for all the three GPCR genes, the gB gene and the gH gene.

Conclusion
Our understanding of the genome diversity of EHV2 increases with the availability of sequence data. The 18 EHV2 full-genome sequences generated from this study contribute to genomic studies of EHV2. Analyses of the resultant 20 genome sequences enabled us to assess EHV2 genetic diversity and recombination in genomic regions and individual genes. Our findings point to notable or unique characteristics of EHV2 compared to many other herpesvirus species, including a comparatively high number of recombination breakpoints, a high level of genetic diversity and a large proportion of genes seemingly undergoing diversifying selection. These results are likely to reflect the biology of this evolutionarily successful virus, including its infection and immune evasion characteristics and the high prevalence of infection in horses, as well as co-infections in individual hosts.

Viruses and cells
Eighteen Australian isolates of EHV2 were used in this study, including ten archived viruses isolated between 1967 and 1994 and eight isolates collected for this study in 2018 (Table 2).
To isolate the contemporary viruses, PBMCs were isolated from two ponies and three Thoroughbred (TB) horses in Victoria, Australia. Whole blood was collected with approval from the Faculty of Veterinary and Agricultural Sciences Animal Ethics Committee at the University of Melbourne (approval number 1714237.1). Whole blood was collected from the 5 horses (80 ml of blood per horse was collected into heparin treated tubes, final concentration 20 IU/ml) and immediately chilled on ice and transported to our laboratories for Ficoll-Paque purification as previously described [55]. Approximately 10 6 PBMCs in 500 µl volumes were overlayed onto primary equine foetal kidney (EFK) cell monolayers under methyl cellulose overlay media in 6-well trays [85]. Isolated viral plaques were picked and identified as either EHV2 or EHV5 by PCR screening [27] before selected plaques were plaque-purified on EFK monolayers two more times followed by amplification of virus stocks and virus purification using sucrose or Ficoll gradients as previously described [2,85]. PCR identification of these plaques is summarised in Table 1.
To select a range of distinct viruses for sequencing from the contemporary collection, qPCR-HRM curve analysis of a region of EHV2 gB was used to compare EHV2 isolates. Primers corresponding to nucleotides (nt) 2096-2115 (forward primer 5' -GCA AGG TGG TGG TCA ATG TG -3') and 2535-2555 (reverse primer 5' GGC TCA TAA TCC CCC TCA TCG -3') were used, numbered according to the EHV2.86-67 genome (GenBank accession NC_001650). Details of the isolates selected for whole genome sequencing are shown in Tables 1 and 2.

DNA extraction, genome sequencing and assembly
Viral DNA was extracted from purified virus using the High Pure PCR template preparation kit (Roche) according to the manufacturer's instructions. Following DNA extraction, sequencing of all the selected isolates was performed as described previously [50]. Briefly, libraries were prepared using 1 ng of DNA with the Illlumina Nextera XT kit and sequenced according to the manufacturer's instruction using Illumina MiSeq with V3 chemistry (Illumina 15,046,563 v02). The reads produced consist of paired end read of 150 bp.
All the analysis performed in Geneious was done using version 11.1.5 (www. genei ous. com).
The quality of the resulting paired end sequencing reads was assessed using FASTQC software (https:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/) [3]. Read quality was processed using BBDuk 1.0, a plugin in Geneious [43]. Assembly was done using Geneious mapper at medium-low sensitivity option with 5 iterations as recommended by the program [95]. Sequence reads were assembled by first mapping to the reference strain EHV2 86-67 (NC_001650) excluding the first terminal repeat region, with a modification of the maximum mismatch allowance to 40% to generate a 'preliminary genome sequence' for each isolate. The 'generate consensus sequence' option in Geneious was used to derive the consensus sequences based on the majority of nucleotides in the coverage area at a cut off of 30. All other parameters used default settings. The genome assembly of each isolate was completed by iteratively mapping the sequence reads of each isolate onto its preliminary genome sequence until the entire genome length was covered, or the number of reads mapped to the preliminary genome sequence stopped increasing. The reference sequence was used in areas with low coverage and quality.
De novo assembly was used to assemble two of the genomes (Fin60-72 and 157IFEye-69) to compare the two methods of genome assembly as described previously [95]. The alignment of the two genome sequences derived from the two methods showed some differences in selected genome regions, particularly in repeat rich areas. The sequence reads were remapped separately to each genome sequence as reference to closely examine the regions of disagreement, which are shown to be mostly associated with areas containing repeats and INDELS. Some CDS regions showing disagreement, specifically those with large INDELs, were characterized using PCR amplification and analysis of amplicon size using the following primes (

Genome annotation and sequence alignment
The annotations from the reference genome (86/67) were transferred to the new genome sequences using the "Annotate From" and "Transfer Annotation" functions in Geneious on default settings. Open reading frames (ORFs) were detected by using Geneious 'Find ORFs' option. ORFs were recognized as initiated by the start codon (ATG), ending with a termination codon (TGA, TAA or TAG) and a minimum length of 120 bp. The transferred annotations were then curated using the predicted ORF sizes. ORFs were individually extracted and genomic features in EHV2 strains 86-67 and G9/92 were verified by BLAST searches of the GenBank database (National Center for Biotechnology Information (NCBI) website http:// www. ncbi. nlm. nih. gov/ BLAST/ using BLASTn, BLASTx, tBLASTx.
The complete genome sequences of all 18 EHV2 isolates derived from this study were aligned together with strains 86-67 and G9/92 using the Multiple Alignment with Fast Fourier Transformation (MAFFT) 7.450 plugin in Geneious [41,42]. The prototype strain EHV2 86-67 [87] as used as reference sequence for the alignments. To extend the analysis of some EHV2 genes, such as glycoproteins and GPCRs beyond 18 sequences generated in this study, publicly available complete sequences of these genes in NCBI database were included in our analyses (see Supplementary Table 4, Additional file 2). Comparative analysis of aligned complete genome and selected genes sequences were performed in Geneious 11.1.5.

Comparative sequence analysis
To perform phylogenetic analysis, the best-fit model was first determined using the IQ-TREE web server http:// iqtree. cibiv. univie. ac. at/ [59]. GTR model was indicated as best fit with a gamma distribution and BIONJ tree builder. The maximum likelihood phylogenetic analysis was performed on the alignments of complete EHV2 gB, gH, E1, E6 and ORF74 genes from this study and those sourced from NCBI (see Supplementary Table 4, Additional file 2), and complete genome (excluding 1TR) using GTR model (GTR + 1 + G4) within the Molecular Evolutionary Genetics Analysis (MEGA) version 7 [47,96]. Trees were initially built using an improved version of Neighbor joining (NJ) algorithm (BIONJ) and Nearest-Neighbor-Interchange (NNI) for Heuristic analysis using the Maximum Composite Likelihood (MCL) with 1000 bootstrap replicates and a support threshold of 90% [33].
The DNA polymorphism option of the DNA Sequence Polymorphism (DnaSP) software 6.12.01 × 64 [72] was used to calculate whole genome diversity parameters, including nucleotide diversity (π), the number of polymorphic sites, and the average number of nucleotide differences, excluding gapped sites. Window size and step size were set at 500 nt and 100 nt, respectively, as previously described [77]. Also, nonsynonymous substitutions per nonsynonymous site (Ka) and synonymous substitutions per synonymous site (Ks) were calculated using the synonymous and nonsynonymous substitutions option [72].
Sequence similarity plots between EHV2 strains were performed using SimPlot software version 3.5.1 as previously described [49]. This software calculates and plots the percent identity of the query sequence to a panel of reference sequences in a sliding window, which is moved across the alignment in steps.

Recombination analysis
In preparing the aligned genome sequences for recombination analysis, tandem repeat regions were first identified and deleted using Phobos [53] plugin in Geneious as previously described [95]. Recombination analysis was performed on aligned complete genome sequences, individual unique region, internal repeats (IR1 and IR2) and terminal repeat regions. Evidence of intraspecific recombination was examined using recombination detection programs, RDP4 version 4.95 [51,52] and SplitsTree4 V 4.14.6 [39]. Recombination network trees were generated as previously described [95] and the pair-wise homoplasy (Phi) test [15] was used to analyse the statistical significance of recombination networks as executed within SplitsTree4. Six different programs in RDP4 were executed using default setting to detect recombination breakpoints including RDP, GENE-CONV, 3Seq, Maximum Chi Square (MaxChi), SiScan and BootScan [51,52]. Only break points detected by at least three programs with a Bonferroni-corrected P value < 0.05 were reported. Duplicate and flagged breakpoints were omitted. A plot of the distribution of recombination break points along the length of the genome was generated as previously described [50].