Genomic nucleotide-based distance analysis for delimiting old world monkey derived herpes simplex virus species

Herpes simplex viruses form a genus within the alphaherpesvirus subfamily, with three identified viral species isolated from Old World monkeys (OWM); Macacine alphaherpesvirus 1 (McHV-1; herpes B), Cercopithecine alphaherpesvirus 2 (SA8), and Papiine alphaherpesvirus 2 (PaHV-2; herpes papio). Herpes B is endemic to macaques, while PaHV-2 and SA8 appear endemic to baboons. All three viruses are genetically and antigenically similar, with SA8 and PaHV-2 thought to be avirulent in humans, while herpes B is a biosafety level 4 pathogen. Recently, next-generation sequencing (NGS) has resulted in an increased number of published OWM herpes simplex genomes, allowing an encompassing phylogenetic analysis. In this study, phylogenetic networks, in conjunction with a genome-based genetic distance cutoff method were used to examine 27 OWM monkey herpes simplex isolates. Genome-based genetic distances were calculated, resulting in distances between lion and pig-tailed simplex viruses themselves, and versus herpes B core strains that were higher than those between PaHV-2 and SA8 (approximately 14 and 10% respectively). The species distance cutoff was determined to be 8.94%, with the method recovering separate species status for PaHV-2 and SA8 and showed that lion and pig-tailed simplex viruses (vs core herpes B strains) were well over the distance species cutoff. We propose designating lion and pig-tailed simplex viruses as separate, individual viral species, and that this may be the first identification of viral cryptic species.


Background
The alphaherpesvirinae comprise a subfamily within Herpesviridae, with most of its members establishing latency in the peripheral nervous system. The five genera which comprise the alphaherpesvirinae infect birds (Iltovirus, Mardivirus), sea turtles (Scutavirus), mammals (Varicellovirus, Simplevirus), as well as lizards (currently unassigned). Until fairly recently, simplex viruses were thought to only infect primates, however simplex viruses have been isolated from cattle, bats, rabbits, and marsupials [1][2][3][4][5]. Various species of macaque monkeys are the natural reservoir for the herpes B simplex virus. Herpes B was first described in 1933, following an incident where a 29-year-old laboratory worker was bitten by an asymptomatic monkey and later died from encephalitis [6,7]. Herpes B has been demonstrated to be highly neurovirulent with~80% mortality and is categorized as a BSL-4 level pathogen by the CDC [8,9]. In spite of considerable work with macaques in laboratory settings, as well as close contact between humans and macaques particularly in Asia, there have only been 46 documented cases of zoonotic transmission since 1933 [10,11]. A recent commentary has questioned the high neurovirulence of herpes B and has raised the possibility of higher rates of viral shedding in laboratory settings due to stress [11].
Herpes B has an approximately 156,400 bp genome, a high GC content of 74.5%, and has been shown to be closely related to Papiine alphaherpesvirus 2 (PaHV-2; herpes papio) and Cercopithecine alphaherpesvirus 2 (SA8). With the advent of next-generation sequencing (NGS) the genomes of 19 herpes B isolates have been sequenced [12][13][14]. The sequenced strains were isolated from six macaque species; Macaca (M.) fascularis (crabeating; cynomologous; cyno), M. fuscata (Japanese), M. mulatta (rhesus), M. nemestrina (pig-tailed), M. radiata (bonnet), and M. silenus (lion-tailed). Macaque phylogenetic research has shown that of the macaque species featured in the current study, M. silenus and M. nemestrina are basal to the remaining species [15]. A herpes B multi-isolate analysis previously showed that herpes B strains isolated from M. silenus and M. nemestrina were distant from the remaining macaque derived sequences according to percent coding identity [12].
For several decades, the classic definition of species originating from Ernst Mayr has been "species are groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups" [16,17]. This definition is problematic in virology as viruses undergo recombination [18][19][20][21][22][23][24][25][26], but they do not interbreed per se, so an alternative definition is required. The definition of species has not been static, with several alternative species concepts proposed based on biological, ecological, evolutionary, cohesion, phylogenetic, phenetic, and genotypic cluster properties, many of which have further subdivisions [27]. Related to challenges regarding species concepts, are cryptic species (non-viral) which have been described since the early eighteenth century [28,29]. Cryptic species appear identical based on morphology but are on different evolutionary paths [29]. The definition of cryptic species lacks clarity, however, a recently proposed conceptual framework for identifying cryptic species involves "statistically separable and divergent genotypic clusters" [29]. To address these challenges several methods of species delimitation have been used in organisms ranging from bacteria to eukaryotes such as arbitrary distance thresholds, in silico DNA-DNA hybridization (isDDH) and generalized mixed Yule coalescent (GMYC) [30][31][32][33]. Previous phylogenetic studies of porcine circovirus type 2 (PCV2), H5N1 influenza, feline herpesvirus 1 (FHV-1), and the varicellovirus genus have used genomic nucleotide distance to establish intraspecies clade cutoffs [34][35][36][37]. The goal of the current study was to use this genomic distance cutoff approach to determine if the herpes B strains isolated from M. silenus and M.

Old world monkey simplex virus phylogeny
To investigate if the pig and lion-tailed macaque simplex viruses warranted separate species status, the genomes of the available Old World monkey (OWM) derived simplex viruses were downloaded from Genbank ( Table 1). The available PaHV-2 strains were included in the analysis in order set an overall species cutoff for the OWM simplex viruses. The viral genomes were first aligned, and then the terminal repeat segments were deleted from the genomic multiple sequence alignment (MSA). The optimal nucleotide substitution model for the dataset was also calculated. This MSA alignment was used to generate a phylogenetic network which illustrates phylogenetic dissonance within the dataset (Fig. 1a). The phylogenetic network in Fig. 1a shows a "genetic continuum" with the core herpes B strains at one end, the pig and lion-tailed macaque derived strains located approximately in the middle, and the baboon viruses at the opposite end of the continuum. Additionally, the herpes B strain E90-136, isolated from a cyno macaque was separated from the core herpes B strains. A maximum likelihood (ML) tree was also generated to establish phylogenetic robustness, and the subsequent tree produced highly similar results to phylogenetic network (Fig. 1b). The OWM simplex virus phylogenetic network and ML tree ( Fig. 1a and b) show similar phylogenetic tree topology to the Old World monkey hosts (Fig. 1c).

Establishing species level cutoffs
Genomic nucleotide distance-based cutoff values have been used in the past in an effort to define viral intraspecies clades empirically [34][35][36][37]. In the current study we applied this distance-based method to define species level cutoffs. To begin to establish species level cutoffs, the maximum composite likelihood (MCL) pairwise distances between the 28 OWM viruses was calculated, the frequencies plotted, and a kernel density graph was overlaid (Fig. 2a). A genomic distance cutoff for establishing species status was derived by marking the lowest point of the kernel density plot (8.94%) and is denoted by the vertical dashed line in Fig. 2a. Thus, for the current data set, genomic distances over 8.94% merit species status, and under 8.94% do not. Using this genomic nucleotidebased distance cutoff approach, the pig and lion-tailed macaque simplex viruses merit separate, individual species status, as the distances between each other was 10.1%. The distance of the pig and lion-tailed macaques from the core herpes B strains was approximately 14% (Fig. 2b), suggesting they are separate species. Using this method, SA8 and PaHV-2 retained species status, however the outlying core herpes B isolate E90-136 did not merit species status (6.1% distance; Fig. 2b).

Core herpes B clade
The core herpes B strains isolated from rhesus, bonnet, and Japanese macaques were next examined to establish intraspecies genomic distance-based clade cutoff. Similar to the method described above, MSAs comprising the 15 core herpes B strains identified in Fig. 1a and b were generated with and without an outgroup (M. nemestrina isolate KQ). Next, a phylogenetic network and maximum likelihood tree were constructed ( Fig. 3a and b) based on the alignment with an outgroup. The tree topology patterns between the two phylogenetic methods were nearly identical, with two basic groupings, aside from an outlier strain (9400371). Next, pairwise distances between the core herpes B strains were calculated using the core herpes B MSA without an outgroup, and the frequencies were plotted (Fig. 3c). The genomic distance clade cutoff derived from the kernel density trough was 0.2031% (Fig. 3c). The distance between groups 1 and 2 was 0.7689% (Fig. 3d), which is above the distance cutoff validating their status as clades. The distance between strain 9400371 and clades 1 and 2 was 0.07246 and 0.05295% respectively, therefore because these values are above the 0.02031% cutoff value, strain 9400371 may warrant consideration as a single member of a third clade. Host species differs between the Genbank annotation and the corresponding publication [12] c Strain was originally isolated from C. neglectus, however subsequent work showed the natural reservoir is M. silenus [41]  PaHV-2 clade structure The phylogenetic structure of the seven available PaHV-2 genomic sequences was examined examined. Both the phylogenetic network and maximum likelihood tree recovered three groupings ( Fig. 4a and b). The clade cutoffs were performed in the same manner as described above, with the cutoff value calculated at 1.9611% distance (Fig. 4c).
The distances between groups 1, 2 and 3 were above the cutoff (Fig. 4d), thus validating their clade status.

Discussion
In the current study we utilized a genomic nucleotide distance-based method previously used for identifying phylogenetic clades and applied it to detect viral species.
The results suggest that herpes simplex viruses isolated from lion and pig-tailed macaques should be designated as separate species. To our knowledge this is the first time this technique was been applied to virus species and may be useful in detecting cryptic viral species. Fig. 2 Establishing viral species cutoff value. Pairwise distances in the Old World monkey virus alignment were calculated using Mega 7 [46], and the frequencies plotted using the R package. A kernel density plot was also generated and combined with the distance frequencies (a). A distance cutoff value was established by determining the trough of the kernel plot, which is depicted by a vertical dotted line (8.94%). Mega 7 was used to calculate between group distances which is shown in Figure b (See figure on previous page.) Fig. 1 Phylogenetic analysis of Old World monkey (OWM) derived simplex viruses. OWM viral genomic sequences (Table 1) were aligned with MAFFT ver. 7.394 and the optimal substitution model was calculated by IQ-Tree [42,43]. a Phylogenetic network generated from the alignment using Splitstree ver. 4.14 and the HKY + G + I substitution model (gaps deleted; p-inv = 0.469; gamma = 1.138) [44] was used. b Maximum Likelihood tree was generated from an alignment using HSV-1 as an outgroup using RAxMLGUI (GTRCATI; ver 1.3) [45]. Figure c shows a macaque monkey phylogenetic tree based on data presented by Li et al. [15] Host-virus co-speciation Herpesviruses have been shown to cospeciate with their hosts [47], however they can cross species barriers [48], especially in captivity [38,39,41,[49][50][51][52][53]. These captive transmissions, especially between macaque species can complicate phylogenetic analysis. In particular, crossspecies transmission appears to be fairly common among the core herpes B strains, and has been discussed previously in depth by Eberle et al. [12]. In some of the herpes B strains, the original source of the virus appears to be unclear. For instance, the cynomolgus macaque derived strain E90-136 is more distant and phylogenetically separated from the core herpes B strains (Fig. 1), however it was not sufficiently distant (Fig. 2) to be considered a separate species. Interestingly, strain E90-136 was isolated from a cyno macaque which died due to a disseminated infection caused by the virus [54]. Herpes B strains are generally asymptomatic within the natural host, which may suggest that cyno macques are not the natural reservoir for this particular viral strain. For other OWM strains, interspecies spread is well documented.
The isolate 8100812 was originally isolated from a DeBrazza monkey, however restriction digest patterns showed that the lion-tailed macaque was the natural host [41]. Phylogenetically, this appears appropriate as strain 8100812 forms a node with the two pig-tailed macaque isolates ( Fig. 1a and b), and importantly matches phylogenetic profile of the macaque species themselves (Fig. 1c). The correlation between lion and pig-tailed viruses and macaque phylogeny strongly suggests hostvirus co-speciation. Additionally, while natural crossspecies viral transmissions between animals does occur [48,[55][56][57], natural species viral transmissions between the animals and viruses in this study are fairly unlikely given the natural host ranges of the monkeys (Fig. 5).
The reduced likelihood of natural cross species transmission is important as it increases the probability of hostvirus co-evolution. Further, for example while lion-tailed and bonnet macaques ranges overlap, different living strategies (frugivorous and arboreal vs generalist in human dominated environments respectively) [58,59] between these animals make cross transmission unlikely.

Viral species concept
Standard definitions of what constitutes a biological species, such as a reproductively isolated population [16], are insufficient for viruses as they replicate, but do not reproduce like other organisms. Originally, viruses were simply classified according to the host that was infected, i.e. bacterial, plant or animal [60]. It wasn't until 1950 that official principles of animal virus classification were established, with categories such as morphology, chemical composition, method of transmission, tropism and symptomatology [60]. In 1963 the International Committee on Nomenclature of Viruses (ICNV) was established and in 1966 the body proposed a taxonomic framework and classification rules which included class, order, family. This organization is now known as the International Committee for Taxonomy of Viruses (ICTV) [60,61]. In 1990 the ICTV established an official definition of viral species which was stated as "a virus species is a polythetic class of viruses that constitutes a replicating lineage and occupies a particular ecological niche" [62], and has since evolved to state "a monophyletic group of viruses whose properties can be distinguished from those of other species by multiple criteria … .not limited to natural and experimental host range, . While this statement recommends distinguishing properties for determining species, the process is still ambiguous. We chose to focus our efforts on genomic distance in order to apply a quantitative measure to delimit viral species. Several species delimitation methods have been used in bacteria and eukaryotes. One of the most common and recent methods for species delimitation in bacteria and eukaryotes is generalized mixed Yule coalescent, where branching patterns of a single tree transition from Yule process inter-species branching to coalescent process intra-species branching [33]. Single loci can be used for this method, however more recently multiple genes and morphological characters can be used [64]. Previously, a distance method based on gene homology and sharing was used to reevaluate viral family classifications [65]. A relatively simple genomic distance cutoff method has been used to validate viral clades [34][35][36][37] and was applied to delimit species in the current study. While the kernel density plot combined with genetic distance cutoff method described here is simplistic compared to the computation heavy generalized mixed Yule coalescent method, whole genomes and therefore more phylogenetic signal is available for analysis. We did not compare the various species delimiting methods to the genetic distance cutoff method as this was beyond the focus of the study. A caveat with the distance cutoff value used in the current study is that the cutoff value is not universal, but dataset dependent. A potential issue with using the distance cutoff method to establish species boundaries is that as the genomes of additional viruses are sequenced, the species cutoff value could potentially shift, resulting in species cutoff values that could vary over time. A general complication of the method used in the current study and in other genetic data delimitation techniques is that the methods may be delimiting populations, and not necessarily species [66]. We cannot eliminate this possibility in our analysis however this is unlikely given the large distance values between species in the dataset.
In our study to determine if the lion and pig-tailed derived simplex viruses were species separate from herpes B, we included all sequenced Old World monkey strains in an effort not to bias the results and establish a general cutoff for the Old World monkey group. The results of our study showed the genome-based genetic distance between lion/pig-tailed macaque derived viruses and the core herpes B strains were both approximately 14%, which was actually greater than the distance observed (~10%) between SA8 and PaHV-2 (Fig. 2b), previously established viral species. The recovery of SA8 and PaHV-2 as separate species helps to validate the method. Both of these values were well above the species cutoff value (8.94%; Fig. 2b). The genetic distance data, and the data supporting co-speciation of the lion and pig-tailed macaque viruses reinforces the idea that these should be designated as separate, individual species from herpes B, and each other.

Cryptic viral species
The term cryptic species is related to similar concepts such as sibling species, species complex, and superspecies, with the definitions between these concepts often blurred. Cryptic species are generally defined as species which appear virtually identical phenotypically, but belong to different taxa, and were thus "hidden". It should be noted that it is not unusual for non-viral cryptic species to have some morphological differences in terms of color, size, and markings [67,68]. Cryptic species were originally described three centuries ago [28,29], and with modern molecular techniques have been increasingly identified across multiple organisms [69][70][71][72][73]. To our knowledge, the concept of cryptic species has not been applied to viruses, however species complex occasionally has [74,75]. From the phylogenetic network of the Old World monkey simplex viruses (Fig. 1a), these viruses could be described as a series of species complexes (i.e. a group closely related viruses that are difficult to separate), one comprising the macaque viruses and a second encompassing the baboon simplex viruses. The genetic distance cutoff method may be useful in establishing species boundaries in these complexes, as the method confirmed species status for the baboon derived PaHV-2 and SA8. Importantly, the method identified lion and pig-tailed simplex viruses as separate species (Fig. 2), defining these viruses essentially cryptic species. The genetic distance cutoff method provides a quantitative threshold to determine species status and could be another tool for establishing species status among viral cryptic species complexes.

Challenges and issues
There are multiple challenges in defining species, for example recently, even in defined species, fertile hybrids among plants, birds, fish, and even mammals are not uncommon [76][77][78][79], suggesting reproductive barriers may not always separate species. This may call into question as to what constitutes a species. As previously stated, viruses do not reproduce per se, however they do recombine, and herpesviruses have been shown to be highly recombinogenic [26,80]. Several recent studies have found natural interspecies recombinants between HSV-1 and HSV-2 [18,20], although they share approximately 70% sequence similarity [81]. While natural recombinants between OWM viruses, which have lower genetic distances than HSV-1 and 2 have not been reported, it seems reasonable to assume it is possible. It is therefore unlikely that the ability to recombine in a host would be a factor in defining species in primate herpes simplex viruses.
Species defining methods related to virion morphology, serology, as well as gene homology and function are problematic in primate herpes simplex viruses as these characteristics are highly similar, with one of the only differences being the apparent lack of γ 1 34.5 in the Old World monkey simplex viruses [40]. Virus morphology in particular is difficult to distinguish between simplex viruses, as an older study found that virion morphology is nearly the same between HSV and herpes B, however there may some minor differences in morphogenesis [82]. Further, to our knowledge, differences in virus morphology between in the various herpes B strains has not been investigated. From the studies performed so far, the herpes B strains examined here appear to be nearly identical in nearly every respect, including the ability to infect multiple monkey species. Future studies may be able to detect morphological difference in the viral virions or at the protein structure level. Pathogenicity is one determinative method in which there appears to be a difference between the pig and lion-tailed macaque viruses and the remaining herpes B strains. Studies performed by Eberle et al. examining the lethal dose (LD 50 ) of the sequenced herpes B strains in mice showed that the pig and pion-tailed macaque simplex viruses had different lethality phenotypes compared to the remaining herpes B strains [12]. Importantly, the LD 50 values for the pig and lion-tailed viruses were > 10 7 PFU, while the average for the remaining herpes B strains was approximately 10 4 PFU. In addition to the species delimiting method described here, pathogenicity differences support separate species designations for the pig and lion-tailed macaque simplex viruses.

Implications of separate species designation
There are several related scientific threads derived from giving species designations to the pig and lion-tailed macaque species. The first is acknowledging that these viruses are on separate evolutionary paths from each other, and from other herpes B strains. This may result in closer examination of possible phenotypic differences between herpes B strains, and among other groups of closely related viruses. Further, possible future transcriptomic or proteomic data conclusions from core herpes B strains for example will not be assumed for the pig and lion-tailed herpes simplex viruses and would require separate experimentation.

Herpes B core phylogeny
Phylogenetic analysis of the remaining herpes B strains showed a core group, designated core herpes B, containing two main clades ( Fig. 3a and b). Core herpes B clade 1 contained strains with longer branch lengths compared to clade 2, with strains derived from M. mulatta, M. radiata (strain M12-O), and M. fuscata (strain 7709642). It is unclear why the branch lengths are longer in clade 1, however the isolation locations and host species are variable [12] and may contribute to the greater genetic distances. The strains comprising clade 2 were all isolated from M. mulatta, and from two locations. It is possible that clade two represents a rhesus only strain grouping. Herpes B core strain 9400371 may represent a sole member of a third clade, with genetic distances from clades one and two that were above the cutoff threshold (Fig. 3d). The original host for this virus is unclear as the Genbank annotation (KY628983.1) states that it is rhesus macaque, however the corresponding publication [12] states it is from a cynomolgus macaque. If strain 9400371, is derived from a cynomolgus macaque, future research will help determine if is the first member of a cynomolgus macaque clade.

Conclusion
In conclusion, genome-based phylogenetic and genetic distance cutoff techniques were applied to the available Old World monkey simplex virus genome sequences. The results showed that lion and pig-tailed macaque simplex viruses were approximately 14% distant from core herpes B strains, which was more distant than between PaHV-2 strains and SA8, previously established viral species. The genomic distance cutoff method recovered PaHV-2 and SA8 as separate species, and lion and pig-tailed macaque simplex viruses as separate species, effectively identifying these macaque viruses as cryptic species. Based on the genetic distance analysis, the fact that the OWM hosts are designated as separate species, and herpes viruses co-evolve with their hosts, we propose establishing lion and pig-tailed macaque simplex viruses as separate species. This may be the first identification of cryptic viral species.

Genome sequences and genomic sequence alignment
The genomic sequences of the viral strains used in the current study were downloaded from NCBI and can be found in Table 1. Several genomic multiple sequence alignments (MSAs) were generated with MAFFT (Linux ver. 7.394) using the FFT-NS-1 strategy option [42,83]. MSAs with and without an outgroup were generated for herpes B, PaHV-2, and all available Old World monkey (OWM) genomic sequences. The generated MSAs were manually inspected, and locally aligned for optimization using ClustalW within the MEGA 7 package [46,84]. The alignments generated for this study can be downloaded at https://brandt.ophth.wisc.edu/data-sets/.

Nucleotide substitution model optimization and phylogeny
Prior to phylogenetic network construction, the optimal substitution model for each MSA, and subsequent optimal model parameters were calculated using IQ-TREE version 1.6.3 [43]. Phylogenetic networks for each of the alignments were generated using Splitstree 4 [44] using the optimal substitution model and parameters calculated by IQ-TREE. Maximum likelihood trees were generated using RAxMLGUI (ver. 1.3) using the GTRCATI option with 1000 bootstrap replicates [45].

Genomic nucleotide distance and clade cutoff calculations
To determine clade cutoff parameters, pairwise distances were first calculated using the genomic MSAs without outgroups. The genomic MSAs without outgroups were used in order to minimize alignment gaps usually created by including an outgroup sequence. A statistical description of establishing clades using genomic nucleotide distance has been previously described [35]. Briefly a variance analysis framework was used, where the F statistic.
was calculated for each dataset and plotted as a curve. Maximum composite likelihood (MCL) pairwise distances were calculated with MEGA 7 rather than uncorrected p-distances as have been used previously [34][35][36][37]. Species distance cutoffs were established by using the Old World monkey MSA, followed by graphing the frequency of the pairwise MCL distances using the R software package (ver. 3.4.4) [85]. A kernel density plot was also generated in R to assist in determining the clade cutoff value by finding the trough between the low and high MCL distance populations. Intraspecies clade cutoffs were established in a similar manner, using the core herpes B, and herpes papio MSAs (minus outgroup) respectively.