In search of the Aplysia immunome: an in silico study

The immune repertoires of mollusks beyond commercially important organisms such as the pacific oyster Crassostrea gigas or vectors for human pathogens like the bloodfluke planorb Biomphalaria glabrata are understudied. Despite being an important model for neural aging and the role of inflammation in neuropathic pain, the immune repertoire of Aplysia californica is poorly understood. Recent discovery of a neurotropic nidovirus in Aplysia has highlighted the need for a better understanding of the Aplysia immunome. To address this gap in the literature, the Aplysia reference genome was mined using InterProScan and OrthoFinder for putative immune genes. The Aplysia genome encodes orthologs of all critical components of the classical Toll-like receptor (TLR) signaling pathway. The presence of many more TLRs and TLR associated adapters than known from vertebrates suggest yet uncharacterized, novel TLR associated signaling pathways. Aplysia also retains many nucleotide receptors and antiviral effectors known to play a key role in viral defense in vertebrates. However, the absence of key antiviral signaling adapters MAVS and STING in the Aplysia genome suggests divergence from vertebrates and bivalves in these pathways. The resulting immune gene set of this in silico study provides a basis for interpretation of future immune studies in this important model organism.


Background
Making inferences of immune function from genomics and transcriptomics data can be challenging with nonmammalian or non-ecdysozoan models due to a paucity of functional research and gene characterization outside these two groups. Immune research in mollusks has historically centered on commercially important species such as the Pacific oyster Crassostrea gigas [1][2][3], or on models of parasite transmission such as the bloodfluke planorb Biomphalaria glabrata [4][5][6]. Recent work has ventured to describe immune function in other mollusks, mainly in other commercially important bivalves [7][8][9], but also in a few important molluscan models such as the pond snail Lymnaea stagnalis [10] and the common octopus Octopus vulgaris [11]. These organisms often lack reference genomes or transcriptomes, and researchers must rely on de novo assembly and homology-based transcript annotation. Many other mollusk models for which publicly available reference sequences are available have not been comprehensively surveyed for immune related genes, such as the marine model Aplysia californica.
Despite being a premier neural model for more than half a century and boasting a reference genome and transcriptome, the immune repertoire of Aplysia is still poorly understood [12][13][14]. Historically, the extent of interest in the Aplysia immune response was a model for the role of inflammation and immune response in neuropathic pain [15][16][17][18]. Recently, however, a novel virus in the order Nidovirales was discovered in the marine model Aplysia californica [19]. This virus appears to have a broad tropism, with particularly high viral expression levels in the nervous system [20]. This discovery presents an opportunity to further develop Aplysia as a new model for neurotropic viral infection, particularly in the context of neural aging. However, the dearth of functional annotation and lack of even a putative immune gene set makes future inference into the immune response to this novel virus difficult. While some putative immune associated genes have been identified in Aplysia previously in the context of the neuronal transcriptome [14] or aging associated neuro-inflammation [21], these studies did not perform a comprehensive survey of the Aplysia immunome. To address this gap in the Aplysia literature, an in silico survey of Aplysia reference genome was conducted to identify and describe a putative immunome to aid in future immunological studies in this well-established model.

Results
A combination of InterProScan (IPS) protein domain annotation and OrthoFinder gene homology search hierarchical clustering was used to identify genes in the Aplysia genome with potential for immune function (see Methods). Candidate immune genes were first identified by conserved protein domains typical of immune genes in other taxa and then vetted for ortholoy to said immune genes with OrthoFinder. This was particularly necessary for genes which cannot be easily identified by diagnostic conserved domains alone, such as inhibitors of NFkB. Of the 26,658 predicted proteins in the AplCal3.0 gene feature format file (gff ) available from the NCBI RefSeq FTP site, 26,209 (98.3%) were annotated with at least one analyses incorporated into IPS, and 22,511 (84.4%) proteins were annotated with at least one InterPro accession.
From the 5 species used for OrthoFinder analysis (Human, Drosophila melanogaster, Crassostrea gigas, Biomphalaria glabrata, and Aplysia californica), proteins were sorted into 19,353 orthogroups. Of these orthogroups, 12,594 (65.1%) contain genes from at least 2 species while 4606 (23.8%) contained genes from across all 5 species surveyed. Comparing among groups, 1,723 orthogroups were common to mollusks and humans but not the ecdysozoan Drosophila. A further 1501 orthogroups were common only to the mollusks surveyed and 1918 were common only to the two gastropods. A total of 849 orthogroups were unique to Aplysia.
By mining the IPS results for immune domain signatures (see Methods) and vetting the resultant set with OrthoFinder results, 2241 Aplysia proteins from 1669 genes were identified as having potential immune function. These putative immune genes are described in detail below.
IPS annotation of the Aplysia californica RefSeq predicted protein models can be found in supplemental file SFile1. Full OrthFinder results can be found in supplemental file SFile2. The fully compiled immunome can be found in supplemental file SFile3.

Pattern recognition receptors (PRRs) Glycan receptors
Several PRRs are adapted to detect the glycans of various pathogens, including peptidogylcans, beta 1,3-glucan from fungi, and lipoglycans like bacterial liposaccharide (LPS) [22][23][24]. Many such PRRs were identified in Aplysia, including: five proteins in the CD36-like family, three gram-negative bacteria binding-proteins (LOC101850763, LOC101851233, LOC101861522), and two peptidoglycan recognition proteins (LOC101855568, LOC101858995). Orthologs of Beta 1,3-glucan recognition proteins (BGRP) which are antimicrobial PRRs known from insects, were not identified, however. In vertebrates, detection of bacterial liposaccharides (LPS) is facilitated by an extracellular cascade of the LPS binding proteins LBP, CD14, and MD2. LPB binds LPS and facilitates its binding to CD14, which then delivers LPS to the MD-TLR4 receptor complex [25]. A single LBP protein, LOC101845376, as well as 15 proteins with MD-2-related lipid-recognition domains were identified, however potential orthologs of CD14 were not (Table 1).
A unique subclass of lectins that pairs an immunoglobulin superfamily (IgSf ) domain with a carbohydrate recognition domain, called Variable Immunoglobulin and Lectin domain containing molecules (VIgL), have been shown to play an important role in innate immunity in gastropods [4]. The first identified VIgL in Biomphalaria was dubbed a fibrinogen related protein (FREP). Later, genes with similar domain architecture but galectin or C-type lectin carbohydrate recognitions domains were dubbed GREP and CREP respectively.
Based on conserved domains identified in IPS, 2 possible CREP genes (LOC101860578 and LOC101862063) were identified. It was not possible to find any FREPs based on conserved domains, although two Aplysia FREPs have been reported previously. Because IgSf domains of mollusk VIgLs are difficult to detect for automated software such as those used by InterProScan [10], BLAST homology searches to known gastropod VIgLs from Biomphalaria were used to identify these subsets of FBGDC, CTLDC, and galectin domain containing genes. This approach recovered the two previously identified FREPs [27] and aforementioned CREPs. Multiple sequence alignment of the two putative CREPs with previously described Aplysia FREPs via Clustal Omega identified potentially distantly related IgSf regions ( Fig. 1) [28].

Broad Spectrum PRR families
While several families of PRRs are dedicated to a specific class of PAMPs, the Toll-like receptor family of PRRs can detect a diverse range of PAMPs. For example, in mammals TLRs can detect bacterial lipopeptides (TLR1,2,3), glycans (TLR4), nucleic acids (3,7,8,9), and bacterial proteins (TLR2,5) [39]. In Aplysia 39 genes with Toll/interleukin-1 receptor homology (TIR) domain and Leucine Rich Repeat (LRR) domains characteristic of TLRs were identified. OrthoFinder divided these putative TLRs in 17 orthogroups. Of these, four orthogroups were unique to Aplysia (OG0001270, OG0009332, OG0016038, and OG0009441). Of the 17, only three were not unique to gastropods, with OG0011109 and OG0012371 containing a protein from C. gigas, and OG0003873 containing proteins from C. gigas and Drosophila.
Neighbor joining tree with 1000 boostraps of TIR domain region of these putative TLRs broadly supports OrthoFinder's division of TLR groups (Fig. 4). Only TIR domains from LOC101851910 (XP_005112480.1) and LOC101863390 (XP_012942443.2) clustered away from their assigned orthogroups (OG0001276 and OG0007258 respectively). However, these divergent TIR assignments corresponded to weekly supported branches with low bootstrap values.
Interestingly, LOC101863390 contains 2 TIR domains. In addition, LOC101863390 also contains 3 transmembrane domains, and two sets of LRRs that give the appearance of an end-to-end fusion of two TLRs.
Phylogenetic clustering of TLR sequences from Aplysia californica, Biomphalaria glabrata, Crassosteraea gigas, Nematostella, Drosophila melanogaster, Strongylocentrotus purpuratus, and humans revealed lineage specific trends in TLR diversification (Fig. 5). Molluscan TLRs clustered together with other protostome TLRs and exhibited a unique molluscan radiation within this group. This protostome radiation clustered apart from a massive radiation in S. purpuratus. The number of publicly available reference TLR protein sequences for C. gigas was substantially less than the manually annotated number previously reported by Zhang et al. (18 vs. 83) [3], making direct comparison difficult. However, these results do suggest a unique molluscan TLR radiation. Furthermore, TLRs from then two gastropods primarily originate from this uniquely molluscan radiation or its sister branch, unlike available C. gigas TLRs.
While most of the Aplysia TLRs contain only TIR and LRR domains, several contain additional domains. Two of these putative TLRs (LOC101850809) and LOC101845154) contain detectable Cysteine-rich c-terminal flanking region in proximity of their LRR  EF-hand domain (Fig. 6). While these two genes are assigned to different orthogorups, tree analysis demonstrates they form a clade. Only two TLRs were identified as having c-terminal flanking regions (LOC101850809 and LOC101845154). Like TLRs, Nucleotide-binding oligomerization domain-like receptors (NOD-like receptors, NLRs) are a diverse family of PRRs that detect a similarly broad range of PAMPs in vertebrates [40]. In Aplysia, only 4 proteins with the characteristic NACHT domain of NLRs were identified, only one of which contained both a NACHT domain and LRR domains typical of NLRs. This NLR-like protein also contains a Death-like domain and a DZIP3like HEPN domain.
Like TLRs and NLRs, Receptor Cysteine-Rich (SRCR) Domains have been implicated in innate immune function across taxa against a broad suite of pathogens [41]. SRCR PRRs have been described in bivalves where they are found on the surface of haemocytes and respond to PAMPS [42]. In Aplysia, 11 SRCR genes were identified (Table 4).

Adapter proteins
PRRs in several signaling cascades associate with adaptor proteins to facilitate transduction of pattern recognition into activation of kinases and ultimately transcriptional changes to facilitate an immune response, most notably the TLRs. Adaptor proteins of TLRs complex with their target receptor via their TIR domains. The most conserved TLR adaptor among animals is Myeloid differentiation primary response 88 (MyD88), which complexes with TLRs via its TIR domain and with Interleukin-1 receptor-associated kinases (IRAKs) via its Death-like domain (DLD) in a complex called a Myddosome [43]. In Aplysia 5 MyD88-like proteins were identified. One of these contained only the TIR and DLD typical of vertebrate MyD88 (LOC101855709), while three contained an additional Armadillo-type fold (LOC101850562, LOC101850562, and LOC101858605), and the last contained the same ROCO domain as described earlier in some AcTLRs (LOC101861518). OrthoFinder homology search also identified LOC101847182 as a MyD88like protein based on TIR domain homology, but this gene lacked any death-like domains. In addition to MyD88, 4 other TIR domain containing (TIRDC) adaptors have been identified: MyD88-adaptor-like (Mal), TICAM1 and 2 (also known as TRIF and TRAM), and sterile-alpha and Armadillo motif protein (SARM) [44]. Although orthologs of MAL, TRIF, or TRAM could not be identified, three Aplysia SARMs (LOC101849069, LOC101850713, and LOC106011731) were identified. In addition, a diverse group of TIRDC proteins with as of yet unknown function were also identified. Of these, 10 contained Armadillo-type folds, two contained EGF-like domains, one contained COR domain, and the remaining 17 contained only a TIR domain. Potential orthologs of other, non-TIRDC TLR adaptor proteins Toll-Interacting Protein (TOLLIP, LOC101845996) and evolutionarily conserved signaling intermediate in Toll pathway (ECSIT, LOC101857763) were also identified.
One of the several downstream kinase complexes associated with PRR activation is the TAK1-TAB complex. The TAB proteins TAB1-3 facilitate the association of the kinase TAK1 with ubiquitin chain scaffolds to colocalize with downstream targets [45]. A TAB1 ortholog was difficult to identify based on protein domain alone, as the only InterPro domain in vertebrate TAB1 is the PPM-type phosphatase domain (IPR001932) which is found in 14 Aplysia genes. However, OrthoFinder homology search identified a gene with two isoforms (LOC101854409) listed as TAB1 on the NCBI. The other TAK1 adaptors TAB2 and 3 were similarly difficult to identify based on IPS alone. It was not possible to identify vertebrate-like TAB2/3 proteins that contain both a CUE domain and RAnBP2 zinc finger. OrthoFinder homology search identified two isoforms of the same gene (LOC101854409), which is identified as TAB2 on the NCBI. Notably, this gene lacks any CUE domain but does contain a RAnBP2 zinc finger needed for ubiquitin chain binding and subsequent signaling [46]. In fact, 19 RanBP2-type zinc finger domain containing genes were identified (Table 5).

E3 ubiquitin ligases
Ubiquitylation of various adaptor proteins and kinase complexes is a critical component of PRR signaling cascades downstream of many PRRs, notably TLRs, NLRs, and RLRs [47]. Multi-species Neighbor Joining Tree of Toll-like Receptor proteins. Toll-like receptor genes were extracted a custom InterProScan annotation of the Aplysia californcia RefSeq protein modles (AplCal, GCF_000002075.1) and publicly available InterPro annotations of proteins from the UniprotKB data base for Biomphalaria glabrata (Bglab, UP000076420_IPS), Crassosteraea gigas (Cgig, UP000005408), Nematostella vectensis (Nvect, UP000001593), Drosophila melanogaster (Dromel, UP000000803), Strongylocentrotus purpuratus (Spurp, UP000007110), and human (Hs, UP000005640) based on presence of TIR and LRR domains. Protein sequences were aligned and used to generate a neighbor joining tree with the JTT model. Branches and leaf nodes are colored based on the species from which the sequence originated. Arcs describe major groupings of TLRs in the tree: a S. purpuratus dominated branch, a set of TLRs containing mostly human TLRs (dubbed vertebrate-like), a major branch that contains sequences only from the protostomes assessed (as well as lone TLR from Nematostella), and a branch containing only molluscan representatives. This tree corroborates earlier findings that massive TLR expansions in S. purpuratus and mollusks represent distinct, lineage specific radiations of TLR genes   Tumor necrosis factor receptor (TNFR)-associated factor (TRAF) proteins play a key role in activating kinases during PRR signaling events [48]. In Aplysia, 6 TRAF-like proteins were identified: two resembling TRAF2 (LOC101856795 and LOC101844983), one resembling TRAF3 (LOC101846107), one resembling TRAF4 (LOC101859360, and two resembling TRAF6 (LOC101856862 and LOC101859878).
The LUBAC complex, comprised of proteins HOIP, HOIL1, and Sharpin, also plays a key role in RLR and TLR signaling, particularly in the activation of the IkB complex in NFkB signaling [49]. In Aplysia, putative orthologs of LUBAC complex components HOIP (LOC101849805) and HOIL1 (LOC101863809) were identified. Although an individual gene representing the third LUBAC component, Sharpin, was not identified, one of the 4 AcHOIL1 isoforms (XP_035825204.1) resembled Sharpin more then HOIL1 in conserved domain composition (i.e. lacking a TRIAD domain) and thus may represent a functional analog of Sharpin.
Other E3 ligases function to modulate or inhibit proinflammatory immune signaling. Pellino proteins modulates the activity of several immune associated kinases such as RIPKs and IRAKs [51]. In Aplysia only one Pellino gene (LOC101863733) was identified. Similarly, the diverse family of tripartite-motif contain proteins (TRIMs) regulate immune and proinflammatory signaling cascade via Ubiquitylation of various kinases [52]. TRIMs generally contain a b-box domain and a RING type zinc finger domain [53]. Twenty-two such genes in Aplysia were identified.
While the baculoviral Inhibitor of Apoptosis (IAP) repeat (BIR) family of proteins are known to modulate apoptosis, BIRC2/3, also known as cIAP1/2, are also known to modulate immune associated NFkB signaling [54]. The Aplysia genome encodes 22 BIR domain containing proteins, 11 of which also contained RING zinc finger domains that represent potential BIRC E3 ligases. Three of these also contained Death-like domains typical of cIAPs (LOC101851721, LOC101860034, and LOC101852760).
Signaling downstream of RLRs also depends on ubiquitylation via TRIMs and other E3 ligases such as MEX3C and RIPLET [55]. In Aplysia, a single potential MEX3C ortholog (LOC101854000) and two RIPLET-like genes  (Table 6).

Kinases
Interleukin-1 Receptor Associated Kinases (IRAKs) are serine/threonine kinases that complex with MyD88 via their Death-like domains and phosphorylate downstream adaptors in TIR signaling [56]. Two IRAK genes were identified, one (LOC101845636) is listed as IRAK4 on the NCBI, and the other (LOC101845840) assigned to the Pelle-family of IRAKs by PANTHER (PTHR24419:SF33).
In vertebrates, IRAK activation leads to activation of TRAF6, which ubiquitylates and activates the TAK1-TAB complex which contains the mitogen activate protein kinase (MAPK) kinase kinase 7 (MAP3K7/ TAK1) [57]. In Aplysia one ortholog of TAK1 which has 10 protein isoforms (LOC101852077) was identified. TAK1 phosphorylates two subsequent kinase chains: the IKK complex and mitogen activated protein kinases (MAPKs), that lead to translocation of transcription factors NFkB and AP-1 to the nucleus [45].
Parallel to IKKs, TAK1 and other MAP3Ks activate MAP2Ks (MKKs or MEKs), which activate MAPKs which in turn activate transcription factors, notably those in the AP-1 family [61]. In Aplysia several putative orthologs of several immune associated kinases in this three-tiered cascade were identified.
In addition to TAK1, one MAP3K in the Apoptosis signal-regulating kinase (ASK) family (LOC101862124) was identified, a MAP3K10 (LOC101864517), and a MAP3K12/13 (LOC101864115). Based on InterProScan annotation, it was not possible to identify orthologs of immune relevant MAP3Ks MAP3K14/NIK or MAP3K8/ Tpl2. Annotation by PANTHER identified an additional 17 MAP3Ks.
MAP2Ks were difficult to identify purely using IPS domain annotation. Using the CDD analysis generated by IPS, a MAPK/ERK Kinase (MEK) with two isoforms (LOC101850764) was identified. Annotation by PANTHER further identified an additional 4 MAP2Ks (LOC101845548, LOC101846645, LOC101846695, LOC101860284).
An ortholog of RIO Kinase 3 (RIOK3) which has been implicated as suppressor of RLR signaling but also as a critical adaptor for transcription factor IRF3 [62,63] was also identified.
Notably, it was not possible to identify any orthologs of receptor-interacting serine/threonine protein kinases (RIPKs) which also play an important role in immune and inflammatory signaling [64].

Transcription factors
The ultimate action of PRR signal transduction cascades is the activation and translocation to the nucleus of prosurvival/pro-inflammatory transcription factors NFkB and AP-1 and interferon (IFN) regulatory factors (IRFs) [65].
Nuclear factor kappa beta (NFkB) transcription factors are the quintessential and evolutionarily conserved immune/inflammation transcription factors [66]. Based on Rel homology domains (IPR032397 and IPR011539) a p105-like protein with two isoforms (LOC101855313) and a RelB/Dorsal-like (LOC101860399) protein were identified. A putative ortholog of Nuclear factor of activated T cells (NFAT, LOC101858233), which are a related class of immune transcription factors, was also identified. NFkB proteins are characteristically bound by inhibitor proteins (NFKBI/IkB) which are the target of IKKs. Aplysia contains putative orthologs of NFKBIA with 4 isoforms (LOC101852802) and of NFKBI Cactus (LOC101863680).
Just as NFkB is the target of the IKK branch of immune signaling, activator protein 1 (AP-1) is the downstream target of the MAPK cascade. AP-1 is a dimeric transcription factor made up of any two of a related group of proteins families known as JUN, FOS, MAF, and ATF [67]. In Aplysia, 8 AP-1 type transcription factors were identified, including one identified as ATF2-like (LOC101856191). Homology search further identified the Aplysia c-Jun gene (LOC101863323).
Parallel to NFkB and AP-1, IRFs are activated by immune signaling such as TLRs or cytosolic nucleic acid sensors and facilitate the translation of interferon cytokines and interferon stimulated genes (ISG) to counteract infection [68]. Seven IRFs were identified: 2 IRF1/2-like that lack SMAD domains (LOC101855936 and LOC101850966) and 5 SMAD domain containing IRFs. Translocation of these transcription factors is facilitated by major vault proteins (MVP) [69], 4 of which were identified in Aplysia (Table 8). Successful activation of NFkB, AP-1, and IRFs transcription factors leads to the production of cytokines that recruit another family of critical immune transcription factor: Signal Transducer and Activator of Transcription (STAT) proteins. Cytokine receptors for Interleukins and Interferons complex with Janus Kinases (JAKs) which phosphorylate STAT transcription factors to promote transcription of interferon stimulated genes (ISGs) [70]. This pathway is then autoinhibited by Suppressor of Cytokine Signaling (SOCS) proteins. In Aplysia, a single JAK (LOC101861981), a single STAT (LOC101861517), and 5 SOCS (LOC101845832) were identified.
Another major transcription factor that plays a role in immune response is the pro-proliferative Myelocytomatosis oncogene (MYC). While MYC is generally known for its role in modulation of the cell cycle and cellular proliferation, it has also been demonstrated to play a role in not only immune cell proliferation but also immunomodulation [71]. Conserved domain search failed to identify an Aplysia MYC based on conserved leucine zipper (IPR003327) and n-terminal domains (IPR002418). Homology search also failed to identify an Aplysia ortholog of MYC. However, 52 genes were identified with a myc-type basic helixloop-helix (bHLH) domain typical of transcription factors related to MYC, leaving the possibility a MYC-like transcription factor may yet be identified.

Cytokines
Successful activation of immune receptor transduction results in the secretion of pro-inflammatory cytokines [72]. Activation of pro-inflammatory cytokine receptors and associated pathways then stimulates the proliferation of immune cells in an inflammatory immune response [73]. Seven putative orthologs of the pro-inflammatory cytokine Macrophage migration inhibitory factor (MIF), a progranulin ortholog (LOC101846478), 4 putative tumor necrosis factors (TNF), and 4 interleukin-17 -like cytokines (IL-17) was identified. Three putative IL-17 receptors and 15 TNF receptor-like genes also were identified (Table 9).
TNF receptors associate with adaptor proteins via Death domains (DD) to transduce signaling downstream to the pro-inflammatory NFkB signaling pathway or to the pro-apoptotic caspase pathway. These DD containing signaling adaptors are TRADD, CRADD/RAIDD, and FADD. Putative CRADD (LOC101855728) and FADD (LOC101861195) orthologs were identified, but not TRADD.
Although not adapter proteins per se, A disintegrin and metalloproteinase (ADAM) proteins have been implicated in immune function, notably in cytokine signaling, particularly of TNF and Interleukins [74]. Twenty-five ADAMs were identified based on conserved peptidase domain (IPR001590).

Apoptosis
Apoptosis is the programmed self-destruction of cells and plays key roles in diverse biological processes such as development and the immune response. In mollusks, apoptosis is believed to play a critical role in the immune response by destroying pathogen infected cells and limiting the spread of infections without inducing inflammation [75]. Indeed, apoptosis serves as a key mechanism for slowing the spread of viral infections across taxa [76]. In addition to BIR containing apoptosis inhibitors mentioned in previous sections, several genes in the Bcl-2 family of proteins also act as important  NFkB  IPR033926, IPR011539  IPR032397, IPR011539  3  LOC101855313, LOC101858233, LOC101860399   IkB  IPR002110, IPR038753  IPR038753, IPR036770  1  LOC101852802   MVP  IPR041139, IPR021870  IPR043023, IPR036013  4  LOC101847145, LOC101848074,  LOC101861920, LOC101860982   STAT  IPR000980, IPR013800,  IPR013801, IPR013799 IPR013801, IPR001217 1 LOC101861517 modulators of apoptosis [77]. In Aplysia,6 Bcl-2 family proteins, including putative ortholog of anti-apoptotic Bcl-2 (LOC101848156), as well as pro-apoptotic proteins Bax (LOC101854270), Bak (LOC101851280), and Bok (LOC101854628) were identified. Two potential Fas Apoptosis Inhibitor Molecules (FAIM, LOC101863121 and LOC106012994), which act as apoptosis inhibitors and play a role in IFN signaling [78] were identified. Apoptosis signaling is most notably mediated through pro-inflammatory and pro-apoptotic proteases called caspases [79]. Eight caspase genes were identified. Other apoptotic regulators Death factor and mitochondrial metabolism regulator AIF (LOC100533557) [80] and IAP inhibitor and pro-apoptotic factor HTRA2 [81] was also identified (Table 10).

Anti-microbial
In mollusks, often the first line of defense against pathogens is the mucus, which contains a diverse array of antimicrobial compounds including lectins and mucins [75,82]. In addition to previously discussed lectins, the Aplysia genome also encodes 13 mucins. Among the most ancient antimicrobial proteins are those that punch holes in the cellular membranes of pathogens, called pore forming proteins (PFP) [83,84]. In Aplysia, 14 Aerolysin type, 5 ETX/MTX-2 type, and three MACPF containing MPEG-like pore forming proteins were identified. Lysozymes are similarly ubiquitous components of the innate immune system that act to lyse pathogen membranes [85]. In Aplysia three lysozymes were identified (Table 11). Other antimicrobial proteins known from bivalves such as defensins, mytilins, or mytimicins were not identified.

Anti-viral
In the case of virus infection specifically, cells mobilize a diversity of antiviral proteins that hamper virus entry, uncoating, replication, translation, assembly, and release [29]. Two dynamin family Mx-like proteins (LOC101848065 and LOC101863954), one RNA Specific Adenosine Deaminase (ADAR) with three isoforms (LOC101855902), four viperins/RSAD, 8 caveolins, 5 Gilt-like proteins, three APOBEC3-like proteins, and 10 Interferon-induced transmembrane protein/Dispanin family (IPR007593) proteins were identified. OrthoFinder homology search identified one further potential ADAR gene (LOC101863647), which contains dsRNA binding domains (IPR014720) and adenosine deaminase/ editase domains (IPR002466) but lacks a Z-binding Table 9 Putative cytokine signaling associated genes identified in Aplysia. See Table 1 for table format   Similarly to DNA damage response genes being implicated as PRRS, poly(ADP-ribose) polymerases (PARPs) which are associated with DNA damage response have also been demonstrated to have broad spectrum antiviral and pro-inflammatory effects [86,87]. In Aplysia, 17 PARP genes were identified (Table 12).

ROS and Antioxidants
A prime function of immune cells is the capacity to recognize, phagocytose, and destroy pathogens during an immune response. Phagocytosed pathogens are destroyed by the generation of reactive oxygen species (ROS). ROS also play a role in potentiating pro-inflammatory signaling cascades during immune signaling [89]. In Aplysia a diverse set of ROS generating enzymes were identified, including: 3 dual oxidases (DUOX) and 2 associated assembly factors (DUOXA), 7 nitric oxide synthases (NOS), 5 NADPH oxidases, and one peroxidasin (LOC100533347). Furthermore, 20 L-amino-acid oxidases (LAAO) were identified, among which was antimicrobial peptide Aplysianin-A (LOC100533295) [90]. Due to the cytotoxic nature of ROS, levels of reactive species are tightly regulated in the cell via antioxidant enzymes. Additionally, antioxidant enzymes have also been demonstrated to act as immunomodulators of proinflammatory cytokines [91]. In Aplysia, a robust complement of antioxidant enzymes was identified, including: 8 Cu-Zn superoxide dismutases (SOD), one Fe-Mn superoxide dismuates (MnSod, LOC101852344), two catalases (LOC101854685 and LOC101860167), and 14 peroxiredoxins. In the glutathione antioxidant pathway, 3 glutathione peroxidases, 8 glutaredoxins, a glutathione reductase (LOC101847425), and 68 glutathione S-transferases were identified (Table 14).

Heat shock proteins
Stress signaling during an immune response is known to induce the transcription of heat shock protein (HSP), which also have been suggested to have immunomodulatory function [92]. In Aplysia, many HSPs across several families were identified (Table 15), including: 20 HSP20s, 12 HSP70s, 4 HSP90s, and one heat shock factor (HSF, LOC101851429).

Melanization
In arthropods, part of the immune response involves the production of toxic quinone compounds, wound healing factors, and pathogen encapsulating melanin in a process known as melanization. Although most well described in arthropods, melanization as an immune response has also been described in several bivalves [93] and in the gastropod B. glabrata [94]. While 5 laccases and 6 tyrosinases that participate in melanization were identified in Aplysia, an ortholog of the critical Phenol Oxidase central to arthropod melanization could not be identified.

Complement system
A major component of the vertebrate immune response is the complement system, which facilitates the formation of the pathogen destroying membrane attack complex (MAC) and the recruitment of phagocytic cells [95]. The complement system is triggered after PAMP/DAMP binding to one of three lectins: Complement 1 (C1q), MBL, or fibrinogen and collagen domain containing lectins (ficolins) [96]. While neither MBL nor ficolins were identified in Aplysia, 11 C1q domain containing genes (C1qDC) were identified. In addition, two orthologs for C1q receptor Multiple epidermal growth factorlike domains protein 10 (MEGF10, LOC101848617 and LOC106013429) were identified. Activation of complement PRRs recruits a group of thioester containing proteins called complement factors.
Thioester containing proteins (TEPs) play an important role in pathogen defense across the animal kingdom. TEPs are subdivided into complement factors in the complement system and alpha 2 Macroglobulins (a2Ms) which act as protease inhibitors [96]. In Aplysia 13 TEPs based on conserved thioester domains (IPR011626) were identified. Two of these, LOC101848728 and LOC101862906, could be classified as complement factors C3 and C4-A respectively based on conserved domains. LOC101854858 and LOC101861335 were further classified as CD109-like proteins based on conserved domains. On the other hand, a factor B ortholog (Bf ) based on conserved domains could not be identified. However, unlike vertebrate Bf which contains type A von Willebrand factor, Sushi, and Trypsin domains, the C. gigas gene identified as Bf by Zhang et al. 2015 contains von Willebrand type A (IPR002035), Sushi (IPR000436), EGF (IPR000742), and Pentraxin domains (IPR001759) [3]. Based on these domains, an Aplysia Bf (LOC101854327) was identified from among the Aplysia genes identified as Pentraxins (Table 16).

Discussion
In this in silico analysis, a robust immune complement in the Aplysia genome was proposed. This includes PRRs and adaptor proteins, signal transducers, transcription factors, and subsequent cytokine signaling components [97][98][99]. Among these are NFkB transcription factors, the IKK complex, a diverse compliment of MAPKs and associated kinases, the TAB-TAK1 complex, E3 Ubiquitin ligases like Pellino, TRAFs, and the LUBAC complex, and a diverse collection of TIR domain adaptors and receptors.

Table 15
Putative heat shock proteins genes identified in Aplysia. See Table 1 for table format description   Required Domains  Genes   Name  Strict  Relaxed  count  Accessions   HSF  IPR003594, IPR000232  IPR027725  1  LOC101851429   HSP20  IPR002068  IPR008978  25  See supplemental file SFile3   HSP70  IPR018181  IPR013126  12  See supplemental file SFile3   HSP90  IPR019805, IPR020575, IPR003594  IPR001404  4 LOC101845707, LOC101859343, LOC101861407, LOC101859248  , IPR009048, IPR011626  IPR008993, IPR036595  2  LOC101848728, LOC101862906   CD109  IPR009048, IPR011625, IPR041813,  IPR013783, IPR001599, IPR002890,  IPR041555   IPR036595, IPR011626, IPR011625,  IPR041813, IPR013783, IPR001599   2  LOC101854858, LOC101861335   Bf  IPR002035, IPR000436, IPR001254  (IPR000436, IPR000742, IPR002035,  IPR001759)   IPR011360  1  LOC101854327   MEGF10  IPR001881, IPR000742, IPR011489  2  LOC101848617, LOC106013429   TEP  IPR011626  13  See supplemental file SFile3 Functional studies in C. gigas have demonstrated that the oyster TLRs function as immune sensors and signal through MyD88 down to NfKB as in vertebrates [100]. Similarly. Studies have shown that the oyster IKK complex does indeed form from two IKKs and adapter NEMO as in mammals. Furthermore, oyster IKK Fig. 8 Orthologous components of the TLR signaling cascade in Aplysia californica and other animals. Immune genes of interest were extracted from a custom InterProScan annotation of the Aplysia californcia RefSeq predicted protein models (AplCal, GCF_000002075.1) and publicly available InterPro annotations of proteins from the UniprotKB data base for Biomphalaria glabrata (Bglab, UP000076420_IPS), Crassosteraea gigas (Cgig, UP000005408), Nematostella vectensis (Nvect, UP000001593), Drosophila melanogaster (Dromel, UP000000803), Strongylocentrotus purpuratus (Spurp, UP000007110), and human (Hs, UP000005640) based on InterPro domains detailed in supplementary data Supplemental File SFile5. Results were further refined using sequence similarity search among listed proteomes and predicted protein models using OrthoFinder and BLASTP. Columns of the table represent a single species, while rows represent proteins known to play key roles in TLR signaling in mammals. Proteins are grouped according to their functional roles in TLR signaling: TLR receptors, Adapter proteins like MyD88, E3 Ubiquitin ligases, Kinase signaling complexes and components, and Transcription factors and associated proteins. Numbers in each cell represent the number of protein hits to each protein type, and thus differ from gene level numbers present in the main text. Note that hits are to Uniprot KB proteomes and as such differ from previously reported numbers for C. gigas and S. purpuratus which used predicted gene models. Aplysia protein numbers for major components are broadly similar to other animals, although TLRs appear to be highly diversified compared to mammals and arthropods, but still fewer in number than in S. purpuratus and previously reported numbers for C. gigas interacts with TRAF6 and MyD88 and induces the transcription of NF-kB and IRF target genes, likely through NFkB and IRF8 orthologs [101]. Indeed, mollusks in general appear to retain a TLR signaling cascade like that of vertebrates [98]. As in other mollusks, the Aplysia genome encodes essentially all the major components of the canonical TLR signaling cascade (Figs. 8). Given the broad conservation of genes from the TLR cascade in mollusks, and the conserved function of many of those genes as described in oyster, Aplysia possibly also retain a vertebrate-like TLR signaling cascade (Fig. 9). However, molluscan TLRs appear to have undergone massive expansion compared to vertebrates, suggesting a more diverse TLR signaling pool beyond the canonical pathway conserved with vertebrates.
Previously, a survey of the C. gigas genome demonstrated a massive expansion of the short protostome type TLRs and a smaller expansion of TLRs containing leucine-rich repeat c-terminals. This expansion led to a comparatively high TLR count, though was different in type compared to S. purpuratus which exhibited an expansion but in vertebrate-type TLRs. These TLRs were upregulated in response to boiotic challenge, suggesting a unique molluscan immune TLR cascade. Compared to C. gigas and S. purpuratus, Aplysia is comparatively poor in TLRs (39 in Aplysia vs 83 in C. gigas and 222 in S. purpuratus). Nevertheless, like other invertebrates, the Aplysia genome encodes many more TLRs than in vertebrates. While direct comparison to previously reported C. gigas numbers was not possible due to differences in custom vs reference protein number for C. gigas, the neighbor joining tree generated in this study corroborates earlier results that demonstrate that the massice expansion of TLRs in S. purpuratus and in mollusks represent distinct expansion of different TLR families [3]. Interestingly, unlike the C. gigas TLRs, the gastropod TLRs predominantly clustered within the proteosome specific branch of the tree, with the majority clustering in the mollusk specific branch. These results agree with OrthoFinder results which suggest most of the gastropod TLRs form a unique group apart from arthropods, veterbrates, and even bivalves. This may suggest a unique gastropod TLR expansion that may hint at unique gastropod TLR signaling pathways.
Concomitantly, The Aplysia genome further contains a diversity of TIRDC that may represent novel or expanded TLR signaling adaptors. Among these are several MyD88 and SARM-like TIR adaptor proteins, both of which are known from diverse lineages and are considered the most ancient of the TLR adapters [44,102]. However, many contain domains unknown from vertebrate TIRDC adapter. The presence of Armadillo-like folds in several Aplysia TLRs and TIRDCs may represent a novel TLR signaling cascade. Similarly, the presence of ROCO domains in two AcTLRs, two AcMyD88s, and an AcTIRDC may represent another novel GTPAse TLR signaling cascade. The inability to identify orthologs of TIR adapters TICAM1/TRIF and TICAM2/TRAM was expected as this MyD88-independent TLR signaling cascade arose only in the chordate lineage [102]. This suggests that the TRIF/TRAM/TBK1 signaling axis of some vertebrate TLRs is absent in Aplysia, or that another set of TIRDC adaptors fill an analogous role of TICAMs in non-canonical NFkB signaling. Downstream of TIRDCs, Aplysia retains two IRAKs as in other invertebrates, like the arthropod Drosophila and basal chordate Amphioxus. Given that these two IRAK proteins transduce TIR activation in different ways between taxa, it is unclear how Aplysia IRAKs interact with Aplysia TRAFs or Cactus orthologs [56].
Unlike in arthropods, Aplysia also appears to retain several key components of the complement system, including C1q lectins, many thioester proteins including factors C3 and C4, and factor B. While the complement system has substantially changed in the vertebrate lineage since the emergence of jawless fish, evidence suggests that the basic components of this pathway have ancient roots in early metazoa [103]. In this context, the Aplysia complement system likely serves to opsonize pathogens and induce inflammation [104]. In oyster, the C1q domain containing family underwent an extensive expansion like the TLRs [3]. In comparison, Aplysia is comparatively poor in C1qDC proteins, both compared to oyster and human.
Of particular interest for future studies of the Aplysia Abyssovirus, the Aplysia genome encodes many anti-viral genes. Among these are 5 Gamma-interferon-inducible lysosomal thiol reductases (Gilt/IFI30), which are key ISGs that inhibit viral entry in mammals [105]. Crustacean orthologs of Gilt have been demonstrated to have conserved viral entry-inhibiting function [106]. Although reported as an Mx ortholog, DQ821497 of Haliotis discus appears to be an ortholog of Aplysia Gilt based on homology and conserved domains. This abalone Gilt has been demonstrated to be induced by viral RNA mimic Poly I:C, suggesting conserved antiviral function of gastropod Gilt orthologs as well [107]. Vertebrate Mx proteins are unrelated to Gilt proteins and are instead related to dynamins and serve to prevent entry and replication of a diverse array of viruses [108]. Several dynamin-like genes were identified in Aplysia, but tree-based result from OrthoFinder suggests these Aplysia dynamins are more like vertebrate dynamins than vertebrate Mx proteins, and so may not exhibit any antiviral activity.
The Aplysia genome also encodes orthologs of antiviral deaminases ADAR and APOBEC3. These ISGs inhibit Fig. 9 Potentially Toll-like receptor signaling cascade in Aplysia californica. Based upon conservation of key components of the vertebrates TOLL-like receptor (TLR) cascade and conserved function of those components in other mollusks, we propose a vertebrate-like TLR signaling cascade in Aplysia californica. The Aplysia genome encodes putative orthologs of all major components of the classical TLR singaling cascade, including a diversity of TLRs, adapter protein MyD88, two MyD88 interacting IRAKs, E3 ligases and scaffolding protein, E3 ligase Pellino, E3 ligase LUBAC complex comprised of HOIP, HOIL1, and SHARPIN, the Tak1 kinase complex comprising of TAB1, TAB2/3, and TAK1, the IkB Kinase (IKK) complex made up of IKKa, IKKb, and IKKg/NEMO, several MAP kinases, and transcription factors in the NFkB, interferon regulatory factor (IRF), and AP-1 family. The Aplysia genome also encodes for elements of non-canonical TLR/NFkB singaling including TRAF3, non-canoninical IKK TABK1 and associated adapter NAP1. A broad suit of non-Myd88 TIR domain containing proteins suggests a diverse suit of alternative TLR signaling cascades that have yet to be described viral replication and function by introducing point mutations into viral genomes [109,110]. RNA editing of viral genomes by molluscan ADAR genes has been demonstrated to play an important role in the immune response to malacoherpseviruses [111]. The Aplysia genome also encodes a large complement of PARP proteins, which have been demonstrated to have strong anti-viral action against a diverse suite of viruses in mammals [112]. If these defenses are insufficient to protect cells from active infection, the Aplysia genome also encodes several viperins to prevent viral spread. Viperins are ISGs that serve to inhibit viral budding and release from lipid bodies in diverse taxa [113]. Conservation of function for molluscan viperins has been demonstrated in oysters [114]. Although the Aplysia genome encodes an immune compliment with substantial conservation with known innate immune genes, several major immune pathways or components appear to be missing.
Despite containing two non-canonical IKKs, the Aplysia genome lacks many of the components associated with TBK1 signaling in vertebrates. The adaptor proteins STING and MAVS, neither of which have identifiable orthologs in Aplysia, are both known to act as scaffolds that facilitate TBK1 activation of IRF3 in response to antiviral PRR activation [115].
While present in bivalves, the absence of STING has been described previously in heterobranch gastropods like Aplysia [98]. STING acts as an adaptor that facilitates phosphorylation of IRF3 by TBK1 in a diverse suit of nucleotide sensing pathways, many of which are either absent in Aplysia (DAI, PYHIN proteins), or non-productive in the heterobranch lineage (cGAS). The Aplysia genome does however encode for the MRN and DNAPK complexes, which have also been described as able to activate TBK1 in a STING dependent and independent fashions. Aplysia also lack the upstream components of TLR associated TBK1 signaling; notably TICAM proteins and TBK1 adaptors TANK and SINTBAD that form an analogous signaling cascade to the MyD88-TAB-TAK1 axis. TRADD, which is also known to complex with TRAF2/3 and activate the TANK-TBK1-IRF axis is also absent. Interestingly, Aplysia does retain an ortholog of RIOK3, which has been described as an essential adaptor of TBK1 and as an inhibiter of RLR signaling [62,63,116]. Given the diversity of yet undescribed TIRDC proteins in Aplysia, perhaps a convergent non-canonical IKK signaling pathway has emerged in the heterobranch lineage.
Given the absence of STING in heterobranchs, despite its presence in the bivalve lineage, it is perhaps not surprising that Aplysia also lack a MAVS ortholog [117,118]. Nevertheless, the Aplysia genome does still include 3 putative RLRs that act upstream of MAVS in vertebrates and presumably bivalves. Notably, the Aplysia RLRs lack any CARD or DLD which act to form complexes between RLRs and MAVS in canonical RLR signaling [119]. This condition of Aplysia RLRs is conserved with those in Biomphalaria, suggesting a functional divergence of heterobranch RLRs from those in bivalves. Perhaps heterobranch RLRs follow a similar pattern to the highly divergent immunoglobulin domains of VIgLs, where the functional domain is conserved but so highly divergent that it is difficult for automated systems to detect. Perhaps a similarly highly divergent MAVS-like protein has yet to be identified in heterobranchs, or a novel signaling axis for RLRs has emerged in the absence of MAVS. Future studies coprecipitating Aplysia RLRs with interacting proteins may be able to uncover such a highly divergent MAVS, if it indeed exists.
Another notably absent immune axis is that of OAS1 and its downstream effector RNAse L. OAS acts as a sensor for a diverse repertoire of viruses, including RNA viruses such as SARS-CoV2 [120].
In addition to missing some PRRs and associated signaling cascades, the absence of certain pro-inflammatory pathways may suggest Aplysia exhibits an inflammatory response different from vertebrates. A major component of the vertebrate inflammatory response is the formation of the NLRP3 inflammasome [121,122]. While originally believed to have evolved in vertebrates, recent evidence suggests that the NLRP3 inflammasome may have antecedents in invertebrate innate immunity as orthologs of NLRP3 exhibit immune function in echinoderms and arthropods [123,124]. The only Aplysia NLR identified does contain a Death-like domain, considered a possible predecessor to the CARD domain of NLRP3 in vertebrates, suggesting perhaps some conserved function [125]. However, the dearth of NLRs, lack of PYHIN proteins, and absence of RIPKs suggest that Aplysia does not generate inflammasomes in response to pro-inflammatory signaling [126].
Studies in bivalves suggest an ancestral molluscan innate immune system with a high degree of conservation with that of deuterostomes [3,8,[127][128][129]. However, the absence of several PRRs and their adaptors suggests that the immune response in Aplysia may be substantially different not only from vertebrates but also from bivalves. Antiviral immunity in arthropods acts through different mechanisms than in vertebrates, notably lacking RLR antiviral signaling. Arthropods detect and respond to viruses via the RNA interference, TLR, JAK/STAT, and IMD/PGRP pathways [88,130]. Perhaps heterobranchs gastropods, exhibiting a reduced immune complement compared to their bivalve relatives, have developed alternative, lineage specific immune pathways as occurred in arthropods.

Conclusion
Together these results illustrate the difficulty of interpreting transcriptomic and genomics data in understudied lineages. Due to the overrepresentation of mammals and ecdysozoans in immune research, there is risk of erroneous interpretation of gene function based on similarity searches alone in mollusks. In these results, areas of broad conservation, notably in canonical toll-like receptor signaling were identified. However, Aplysia also exhibits areas of stark deviation from mammalian and even bivalve immunity, notably the absence of STING and MAVS. This illustrates the need for a healthy measure of caution when investigating immunity in Aplysia. By combining homology searches with annotation of conserved protein functional domains, an annotation set has been created that will provide a more nuanced basis for interpretation of immune-associated data in Aplysia. However, due to the in silico nature of this study, it is not possible to fully address potentially novel elements in Aplysia immunity. This dataset will serve as a springboard for future immune studies in Aplysia to further elucidate immune function in this venerable marine model.

Methods
In order to identify a suite of potential immune associated genes in Aplysia californica, conserved protein domain annotation via InterProScan was combined with homology search and tree based ortholog identification via OrthoFinder.
InterProScan (IPS) is a software suite that annotates queried proteins via several algorithms for in silico functional characterization [131,132]. Annotations are given not only for the InterPro database, but also participating databases such as PFAM, SMART, PANTHER. For this analysis, the publicly available AplCal3.0 RefSeq reference genome assembly (GCF_000002075.1), generated from whole genome shotgun sequencing by the Broad Institute, was used [133]. Aplysia RefSeq predicted protein models for this assembly (GCF_000002075.1_Apl-Cal3.0_protein.faa) were annotated for conserved domains, motifs, and protein families via IPS (version 5.52-86.0) on the University of Miami Center for Computation Science PEGASUS supercomputer [132]. IPS config file used can be found in supplemental file SFile4. The resulting database was then searched for proteins annotated with conserved domain suites that characterize described immune-associated genes in other organisms. The list of genes of interest and their requisite domains was constructed by combining previous in silico immune gene studies in Pacific oyster Crassostrea gigas [3] and pond snail Lymanaea stagnalis [10]. This list was further expanded upon based on other reviews of immune-associated genes in vertebrates, mollusks, and arthropods [2, 4, 5, 9, 29-31, 33, 48, 55, 61, 63-65, 69, 84, 98, 110, 129, 134-141]. The full list of genes and requisite domains used can be found in supplemental file SFile5.
Conserved domain family screening of a proteome can recover putative members of known gene families with high confidence. Nevertheless, for proteins that are difficult to distinguish from a domain level alone (ie IkB with only Ankyrin repeat domains), this approach alone is insufficient. Furthermore, in several protein families of interest, molluscan proteins lack several domains or contain extra domains relative to their vertebrate and arthropod orthologs. In order to avoid missing these genes, previously identified molluscan representatives of these gene families were used in homology-based searches and validated with conserved domain data.
For more traditional homology searches for immune genes, reference predicted protein models for Aplysia, Crassostrea gigas (GCA_000297895.1_oyster_v9_protein. faa), and Biomphalaria glabrata (GCF_000457365.1_ ASM45736v1_protein_Bglabrata.faa) from the RefSeq database [142,143], and reference proteomes for Homo sapiens (UP000005640_9606_Homo_Sapies) and Drosophila melanogaster (UP000000803_7227_DroMel) from UNIPROT were analyzed with OrthoFinder. OrthoFinder combines reciprocal, all-against-all BLAST homology search with hierarchical clustering to identify probable orthologs and clusters of orthologs called orthogroups [144]. These collections of orthologs were then queried for genes identified as immune associated in C. gigas [3], B glabrata [5], human, and Drosophila (SFile2).
The two sets of putative immune genes generated by these two analyses were then inspected and compared by hand to verify that hits contained requisite functional domains and were likely relatives of target genes. Data manipulation was done in the R statistical environment using RStudio and the tidyverse software suite [145,146].
TIR domains of the longest protein isoform of each putative Aplysia TLR gene were extracted from RefSeq predicted protein fasta files using predicted TIR domains from IPS using tidyverse and the Biostrings R package [147]. Extracted TIR domain sequences were aligned using the msa R package with default clustal parameters [148]. A neighbor joining tree from Aplysia TLR TIR multi-sequence alignments was constructed with the Phangorn R package using the standard scripts for amino acid trees from the developer [149,150]. Briefly, tree was built with a k of 4, a proportion of invariable sites of 2, and using the Jones-Taylor-Thornton model and bootstrapped with 1000 iterations. The resulting tree was visualized with the treeio and ggtree R packages [151][152][153][154].
TLR proteins for phylogenetic tree of TLRs were extracted from publicly available InterPro annotations of proteins from the UniprotKB data base for Biomphalaria glabrata (Bglab, UP000076420_IPS), Crassosteraea gigas (Cgig, UP000005408), Nematostella vectensis (Nvect, UP000001593), Drosophila melanogaster (Dromel, UP000000803), Strongylocentrotus purpuratus (Spurp, UP000007110), and human (Hs, UP000005640). These files were queried for TLRs based on TIR domain and LRR domains as was done with AplCal3.0 IPS annotation described above. Predicted TLR sequences were download from the UnirpotKB (or RefSeq for Aplysia and Nematostella) and used to generate a tree as described above for TIR domain sequences. The tree was not bootstrapped due to long iteration time. Tree was imported and visualized with treeio and ggtree.