Functional characterization of protein domains common to animal viruses and mouse
© Kinjo et al; licensee BioMed Central Ltd. 2011
Published: 30 November 2011
Many viruses contain genes that originate from their hosts. Some of these acquired genes give viruses the ability to interfere with host immune responses by various mechanisms. Genes of host origin that appear commonly in viruses code for proteins that span a wide range of functions, from kinases and phosphotases, to cytokines and their receptors, to ubiquitin ligases and proteases. While many important cases of such lateral gene transfer in viruses have been documented, there has yet to be a genome-wide survey of viral-encoded genes acquired from animal hosts.
Here we carry out such a survey in order to gain insight into the host immune system. We made the results available in the form of a web-based tool that allows viral-centered or host-centered queries to be performed (http://imm.ifrec.osaka-u.ac.jp/musvirus/). We examine the relationship between acquired genes and immune function, and compare host-virus homology with gene expression data in stimulated dendritic cells and T-cells. We found that genes whose expression changes significantly during the innate antiviral immune response had more homologs in animal virus than genes whose expression did not change or genes involved in the adaptive immune response.
Statistics gathered from the MusVirus database support earlier reports of gene transfer from host to virus and indicate that viruses are more likely to acquire genes involved in innate antiviral immune responses than those involved in acquired immune responses.
Some viruses that infect vertebrates are able to acquire genes from their hosts[1, 2]. Host organisms, in turn, have developed intricate mechanisms such as the innate and adaptive immune responses to defend themselves against viruses and other pathogens. Acquired genes that increase the chances of survival within the hosts, can thus give us a unique view of the host’s immune system. To understand the defense systems of vertebrates against viruses, comprehensive knowledge of relationships between viral and mouse proteins is necessary. Here, we select mouse (Mus musculus) as a model vertebrate organism, since genetic techniques are well established for mice, and they have many genes orthologous to genes in humans. We considered the overlap between entries in the conserved domain database, and homologous domains in mouse and in a wide range of viral proteins. The result of this comprehensive comparison was compiled as the MusVirus database (DB).
In order to characterize the functions of acquired domains on a biological level, we mapped each mouse entry to DNA microarray probe identifiers, allowing sets of differentially expressed genes to be uploaded to MusVirus. Here, we examined the number of viral homologs to genes differentially expressed in dendritic cells following stimulation of innate immune response pathways. For comparison we examine genes whose expression levels remained unchanged upon such stimulation, as well as genes differentially expressed upon stimulation of T cell receptors in T cells. Together, these results indicated that genes involved in innate immune response pathways were more likely to be acquired by viruses than genes involved in adaptive immune response pathways or genes whose expression levels do not change significantly upon stimulation.
Results and discussion
Overview of homologs between mouse and viral proteins
Shared conserved domains
After identifying homologs between mouse and viral proteins, we next asked if the similarities between homologs based on whole-sequence similarities are biased to some specific domains. To further analyze the structure of homolog similarities, we performed PSI-BLAST searches of both mouse and viral proteins against the CDD database and identified shared conserved domains. 7,222 CDD domains (out of 46,288 in total) were shared between at least one pair of mouse and viral proteins. The most frequently shared CDD domain was integrin-linked kinase (KOG0195), which has been linked to mammary tumor progression  and was found in 1,247 and 495 mouse and viral proteins, respectively. Out of 20 most frequently shared CDD domains, nine were kinases and eight contained ankyrin repeats. 366 CDD domains matched to more than 1,000 mouse proteins, and 86 % of these were kinases. To systematically identify the characteristics of shared conserved domains, we compared the frequencies of the words in annotations between shared CDDs and non-shared CDDs (a “non-shared CDD” is a CDD that matches either a mouse or a viral protein but not both). The words most well-representing the shared CDDs were related to serine/threonine kinases (“STKs”) and protein tyrosine kinases (“PTK,” “PTKs,” “PTKc”) followed by GT1 family of glycosyltransferases (“GT1”) and the CCX motif involved in various Rab subfamilies (“CCX”).
Distribution of mouse homologs across mouse infecting viruses
Number of homologs1
Total number of proteins2
Number of mouse homologs3
Murid herpesvirus 4
Moloney murine leukemia virus
Friend murine leukemia virus
Mouse mammary tumor virus
Moloney murine sarcoma virus
Murid herpesvirus 2
Abelson murine leukemia virus
Murine hepatitis virus strain A59
Murine hepatitis virus strain JHM
Murine osteosarcoma virus
Murid herpesvirus 1
Murine type C retrovirus
Rauscher murine leukemia virus
Murid herpesvirus 4 proteins with mouse homologs.
Number of mouse homologs
complement control protein
E3 ubiquitin ligase MIR1
DNA polymerase catalytic subunit
tegument protein G75C
ribonucleotide reductase subunit 2
tegument serine/threonine protein kinase
membrane protein G74
ribonucleotide reductase large
Genes induced by innate immune responses abundant in MusVirus DB
As discussed above, MusVirus contains proteins with various biochemical annotations. We next asked whether MusVirus hits are related by their biological functions as well. Since database annotations of biological function are less complete than those of biochemical functions, we grouped proteins by their corresponding gene expression levels in stimulated immune cells. Specifically, we examined expression levels of genes induced in the innate immune and adaptive immune responses. Animal hosts provoke pleiotropic immune responses to limit viral dissemination when infected with viruses. In the course of the response, many genes are induced, not only as mediators of antiviral signaling such as type I IFN genes and proinflammatory cytokines, but also cell-intrinsic effectors such as GTPase Mx1, RNase Isg20, the SAM domain protein viperin, and the zinc finger protein ZAP1 . As shown above, and discussed elsewhere, viruses have many proteins homologous to such host factors. It is highly likely that such viral proteins mimic antiviral effectors induced in host cells upon viral infection.
To test this hypothesis, microarray data from mouse GM-CSF-induced dendritic cells stimulated with the Toll-like receptor 4 (TLR4) ligand lipopolysaccharide (LPS), the TLR2 ligand Pam3CSK4 (PAM), or Newcastle disease virus (NDV) were analyzed. Probes whose expression levels changed 3 fold or more, and 1.5 fold or less, were denoted “3-fold” and “1.5-fold” sets, respectively. The 3-fold set represented a set of genes induced or suppressed after stimulation, and thus directly or indirectly involved in innate immunity. In contrast, the 1.5-fold-set represents genes whose expression levels do not change upon stimulation. By using observed gene expression levels in this way we avoided the errors associated with incomplete functional annotations that would compromise statistics derived from database queries.
Each set corresponded to a list of affymetrix identifiers, which could be directly uploaded to MusVirus. A utility script was prepared that converted the resulting output to a table that could easily be imported into 3rd party software.
We first computed the numbers of MusVirus hits for each gene in each set. As shown in Table 3, the 3-fold sets of LPS and NDV stimulation had significantly higher mean numbers of viral homologs than the corresponding 1.5-fold sets, whereas the 3-fold set of PAM stimulation had only marginal difference. Virus and LPS but not PAM induce an IRF-dependent antiviral signaling pathway . Thus this result suggests that genes induced or reduced by antiviral signaling are preferentially mimicked by viruses.
Mean numbers of hits in MusVirus of each gene sets.
1.79 x 10-3
3.23 x 10-4
5.15 x 10-2
1.50 x 10-1
Using MusVirus to search for proteins with known biochemical function
In this section we demonstrate how to search for proteins using keywords that map to specific biochemical functions. For the following we used mode-2 (low-complexity filtering turned on), as described in the Methods “Sequence Comparison” subsection. We focused on several functional classes of proteins that are known to affect host immune response. These included kinases, cytokines and their receptors, as well as proteins involved in ubiquitination.
Viruses often encode kinases to control viral growth by regulating viral gene expression, DNA synthesis, and tissue tropism. Searching with the entire “viral” DB for the word “kinase” resulted in 933 hits, 701 with mouse-virus homologs. By restricting the search to virus with “kinase” in the annotation field, 134 hits with homologs in mouse were found. By restricting the search to those with homologs, we expect to increase the chance of finding example where the acquired gene is used to exploit host signaling pathways for propagation or to subvert host immune responses.
Viruses target cytokines and chemokines directly, through molecular mimicry, and also indirectly by targeting upstream or downstream regulators in cytokine/chemokine-mediated pathways . Chemokines and cytokines are attractive targets for acquisition by viruses since they can affect a broad range of host defense responses, including intra and inter cellular signaling, chemotaxis, and cell death. They are also efficiently stored in DNA viral genomes due to their small size. The most studied viruses that encode homologs of host cytokines are herpesvirues. When the search “cytokine” NOT “receptor” was performed against the viral-centered database with “require mouse-virus homology” disabled, 10 hits were found, most of which were dsDNA viruses (herpes viruses, variola virus, Infectious spleen and kidney necrosis virus, etc.). The top hit was to a Putative CC-type chemokine U83 in Human herpesvirus 6B, which, is also found in the 6A strain and functions as a chemoattractant for monocytes .
Some viruses manipulate their hosts' ubiquitin system for their propagation . Some ubiquitin-like proteins in viruses such as USP7 protect other viral proteins from ubiquitination in the host cell and suppress NF-κB signaling. Searching CDD annotations for viral proteins with the keyword ''ubiquitin'' identified 246 mouse-virus homolog hits, mostly to E3 ubiquitin ligases. Ubiquitin E3 ligase-like proteins are known to be involved in processes such as activation of viral and cellular genes, degradation of receptor molecules (e.g., MHC class I), as well as suppression of innate immunity. Among such ubiquitin E3 ligase-like proteins, 19 infected cell protein 0 (ICP0) from various herpesviruses were found in MusVirus and they were indeed homologous to many mouse proteins containing the RING finger motif.
Recently, a number of examples of viruses acquiring genes involved in host defense have been described [5–7, 10, 17, 18]. Together, these cases studies provide crucial information about regulation of the animal immune system. The purpose of MusVirus is to provide a platform where researchers can systematically search for functionally relevant homologous relationships between animal virus and host genes. MusVirus was designed to operate on the level of protein domains in order to make biochemical functional interpretation of results as unambiguous as possible.
As the examples in this study show, MusVirus hits cover various functional classes of proteins exploited by viruses. In order to be useful as a tool for functional prediction and discovery, we included mapping between DNA microarray identifiers and mouse proteins. Our investigation of gene expression data in dendritic cells stimulated with several ligands (LPS, PAM3CSK4, and NDV) indicate that there is a significant difference in the number of host acquired genes differentially expressed during the innate immune response compared to genes whose expression levels do not change. Interestingly, we could classify such genes as belonging particularly to a viral (NDV and LPS 3-fold sets) as opposed to an antibacterial innate immune response (PAM 3-fold set). Analysis of gene expression in T cells indicates that viruses are more likely to acquire genes involved in innate immune responses than that in acquired immune response.
Although we have focused on mouse as a host species of animal viruses in this study, the present approach is readily applicable to other host species such as humans. By compiling databases analogous to MusVirus for various host species, it may help to elucidate universality as well as intricate diversity of viral infection, host defense mechanisms and their interplay. Indeed, the above hits to mouse kinases (Rrm1, Cdk6, Crkrs, EF-1delta, CK-IIbeta, and the large subunit of RNA polymerase II), and cytokine receptors (interferon gamma receptor 1) all had human orthologs.
Collectively, MusVirus captures the preference of viruses to mimic proteins involved in acute antiviral immune responses presumably because viruses take advantage of these proteins to suppress host immune system for survival and dissemination. We believe that this database will benefit research in both immunology and virology by providing a vehicle for functional insights on various genes.
40,732 mouse proteins were obtained from the Ensembl database (NCBIM37.56 release). The viral proteins were obtained from RefSeq. A list of the correspondence between viruses and their hosts were provided by the Database of the International Committee on Taxonomy of Viruses (ICTVdB). Proteins from viruses that infect Algae, Archaea, Bacteria, Fungi, or Plants as well as all phages were discarded. All other viral proteins, including those with un-annotated hosts, were retained. In total, there were 32,928 proteins from 1,190 viral species. We note that proteins from animal viruses whose host is not mouse were retained since their homologs may also provide valuable information about animal immune systems. There were 24 viral species whose host was identified as “mouse” (more precisely, their scientific names contained the words “mouse,” “murid” or “murine”). In this work, we refer to these 24 viruses as “mouse-infecting viruses.”
We employed two different modes for finding homologs between mouse and viruses. In Mode-1, PSI-BLAST  was used for sequence similarity searches. Position-specific scoring matrices (PSSMs) were created for mouse and viral proteins by iterating PSI-BLAST (with low-complexity filter) three times against the UniRef90  database with an e-value cutoff of 10−3 and with no limit for the number of displayed alignments (other parameters were set to default values). After a PSSM was made for each mouse (or viral) protein sequence, PSI-BLAST searches were conducted against viral (or mouse) proteins, UniProt/SwissProt, and CDD databases (only amino acid sequences, not PSSMs, were used for CDD). In the production run of PSI-BLAST, we turned off the low-complexity filter. Although this may introduce some false positives, we noticed that some meaningful hits were only found without low-complexity filtering. In Mode-2, we used BLAST with low-complexity filter for all the cases, except for the search against CDD where RPS-BLAST was used using the CDD PSSMs. In all BLAST/RPS-BLAST searches, the parameters were equivalent to those in Mode-1.
Compilation of sequence comparison data
Sources of protein sequences and annotations.
A web interface for the MusVirus database was constructed (http://imm.ifrec.osaka-u.ac.jp/musvirus/). The user can construct queries by searching over various fields in MusVirus. CDD keywords are useful for finding proteins that share particular conserved domains, in contrast to sharing only local homology. For mouse proteins, affymetrix IDs can also be used as input. The latter feature is useful for identifying the intersection between inducible genes and the corresponding proteins shared between mouse and virus. It is also possible to search by browsing a viral taxonomy tree, thus restricting the search to a particular subset of viruses.
Microarray data processing and statistical analysis
Microarray data of gene expression in GM-CSF-induced bone marrow dendritic cells (GM-DC) stimulated with Newcastle disease virus was described previously . Microarray data of gene expression in GM-DC stimulated with Pam3CSK4 or LPS , and in T cell stimulated with anti-CD3 antibody  were described and obtained from NCBI GEO (accession numbers GSE17721 and GSE12464, respectively). Robust multiarray expression values were calculated from the raw intensity data with using R and Bioconductor (http://www.bioconductor.org). Probes having 3 fold or more difference relative to unstimulated conditions were denoted as the “3-fold set”. Probes having 1.5 fold or less difference were denoted the “1.5-fold set”. For each probe, the number of hits in MusVirus were counted using “mode 2”, as described above, and then the number was divided by the number of proteins corresponding to the probe. The resulting number was used as the number of MusVirus hits for the probe. The Welch’s t test was then applied to these sets and p values were calculated using R.
The authors thank R.J. Milewski and R. Amphlett for helping prototype the MusVirus web interface in the early stages of the project.
This article has been published as part of BMC Genomics Volume 12 Supplement 3, 2011: Tenth International Conference on Bioinformatics – First ISCB Asia Joint Conference 2011 (InCoB/ISCB-Asia 2011): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/12?issue=S3.
- Brander C, Walker BD: Modulation of host immune responses by clinically relevant human DNA and RNA viruses. Curr Opin Microbiol. 2000, 3 (4): 379-386. 10.1016/S1369-5274(00)00108-9.View ArticlePubMedGoogle Scholar
- Finlay BB, McFadden G: Anti-immunology: evasion of the host immune system by bacterial and viral pathogens. Cell. 2006, 124 (4): 767-782. 10.1016/j.cell.2006.01.034.View ArticlePubMedGoogle Scholar
- Kawai S, Yoshida M, Segawa K, Sugiyama H, Ishizaki R, Toyoshima K: Characterization of Y73, an avian sarcoma virus: a unique transforming gene and its product, a phosphopolyprotein with protein kinase activity. Proc Natl Acad Sci U S A. 1980, 77 (10): 6199-6203. 10.1073/pnas.77.10.6199.PubMed CentralView ArticlePubMedGoogle Scholar
- Pontier SM, Huck L, White DE, Rayment J, Sanguin-Gendreau V, Hennessy B, Zuo D, St-Arnaud R, Mills GB, Dedhar S, et al: Integrin-linked kinase has a critical role in ErbB2 mammary tumor progression: implications for human breast cancer. Oncogene. 2010, 29 (23): 3374-3385. 10.1038/onc.2010.86.View ArticlePubMedGoogle Scholar
- Alcami A: Viral mimicry of cytokines, chemokines and their receptors. Nat Rev Immunol. 2003, 3 (1): 36-50. 10.1038/nri980.View ArticlePubMedGoogle Scholar
- Sodhi A, Montaner S, Gutkind JS: Viral hijacking of G-protein-coupled-receptor signalling networks. Nat Rev Mol Cell Biol. 2004, 5 (12): 998-1012. 10.1038/nrm1529.View ArticlePubMedGoogle Scholar
- Gershburg E, Pagano JS: Conserved herpesvirus protein kinases. Biochim Biophys Acta. 2008, 1784 (1): 203-212.PubMed CentralView ArticlePubMedGoogle Scholar
- Sadler AJ, Williams BR: Interferon-inducible antiviral effectors. Nat Rev Immunol. 2008, 8 (7): 559-568. 10.1038/nri2314.PubMed CentralView ArticlePubMedGoogle Scholar
- Akira S, Uematsu S, Takeuchi O: Pathogen recognition and innate immunity. Cell. 2006, 124 (4): 783-801. 10.1016/j.cell.2006.02.015.View ArticlePubMedGoogle Scholar
- Kawaguchi Y, Kato K: Protein kinases conserved in herpesviruses potentially share a function mimicking the cellular protein kinase cdc2. Rev Med Virol. 2003, 13 (5): 331-340. 10.1002/rmv.402.View ArticlePubMedGoogle Scholar
- Malumbres M, Barbacid M: Mammalian cyclin-dependent kinases. Trends Biochem Sci. 2005, 30 (11): 630-641. 10.1016/j.tibs.2005.09.005.View ArticlePubMedGoogle Scholar
- Rahman MM, McFadden G: Modulation of tumor necrosis factor by microbial pathogens. PLoS Pathog. 2006, 2 (2): e4-10.1371/journal.ppat.0020004.PubMed CentralView ArticlePubMedGoogle Scholar
- Seregin SV, Babkina IN, Nesterov AE, Sinyakov AN, Shchelkunov SN: Comparative studies of gamma-interferon receptor-like proteins of variola major and variola minor viruses. FEBS Lett. 1996, 382 (1-2): 79-83. 10.1016/0014-5793(96)00069-5.View ArticlePubMedGoogle Scholar
- Viejo-Borbolla A, Martin AP, Muniz LR, Shang L, Marchesi F, Thirunarayanan N, Harpaz N, Garcia RA, Apostolaki M, Furtado GC, et al: Attenuation of TNF-driven murine ileitis by intestinal expression of the viral immunomodulator CrmD. Mucosal Immunol. 2010, 3 (6): 633-644. 10.1038/mi.2010.40.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu X, Nash P, McFadden G: Myxoma virus expresses a TNF receptor homolog with two distinct functions. Virus Genes. 2000, 21 (1-2): 97-109.View ArticlePubMedGoogle Scholar
- Zou P, Isegawa Y, Nakano K, Haque M, Horiguchi Y, Yamanishi K: Human herpesvirus 6 open reading frame U83 encodes a functional chemokine. J Virol. 1999, 73 (7): 5926-5933.PubMed CentralPubMedGoogle Scholar
- Randow F, Lehner PJ: Viral avoidance and exploitation of the ubiquitin system. Nat Cell Biol. 2009, 11 (5): 527-534. 10.1038/ncb0509-527.View ArticlePubMedGoogle Scholar
- Panus JF, Smith CA, Ray CA, Smith TD, Patel DD, Pickup DJ: Cowpox virus encodes a fifth member of the tumor necrosis factor receptor family: a soluble, secreted CD30 homologue. Proc Natl Acad Sci U S A. 2002, 99 (12): 8348-8353. 10.1073/pnas.122238599.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH: UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007, 23 (10): 1282-1288. 10.1093/bioinformatics/btm098.View ArticlePubMedGoogle Scholar
- Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A: BioMart--biological queries made easy. BMC Genomics. 2009, 10: 22-10.1186/1471-2164-10-22.PubMed CentralView ArticlePubMedGoogle Scholar
- Pierleoni A, Martelli PL, Fariselli P, Casadio R: eSLDB: eukaryotic subcellular localization database. Nucleic Acids Res. 2007, 35 (Database issue): D208-212.PubMed CentralView ArticlePubMedGoogle Scholar
- Database I: http://www.ncbi.nlm.nih.gov/ICTVdb/. 2010,
- Matsui K, Kumagai Y, Kato H, Sato S, Kawagoe T, Uematsu S, Takeuchi O, Akira S: Cutting edge: Role of TANK-binding kinase 1 and inducible IkappaB kinase in IFN responses against viruses in innate immune cells. J Immunol. 2006, 177 (9): 5785-5789.View ArticlePubMedGoogle Scholar
- Amit I, Garber M, Chevrier N, Leite AP, Donner Y, Eisenhaure T, Guttman M, Grenier JK, Li W, Zuk O, et al: Unbiased reconstruction of a mammalian transcriptional network mediating pathogen responses. Science. 2009, 326 (5950): 257-263. 10.1126/science.1179050.PubMed CentralView ArticlePubMedGoogle Scholar
- Blomberg KE, Boucheron N, Lindvall JM, Yu L, Raberger J, Berglof A, Ellmeier W, Smith CE: Transcriptional signatures of Itk-deficient CD3+, CD4+ and CD8+ T-cells. BMC Genomics. 2009, 10: 233-10.1186/1471-2164-10-233.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.