Ixodes scapularis tick serine proteinase inhibitor (serpin) gene family; annotation and transcriptional analysis
© Mulenga et al. 2009
Received: 31 October 2008
Accepted: 12 May 2009
Published: 12 May 2009
Skip to main content
© Mulenga et al. 2009
Received: 31 October 2008
Accepted: 12 May 2009
Published: 12 May 2009
Serine proteinase inhibitors (Serpins) are a large superfamily of structurally related, but functionally diverse proteins that control essential proteolytic pathways in most branches of life. Given their importance in the biology of many organisms, the concept that ticks might utilize serpins to evade host defenses and immunizing against or disrupting their functions as targets for tick control is an appealing option.
A sequence homology search strategy has allowed us to identify at least 45 tick serpin genes in the Ixodes scapularis genome that are structurally segregated into 32 intronless and 13 intron-containing genes. Nine of the intron-containing serpins occur in a cluster of 11 genes that span 170 kb of DNA sequence. Based on consensus amino acid residues in the reactive center loop (RCL) and signal peptide scanning, 93% are putatively inhibitory while 82% are putatively extracellular. Among the 11 different amino acid residues that are predicted at the P1 sites, 16 sequences possess basic amino acid (R/K) residues. Temporal and spatial expression analyses revealed that 40 of the 45 serpins are differentially expressed in salivary glands (SG) and/or midguts (MG) of unfed and partially fed ticks. Ten of the 38 serpin genes were expressed from six to 24 hrs of feeding while six and fives genes each are predominantly or exclusively expressed in either MG and SG respectively.
Given the diversity among tick species, sizes of tick serpin families are likely to be variable. However this study provides insight on the potential sizes of serpin protein families in ticks. Ticks must overcome inflammation, complement activation and blood coagulation to complete feeding. Since these pathways are regulated by serpins that have basic residues at their P1 sites, we speculate that I. scapularis may utilize some of the serpins reported in this study to manipulate host defense. We have discussed our data in the context of advances on the molecular physiology of I. scapularis. Although the paper is descriptive, this study provides the first step toward a comprehensive understanding of serpins in tick physiology.
Ticks, segregated into two families; Ixodidae (hard ticks) and Argasidae (soft ticks) are important vectors of several pathogens with a global veterinary and public health impact [1, 2]. Although research on ticks has for the most part been directed towards agricultural interests, ticks have been recognized as the second most important vectors of human disease agents after mosquitoes . Globally, the impact of tick borne disease agents on public health has been on a dramatic rise [3–6] since the discovery of Borelia burdgoferi as the causative agent for Lyme disease in 1982 [7, 8]. Literature reviews by Parola and Roult  listed 15 new tick borne bacterial agents being discovered or recognized as human pathogens between 1982 and 2004. In the United States ticks transmit more causative agents of vector borne diseases than any other vector arthropod . Tick borne human diseases reported in the USA include Babesiosis, ehrlichisosis, Southern Tick-Associated Rash illness (STARI), Lyme disease, tick-borne relapsing fever, anaplasmosis and Rocky Mountain spotted fever, . From a human health perspective, Ixodes scapularis and its close relatives, I. pacificus, I. ricinus and I. persulcatus and I. holocyclus are the most important ticks as they transmit the majority of emerging human disease pathogens. The importance of these ticks to human health was the key justification for funding of the I. scapularis genome, sequencing project . Key anticipated outcomes of the tick genome sequencing project will be provision of opportunities to identify unique tick genes that could be exploited for tick control and, thus the control of tick borne diseases . We are interested in understanding the role of the serine proteinase inhibitors (serpin) in tick physiology and feeding.
Serpins represent the largest family of proteinase inhibitors that is widely conserved across taxa, from animals to plants, viruses and bacteria [11–18]. Of the 68 families of proteinase inhibitors listed on the MEROPS database (version 7.6, http://merops.sanger.ac.uk/, ), the serpin family (denoted as I4) has the largest number of entries. In humans, serpins make up 2% of total blood plasma proteins  and are involved in the regulation of important pathways such as blood coagulation, complement activation, inflammation, fertilization and food digestion [11–14]. The importance of serpins in humans can be attested to by more than 90 human diseases such as cirrhosis, emphysema, blood coagulation disorders and dementia that arises from serpin malfunctions due to mutation . In arthropods, serpins were linked to immunity in mosquitoes [21, 22], the fruit fly [23–26] and the tobacco hornworm , development in the fruit fly , control of the hemolymph coagulation cascade in the horseshoe crab [29–31]. Given the importance of serpins in the biology of multicellular organisms, it has been hypothesized that, ticks might utilize serpins to evade host defenses and immunizing against or disrupting functions of these proteins is an appealing option for designing new tick control strategies [17, 32].
More than 30 serpin encoding cDNAs have now been cloned in several economically and medically important ticks including Amblyomma americanum , R. appendiculatus  and I. ricinus  and I. scapularis . Serpin encoding ESTs and cDNA sequences of I. scapularis and R. microplus cited in  are also available in GeneBank. Studies by Sugino et al., , Imamura et al.,  and Prevot et al.,  have provided encouraging evidence suggesting the potential of serpins as targets for tick control. These authors [36–38] showed that feeding of ticks on animals immunized with recombinant tick serpins caused ticks to obtain smaller blood meals and as consequence reduced tick fecundity and mortality. These findings clearly suggest that serpins play important roles in tick physiology.
Given the availability of I. scapularis genome sequence information in GeneBank and at VectorBase  we thought to get some insight on the size of the serpin protein family in ticks. We report here on the identification and characterization at least 45 I. scapularis serpin genes. A sequence based analyses show that ~93% of these serpins are putatively inhibitory with 88% (31/35) of full-length serpins being putatively extracellular. Our RT-PCR expression analyses data demonstrated that 84% (38/45) are differentially expressed in midguts and salivary glands of unfed and partially fed ticks. We have discussed our findings towards advances in I. scapularis tick molecular biology.
For comparisons to known serpins, assembled coding sequences were scanned against known protein entries in GenBank using the BLASTx or BLASTp homology search program. To validate their accuracy, inferred amino acid sequences were inspected for two amino acid motifs, "NAVYKFG" and "DVNEEG," that are conserved in most known serpins [30, 44]. Full-length and partial sequences were determined on the basis that, a typical serpin molecule ranges between 350–450 amino acid residues long [11, 12]. To gain insight on probable functionality, inferred amino acid sequences were scanned against amino acid motif entries in ScanProsite and signal peptides in the SignalP servers (ExPASY Proteomics Server http://ca.expasy.org/). The reactive center loop (RCL) which determines the functionality (inhibitory and non-inhibitory) of a serpin molecule was determined based on consensus 20/21 residue peptide "p17 [E]-p16 [E/K/R]-p15 [G]-p14 [T/S]-p13 [X]-p12-9 [AGS]-p8-1 [X]-p1' -4' " in carboxy-terminus region [11, 45]. The numbering here is based on the convention in which residues on the amino terminal side of scissile bond (p1-p1') are labeled as p17 to p1 and those on the carboxy terminal side are labeled as p1' -p4' . The putative scissile bond (p1-p1') and the p1 residue were predicted based on the conserved features that there are generally 17 amino residues (p17 to p1) between the start of the hinge region of the RCL, and scissile bond .
In order to determine relationships among tick serpins, a guide tree of 68 tick serpins sequences, 34 full-length serpins from this study, and 34 other tick serpins that were downloaded from GeneBank, and human α1-antitrypsin (out group, AAB59495) as an outlier were used to construct the guide tree using the neighbor joining method. Specifications were set for bootstrap values at 1000 replications, gaps proportionately distributed and correction for distance set to a Poisson distribution. To determine the homologous features that influenced the clustering patterns, sequences of each clade were subjected to multiple sequence alignment analyses.
To determine secondary structures, randomly selected representative sequences of each cluster in the guide phylogeny tree, serpins S1, 19, 22, 34 and 35 were subjected structural based sequence alignment using the web based STRAP (Structure based Alignment Program) http://www.charite.de/bioinf/strap/. Subsequently, secondary structures were superimposed on the structurally aligned sequences using the human α1-antitrypsin tertiary structure (1HP7, ) as template. The aligned sequences were subsequently viewed using the GeneDoc sequence analysis software for windows http://www.nrbsc.org/gfx/genedoc/index.html.
Gene specific PCR primers to amplify 16 serpins that have basic residues at their P1 site
Gene Specific PCR primers used to amplify transcripts of I. scapularis serpins that have non-basic residues their P1 sites
Summary, characterization of serpin inferred amino acid sequences
Serpin # ID
Trace archive EST/mate Accession #s
Whole genome shotgun accession #s
Best match (% identity)
IriS-4, DQ915844 (56)
ABJB010418850 and ABJB011135496
IriS-4, DQ915844 51)
IriS-4, DQ915844 (56)
IriS-4, DQ915844 (57)
ABJB011103238 and ABJB010459188
IscAAV80788 (97) REF1
Iri-1 DQ915842 (95)
HloS-2, AB162827 (50)
IriS-1 DQ915842 (30)
IriS-8 DQ915845 (98)
ABJB010938637 and ABJB010859665
HloS-2, AB162827 (50)
ABJB011069597 and ABJB010801481
ABJB010325534 and ABJB010620410
HloS-2, AB162827 (51)
ABJB010111671 and ABJB010862034
HloS-2, AB162827 (50)
HloS-2, AB162827 (44)
HloS-2, AB162827 (48)
IriS-2, DQ915843 (46)
IriS-2, DQ915843 (60)
IriS-2, DQ915843 (93)
IriS-1 DQ915842 (61)
As summarized in table three, all I. scapularis serpin sequences showed best matches exclusively to other tick serpin sequences. Except for S5, 9, 17 and 40 that respectively showed 96, 97, 99, 96 and 95% amino acid identity to I. ricinus serpin (Irs) 4 (accession [acc] # DQ915844), Irs1 (acc# DQ915842), Irs8 (acc# DQ915845), the I. ricinus immunosuppressant serpin (AJ269658, ) and Irs2 (acs# DQ915843), the identity levels between serpins in this study and other tick serpins ranged between 34 to 57% (Table 3). Two of the serpins genes in this study, S5 and 7 are 99 and 98% identical to previously annotated I. scapularis serpins, AAM93649 and AAV80788 respectively  (Table 3). At the time of preparing this manuscript, a preliminary annotation of the I. scapularis genome (version 0.5) was released http://www.vectorbase.org/index.php. When scanned against this database, 88% (40/45) of the serpin sequences produced matches with e-values of zero. The reader is notified here that because neither supercontig accession numbers nor the preliminary gene annotations are referenced in GeneBank, whole genome shotgun sequence and/or trace archive accession numbers are given in table three to provide the source for primary sequence data that was used in this study.
Motif scan analyses on the ScanProsite server  revealed that the serpin signature motif pattern PS00284 ([LIVMFY]-G- [LIVMFYAC]- [DNQ]- [RKHOS]- [PST]-F- [LIVMFY]- [LIVMFYC]-X- [LIVMFAH] was present in all sequences (results not shown). On the SignalP3.0 server http://www.cbs.dtu.dk/services/SignalP/, 89% (31/35) of the 35 full-length serpin sequences were predicted to have leader sequences (Table 3). Other important motifs include the putative N-glycosylation (NX [S/T]) sites in 38 of the 45 serpin sequences. Except for S11, which was predicted to have eight putative N-glycosylation sites, the rest have between one to three sites. We also found the cell attachment motif "RGD" in four sequences, S1, 7,17 and 42.
Analysis of genome sequence data has led to the discovery of large families of serpins in multicellular organisms, including 36 in humans, nine in Caenorhabditis elegans , 29 in Drosophila  and plants . On the Merops database, 17 and 18 serpin sequences are listed for Aedes aegypti and Anopheles gambiae. In ticks, documentation of multiple serpin encoding cDNAs has provided indirect evidence suggesting that ticks do encode large serpin families. For instance, we recently described 17 serpin cDNAs that are expressed by A. americanum during feeding . Here we describe the annotation and characterization of 45 serpin genes in the I. scapularis genome. The observed high amino acid sequence identity between I. scapularis and I. ricinus serpins was not surprising as these two ticks belong to the same genus. It is important to point out that eight of the 45 annotated were not represented in the preliminary peptide build at VectorBase. This finding may raise the prospect of error in annotations reported here. Interestingly, this possibility is ruled out, as ESTs of six of the eight serpins in this study (S1, 13, 23, 24, 33 and 39) were present in the trace archive database, while coding regions of serpins S26 and 32 were amplified from unfed and partially fed ticks (see figure 7).
The adoption of the consensus serpin secondary structures [11, 12] and the high conservation of the core amino acid residues  that underpin the structure and functionality of serpins strongly suggested that, I. scapularis serpins are functional. The majority of known serpins function as inhibitors of serine proteinases and hence the name . However others with activity against cysteine proteinases and those with no inhibitor functions have also been described . Although on the basis of sequence analysis , we are able to distinguish between inhibitory and non-inhibitory serpins, available data in this study is insufficient to specify their preferred proteinase substrates. Putative RCLs and scissile bonds of serpins in this study were predicted based on consensus that there are 17 amino acid residues between the start of the RCL hinge region and the scissile bond [11, 45, 51]. As some of the characterized serpins such as α2-antiplasmin  or serpin1k from Manducca sexta , utilize shorter RCLs, we are interpreting our predictions of RCLs and scissile bonds with caution.
Our finding that 82% of the full-length serpins in this study have signal peptides is consistent with observations in humans where the majority of known serpins exist in the extracellular form . Findings in this study are not unique, in that similar results were reported in A. americanum where 13 of the 17 putatively inhibitory serpins in A. americanum were predicted to be extracellular . From the perspective of finding target antigens for vaccine development, it is encouraging to note that the majority of I. scapularis serpins are putatively extracellular as they will be accessible to host immune response factors. Predictions based on sequence analysis, may not be consistent with the situation in vivo. However it is interesting to note that the four serpin sequences (S30, 32, 35 and 38) that were predicted to be intracellular sequences, based on lack of a signal peptide also posses "C" residues in the exposed regions of their RCL, a feature that has been observed in many intracellular serpins .
The use of alternatively spliced RCLs appears to be a wide spread strategy in insects to diversify the range of target proteinases that can be controlled by a single serpin gene [56–58]. A classic observation of this phenomenon is the serpin gene-1 of the tobacco hornworm, M. sexta, which has 12 different alternatively, spliced RCLs . Effectively this gives rise to 12 serpins regulating 12 different proteolytic pathways. An interesting structural feature among the 12 variants of M. sexta serpin gene 1 is that the first 336 amino acids are exactly identical with difference restricted to the RCL . The identity patterns among I. scapularis serpins sequences of where, stretches of identical and variable domains were spread across the entire sequence suggest that the diversity among serpins in this study may have arisen by duplication other than alternative splicing of RCL encoding exons.
From the perspective of understanding how the tick manipulates the mammalian host's defense against tick feeding, the finding of 18 serpins with basic residues at their P1 sites was exciting. In humans, this feature is associated with key serpins such as α1-antichymotrypsin, α1-antiplasmin, antithrombin III, protein C inhibitor and C1 inhibitor , which regulate important pathways such as inflammation, blood coagulation and complement activation. As these pathways are thought to represent the mammalian host's defense against tick feeding [59, 60], it is tempting to imagine that ticks could utilize some of these serpins to manipulate host defense to facilitate tick feeding and disease transmission. It is also possible that these serpins may not be directly be involved in facilitation of feeding. However, like their mammalian counterparts, they may be involved in regulation of important pathways in the tick, which if disrupted can affect the capacity of ticks as vectors.
Although the biological significance of gene expression data will be strengthened if correlated with protein production, our RT-PCR data provide some useful insights on probable biological roles of serpin genes in this study in the physiology of I. scapularis. Speculatively I. scapularis genes that were induced or up regulated after ticks had penetrated their host skin may signal their involvement in facilitation of blood meal up take. This is particularly true for the 11 genes that were induced in both SG and MG (S1, 2, 3, 4 and 7) or SG alone (S5, 6, 16, 26, 35 and 36) in ticks that were fed for 6–24 hrs. This period corresponds to the preparatory feeding phase when tick attaches onto host skin and creates its feeding lesion . During this period the tick must overcome inflammation and blood coagulation for it to successfully start the feeding process. Similarly, the group of serpin genes (S17, 23, 25, 32, 37, 38 and 40) that were constitutively expressed but their transcript abundance increased with tick feeding may also play a role in facilitating blood meal up take. For those genes that were constitutively expressed, S9, 12 and 27 in the MG, and S19 in both SG and MG, but were progressively down regulated as ticks continued to feed, could be involved in regulating physiological processes at the front end of tick feeding process. The expression of S10, 14, 18, 21 and 22, specifically or predominantly in the MG is interesting as it signals the potential role for these genes in facilitation of not only blood meal processing, but also in the crossing of the gut barrier by pathogens. From the perspective of our long-term interest to find tick proteins that are used by ticks to evade host immunity, it was exciting to note that some serpins were specifically expressed in SG. It will be exciting to investigate whether or not any of these genes are injected into the host during tick feeding. It is possible that the genes analyzed here could be expressed in multiple tick organs besides the SG and MG. However, from the perspective of our long-term interests to understand molecular mechanisms that underlie tick-host interactions, our analysis here is biased to biological functions of serpins at the SG and MG levels. The SG is critical for feeding and disease transmission while the MG is important for blood meal processing and the passage of pathogens from the blood meal into the tick hemolymph . Our future questions will thus address the role of the serpins in facilitation of tick feeding and blood meal processing.
Most known serpins are glycosylated [12, 16] and thus it is not surprising that 40 of the 43 serpin sequences that were tested are predicted to possess putative N-glycosyslation sites. As pool feeders, ticks accomplish feeding by lacerating small blood vessels and then sucking blood from the hematoma that forms in the feeding site . In order to complete feeding, ticks must prevent host blood from coagulating to ensure continued blood flow into the hematoma for the entire tick feeding period, which may last for over the 10–14 and 4–7 day feeding periods for adults and immature ticks respectively . From the perspective of solving the paradox of how the tick interferes with the coagulation cascade of its mammalian host, it was interesting to note that ~9% (4/45) of the sequences contain the RGD motif. Previous studies have shown that tick proteins containing the RGD motif such as variabilin  and savignygrin  were potent inhibitors of platelet aggregation. Platelet aggregation is critical to stopping bleeding of injured small blood vessels such as occurs at the tick bite . Thus, if functional, the RGD motif containing serpins could represent important molecular targets aimed at countering the ability of ticks to suppress the mammalian's host's blood clotting system. In addition to the anti-platelet aggregation function, RGD motif containing proteins are also involved in regulating cell-cell interactions in mammals , immunity in arthropods , and plants . From the foregoing, it is clear that the RGD motif containing serpins could also be involved in regulation of multiple other functions in the tick, besides platelet function at the tick feeding site. It is interesting to note that our RT-PCR data suggested that the expression of three (S1, 7 and 16)) of the four serpin RGD motif-containing genes was responsive to tick feeding activity (see figure 8).
When compared to the 3100 Mbp human genome that encodes at least 36 serpin genes  the 45 serpin genes identified in I. scapularis, which has a 2100 Mbp genome is considerably high. While the biological significance of the high number of serpin genes in the I. scapularis biology cannot be ascertained at present, we speculate that this may signal the significance of serpins in tick physiology. In light of lack of genome sequence data of many tick species, the sizes of tick serpin families will remain unknown. Ticks are diverse, both in terms of their biology  and their genome sizes [66–68]. Thus it is likely that the sizes of serpin families in ticks are going to vary. However, this study provides some insight on the probable sizes of serpin families in ticks.
We thank Dr. Shahid Karim (formerly at University of Rhode Island, now at the University of Southern Mississippi, Department of Biological Sciences) for providing the Ixodes scapularis cDNA that was used in the transcription analysis. Funding of this work was provided by start up funds (Texas A & M University) and research support from the National Institute of Allergy and Infectious Diseases (grant # AI074789) to AM.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.