Dramatic expansion of the black widow toxin arsenal uncovered by multi-tissue transcriptomics and venom proteomics
© Haney et al.; licensee BioMed Central Ltd. 2014
Received: 30 December 2013
Accepted: 8 May 2014
Published: 11 June 2014
Animal venoms attract enormous interest given their potential for pharmacological discovery and understanding the evolution of natural chemistries. Next-generation transcriptomics and proteomics provide unparalleled, but underexploited, capabilities for venom characterization. We combined multi-tissue RNA-Seq with mass spectrometry and bioinformatic analyses to determine venom gland specific transcripts and venom proteins from the Western black widow spider (Latrodectus hesperus) and investigated their evolution.
We estimated expression of 97,217 L. hesperus transcripts in venom glands relative to silk and cephalothorax tissues. We identified 695 venom gland specific transcripts (VSTs), many of which BLAST and GO term analyses indicate may function as toxins or their delivery agents. ~38% of VSTs had BLAST hits, including latrotoxins, inhibitor cystine knot toxins, CRISPs, hyaluronidases, chitinase, and proteases, and 59% of VSTs had predicted protein domains. Latrotoxins are venom toxins that cause massive neurotransmitter release from vertebrate or invertebrate neurons. We discovered ≥ 20 divergent latrotoxin paralogs expressed in L. hesperus venom glands, significantly increasing this biomedically important family. Mass spectrometry of L. hesperus venom identified 49 proteins from VSTs, 24 of which BLAST to toxins. Phylogenetic analyses showed venom gland specific gene family expansions and shifts in tissue expression.
Quantitative expression analyses comparing multiple tissues are necessary to identify venom gland specific transcripts. We present a black widow venom specific exome that uncovers a trove of diverse toxins and associated proteins, suggesting a dynamic evolutionary history. This justifies a reevaluation of the functional activities of black widow venom in light of its emerging complexity.
KeywordsRNA-Seq Latrotoxins Venom Mass spectrometry Transcriptomics Spider
Venomous taxa have evolved many times within the metazoa , and occur in both vertebrates and invertebrates. The venoms these diverse taxa produce are chemically complex and play key roles in organismal ecology, functioning in both predation and defense. Molecules contributing to the toxicity of venom are the focus of sustained effort aimed at characterizing their physiological roles and biochemical action, given their potential in pharmacological and biomedical applications . Venom toxins are often members of large gene families, and the study of their evolution can illuminate the roles of gene duplication, convergence and positive selection in generating the functional diversity of venoms . Determining the molecular diversity of venoms is the necessary first step in this process, yet few studies have utilized large scale approaches for venom characterization.
Spiders (Order Araneae) are the most species-rich venomous clade, with >44,000 described species , the overwhelming majority of which are venomous. Estimates of the number of unique venom peptides and proteins produced by members of this clade range from 1.5 - 20 million [5–7], significantly more than are estimated from other major clades of venomous invertebrates such as scorpions and cone snails [8, 9]. The venoms of some spiders have been extensively studied, largely due to the potential for isolating novel insecticidal toxins , and reasons of direct medical concern [10–13]. However, past work has focused on a small fraction of total spider species, and much of the molecular diversity of spider venoms remains to be discovered.
Spider venom proteins characterized to date belong to several different broad classes: enzymes (such as proteases, phospholipases and hyaluronidases), small linear cytolytic peptides, and neurotoxins with differing functionality and size range . The most commonly documented form of spider neurotoxin is a small (<15 kDa), disulfide-rich peptide. The disulfide bonds give rise to one of three typical structural motifs, the disulfide-directed β-hairpin, the Kunitz motif, or the inhibitor cystine knot (ICK), the last of which appears to be the most common amongst studied spider venoms . The compact structure of ICK peptides renders them highly resistant to the actions of proteases in envenomated organisms, contributing to their efficacy . Different ICK peptides specifically target different ion channels in the nervous system , and diverse sets of these peptides can occur within the venom of even a single species [14, 12], acting synergistically with one another and with small linear peptides [14, 16, 17] in a manner similar to the “toxin cabals” of cone snails .
The most prominent exception to this venom small-molecule (<15 kDa) dominance occurs in the black widow spiders (genus Latrodectus, family Theridiidae), which contain multiple large (>130 kDa) neurotoxic proteins known as latrotoxins, encoded by paralogous loci [19–26]. The best studied of the latrotoxins, α-latrotoxin, forms tetrameric complexes which bind to vertebrate presynaptic receptors and insert into neuronal membranes, forming calcium-permeable ion channels that stimulate massive neurotransmitter release . α-Latrotoxin is also widely known as the causative agent of the extreme pain associated with black widow bites. Other functionally characterized latrotoxins differ in their phyletic specificity, affecting the nervous systems of only insects or crustaceans. Latrotoxin proteins are accompanied in the venom by low-molecular weight peptides called latrodectins (also known as α-latrotoxin associated LMWPs) that may enhance latrotoxin toxicity [20, 28], although they exhibit no toxicity themselves .
Given the large number of peptides and proteins remaining to be discovered in the venoms of spider species, next generation RNA sequencing (RNA-Seq) methods are particularly well suited for rapidly obtaining a comprehensive inventory of venom components, as well as an improved functional understanding of the venom gland. The high-throughput of next-generation sequencing allows for profiling of transcripts over a wide range of abundance , providing an accurate picture of differential expression across tissues within an organism. A multi-tissue approach allows for the identification of transcripts with highly biased expression in the venom gland, whose products are candidates for function in the venom as toxins, or in venom production. Venom gland specific sequences can then be subjected to bioinformatic and evolutionary analyses to discover novel toxins and to better understand their origins and the mechanisms generating their diversity. The insights provided by transcriptomic data can be greatly enhanced by proteomics approaches which permit a direct examination of the peptide and protein composition of venoms, typically with methods coupling liquid chromatography based separation to mass spectrometry . These methods have begun to be applied to a range of species, leading to an expansion of the number of venom peptide and protein toxins known from arachnids [31, 32].
In this study we present an integrated set of multi-tissue transcriptomic and proteomic data from the Western black widow spider, Latrodectus hesperus, to investigate the composition and evolution of its venom. The venom of this species remains largely unexplored, despite the relevance of black widows to human health and the importance of their venom in studies of vertebrate neurotransmission [33–35]. We identify transcripts with biased expression in the venom gland relative to other tissues, and potential toxin transcripts in the venom gland exome, using bioinformatics-based approaches. We also explore the relative abundance of transcripts specific to the venom gland and quantify the representation of the biological functions and processes in which these transcripts take part. We identify prominent toxin families, and perform phylogenetic analyses to investigate their evolution. Lastly, we explicitly identify the secreted peptide and protein component of the venom using a mass spectrometric based proteomic approach. Our transcriptome and proteome provide complementary data in order to separate the secreted venom components from the cast of molecules that support toxin production within the gland.
Bioinformatic functional categorization of the L. hesperusvenom gland transcriptome
Summary of groups of toxins and enzymes in L. hesperus venom gland specific transcripts
Facilitate latrotoxin action
Form calcium channels, cause neurotransmitter release in target organism
Potential ion channel action
Calcium channel blocker, smooth muscle paralysis
Tissue degradation/spreading factor
Alter function of neuronal ion channels
Tissue degradation/ spreading factor
Breakdown of arthropod exoskeleton
Breakdown of extracellular matrix/spreading factor
Overrepresented GO terms in the L. hesperus venom gland specific transcript set
other organism cell membrane
other organism membrane
other organism presynaptic membrane
integral to membrane
viral genome replication
structural constituent of cuticle
RNA-directed RNA polymerase activity
serine-type endopeptidase activity
mRNA methyltransferase activity
carbonate dehydratase activity
Of the 695 VSTs, 414 had at least one protein domain prediction from InterProScan, including 179 sequences with no significant BLAST hit at UniProt. Among all protein domains identified more than five times amongst the VSTs, ankyrin domains were most common, while leucine-rich repeat, low density lipoprotein receptor class A, immunoglobulin, chitin-binding, helix loop helix, latrotoxin C-terminal, venom allergen 5, serine protease and metalloprotease domains also commonly occurred in predicted proteins from the VST set (Additional file 2, Additional file 3).
L. hesperustoxin diversity and evolution
We searched the entire translated L. hesperus transcriptome to identify other sequences with homology to latrotoxins, but lacking venom gland biased expression. As ankyrin domains are common components of many non-homologous proteins with diverse functions, we limited the BLASTp search to the conserved and distinct N-terminus of the latrotoxin protein, which lacks ankyrin repeats. Two hits were recovered. However, read count data indicate that they lack expression in tissues other than venom gland, and were not included as VSTs because they did not reach the minimum read count threshold for inclusion. These two sequences were not included in phylogenetic analyses, as they did not meet the minimum length requirement.
ICK toxins and other small proteins with potential toxicity
Summary of putative toxins with no BLAST hit
SP, TM, NC, CC
SP, TM, NC
TM, NC, CP
TM, NC, CP
CRISP proteins and enzymes
Transcripts with homology to several types of enzymes were found in the L. hesperus VST set. A total of two hyaluronidases, a single chitinase, and 3 lipases (phospholipase C, AB hydrolase) were identified. A total of 7 distinct serine protease sequences and 8 M13 metalloproteases were found among the 695 in the venom gland specific set. In addition, single sequences with homology to O-sialoglycoprotein endopeptidases and gamma glutamyl transpeptidases were recovered (Additional file 1).
Clustering analysis of venom-gland specific proteins
There were several clusters with members homologous to known toxins. Under the most permissive clustering criterion, the largest of these groups had 34 members, all but three of which had best BLASTx hits to latrotoxins in the UniProt database, with the other sequences likely clustered due to weak similarity in the ankyrin repeat regions. A second group contained four additional latrotoxin sequences. Membership in the larger group was highly sensitive to the stringency of the clustering parameters, as at 35% overlap and 35% identity, only 22 sequences remained, all with homology to latrotoxins, and at 45% overlap and 45% identity this cluster had fragmented into several smaller clusters, the largest of which contained six members (Additional file 6). The four sequences with homology to ICK toxins also formed a group at the lowest clustering stringency, but this group appeared more coherent: these sequences remained clustered as stringency was increased until 75% overlap at 75% identity was reached.
Other clusters containing more than five members at the most permissive threshold (30% overlap, 30% sequence identity), and representing putative venom gland expressed families, included sequences with homology to cuticular proteins (18 members), M13 metalloproteases (11), leucine-rich repeat (LRR) proteins (7), and serine proteases (6), while the two CRISP proteins identified by BLAST homology clustered with an uncharacterized protein.
Highly expressed venom gland transcripts
Proteomic and bioinformatic analysis of secreted components
Predicted neurotoxin proteins identified in venom
Best BLAST hit
# unique peptides
Approximately 12.5% (87) of protein translations from the L. hesperus VSTs possessed a predicted signal sequence. If only the 313 proteins with a putative methionine start codon are considered, this figure rises to 24.9%. Amongst the toxin homologs in this set, none of the predicted latrotoxin proteins contained a typical eukaryotic signal sequence, while four of four ICK toxins, both CRISP toxins, and both latrodectins, contained a signal sequence, as did all seven other potential ICK toxins with no significant BLAST homology. Five proteases (four serine proteases and one metalloprotease) also had a predicted signal sequence. Thirty-six of the 49 predicted proteins from VSTs detected in venom by mass spectrometry contained an M-start, of which 22 (61%) had predicted signal sequences, consistent with their function as a venom component, as opposed to having an intracellular function.
Spiders are the most species-rich clade of venomous metazoans, and it is likely that millions of toxic compounds remain to be identified in their venom [7, 45]. Next generation transcriptomic and proteomic methods, when used in combination, offer a powerful approach to cataloguing and understanding this complexity, as well as its evolution. By applying these methods to Latrodectus hesperus, in the context of a multi-tissue expression analysis, we have identified 695 transcript sequences with strongly biased venom gland expression in this species and confirmed the presence of 61 proteins in its venom. The inferred functions of these sequences indicate that the venom of black widow spiders is extremely diverse at the molecular level, and is the product of a complex evolutionary history.
Molecular diversity in the L. hesperusvenom gland and functional implications
We found that only 22% of the 695 L. hesperus VSTs shared some sequence overlap at the protein level through BLASTclust analyses, implying that a wide diversity of proteins contribute to venom gland function. Nevertheless, we estimated that at least 20 distinct latrotoxin paralogs are expressed in the black widow venom gland, constituting by far the largest gene family in the venom gland specific set of sequences. The latrotoxin proteins predicted from these transcripts were divergent in amino acid sequence and motif organization (Figure 2, Figure 3), and thus it is likely that they represent distinct loci. While seven latrotoxins have been assigned names based on their taxonomic specificity (5 insect-specific, 1 vertebrate-specific, 1 crustacean-specific) in the related species L. tredecimguttatus, the sequence of only four of these seven functionally characterized latrotoxins are definitively known [19–22]. We identified orthologs of these four functionally characterized latrotoxins in our transcriptome, but have also quintupled the number of sequenced latrotoxin paralogs in L. hesperus. While the functionality of these novel latrotoxins is unknown, some of these sequences have best BLASTx hits to the vertebrate-specific α-latrotoxin. Although functional testing is a requirement for confirmation, some of these sequences could represent heretofore unknown vertebrate specific neurotoxins. Such discoveries are significant because vertebrate neurotoxins have important applications in neurophysiological research, considering the fundamental role of α-latrotoxin in deciphering the molecular mechanisms of neurotransmission. The extensive diversity found among the vertebrate receptors of latrotoxins such as neurexins and latrophilins [46–48], suggests that some of these new latrotoxin variants may interact specifically with different receptor isoforms and could play important roles in their characterization. The variable number of ankyrin domains predicted from nearly full-length sequences in this study could contribute to altered functionality, including the ability of latrotoxin monomers to tetramerize, given the role of ankyrin repeats in protein-protein interactions .
Small cysteine rich neurotoxic proteins with the inhibitor cystine knot motif dominate the venoms of many spider species . Our BLAST analyses identified four putative ICK toxin sequences amongst the L. hesperus VSTs and one was present in the exuded venom. In addition to these ICK toxins, other small cysteine-rich sequences were venom gland specific in expression and some were present in the venom. Some of these toxins may also be ICK toxins as they possess a predicted ICK domain, while others may represent distinctly different molecular scaffolds, although further research is necessary on their structure and function. The presence of both latrotoxins and ICK toxins in Latrodectus venom also suggests novel avenues in research as to how small, selective ion-channel toxins may act synergistically with the non-selective cation channels created by latrotoxin pores in the presynaptic membrane [49, 50]. Three additional cysteine-rich proteins with homology to CRISP toxins (or found by clustering analyses) were also strongly biased towards expression in L. hesperus venom gland and present in the venom. CRISP family members were also found to be expressed in the venom gland of the related species L. tredecimguttatus, indicating that this toxin type may be more widespread within the genus.
Among the other venom gland specific transcripts were multiple sequences with homology to proteins with nervous system related functions (Additional file 1). Examples of these included bruchpilot from Drosophila melanogaster, involved in synaptic plasticity and regulation  and neural cell adhesion molecule L1, the Drosophila ortholog of which plays a critical role in neural development . L1-type cell adhesion molecules also play a role in presynaptic organization, and often interact with ankyrin repeat containing proteins . Given the importance of the ankyrin repeat-containing latrotoxins in black widow venom, the venom gland biased expression of these transcripts is intriguing, although their links to the action of latrotoxins are speculative at this point. Lastly, eight sequences with homology to leucine-rich repeat (LRR) proteins were also venom gland specific, and a number of these proteins play key roles in neuronal development and maintenance in both invertebrates and vertebrates [54, 55]. These results suggest that homologs of spider proteins involved in neuronal development or function are being co-opted for venom expression, or the potential for molecular mimicry of neuronal proteins by unrelated venom gland expressed sequences.
Evolutionary diversification of black widow venom toxins
The development of pools of diverse toxin molecules in venom often involves the expansion of gene families . This process can generate large numbers of distinct transcripts and peptides in certain toxin classes. In cone snails, species may produce from 100–300 small ICK peptides known as conotoxins . Conotoxins are notable for their rapid evolution and the extreme divergence among paralogs within a species at the amino acid level . Similarly, sequencing of spider venom gland transcripts has revealed single species ICK toxin libraries containing more than 100 distinct members [12, 57]. While ICK toxin sequences can also differ dramatically among spiders, clades of more closely related sequences also occur in some spider species, and likely represent more recent, species-specific gene family diversification . This may be true in the case of the L. hesperus sequences with BLAST homology to known ICKs. Yet, we also found seven additional ICK motif containing sequences, which were more diverse in length, signal sequence and cysteine arrangement, suggesting the recruitment of multiple ICK motif encoding proteins for black widow venom expression.
Latrotoxins, while the most diverse toxin type in this study, as a whole appear to be limited in phylogenetic distribution, and the origins of these toxins are obscure. Only one paralog (α-latrotoxin) has been recognized outside the genus Latrodectus, and to date latrotoxins are only known from three genera of Theridiidae . Although repeated ankyrin domains are found in a wide range of unrelated proteins of various functions , the latrotoxin N-terminal region appears to be somewhat unique to latrotoxins. A BLASTp search with latrotoxin N-terminal sequences (first 320 amino acids) against the non-venom gland specific L. hesperus transcriptome did not find any significant hits. However, we performed a BLASTp search with the L. hesperus α-latrotoxin N-terminal region against NCBI’s nr database, and found a significant hit to a hypothetical protein from Diplorickettsia massiliensis (Accession WP_010598965; e-score 1e-16), an obligate intracellular bacteria isolated from the tick Ixodes ricinus, which is a human disease vector. In addition to N-terminal region sequence similarity, the overall length (1286 amino acids) and possession of multiple ankyrin repeats of this bacterial protein are reminiscent of latrotoxins. A recent study by Zhang et al.  described similarities between the C-terminal domain of latrotoxins and proteins from arthropod bacterial endosymbionts such as Wolbachia and Rickettsiella, and suggested that spider latrotoxins were acquired via lateral gene transfer from bacteria. Alternatively, Garb and Hayashi  suggested a possible link between latrotoxins and dTRP1a, a Drosophila calcium permeable transmembrane channel protein involved in sensitivity to temperature and chemical irritation that contains numerous ankyrin repeats. As genome sequences for Latrodectus and related theridiid species become available, these questions regarding the evolutionary origin of latrotoxins may become answerable.
Given the broader phylogenetic distribution of α-latrotoxin outside of L. hesperus, it will be important to determine if the additional latrotoxins we uncovered have orthologs in closely related species having venom that is less toxic to vertebrates when compared to venom from black widows. Phylogenetic analyses of the latrotoxin family across multiple species may illuminate the ecological adaptations of widow spiders, particularly in terms of understanding the functional utility of latrotoxins for a generalist predator of diverse insects and small vertebrates. Three insect specific latrotoxins previously identified in protein separation studies  may be represented in the additional latrotoxins we have recovered, but the functional and taxonomic specificity of the others remains to be determined. Such functional analyses will be necessary to reconstruct whether ancestral latrotoxins have undergone a functional shift from arthropod to vertebrate specificity or vice versa. A comprehensive latrotoxin phylogeny across species could also determine whether gene family expansions are lineage-specific, and correlate with increased venom toxicity and diet breadth.
In contrast to latrotoxins and ICK toxins, the cysteine-rich secretory proteins (CRISPs) are not particularly diverse within the L. hesperus VSTs, but we were able to identify three additional transcripts with homology to CRISPs that do not show venom gland specificity. A CRISP phylogeny including diverse venomous, non-venomous and hematophagous arthropods indicates a dynamic evolutionary history for this gene family, with multiple recruitments to function in venom or salivary glands, including a potentially recent CRISP protein recruitment for venom function in Latrodectus. A similar conclusion was reached with a less densely sampled, but broader taxonomic selection of CRISPs , and more extensive arthropod transcriptomic and genomic resources may identify the gene duplications and changes in tissue-specific expression patterns leading to this pattern.
Highly expressed transcripts, venom composition and secretory mechanisms
Among the venom gland specific transcript set, overall expression is dominated by putative neurotoxins and their associated molecules, although they make up only a minority of the distinct transcripts. Strikingly, the proportion of transcripts that latrodectins represent is similar to that for all latrotoxin sequences, although latrodectin sequence diversity was at least ten times lower than that of latrotoxins. This suggests that the role of latrodectins in facilitating latrotoxin toxicity may be the same for all latrotoxins, including novel forms identified in this study. Protease expression also accounts for a substantial proportion of VST abundance, and several proteases were amongst the most abundant transcripts in the venom gland specific set.
Proteomic analysis of L. hesperus venom also indicates that at least some proteases are secreted, as together with other enzymes (hyaluronidases and chitinase), they were identified in L. hesperus venom. Hyaluronidases are found in venom from a range of spider species , but whether proteases are an active component of venom in spiders has been a subject of some debate, as some authors argue that protease activity in venom is due to digestive secretion contamination . Our finding of proteases with venom gland specificity, together with the presence of a subset of proteases in the venom, some with predicted secretory signal sequences, may be related to a dual function. Some L. hesperus proteases may in fact function in prey immobilization, either acting as toxin spreading factors, or in hemostasis disruption, as is the case in snakes [7, 61], while others may be involved in processing toxin preproproteins into mature toxins .
Our mass spectrometry analyses indicated that the majority of the neurotoxin transcripts specific to the venom gland encoded peptides and proteins that were secreted into the venom. Predicted neurotoxins that were not present in collected venom may reflect the variability inherent in venom-related gene expression, as data acquisition for the transcriptome and proteome was performed on different individuals. It may also reflect variation in the processes of translation or secretion among individual spiders. Overall, the limited number of venom gland specific genes whose products are found in the venom itself is rather unexpected, given the purported mechanism of L. hesperus secretion into the venom gland lumen, in which the secretory cells disintegrate and expel the entirety of their contents [23, 62]. Yet there would appear to be some filtering mechanism that is selective against most proteins from VSTs, as few appear in the venom itself. The possession of a signal sequence may constitute such a filter. While only a minority (25%) of complete predicted proteins from VSTs have a predicted signal sequence, the majority of proteins (67%) identified in the venom by mass spectrometry have predicted signals. Latrotoxins seem to be an exception, lacking a typical eukaryotic secretion signal, yet being common in the venom itself. However, previous work has indicated the presence of a cleaved sequence on the N-terminus that could potentially function as a non-canonical secretory signal .
In this study, next-generation RNA sequencing of multiple tissues coupled to proteomics has provided a wealth of insight into venom gland expression and the molecular complexity of Latrodectus venom. Numerous new variants of known toxins were identified, and potentially novel toxins of unknown function recovered, suggesting the need for a fundamental reconsideration of the functional activities of black widow spider venom in natural prey and in human envenomation. The extreme pain associated with black widow spider bites is typically accompanied by additional symptoms (e.g., diaphoresis, hypertension, paresthesia, fasiculations ), which in addition to α-latrotoxin, may be caused by other toxins uncovered in this study. This expanded toxin library can also be mined for novel molecular probes or drug leads. Of particular interest for neurophysiology is the large number (≥20) of previously unknown latrotoxin variants and 11 ICK motif containing proteins discovered in this study, which may offer new avenues for dissecting the molecular mechanism of neurotransmitter release and for characterizing neuronal ion channels. These functionally diverse latrotoxins comprise a large venom gland expressed gene family with a highly restricted phylogenetic distribution, suggesting they have undergone a rapid evolutionary expansion in black widow spiders.
L. hesperustranscriptome sequencing and assembly
Paired-end Illumina sequencing was performed by the Genomics Core at the University of California, Riverside, on cDNA libraries generated using the Illumina mRNA sequencing sample preparation kit with mRNA from three tissue types: (1) venom gland, (2) silk glands and (3) cephalothorax minus venom glands, each in a single lane . After trimming of adapters and low quality sequence, reads from each individual library were separately assembled using Trinity , and subjected to CAP3  to merge transcripts under default parameters and reduce redundancy in the transcript set, producing contigs with the tissue type as a prefix (i.e. venom_Contig0000). CAP3 was then applied a second time to merge transcripts across tissue-specific assemblies and produce a set of contigs with no prefix (i.e. Contig0000) as well as retaining contigs from the tissue specific CAP3 assemblies with a tissue-specific prefix, together with non-merged transcripts that retain the original Trinity nomenclature (i.e. venom_comp00000_c0_seq0) with a prefix indicating their tissue origin . All sequences were screened for homology to the UniProt database using BLASTx with an e-value cutoff of 1e-5. Open reading frames (ORFs) for all transcripts were predicted in all six frames using GetORF, filtering out ORFs less than 90 bp in length. A best protein prediction for each contig was generated with a custom Perl script by (1) extracting the longest reading frame in the same frame as the best BLASTx hit, or (2) by extracting the longest reading frame for contigs lacking a BLASTx hit. However, proteins with a methionine start codon were selected if bounded by stop codons on the 5′ and 3′ ends, indicating the potential for a full-length ORF, and if the M-start ORF was at least 75% of the longest predicted ORF.
After CAP3 assembly at the nucleotide level some transcripts that produced identical amino acid sequences persisted in the data set. Hence we further filtered the transcript set to produce a non-redundant set of proteins and their associated nucleotide sequences. BLASTclust  was employed to identify sets of protein sequences in which members were identical over their entire region of overlap. In cases in which proteins varied in length within a cluster, all but the longest member of the cluster was removed from both the protein and nucleotide sequence libraries using a custom Perl script. Otherwise, the first member was arbitrarily chosen to represent that cluster.
Identification of venom gland specific transcripts
To identify venom gland specific transcripts (VSTs), RSEM  was used to estimate transcript abundances by mapping reads from the venom, cephalothorax and silk libraries against the assembled and filtered non-redundant transcriptome using Bowtie with default parameters . Expected read counts per million (eCPM) in each tissue for each transcript were calculated and the distribution of the log of the ratio of eCPM of venom gland to silk and venom gland to cephalothorax for each transcript was plotted. Transcripts for which venom gland expression of greater than one eCPM was observed, with zero eCPM in the other two tissues, were identified. Further VSTs were identified as those with a ratio of venom eCPM/silk eCPM and venom eCPM/cephalothorax eCPM in the upper 2.5% of the distribution of the remaining transcripts, and at least one eCPM in venom. Together, transcripts from these two categories constitute the venom gland specific set. Fragments per kilobase per million reads (FPKM) values were also calculated in RSEM for comparing abundances amongst VSTs.
Functional analysis of venom gland specific transcripts
GO terms were retrieved from UniProt-GOA for the best BLASTx hit to each sequence and used to annotate the L. hesperus sequence set. Additional GO terms were mapped by searching the Pfam-A database for sequence homology to predicted protein sequences using the probabilistic Hidden Markov models implemented in HMMER 3.0 .
To correct for potential transcript length bias in differential expression in RNA-Seq experiments, GOseq  was performed to find overrepresented gene ontology categories in the set of venom gland specific transcripts to identify biological processes and functions important in the venom gland. This method corrects for the violation of the assumption that all genes are equally likely to be identified as differentially expressed, an assumption that does not hold for read count based methods such as RNA-Seq, and the violation of which causes false positives for categories with an excess of long genes in GO overrepresentation analysis.
Identification of toxins in the venom gland specific set
Sequences with homology to known toxins were identified in the UniProt BLASTx results using text searches. We identified the potential presence of families of toxin and other transcripts specifically expressed in the venom gland of L. hesperus by clustering predicted protein sequences using the BLASTclust algorithm under both permissive and stringent criteria. The BLASTclust output was parsed with a custom Perl script to calculate group sizes, group numbers and group composition by appending BLASTx results.
InterProScan  was used on predicted proteins to identify the domain architecture of gene products. ClanTox  was used to predict the potential toxicity of translated proteins. The algorithm used takes into account features of the frequency and distribution of cysteine residues in the primary sequence from known peptide toxins . ClanTox produces four categories of toxin predictions based on statistical confidence ranging from N = probably not toxin-like to P3 = toxin-like. Knoter1D was used to predict the connectivity of inhibitor cystine knot structures (also referred to as knottins) from the primary sequence of peptides and proteins . Given that toxins function within an extracellular secretion, predicted proteins were scanned for the presence of a signal sequence indicating targeting to the secretory pathway using SignalP 4.1 .
Venom collection and mass spectrometry
We determined the proteins present in the venom of L. hesperus by collecting venom exuded by anesthetized adult females subject to electrostimulation with a 10 V current via a capillary tube, and subsequently diluting the venom in 5 μL of distilled water. The trypsin-digested diluted venom was analyzed by MudPIT analysis , performed by the Arizona Proteomics Consortium at the University of Arizona. This method uses a multidimensional liquid chromatography separation followed by tandem mass spectrometry (LC-MS/MS) and the Sequest algorithm  to identify digested peptides in L. hesperus venom secretions. Scaffold software (Proteome Software, Portland, Oregon) was then used to map peptides found in venom to the predicted protein sequences from the L. hesperus assembled transcriptome, together with L. hesperus venom gland ESTs, and all L. hesperus protein sequences available at NCBI, to identify secreted products. Only sequences with protein and peptide probabilities in excess of 95%, and with at least two mapped unique peptides were considered as present in venom.
Alignments of amino acid sequences were constructed with the COBALT  web server at NCBI using default settings for gap penalties and query clustering, and with RPS BLAST enabled. Alignments were trimmed manually or with trimAl 1.2  using the automated1 setting to remove regions with an excessive amount of missing data or poorly aligned regions. Phylogenetic trees were constructed for members of specific gene families using Bayesian analysis of amino acid sequences in Mr. Bayes 3.2.2  sampling across fixed amino acid rate matrices. Two simultaneous runs of 1,000,000-5,000,000 generations using a single Markov chain were performed. Convergence was achieved in all analyses as determined by an average standard deviation of split frequencies < 0.01, effective sample sizes for all parameters > 100, and potential scale reduction factors for all parameters of approximately 1. The first 25% of trees sampled were discarded as burn-in and a 50% majority rule consensus was constructed for each analysis using posterior probability (PP) as a measure of clade support. Maximum-likelihood trees for the same set of gene families were found using RAxML  using the BLOSUM62 substitution rate matrix with gamma distributed rate variation among sites. 1000 bootstrap pseudoreplicates were performed to assess support for clades.
Availability of supporting data
All reads and the final transcriptome described in the manuscript are available under BioProject accession PRJNA242358. Illumina sequence reads have been deposited at NCBI’s SRA archive under the following numbers (Venom: Sample: SAMN2720862, Experiment: SRX512000, Reads: SRR1219652; Cephalothorax: Sample: SAMN2708870, Experiment: SRX511999, Reads: SRR1219650; Silk: Sample: SAMN2720861, Experiment: SRX512001, Reads: SRR1219665). Venom gland ESTs are available under NCBI accession numbers JZ577614-JZ578096 .
Venom gland specific transcript
Inhibitor cystine knot
Cysteine-rich secretory protein
Expected counts per million
Multidimensional protein identification technology
Expressed sequence tag
Open reading frame
Fragments per kilobase per million reads
We appreciate the analytical and intellectual contributions of Alex Lancaster, Susan Corbett, Ryan Fitzpatrick, Peter Arensburger and George Tsaprailis towards completing this manuscript. This work was supported by the National Institutes of Health (1F32GM083661-01 and 1R15GM097714-01 to JG; F32 GM78875-1A to NA), and National Science Foundation (IOS-0951886 to NA, IOS-0951061 to CH).
- Fry BG, Roelants K, Champagne DE, Scheib H, Tyndall JDA, King GF, Nevalainen TJ, Norman JA, Lewis RJ, Norton RS, Renjifo C, de la Vega RCR: The toxicogenomic multiverse: convergent recruitment of proteins into animal venoms. Annu Rev Genomics Hum Genet. 2009, 10: 483-511. 10.1146/annurev.genom.9.081307.164356.PubMedView ArticleGoogle Scholar
- Rash LD, Hodgson WC: Pharmacology and biochemistry of spider venoms. Toxicon. 2002, 40: 225-254. 10.1016/S0041-0101(01)00199-4.PubMedView ArticleGoogle Scholar
- Casewell NR, Wüster W, Vonk FJ, Harrison RA, Fry BG: Complex cocktails: the evolutionary novelty of venoms. Trends Ecol Evol. 2013, 28: 219-229. 10.1016/j.tree.2012.10.020.PubMedView ArticleGoogle Scholar
- Platnick NI: The World Spider Catalog, Version 14.0. [http://research.amnh.org/entomology/spiders/catalog/index.html]
- King GF: The wonderful world of spiders: preface to the special Toxicon issue on spider venoms. Toxicon. 2004, 43: 471-475. 10.1016/j.toxicon.2004.02.001.PubMedView ArticleGoogle Scholar
- Escoubas P, Sollod B, King GF: Venom landscapes: mining the complexity of spider venoms via a combined cDNA and mass spectrometric approach. Toxicon. 2006, 47: 650-663. 10.1016/j.toxicon.2006.01.018.PubMedView ArticleGoogle Scholar
- King GF, Hardy MC: Spider-venom peptides: structure, pharmacology, and potential for control of insect pests. Annu Rev Entomol. 2013, 58: 475-496. 10.1146/annurev-ento-120811-153650.PubMedView ArticleGoogle Scholar
- Possani L: Peptides and genes coding for scorpion toxins that affect ion-channels. Biochimie. 2000, 82: 861-868. 10.1016/S0300-9084(00)01167-6.PubMedView ArticleGoogle Scholar
- Olivera BM, Cruz LJ: Conotoxins, in retrospect. Toxicon. 2001, 39: 7-14. 10.1016/S0041-0101(00)00157-4.PubMedView ArticleGoogle Scholar
- Isbister GK, White J: Clinical consequences of spider bites: recent advances in our understanding. Toxicon. 2004, 43: 477-492. 10.1016/j.toxicon.2004.02.002.PubMedView ArticleGoogle Scholar
- Vassilevski AA, Kozlov SA, Grishin EV: Molecular diversity of spider venom. Biochemistry (Moscow). 2009, 74: 1505-1534. 10.1134/S0006297909130069.View ArticleGoogle Scholar
- Zhang Y, Chen J, Tang X, Wang F, Jiang L, Xiong X, Wang M, Rong M, Liu Z, Liang S: Transcriptome analysis of the venom glands of the Chinese wolf spider Lycosa singoriensis. Zoology. 2010, 113: 10-18. 10.1016/j.zool.2009.04.001.PubMedView ArticleGoogle Scholar
- He Q, Duan Z, Yu Y, Liu Z, Liu Z, Liang S: The venom gland transcriptome of Latrodectus tredecimguttatus revealed by deep sequencing and cDNA library analysis. PLoS ONE. 2013, 8: e81357-10.1371/journal.pone.0081357.PubMed CentralPubMedView ArticleGoogle Scholar
- Kuhn-Nentwig L, Stöcklin R, Nentwig W: Venom composition and strategies in spiders. Adv Insect Physiol. 2011, London: Elsevier, 40: 1-86.Google Scholar
- Saez NJ, Senff S, Jensen JE, Er SY, Herzig V, Rash LD, King GF: Spider-venom peptides as therapeutics. Toxins. 2010, 2: 2851-2871. 10.3390/toxins2122851.PubMed CentralPubMedView ArticleGoogle Scholar
- Adams ME, Herold EE, Venema VJ: Two classes of channel-specific toxins from funnel web spider venom. J Comp Physiol A. 1989, 164: 333-342. 10.1007/BF00612993.PubMedView ArticleGoogle Scholar
- Wullschleger B: Spider venom: enhancement of venom efficacy mediated by different synergistic strategies in Cupiennius salei. J Exp Biol. 2005, 208: 2115-2121. 10.1242/jeb.01594.PubMedView ArticleGoogle Scholar
- Olivera BM: E.E. Just Lecture, 1996. Conus venom peptides, receptor and ion channel targets, and drug design: 50 million years of neuropharmacology. Mol Biol Cell. 1997, 8: 2101-2109. 10.1091/mbc.8.11.2101.PubMed CentralPubMedView ArticleGoogle Scholar
- Kiyatkin NI, Dulubova IE, Chekhovskaya IA, Grishin EV: Cloning and structure of cDNA encoding α-latrotoxin from black widow spider venom. FEBS Lett. 1990, 270: 127-131. 10.1016/0014-5793(90)81250-R.PubMedView ArticleGoogle Scholar
- Kiyatkin N, Dulubova I, Grishin E: Cloning and structural analysis of alpha-latroinsectotoxin cDNA. Abundance of ankyrin-like repeats. Eur J Biochem. 1993, 213: 121-127. 10.1111/j.1432-1033.1993.tb17741.x.PubMedView ArticleGoogle Scholar
- Dulubova IE, Krasnoperov VG, Khvotchev MV, Pluzhnikov KA, Volkova TM, Grishin EV, Vais H, Bell DR, Usherwood PN: Cloning and structure of delta-latroinsectotoxin, a novel insect-specific member of the latrotoxin family: functional expression requires C-terminal truncation. J Biol Chem. 1996, 271: 7535-7543. 10.1074/jbc.271.13.7535.PubMedView ArticleGoogle Scholar
- Danilevich VN, Luk’ianov SA, Grishin EV: Cloning and structure of gene encoded alpha-latrocrustotoxin from the black widow spider venom. Bioorg Khim. 1999, 25: 537-547.PubMedGoogle Scholar
- Ushkaryov Y: α-Latrotoxin: from structure to some functions. Toxicon. 2002, 40: 1-5. 10.1016/S0041-0101(01)00204-5.PubMedView ArticleGoogle Scholar
- Rohou A, Nield J, Ushkaryov YA: Insecticidal toxins from black widow spider venom. Toxicon. 2007, 49: 531-549. 10.1016/j.toxicon.2006.11.021.PubMed CentralPubMedView ArticleGoogle Scholar
- Graudins A, Little MJ, Pineda SS, Hains PG, King GF, Broady KW, Nicholson GM: Cloning and activity of a novel α-latrotoxin from red-back spider venom. Biochem Pharmacol. 2012, 83: 170-183. 10.1016/j.bcp.2011.09.024.PubMedView ArticleGoogle Scholar
- Garb JE, Hayashi CY: Molecular evolution of alpha-Latrotoxin, the exceptionally potent vertebrate neurotoxin in black widow spider venom. Mol Biol Evol. 2013, 30: 999-1014. 10.1093/molbev/mst011.PubMed CentralPubMedView ArticleGoogle Scholar
- Ushkaryov YA, Volynski KE, Ashton AC: The multiple actions of black widow spider toxins and their selective use in neurosecretion studies. Toxicon. 2004, 43: 527-542. 10.1016/j.toxicon.2004.02.008.PubMedView ArticleGoogle Scholar
- Pescatori M, Bradbury A, Bouet F, Gargano N, Mastrogiacomo A, Grasso A: The cloning of a cDNA encoding a protein (latrodectin) which co-purifies with the alpha-latrotoxin from the black widow spider Latrodectus tredecimguttatus (Theridiidae). Eur J Biochem. 1995, 230: 322-328. 10.1111/j.1432-1033.1995.0322i.x.PubMedView ArticleGoogle Scholar
- Volkova TM, Pluzhnikov KA, Woll PG, Grishin EV: Low molecular weight components from black widow spider venom. Toxicon. 1995, 33: 483-489. 10.1016/0041-0101(94)00166-6.PubMedView ArticleGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101/gr.079558.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Yuan C, Jin Q, Tang X, Hu W, Cao R, Yang S, Xiong J, Xie C, Xie J, Liang S: Proteomic and peptidomic characterization of the venom from the Chinese bird spider, Ornithoctonus huwena Wang. J Proteome Res. 2007, 6: 2792-2801. 10.1021/pr0700192.PubMedView ArticleGoogle Scholar
- Tang X, Zhang Y, Hu W, Xu D, Tao H, Yang X, Li Y, Jiang L, Liang S: Molecular diversification of peptide toxins from the tarantula Haplopelma hainanum (Ornithoctonus hainana) venom based on transcriptomic, peptidomic, and genomic analyses. J Proteome Res. 2010, 9: 2550-2564. 10.1021/pr1000016.PubMedView ArticleGoogle Scholar
- Maretić Z: Latrodectism: variations in clinical manifestations provoked by Latrodectus species of spiders. Toxicon. 1983, 21: 457-466.PubMedView ArticleGoogle Scholar
- Isbister GK, Gray MR: Latrodectism: a prospective cohort study of bites by formally identified redback spiders. Med J Aust. 2003, 179: 88-91.PubMedGoogle Scholar
- Silva J-P, Suckling J, Ushkaryov Y: Penelope’s web: using α-latrotoxin to untangle the mysteries of exocytosis. J Neurochem. 2009, 111: 275-290. 10.1111/j.1471-4159.2009.06329.x.PubMed CentralPubMedView ArticleGoogle Scholar
- Huang X: CAP3: a DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMed CentralPubMedView ArticleGoogle Scholar
- Clarke TH, Garb JE, Hayashi CY, Haney RA, Lancaster AK, Corbett S, Ayoub NA: Multi-tissue transcriptomics of the black widow spider reveals expansions, co-options, and functional processes of the silk gland gene toolkit. BMC Genomics. 2014, 15: 365-PubMed CentralPubMedView ArticleGoogle Scholar
- Grishin EV: Black widow spider toxins: the present and the future. Toxicon. 1998, 36: 1693-1701. 10.1016/S0041-0101(98)00162-7.PubMedView ArticleGoogle Scholar
- Li J, Mahajan A, Tsai M-D: Ankyrin repeat: a unique motif mediating protein-protein interactions. Biochemistry (Mosc). 2006, 45: 15168-15178. 10.1021/bi062188q.View ArticleGoogle Scholar
- Von Reumont BM, Blanke A, Richter S, Alvarez F, Bleidorn C, Jenner RA: The first venomous crustacean revealed by transcriptomics and functional morphology: remipede venom glands express a unique toxin cocktail dominated by enzymes and a neurotoxin. Mol Biol Evol. 2014, 31: 48-58. 10.1093/molbev/mst199.PubMed CentralPubMedView ArticleGoogle Scholar
- Jungo F, Bairoch A: Tox-Prot, the toxin protein annotation program of the Swiss-Prot protein knowledgebase. Toxicon. 2005, 45: 293-301. 10.1016/j.toxicon.2004.10.018.PubMedView ArticleGoogle Scholar
- Coddington JA: Phylogeny and classification of spiders. Spiders N Am Identif Man. 2005, American Arachnological Society, 18-24.Google Scholar
- Vassilevski AA, Fedorova IM, Maleeva EE, Korolkova YV, Efimova SS, Samsonova OV, Schagina LV, Feofanov AV, Magazanik LG, Grishin EV: Novel class of spider toxin: active principle from the yellow sac spider Cheiracanthium punctorium venom is a unique two-domain polypeptide. J Biol Chem. 2010, 285: 32293-32302. 10.1074/jbc.M110.104265.PubMed CentralPubMedView ArticleGoogle Scholar
- Kuhn-Nentwig L, Fedorova IM, Lüscher BP, Kopp LS, Trachsel C, Schaller J, Vu XL, Seebeck T, Streitberger K, Nentwig W, Sigel E, Magazanik LG: A venom-derived neurotoxin, CsTx-1, from the spider Cupiennius salei exhibits cytolytic activities. J Biol Chem. 2012, 287: 25640-25649. 10.1074/jbc.M112.339051.PubMed CentralPubMedView ArticleGoogle Scholar
- Sollod BL, Wilson D, Zhaxybayeva O, Gogarten JP, Drinkwater R, King GF: Were arachnids the first to use combinatorial peptide libraries?. Peptides. 2005, 26: 131-139. 10.1016/j.peptides.2004.07.016.PubMedView ArticleGoogle Scholar
- Südhof TC: alpha-Latrotoxin and its receptors: neurexins and CIRL/latrophilins. Annu Rev Neurosci. 2001, 24: 933-962. 10.1146/annurev.neuro.24.1.933.PubMedView ArticleGoogle Scholar
- Rowen L, Young J, Birditt B, Kaur A, Madan A, Philipps DL, Qin S, Minx P, Wilson RK, Hood L, Graveley BR: Analysis of the human neurexin genes: alternative splicing and the generation of protein diversity. Genomics. 2002, 79: 587-597. 10.1006/geno.2002.6734.PubMedView ArticleGoogle Scholar
- Reissner C, Runkel F, Missler M: Neurexins. Genome Biol. 2013, 14: 213-10.1186/gb-2013-14-9-213.PubMed CentralPubMedView ArticleGoogle Scholar
- Finkelstein A, Rubin LL, Tzeng MC: Black widow spider venom: effect of purified toxin on lipid bilayer membranes. Science. 1976, 193: 1009-1011. 10.1126/science.948756.PubMedView ArticleGoogle Scholar
- Mironov SL, Sokolov YV, Chanturiya AN, Lishko VK: Channels produced by spider venoms in bilayer lipid membrane: mechanisms of ion transport and toxic action. Biochim Biophys Acta. 1986, 862: 185-198. 10.1016/0005-2736(86)90482-7.PubMedView ArticleGoogle Scholar
- Wagh DA, Rasse TM, Asan E, Hofbauer A, Schwenkert I, Dürrbeck H, Buchner S, Dabauvalle M-C, Schmidt M, Qin G: Bruchpilot, a protein with homology to ELKS/CAST, is required for structural integrity and function of synaptic active zones in Drosophila. Neuron. 2006, 49: 833-844. 10.1016/j.neuron.2006.02.008.PubMedView ArticleGoogle Scholar
- Zipursky SL, Wojtowicz WM, Hattori D: Got diversity? Wiring the fly brain with Dscam. Trends Biochem Sci. 2006, 31: 581-588. 10.1016/j.tibs.2006.08.003.PubMedView ArticleGoogle Scholar
- Hortsch M, Nagaraj K, Godenschwege TA: The interaction between L1-type proteins and ankyrins - a master switch for L1-type CAM function. Cell Mol Biol Lett. 2008, 14: 57-69.PubMed CentralPubMedGoogle Scholar
- Laurén J, Airaksinen MS, Saarma M, Timmusk T: A novel gene family encoding leucine-rich repeat transmembrane proteins differentially expressed in the nervous system. Genomics. 2003, 81: 411-421. 10.1016/S0888-7543(03)00030-2.PubMedView ArticleGoogle Scholar
- Grueber WB, Yang C-H, Ye B, Jan Y-N: The development of neuronal morphology in insects. Curr Biol. 2005, 15: R730-R738. 10.1016/j.cub.2005.08.023.PubMedView ArticleGoogle Scholar
- Duda TF, Palumbi SR: Molecular genetics of ecological diversification: duplication and rapid evolution of toxin genes of the venomous gastropod Conus. Proc Natl Acad Sci. 1999, 96: 6820-6823. 10.1073/pnas.96.12.6820.PubMed CentralPubMedView ArticleGoogle Scholar
- Chen J, Zhao L, Jiang L, Meng E, Zhang Y, Xiong X, Liang S: Transcriptome analysis revealed novel possible venom components and cellular processes of the tarantula Chilobrachys jingzhao venom gland. Toxicon. 2008, 52: 794-806. 10.1016/j.toxicon.2008.08.003.PubMedView ArticleGoogle Scholar
- Sedgwick SG, Smerdon SJ: The ankyrin repeat: a diversity of interactions on a common structural framework. Trends Biochem Sci. 1999, 24: 311-316. 10.1016/S0968-0004(99)01426-7.PubMedView ArticleGoogle Scholar
- Zhang D, de Souza RF, Anantharaman V, Iyer LM, Aravind L: Polymorphic toxin systems: comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics. Biol Direct. 2012, 7: 18-10.1186/1745-6150-7-18.PubMed CentralPubMedView ArticleGoogle Scholar
- Perret BA: Proteolytic activity of tarantula venoms due to contamination with saliva. Toxicon Off J Int Soc Toxinology. 1977, 15: 505-510. 10.1016/0041-0101(77)90101-5.View ArticleGoogle Scholar
- Matsui T, Fujimura Y, Titani K: Snake venom proteases affecting hemostasis and thrombosis. Biochim Biophys Acta. 2000, 1477: 146-156. 10.1016/S0167-4838(99)00268-X.PubMedView ArticleGoogle Scholar
- Smith D, Russell F: Structure of the venom gland of the black widow spider Latrodectus mactans. A preliminary light and electron microscopic study. Anim Toxins. Edited by: Russell F, Saunders P. 1966, Oxford: Pergamon, 1-15.Google Scholar
- Vetter RS, Isbister GK: Medical aspects of spider bites. Annu Rev Entomol. 2008, 53: 409-429. 10.1146/annurev.ento.53.103106.093503.PubMedView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29: 644-652. 10.1038/nbt.1883.PubMed CentralPubMedView ArticleGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, MacManes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, LeDuc RD, Friedman N, Regev A: De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013, 8: 1494-1512. 10.1038/nprot.2013.084.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1016/S0022-2836(05)80360-2.PubMedView ArticleGoogle Scholar
- Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011, 12: 323-10.1186/1471-2105-12-323.PubMed CentralPubMedView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMed CentralPubMedView ArticleGoogle Scholar
- Finn RD, Clements J, Eddy SR: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011, 39 (suppl): W29-W37. 10.1093/nar/gkr367.PubMed CentralPubMedView ArticleGoogle Scholar
- Young MD, Wakefield MJ, Smyth GK, Oshlack A: Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 2010, 11: R14-10.1186/gb-2010-11-2-r14.PubMed CentralPubMedView ArticleGoogle Scholar
- Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R: InterProScan: protein domains identifier. Nucleic Acids Res. 2005, 33 (Web Server): W116-W120. 10.1093/nar/gki442.PubMed CentralPubMedView ArticleGoogle Scholar
- Naamati G, Askenazi M, Linial M: ClanTox: a classifier of short animal toxins. Nucleic Acids Res. 2009, 37 (Web Server): W363-W368. 10.1093/nar/gkp299.PubMed CentralPubMedView ArticleGoogle Scholar
- Kaplan N, Morpurgo N, Linial M: Novel families of toxin-like peptides in insects and mammals: a computational approach. J Mol Biol. 2007, 369: 553-566. 10.1016/j.jmb.2007.02.106.PubMedView ArticleGoogle Scholar
- Gracy J, Le-Nguyen D, Gelly J-C, Kaas Q, Heitz A, Chiche L: KNOTTIN: the knottin or inhibitor cystine knot scaffold in 2007. Nucleic Acids Res. 2007, 36 (Database): D314-D319. 10.1093/nar/gkm939.PubMed CentralPubMedView ArticleGoogle Scholar
- Emanuelsson O, Brunak S, von Heijne G, Nielsen H: Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007, 2: 953-971. 10.1038/nprot.2007.131.PubMedView ArticleGoogle Scholar
- Wolters DA, Washburn MP, Yates JR: An automated multidimensional protein identification technology for shotgun proteomics. Anal Chem. 2001, 73: 5683-5690. 10.1021/ac010617e.PubMedView ArticleGoogle Scholar
- Eng JK, McCormack AL, Yates JR: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994, 5: 976-989. 10.1016/1044-0305(94)80016-2.PubMedView ArticleGoogle Scholar
- Papadopoulos JS, Agarwala R: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics. 2007, 23: 1073-1079. 10.1093/bioinformatics/btm076.PubMedView ArticleGoogle Scholar
- Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009, 25: 1972-1973. 10.1093/bioinformatics/btp348.PubMed CentralPubMedView ArticleGoogle Scholar
- Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012, 61: 539-542. 10.1093/sysbio/sys029.PubMed CentralPubMedView ArticleGoogle Scholar
- Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014, 30: 1312-1313. 10.1093/bioinformatics/btu033.PubMed CentralPubMedView ArticleGoogle Scholar
- McCowan C, Garb JE: Recruitment and diversification of an ecdysozoan family of neuropeptide hormones for black widow spider venom expression. Gene. 2014, 536: 366-375. 10.1016/j.gene.2013.11.054.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.