Venom has evolved in parallel in multiple animals for self-defense, prey capture or both. Animals that use venom are widely distributed across the tree of life and include snakes, arachnids (including spiders and scorpions), mollusks (including cone snails, octopuses and jellyfish), insects (including bees and beetles) and some teleost fishes (as reviewed in [1]). Venoms are typically complex mixtures of bioactive peptides and/or proteins formally referred to as ‘toxins’. Toxins are very specific in their activity and different toxins may cause very different pharmacological effects. They act by binding to ion-channels and disrupting metabolic pathways. This leads to paralysis, pain, hematological disturbances, immune reactions, necrosis and apoptosis in the animal that has been injected with venom [2, 3]. Because of the specificity of toxins, they can be used as experimental tools or probes to study cell mechanisms and develop novel medicines and drugs [3]. The study of venoms, categorizing the different toxins that constitute a venom and their activities, has already been successful in the development of novel pharmaceuticals, for example the development of the ACE inhibitor Captopril® from the venom of the snake Bothrops jararaca [4]. From the venom of the death-stalker scorpion Leiurus quinquestriatus, a glioma cell binding toxin is already in use for cancer therapeutics [5]. Other examples are the antimicrobial peptides (AMPs), also found in scorpions, used for treating infections from antibiotic-resistant bacteria, fungi and even viruses [6, 7]. These examples demonstrate the potential benefit of scorpion venom and toxin research in the development of novel medicines. Because of the great diversity, variability, selectivity and application of toxins it is crucial to study additional venoms and especially to identify novel toxins that might be used for the development of new drugs and medicines against for example ion channel-associated diseases like autoimmune diseases, chronic pain, diabetes, epilepsy, and gliomas. However, identifying new toxins for drug development is also challenging since most peptides, like toxins, are easily broken down when ingested or give adverse reactions when injected as drug.
Scorpion venoms typically consist of a complex mixture of small polypeptides, enzymes, nucleotides, lipids, mucoproteins, biogenic amines as well as unidentified substances [8]. In these venom mixtures, polypeptides and enzymes are the most prominent and toxic components [9]. Based on structure and effect, scorpion toxins are generally classified into two classes: disulfide-bridged peptides (DBPs) and non-disulfide-bridged peptides (NDBPs) [9,10,11]. DBPs have at least two cysteines that interact and form a disulfide bridge. Major scorpion toxin families that have these bridges, in order of medical relevance, are sodium-channel binding toxins (NaTx), potassium-channel binding toxins (KTx), chloride-channel binding toxins (ClTx), calcium-channel binding toxins (CaTx), Kunitz-type toxins and M-theraphotoxins, respectively. These toxin families are also most lethal to humans [9,10,11]. The other class of toxins, NDBPs, is much more diverse and less studied, both due to their less harmful nature and generally lower levels of expression. There are two sub-groups of NDBPs: cationic and highly acidic peptides [9]. Although some studies have successfully identified multiple highly acidic peptides, these peptides have not yet been functionally categorized [12, 13]. Researchers have recently identified and functionally characterized some of these toxins. Typical biological activities of these NDBPs include antimicrobial, hemolytic, cytolytic and bradykinin-potentiating, making this group extremely diverse ([11], and as reviewed in [14]).
In transcriptome analysis the resolution is often dependent on the amount of data available to annotate the transcripts and published annotated reference genomes that aid the transcript annotation. With only two scorpion genomes currently accessible (Centruroides sculpturatus and Mesobuthus martensii with 30,465 and 32,016 coding genes, respectively), for which toxin genes were not validated, annotating venom gland transcriptomes becomes inherently difficult [15]. Another major issue for scorpion transcriptomics compared to e.g. snake transcriptomics is the limited availability of genes and proteins used to annotate. The NCBI database holds approximately 30,000 scorpion genes and over 44,000 scorpion proteins, while the same database stores over 114,000 snake genes and 323,000 snake proteins. Furthermore, most of these stored scorpion proteins are housekeeping genes, leaving only 4500 scorpion proteins labelled as scorpion toxin, compared to the 10,000 snake toxins in the NCBI database. This therefore greatly reduces the references that can be used to annotate a scorpion transcriptome, and more specifically scorpion toxin diversity [15]. In addition, the toxin diversity of scorpion venom is in general higher than that of snake venom.
In order to identify new biomedically useful DBP and NDBP toxins, this study has focused on six scorpions belonging to three families. We have included four buthid scorpions: (i) Androctonus mauritanicus (ii) Babycurus gigas (iii) Grosphus grandidieri (iv) Hottentotta gentili. Of the 20 scorpion families recognized by Sharma et al. [16] the family Buthidae contains almost all species that are significantly harmful to humans. Approximately 2400 scorpion species have been described, and, of the 30 or so that are considered medically relevant to humans, 29 are from the family Buthidae. This family is known for the abundance of potent ion-channel toxins in its venom. Since buthid scorpions seem the most active pharmacologically, and their venom contains ion-channel targeting toxins, which are medically relevant, the venoms of scorpions from this family have been extensively studied [17]. However, this has diverted attention away from the other scorpion families. Studies have shown that some toxins in non-buthid scorpions possess unique biological activities and applications [18,19,20]. Therefore we also included one scorpion from the family Iuridae, (v) Protoiurus kraepelini, and one scorpion from the family Diplocentridae, (vi) Nebo hierichonticus.
The first aim of this study was the identification of the venom composition of the six scorpion species listed above, achieved through high-throughput sequencing and transcriptome analysis. The benefits of using high-throughput sequencing methods are efficiency and speed. Furthermore, this method allows for an easy approach to quantify the coverage of transcripts into expression-related data, and increases the probability of finding novel proteins [17, 21,22,23]. In this study both the telson (stinger) and the chela (pincer) of each of the six scorpions were sequenced, resulting in two transcriptomes from each scorpion. The chela transcriptomes were then used to filter out any housekeeping transcripts or other general regulatory transcripts from the telson transcriptomes. To study the remaining transcripts of the telson transcriptomes an automated annotation pipeline was used. This pipeline utilizes datasets downloaded from UniProt [24] (downloaded on February 8th 2018) and labeled each transcript as either physiological, toxin and toxin-family or unidentified. With this pipeline, the venom composition of the six scorpions could be categorized. The second aim of this study was to find novel toxins or novel toxin families. This was done by selecting highly expressed unidentified transcripts from the transcriptomes. For these transcript typical toxin-like features were predicted if present, like a signal peptides (an essential structure of each scorpion toxin), cystine pattern and other conserved domains.
This study is, to our knowledge, the first that has focused on the transcriptomics of multiple scorpion families (Fig. 1). The high-throughput sequencing approach increased the probability of finding novel toxins and provided enough data for comparative transcriptomics. From both Iuridae and Diplocentridae no transcriptomic studies have previously been conducted and no venom studies have been conducted for Diplocentridae. We expect that this wide taxonomic approach will increase the chances of identifying novel peptides.