Draft genome sequences of Hirudo medicinalis and salivary transcriptome of three closely related medicinal leeches

Background Salivary cell secretion (SCS) plays a critical role in blood feeding by medicinal leeches, making them of use for certain medical purposes even today. Results We annotated the Hirudo medicinalis genome and performed RNA-seq on salivary cells isolated from three closely related leech species, H. medicinalis, Hirudo orientalis, and Hirudo verbana. Differential expression analysis verified by proteomics identified salivary cell-specific gene expression, many of which encode previously unknown salivary components. However, the genes encoding known anticoagulants have been found to be expressed not only in salivary cells. The function-related analysis of the unique salivary cell genes enabled an update of the concept of interactions between salivary proteins and components of haemostasis. Conclusions Here we report a genome draft of Hirudo medicinalis and describe identification of novel salivary proteins and new homologs of genes encoding known anticoagulants in transcriptomes of three medicinal leech species. Our data provide new insights in genetics of blood-feeding lifestyle in leeches.

Ligand-binding domain of nuclear hormone receptor g11255

Accessory proteins
vWFD domain: We identified sequences that encode parts of a larger protein, which is the product of a single gene (g10240.t1) and a large, complex protein that comprises several domains, such as the von Willebrand factor type D domain, C8 domain, and trypsin inhibitor-like Cys-rich domain. The combination of all such domains within a single protein features spondins and mucins. Most secreted mucins are major components of mucus and mucosa and are found in salivary gland secretions, where they perform diverse functions [1].
Calreticulin: This protein family has various functions and is involved in diverse processes. The intracellular functions include chaperon activity, and these proteins play an important role in embryogenesis [2,3]. A number of studies have shown that calreticulin plays an active role in haemostasis and interacts with numerous coagulation and endothelial factors [4]. In vivo experiments revealed that these proteins exhibit antithrombotic activity [5]. Due to this property, calreticulin has been patented as an anticoagulant. This protein family is often found in omics data for haematophagous species [6,7]. They may play a modulating role in host haemostasis through binding calcium ions, which act as coagulation cofactors. Leech calreticulin was demonstrated to interact with the complement component C1q and plays an important role in restoring normal synaptic connections in the central nervous system of the leech [8].
Calmodulin: Calmodulin is a conserved multifunctional calcium binding protein containing the EF-hand domain. This protein was shown to activate intracellular eNOS in endothelial cells and to regulate collagendependent platelet aggregation [9]. Calmodulin, like calreticulin, is presumed to participate in the removal of free calcium ions.
Thioredoxin-like domain containing protein: Thioredoxin transfers electrons, acts as a chaperone, and, when coupled with glutathione, is a strong antioxidant involved in reactive oxygen species scavenging [10]. This protein has been found to play an important role in the immune response that emerged from peptide antigen-class I MHC interactions [11] and to participate in platelet activation and regulation [12]. Human proteins homologous to thioredoxin (for example, disulfide-isomerase A6) play a role in platelet aggregation induced by agonists, such as convulxin, collagen, and thrombin [13]. Some human extracellular disulfide isomerases are known to mediate platelet aggregation and thrombus formation (ERp57) [14]. They are secreted by platelets on the surface of the membrane and interact with calnexin and calreticulin.
In addition to the listed proteins, we identified globin, EGF-like protein, arginine ADP-ribosyltransferase, aspartyl aminopeptidase, and aldehyde dehydrogenase. The role of these molecules in haemostasis is currently unclear. In both proteomic and transcriptomic data, we identified cytoskeletal proteins, transcription factors, heat shock proteins, and ribosomal proteins.

Determination of the leech's species
Determination of the leech's species was performed by PCR amplification and subsequent sequencing of the gene regions of nuclear and mitochondrial ribosomal RNA [2]. For amplification of the mitochondrial genes 12S-rRNA we used oligonucleotides 12A and 12B. For amplification of ITS2 nuclear gene we used oligonucleotides IS-A and IS-B. The amplified DNA fragments were sequenced using the same oligonucleotides and compares the sequence contained in the GenBank database: Hirudo medicinalis -AY763166 (IS2) and AY763156 (12S), Hirudo orientalis -AY763170 (IS2) and AY763163 (12S), Hirudo verbana -AY763167 (IS2) and (12S).

Sanger sequencing
The sequences of the amplicons were obtained by Sanger dideoxy sequencing method using Big DyeTM Terminator v.3.1 Cycle Sequencing Kit and ABI Prism Genetic Analyzer 3730XL following the manufacturer's instructions (Applied Biosystem, USA).

Genomic DNA extraction from Hirudo medicinalis
Genomic DNA was isolated using standard molecular biology technique with modifications. Pre-sliced leech body (or 15-20 newborn leeches) were placed in a 50 ml tube and 10 ml of SDS-extraction buffer (100 mM NaCl, 10 mM Tris-Cl pH 8, 25 mM EDTA pH 8, 0.5% SDS) and 150 μL of proteinase K (Amresco, USA; 20 mg/ml) were added. The sample was incubated at 37°C for 18 h, inverting the tube three or four times during the first 2 h. Then, 10 ml of phenol was added and mixed by inverting the tube gently for 10 min. After centrifugation at 3500 g for 15 min, the aqueous phase containing the DNA was extracted a second time with 10 ml of phenol:chloroform (50:50). The white aqueous phase was transferred into a new tube and an equal volume of cold (-20°C) 98% ethanol was added. The DNA was precipitated for 20 min at -80°C, and spun at 3500 g for 20 min. The supernatant was discarded and the pellet washed with cold 70% ethanol. The ethanol was removed and the pellet resuspended in 1 ml of TE buffer (10 mM Tris-Cl pH8, 1 mM EDTA pH 8).

Cleaning up genomic DNA
To

Draft genome annotation
For performing annotation, three sets of "hints" were formed. A hint is a range into genome sequence likely belongs to a genetic feature such as exon, intron etc. The first set was generated by using H. robusta proteins. They were mapped on assembly scaffolds by using BLAT [15] software(version 34x13) with parameters "-t=dnax -q=prot". BLAT hits were converted into Augustus [16] software hints by using provided with augustus script "blat2hints.pl".
Second set was generated by using contigs from transcriptome de novo assembly. It was created by assembly reads from RNA-seq libraries with using Trinity [17] software (version r20131110) with default parameters. Contigs from transcriptome assembly aligned with genome assembly scaffolds by using BLAT software (version 34x13) with default parameters. BLAT hits were converted into augusts software hints by using "blat2hints.pl" Third sets of hints were generated by using RNA-seq reads. Hints were formed by using standard pipeline similar to described in papers [18,19] based on BLAT read alignments with genome scaffolds. Sets of hints were combined and provided to Augustus software during annotation. Annotaion was performed by using Augustus (version 3.7.1) with parameters "--genemodel=complete --gff3=on --species=fly".

Laser microdissection for mRNA-seq
To synthesize cDNA libraries from salivary gland cells of three leech species H. verbana, H. orientalis and H. medicinalis we isolated tissue fragments enriched with this cell type using laser microdissection. Leeches were stretched out on metallic foil by two forceps. Than leeches were allowed to attach this foil and briefly moved into liquid nitrogen. Deeply frozen leeches were detached from metallic foil and anterior part of the body containing salivary gland cell was cut by lancet from the rest of the body. Severed anterior part was attached to aluminum base for preparation of cryo-sections using OCT compound at -20°C. 10 µm sagital sections were made from each leech (the temperature of chamber -20°C, temperature of specimen -24°C) and placed on membrane slides (slides with PEN foil having 2 µm thickness). Slides stored at -70°C until need. For laser microdissection slides were briefly thawed and immediately fixed in 100% ice-cold methanol for 1 minute followed by air drying. Then slides rehydrate in DEPK water for 1 minute, stained in 0.5% methylene blue solution in DEPK water for 1 minute, briefly washed in DEPK water and dehydrate in 100% methanol for 3 minutes followed by air drying. Slides were placed into specimen holder of Leica Laser Microdissection System LMD7000. 0.5 µl tubes with caps filled with 30 µl of RNeasy extraction solution were placed into collector holder. Methylene blue administration allowed us to visualize arias on sections enriched with salivary gland cells by strong blue staining of these cells in contrast to adjacent, poorly stained tissues. These areas were isolated into caps filled with extraction buffer by laser microdissection. As a control we isolated muscles from adjacent areas. In all cases we collected material from 10 sections into one cap. Then cap was removed from the holder, the tube was filled with 170 µl of RNeasy extraction solution, the tube was closed with cap and briefly centrifuged. The tube was stored at -70°C until need.

Real-time PCR
RNA isolation from the LMD samples was carried out with Trizol reagent (Thermo Fisher Scientific) according to the manufacturer's instructions. Total RNA was treated with 2 units DNase I (Thermo Fisher Scientific) in the presence of 20 units ribonuclease inhibitor for 30 min at 37 °C, then phenol-chloroform extraction was carried out, followed by ethanol precipitation. The total amount of RNA was determined using Qubit® 2.0 Fluorometer (Life Technologies) and Quant-iT ™ RNA Assay Kit, 5-100 ng (Thermo Fisher Scientific). The synthesis of the first cDNA strand was performed with using the RevertAid RT Reverse Transcription Kit (Thermo Fisher Scientific). For reverse transcription reactions used 30 ng of total RNA, 100 pmol of primer T20, 20 units of a ribonuclease inhibitor and 200 ea Reverse transcriptase. The reaction mixture was incubated at 42 °C for 60 min and the reaction was terminated by heating at 70 °C for 5 min and further used for the Real-time PCR reaction.
Real-time PCR was performed using CFX96 Touch thermocycler (BioRad) with 20 µl reaction mixture with 0.5 units Taq polymerase (Thermo Fisher Scientific), appropriate buffer and dNTP, 10ng cDNA, 5 pM of each primer, and 0.1x SYBR-Green (Thermo Fisher Scientific). The samples were subjected to the following thermal profile for amplification: 95 °C for 120s , followed by 40 cycles of 94 °C for 15s; 58 °C annealing for 20s; 65 °C extension for 60s + detection.
The calculation of the relative change in the level of expression of the target genes (destabilase, hirudin) was carried out on the basis of the results of quantitative real-time PCR. The amount of cDNA corresponding to the transcripts of the target and reference genes was determined by the difference in the threshold reaction cycle (Ct) for each sample. All assays were performed with three technical replications. Data was processed using the 'delta delta Ct' method. Expression level of destabilase and hirudin was determined by using DebRT1 / DebRT2 and Hir1 / Hir2 primers, respectively. For data normalisation, house-keeping gene (reference gene) tubulin was used with TubHM1 / TubHM2 primers.
primer list:

Filter-aided sample preparation (FASP)
Freeze -dried secret leech was resuspended in 500 μL of 50 mM M Tris/HCl, pH 8.0 with 0,1% sodium deoxycholate (DCNa) (Sigma), placed in centrifugal ultrafiltration units with a nominal molecular weight cutoff of 10,000 (Millipore) and centrifuged at 14,000 × g, 20 °C, for 20 min. 400 μL of 50mM Tris/HCl, pH 8.0 with 8 M urea and 0,1% DCNa was added to the filters sample, mixed and centrifuged at 14,000×g, 20°C, for 20 min. The eluates were discarded, 400 μL of 50mM Tris/HCl, pH 8.0 with 0,1% DCNa was added to the filtration unit, and the units were centrifuged again. These two steps were repeated twice. For reducing of protein disulfide bonds 100 μL of 10 mM dithiothreitol (DTT) (BioRad) in 50mM Tris/HCl, pH 8.0 with 0,1% DCNa was added to the filters, samples were incubated at 60°C for 30 min and centrifuged at 14,000×g, 20°C, for 10 min. For alkylating of cysteine residues 100 μL 55 mM iodoacetamide (BioRad) was added to the filters, samples were incubated at room temperature and in dark for 30 min and centrifuged at 14,000×g, 20°C, for 10 min. Filters were washed twice with 200 μL of 50 mM ammonium bicarbonate with 0,1% DCNa. Proteins were digested in 40 μL of 50 mM ammonium bicarbonate with 0,1% DCNa at 37°C for 16 h, using trypsin (Trypsin Gold, Mass Spectrometry Grade, Promega) at an enzyme to protein ratio of 1:50. The released peptides were collected by centrifugation at 14,000×g for 15 min followed by two washes with 50 μL 0.5 M NaCl. The combined hydrolyzate was desalted using a Discovery DSC-18 Tube (Supelco) according to the manufacturer protocol. Peptides were eluted with 700 µL 75% ACN, 0.1% TFA, dried in a SpeedVac (Labconco) and resuspended in 3% ACN, 0.1% TFA to the final concentration of 5 µg/µL.

Gel-free trypsin digestion using surfactant RapiGest SF (Waters)
Freeze-dried leeches secret was treated with 5 μl of 10% RapiGest SF (Waters) and 1 μl nuclease mix (GE Healthcare) for 30 min at 4 °C, then resuspended in 45 μl of 100 mM ammonium bicarbonate, vortexed and heated at 100 °C for 5 min. After cooling to room temperature insoluble material was removed by centrifugation at 15000 g for 5 min. Supernatant was separated and protein concentration was determined by BCA (BCA Assay Kit, Sigma). Protein cysteine bonds were reduced with 10 mM DTT for 30 min at 60°C and alkylated with 30 mM iodoacetamide in the dark at RT for 30 min. The step with adding DTT was repeated. After alkylation, trypsin (Trypsin Gold, Mass Spectrometry Grade, Promega) was added to supernatant in ratio trypsin : protein equal 1 : 50 and incubated at 37°C for night. For inactivation of trypsin activity and degradation of the acid-labile surfactant stock solution of trifluoroacetic acid (Sigma) (final concentration in sample 0,5%) was added to digest, incubated at 37°C for 45 min and centrifugated at 15000 g for 10 min for removing of surfactant. Hydrolyzate was desalted using a Discovery DSC-18 Tube (Supelco) according to the manufacturer protocol. Peptides were eluted with 700 µL 75% ACN, 0.1% TFA, dried in a SpeedVac (Labconco) and resuspended in 3% ACN, 0.1% TFA to the final concentration of 5 µg/µL.

In-gel digestion of protein sample
Freeze -dried secret leech was resuspended in 500 μL of 50 mM Tris/HCl, pH 6,8, placed in centrifugal ultrafiltration units with a nominal molecular weight cutoff of 3,000 (Millipore) and centrifuged at 14,000 × g, 20°C, for 20 min. This step was repeated twice. Protein concentration in concentrated sample was determined by Bradford (Bradford Protein Assay Kit, BioRad). Then Laemmli buffer was added to concentrated sample (1 : 1), heated for 5 min at 95 ° C and centrifuged at 15,000 g for 10 min. The supernatant was separated and proteins separated by 7.5 % SDS-PAGE. After electrophoresis was stopped the gel was stained with Coomassie G-250. For digestion by trypsin, gel slices for each sample were cut into 3 pieces. Each of these pieces was cut into small (1 x 1 mm) cubes and transferred into tube. Gel samples were destained with 50% ACN in 50 mM ammonium bicarbonate at 50 ° C. Protein disulfide bonds were reduced with 10 mM DTT (BioRad) in 100 mM ammonium bicarbonate at 56°C for 30 min and subsequently alkylated with 55 mM IAA in 100 mM ammonium bicarbonate at room temperature for 20 min (in complete darkness). To neutralize unbound IAA 10 mM DTT solution in 100 mM ammonium bicarbonate was added to gel sample again. Upon alkylation the gel samples dehydrated by addition of neat ACN. After removal of the ACN, the samples were subjected to in-gel digestion. Solution of trypsin (20 nanog / µl ) (Trypsin Gold, Mass Spectrometry Grade, Promega) in 40 mM ammonium bicarbonate and 10 % ACN was added to the dried gel pieces so that the solution covered the gel pieces, and incubated at first 1 hour on ice, and then 16 hours at 37 °C. The resulting tryptic peptides extracted from the gel successively the following solutions : 5 % formic acid, 50 % ACN and 5 % formic acid, 75 % ACN and 5 % formic acid. All solutions were prepared on water mQ. Finally the extracted peptides were combined, dried on a vacuum concentrator CentriVap (Labconco) and dissolved in 50 µl of a solution containing 3% ACN and 0.1% TFA. The obtained peptides extract was desalted using a Discovery DSC-18 Tube (Supelco) according to the manufacturer protocol. Peptides were eluted with 700 µL 75% ACN, 0.1% TFA, dried in a SpeedVac (Labconco) and resuspended in 3% ACN, 0.1% TFA to the final concentration of 5 µg/µL.

TripleTOF
Analysis was performed on a TripleTOF 5600+ mass-spectrometer with a NanoSpray III ion source (Sciex, USA) coupled to a NanoLC Ultra 2D+ nano-HPLC system (Eksigent, USA). The HPLC system was configured in a trap-elute mode. For a sample loading buffer and buffer A, the mix of 98.9% water, 1% methanol, 0.1% formic acid (v/v) was used. Buffer B was 99.9% acetonitrile, 0.1% formic acid (v/v). Samples were loaded on a trap column Chrom XP C18 3 mm 120 Å 350 mm*0.5 mm (Eksigent, USA) at a flow rate of 3.5 ul/min over 10 min and eluted through the separation column 3C18-CL-120 (3 µm 120 Å) 0.075 mm*150 mm (Eksigent, USA) at a flow rate of 300 nl/min. The gradient was from 5 to 40% of buffer B in 120 min. The column and the precolumn were regenerated between runs by washing with 95% of buffer B for 7 min and equilibrated with 5% of buffer B for 25 min. Between the samples to ensure the absence of carryover both the column and the precolumn were thoroughly washed with a blank injection trap-elute gradient that included five 7 minute 5-95-95-5%B waves followed by 25 min 5%B equilibration.
Mass spectra were acquired in a positive ion mode. Information-dependent mass-spectrometer experiment included 1 survey MS1 scan followed by 50 dependent MS2 scans. MS1 acquisition parameters were as follows: mass range for analysis and subsequent ion selection for MS2 analysis was 300-1250 m/z, signal accumulation time was 250 ms. Ions for MS2 analysis were selected on the basis of intensity with the threshold of 400 cps and the charge state from 2 to 5. MS2 acquisition parameters were as follows: resolution of quadrupole was set to UNIT (0.7 Da), measurement mass range was 200-1800 m/z, optimization of ion beam focus was to obtain maximal sensitivity, signal accumulation time was 50 ms for each parent ion. Collision activated dissociation was performed with nitrogen gas with collision energy ramping from 25 to 55 V within 50 ms signal accumulation time. Analyzed parent ions were sent to dynamic exclusion list for 15 sec in order to get the next MS2 spectra of the same compound around its chromatographic peak apex (minimum peak width throughout the gradient was about 30 s).
For protein identification, .wiff data files were analyzed with ProteinPilot 4.5 revision 1656 (ABSciex) using search algorithm Paragon 4.5.0.0 revision 1654 (Sciex, USA) and a standard set of identification settings to search against . The following parameters were used: alkylation of cysteine -iodoacetamide, trypsin digestion, TripleTOF 5600 equipment, species: none, thorough search with additional statistical FDR analysis, and therefore detected protein threshold equal or more than 0,05. Peptide identifications were processed with default settings by a ProteinPilot software built-in ProGroup algorithm. The protein identification list was obtained with the threshold reliable protein ID unused score calculated by ProteomicS Performance Evaluation Pipeline Software (PSPEP) algorithm for 1% global FDR from fit. Then accidental decoy proteins identification was removed as an all peptides containing nontryptic digestion, internal cleavages sites, all modifications except alkylated cysteine and nonmodified cysteine was removed too. All proteins covered with remaining peptides formed final proteins list.

Orbitrap
Peptide separations were carried out using an Ultimate 3000 RSLCnano system (Dionex). The sample was loaded into a trap column (Acclaim PepMap, 2 cm × 75 μm inner diameter, C18, 3 μm, 100 A) (Thermo Scientific) at 2 μl/min with 0.1% formic acid in water. After 5 min, the trap column was set on-line with an analytical column (Zorbax 300SB-C18, 15 cm × 75 μm inner diameter, 3,5μm) (Agilent). Peptide elution was performed by applying a mixture of solvents A and B. Solvent A was HPLC grade water with 0.1% (v/v) formic acid, and solvent B was 80%(v/v) HPLC grade acetonitrile/water with 0.1% (v/v) formic acid. Separations were performed by applying a linear gradient from 5% to 60% solvent B at 400 nl/min over 85 min followed by a washing step (10 min at 99% solvent B) and an equilibration step (25min at 5% solvent B).
MS analyses were performed using a Q-Exactive HF mass spectrometer (Thermo Scientific, Germany). A nanospray Flex ion source with 1800 V ionization voltage and 200°C capillary temperature were used. MS data were acquired in a data-dependent strategy selecting the fragmentation events based on the precursor abundance in the survey scan. MS survey scans were acquired at a resolution of 70000, scan range 400-1200 m/z, target AGC values of 1 × 106, maximum injection time 50 ms. Fragmentation was performed for 25 most abundant ions with a normalized collision energy of 35, dynamic exclusion 10 s. MS/MS scans were acquired with resolution of 17500, target AGC values of 1 × 105, maximum injection time 100 ms, isolation window 2,0 m/z. Obtained mass-spectra in thermo.raw format was converted in non-calibrated peaklists (.mgf extension) with MSconvert program from open source ProteoWizard program bundle. This peaklists were analyzed with ProteinPilot 4.5 revision 1656 (ABSciex) as was mentioned above.
OR The gradient was start from 100% of buffer A during 9 minutes, then from 100% of buffer A to 5% of buffer B in 1 minutes, then from 5% to 30% of buffer B in 140 minutes and finally from 30% to 99% in 5 minutes (for total proteome FASP-DCA-Urea, RapiGest). The column and precolumn were regenerated between runs by washing with 99% of buffer B for 10 minutes and equilibrated with 100% of buffer A for 20 minutes. Between the different samples to ensure the absence of carryover both the column and the precolumn were thoroughly washed with two blank injection: first was made with trap-elute gradient that included 12 10 minute 0-5-99-0%B waves followed by 25 min 100%A equilibration and the second was blank injection with standard separation gradient. Between the different pieces of the same gel only blank injection with standard separation gradient was performed.
Ion source parameters were as follows: drying gas (nitrogen) flow rate -4 l/min, heater temperature -170°С, ionization voltage -1700V, positive ion mode. Information-dependent mass-spectrometer experiment included 1 survey MS1 scan followed by 20 dependent MS2 scans with frequency of ToF spectra acquisition 4 Hz, total cycle time 2.2 sec. MS1 acquisition parameters were as follows: mass range for analysis and subsequent ion selection for MS2 analysis was 200-1500 m/z. Ions for MS2 analysis were selected on the basis of intensity with the threshold of 1000 cps and the charge state from 2 to 5. MS2 acquisition parameters were as follows: measurement mass range was 200-1500 m/z. Collision activated dissociation was performed with nitrogen gas with collision energy ramping from 25 to 55 V depending on m/z of precursor ion. Analyzed parent ions were sent to dynamic exclusion list for 0.25 min in order to get the next MS2 spectra of the same compound around its chromatographic peak apex (minimum peak width throughout the gradient was about 30 s). Raw spectra with the use of program Compass DataAnalysis (Bruker Daltonics GmbH, Germany) were converted in non-calibrated peaklists (.mgf extension). This peaklists were analyzed with ProteinPilot 4.5 revision 1656 (ABSciex) as was mentioned above.