Transcriptional profiling of identified neurons in leech

While leeches in the genus Hirudo have long been models for neurobiology, the molecular underpinnings of nervous system structure and function in this group remain largely unknown. To begin to bridge this gap, we performed RNASeq on pools of identified neurons of the central nervous system (CNS): sensory T (touch), P (pressure) and N (nociception) neurons; neurosecretory Retzius cells; and ganglia from which these four cell types had been removed. Bioinformatic analyses identified 3565 putative genes whose expression differed significantly among the samples. These genes clustered into 9 groups which could be associated with one or more of the identified cell types. We verified predicted expression patterns through in situ hybridization on whole CNS ganglia, and found that orthologous genes were for the most part similarly expressed in a divergent leech genus, suggesting evolutionarily conserved roles for these genes. Transcriptional profiling allowed us to identify candidate phenotype-defining genes from expanded gene families. Thus, we identified one of eight hyperpolarization-activated cyclic-nucleotide gated (HCN) channels as a candidate for mediating the prominent sag current in P neurons, and found that one of five inositol triphosphate receptors (IP3Rs), representing a sub-family of IP3Rs absent from vertebrate genomes, is expressed with high specificity in T cells. We also identified one of two piezo genes, two of ~ 65 deg/enac genes, and one of at least 16 transient receptor potential (trp) genes as prime candidates for involvement in sensory transduction in the three distinct classes of leech mechanosensory neurons. Our study defines distinct transcriptional profiles for four different neuronal types within the leech CNS, in addition to providing a second ganglionic transcriptome for the species. From these data we identified five gene families that may facilitate the sensory capabilities of these neurons, thus laying the basis for future work leveraging the strengths of the leech system to investigate the molecular processes underlying and linking mechanosensation, cell type specification, and behavior.

Background A major evolutionary advantage arising from multicellularity has been the possibility for species to generate "professional" cell types whose highly differentiated and more or less fixed phenotypes allow them to specialize in particular functions. The broad outlines of how this is achieved through cascading interactions of unequal cell divisions and inherited determinants, intercellular signaling, and transcriptional networks is understood, but numerous questions remain. Nowhere is this more evident than in the nervous system, where the diversity of morphologically defined cell types is further complicated by molecular and physiological distinctions [1][2][3]. Large scale transcriptional profiling at the single cell level (scRNAseq) is a powerful approach to this problem for complex vertebrate nervous systems, but at present this approach suffers from major limitations.
First is the trade-off between sequencing depth and the mRNA content of the starting sample. Low abundance transcripts are apt to be missing from the transcriptional profile altogether for scRNASeq, and stochastic variation in which transcripts are counted makes it hard to know which profiles mark phenotypically equivalent cells as opposed to subtle but significant sub-types. A second limitation of common scRNAseq technologies is the need to dissociate the tissue into its constituent cells as part of the procedure, with loss of spatial information regarding cell identity. While spatially resolved transcription profiling techniques are emerging [4], these approaches would be expected to reduce sensitivity to transcriptional differences even further than standard single-cell approaches.
Certain invertebrate nervous systems offer advantages in addressing these problems, just as they proved advantageous for elucidating aspects of neural mechanisms and neural circuits underlying behavior [5][6][7]. For example, their neurons are often larger in size than those in vertebrates--many neurons in gastropod molluscs, for example, are so large that even individual neurons can be transcriptionally profiled to a much greater depth than is possible for mammalian neurons [8]. In invertebrates, moreover, one or a few similar neurons coordinate functions that in vertebrates require hundreds or thousands of similar neurons. This together with the extensive body of previous work in certain invertebrate systems offers the ability to study individual cells with well-defined physiological properties and functions [6]. In addition, the comparative approach inherent in studying a range of invertebrate systems provides an evolutionary perspective to investigations of how neuronal phenotypes are defined at the molecular level.
Among invertebrates, leeches, primarily the medicinal leech species Hirudo medicinalis and H. verbana, have long been models for neurobiology. Pioneering neuroanatomical studies in the late nineteenth century [9] laid the basis for work which combines different experimental approaches to study facets of neurobiology and neurodevelopment ranging from behavior to ion channel function in life stages from the embryo to the adult [10][11][12][13][14][15].
The leech CNS comprises a ventral nerve cord of 32 segmental ganglia connected at its anterior end to a non-segmental dorsal ganglion. Twenty-one segmental ganglia innervate segments in the midbody of the animal. Anteriorly, four fused segmental ganglia constitute the ventral portion of the head-brain, connected to the non-segmental dorsal ganglion by circumesophageal nerves; seven fused segmental ganglia make up a tailbrain that innervates the posterior sucker.
In Hirudo, most segmental ganglia contain approximately 400 bilaterally paired, individually identifiable neurons distributed in a stereotyped manner. Many identified neurons including sensory neurons, motoneurons and the large neuromodulatory serotonergic Retzius neurons are conserved among different segments in each individual, among individuals within each species, and among different species. The physiological characteristics and behavioral roles are well-known for many of these neurons, including three distinct classes of mechanosensory neurons, the T (touch), P (pressure), and N (nociceptive) cells, whose large cell bodies can be visually identified by their size and position within the segmental ganglia (Fig. 1a).
Three bilateral pairs of T cells exhibit brief action potentials, each followed by a rapidly recovering afterhyperpolarization [16]. In response to gentle touch or water flow, T cells fire rapidly adapting bursts of action potentials. The three ipsilateral T cells have partially overlapping dorsal, ventral and lateral receptive fields, respectively, within the ipsilateral body wall [16,17]. Two bilateral pairs of P cells exhibit somewhat slower action potentials and a marked sag potential in response to hyperpolarizing current injections [18]. Their mechanical thresholds are higher than those of T cells, and unlike T cells they give sustained responses during mechanical stimulation [16,[19][20][21]. The medial and lateral cell bodies of P cells innervate partially overlapping dorsal and ventral receptive fields, respectively. P cells exemplify the use of population coding vectors to denote the position of stimuli [19,20]. The two pairs of N cells in each ganglion are polymodal nociceptors. In addition to exhibiting the highest threshold to mechanical stimulation of the skin, they respond to other noxious stimuli such as acid, high osmolarity, heat and capsaicin [22]. Their action potentials are followed by prominent afterhyperpolarizations [16]. In addition to overlapping receptive fields in the body wall, the N cells with more medial cell bodies also innervate the gut [17]; medial and lateral N cells also differ in their sensitivity to capsaicin and acid [22].
In addition to the three classes of mechanosensory neurons, each ganglion contains a prominent pair of serotonergic neuromodulatory Retzius (Rz) cells [23,24]. The electrically coupled Retzius cells have the largest cell bodies in the ganglion [25]; depending on the pattern of electrical activity they may release serotonin from nerve endings, from the axon or from the soma [26].
Identified neurons in the adult leech can be removed from ganglia individually; isolated neurons maintain their electrophysiological properties and may grow or form connections with appropriate targets [27,28]. Thus, the leech Hirudo provides a system in which the physiological and behavioral functions of distinct, clearly defined classes of mechanosensory (T, P and N cells) and neurosecretory (Rz) neurons can be examined in detail. Here, we have used the fact that these isolated cells robustly maintain their specific phenotypes to remove and pool neurons of specific phenotypes for RNA extraction and sequencing. For comparison, we have also profiled the transcriptome of the ganglia from which all four of these cell types had been removed. Bioinformatic analyses identified more than two thousand candidate genes whose expression differed significantly among the samples; these genes formed clusters which could be associated to varying extents with one or more of the identified cell types. We verified predicted expression patterns for selected genes through in situ hybridization (ISH) on whole leech ganglia. We also found that orthologous genes were (with certain exceptions) similarly expressed in ganglia of a rather distantly related leech, Helobdella austinensis, suggesting that the genes we assayed play evolutionarily conserved roles in this group.
In combination with genome data, transcriptional profiling also allowed us to identify candidate genes for future experiments from among expanded gene families, including specific piezo, deg/enac and trp genes as possible mechanotransducers in the T, P and N neurons.

Each neuronal phenotype exhibits a distinctive transcriptional profile
To determine the transcriptional profile of the four cell types, we first created a reference transcriptome by combining the RNA-Seq libraries made from pools of identified T, P, N, and Rz neurons, and libraries made from the remainder of the ganglion after dissection of the four cell types, hereinafter referred to as ganglion-minus (Gm). Three biological replicates were prepared and sequenced for each cell/tissue type, for a total of 15 libraries (Table 1) One of the three Rz replicate libraries yielded a mapping rate 20% lower than any of the other 14 libraries, and was not included in this analysis. In addition, one P cell replicate was confirmed to be an outlier through robustPCA analysis [29] and one N cell replicate was a borderline outlier by robustPCA analysis and then confirmed to be an outlier by its placement on an MDS graph (Fig. S1). Both were removed from the subsequent analyses.
We processed and assembled the resultant reads with Trinity [30] to obtain a transcriptome containing 113, 388 "isoforms", sequences that may represent variants due to processes such as differential splicing. These isoforms were then grouped into 51,875 "genes" (Supplementary File 1), or unique sequence groups generated by Trinity. The assembled transcriptome has an average sequence length of 786 bp and an N50 of 1173 bp. By comparison, the average transcript length of the 24,432 predicted genes (gene models) in the draft genome assembled for the leech species Helobdella robusta is 1.2 kb and their N50 is 1763 bp. Thus, we attribute the discrepancy between the number of "genes" in the H. verbana transcriptome and the number of gene models in the Helobdella genome to a failure to assemble full Hirudo transcripts, so that two or more "genes" correspond to different parts of a single predicted Helobdella gene. This is supported by the fact that in a reduced dataset of only those transcripts that contain an open reading frame of 50 amino acids or more as measured by Transd e c o d e r ( h t t p s : / / g i t h u b . c o m / T r a n s D e c o d e r / TransDecoder/wiki) there are 35,590 Trinity "genes" and 89,363 "transcripts", suggesting that the remaining genes are either non-coding sequence (long non-coding RNAs or UTR) or misassembled transcripts (Supplementary File 2). As all of the transcripts had total read support of 1 TPM (Transcript per Million [31];) or greater, it is likely that they are short or 3′-biased. It is also possible that different potential splice isoforms were separated into separate "genes" when in fact they arise from the same genomic locus. In any case, the sequencing depth achieved by pooling large numbers of phenotypically distinct cell types should prove an important resource for future profiling work at the level of individual neurons.
Our BLAST analysis of the transcriptome assembly revealed that 17,632 (34%) of the "genes" had a significant (e-value < 0.05) BLAST hit in the H. robusta gene model database; consistent with the reasoning presented above, there were many cases in which two or more Hirudo "genes" mapped to a single Helobdella gene model. As two Hirudo medicinalis genomes were recently published [32,33], we also compared the transcriptome to the gene models of one [33] and showed that 28,021 (54%) of the "genes" had a significant BLAST hit. A similar but slightly lower fraction of the Hirudo "genes" 15,646 (30%) had a significant BLAST hit against the SwissProt non-redundant database [34]; we speculate that many of the Hirudo "genes" that failed to map to either the Helobdella or Hirudo genomes or the SwissProt database represent the more rapidly diverging untranslated regions (UTRs) of the transcripts. In the reduced ORF-filtered transcriptome the BLAST rates were 17, 632 (50%) for Helobdella proteins, 28,020 (79%) for H. medicinalis proteins, and 15,646 (44%) for the SWISS-Prot database. Finally, when we mapped the original sequence libraries back to the transcriptome we found that all libraries mapped within a range of 78 to 93.6%, suggesting that the transcriptome is representative of the input sequences.
To test the prediction that different cell types have distinct transcriptional profiles, we performed a Multi-Dimensional Scaling (MDS) analysis of the twelve libraries on genes that had a TPM > 2. As expected, the twelve samples segregated into groups roughly corresponding to cell type. The highest degree of internal consistency was for the three T cell transcriptomes and for the three Gm transcriptomes. The Rz cells exhibited the most divergent transcriptional profiles from the other sample types, consistent with their neurosecretory function (Fig.  1b). The N and P cells showed the highest similarity to each other, which correlates with their electrophysiological similarities.

Comparisons among cell types reveal clusters of differentially expressed genes
To determine the functional implications of the differences in transcriptional profiles, we performed all ten possible pairwise comparisons of the five different transcriptomes. These comparisons yielded a set of 3565 differentially expressed genes ( comparisons summarized in Table 2 was 7229, reflecting the fact that many of the 3565 differentially expressed genes occurred in more than one of the pairwise comparisons. The similarities and differences in overall transcriptional profiles among the samples, and in the expression of individual genes across samples, were explored using a hierarchical clustering analysis on the biological replicates and on the individual expression profiles of the differentially expressed genes described above (Fig. 2a). Individual samples grouped primarily by cell type (Fig. 1b).
Naively, one might have expected to obtain five clearly separated clusters of similar gene expression profiles, corresponding to the five sample types. But this was not the case--there was no discrimination height on the gene clustering tree that delineated five clusters correlating with the five sample types (Fig. 2b). At least in retrospect, this initial expectation seems unlikely, given the fact that multiple genes showed up in more than one pairwise comparison (Table 2), and also given the heterogeneity of cell types within the Gm samples. Instead, once the genes were clustered by similarities in expression, we chose a discrimination height on the tree (dotted line in Fig. 2b) that highlighted at least one distinct cluster for each cell type, giving nine clusters of similarly expressed genes for further analysis ( Fig. 2b and S2).
The mean expression patterns of the clusters ( Fig. 2c and S2) reveal that most of them contain genes that are enriched in particular sample types: genes in Clusters 2  T 0 and 8 tended to be enriched in Retzius cells; genes in Clusters 4 and 5 tended to be enriched in P cells; genes in Cluster 3 tended to be enriched in T cells; genes in Cluster 7 tended to be enriched in N cells; genes in Clusters 1 and 9 tended to be enriched in Gm samples; and those in Cluster 6 were only statistically associated with downregulation in Retzius cells.
To assess the statistical significance of the correlations between expression of the genes in each cluster with the enrichment in a particular cell type, we performed a cluster-trait analysis on the gene expression clusters (Fig. 2d,

In situ hybridization reveals gene expression predicted by transcriptional profiling
To test the validity of the gene expression profiles emerging from the RNASeq analyses, we performed ISH on isolated Hirudo ganglia, using probes for genes selected from various clusters (Table 3). This validation was of particular importance because of the possibility for errors in generating the pools of cells used for transcriptional profiling. For example, while the crosscontamination rate for the Rz samples should be near zero, because these cells are unmistakable due to their uniquely large size and position in the ganglion, P cell samples might be contaminated with occasional Leydig cells [35], which are similar in size and position to lateral P cells, notwithstanding differences in pigmentation. Similarly, rare cross-contamination of T and N neuron samples may occur because these two cell types, while differing in size, occupy adjacent and variable locations in the anterior portion of the ganglion.
For our ISH analysis, we sought to focus on genes with relatively abundant transcripts and relatively selective expression in one of the four cell types under investigation. For this purpose, we first re-examined the set of 3565 differentially expressed genes to identify those with: 1) BLAST e-values below 0.05 to both the SwissProt and Helobdella protein databases; 2) TPM counts above 20 in at least 2 biological replicates, and; 3) a standard error of the mean TPM count not exceeding 40% of the mean of the cell type with the highest expression level. Finally, we excluded genes from Cluster 6, which was not significantly associated with any of the four neuronal subtypes of interest, and Clusters 1 and 9, because they represents genes that were not expected to be enriched in the neuronal phenotypes of interest, because they are associated primarily with the Gm samples. From the resulting list we also excluded those with GO Terms indicating mitochondrial or ribosomal functions, leaving a list of 415 candidates (Supplementary File 4) from which we chose genes of interest as described below for ISH analysis (Table 3).
Current ISH protocols for adult Hirudo ganglia require the microsurgical removal, prior to fixation, of a protective sheath that encapsulates the ganglion [36][37][38]. Unfortunately, this results in the occasional loss or displacement of cell bodies, especially near the lateral edges of the ganglion where the cuts are made. Thus, the counts and spatial distribution of neuronal cell bodies observed in ISH experiments are more variable than in intact ganglia. Nonetheless, as described below, all of the genes tested exhibited characteristic patterns of In what follows, the gene names used result from molecular phylogenetic and BLAST analyses, as will be explained later in this paper. Consistent with the serotonergic character of the Rz neurons (e.g., [24,39]), genes encoding proteins involved in biosynthesis and transport of serotonin were prominent components of their transcriptional profile. In particular, Cluster 2 was enriched for transcripts encoding tryptophan hydroxylase (hve-tph, Trinity Gene ID comp20323_c0) and dopa decarboxylase (hve-ddc, comp24417_c0), two enzymes required for serotonin biosynthesis.
Hve-tph transcripts were readily detected by ISH in the giant Rz neurons, which occupy a prominent anteromedial location on the ventral surface of the ganglion. A strong ISH signal for hve-tph was also observed in two other pairs of smaller serotonergic neurons (cell pairs 21 and 61 [40];; Fig. 3a). Surprisingly, however, while hveddc was also readily detected in Rz neurons, we failed to detect an ISH signal for this transcript in cells 21 or 61 (Fig. 3b). The contrast between the ISH results for hvetph and hve-ddc was consistent with the differences in the transcript levels obtained from the Gm samples-those samples showed significantly higher counts for hve-tph transcripts than did the T, P, or N samples (Fig.  3a), whereas the counts for hve-ddc were uniformly low in all but the Rz samples (Fig. 3b).
ISH for three genes whose transcripts were relatively abundant and enriched in the P transcriptomes showed expression in both the P and N neurons (Fig. 3c-e). These genes, all associated with Cluster 4, encode a protocadherin homolog (hve-pcad1, comp15454_c0); a hyperpolarization-activated, cyclic nucleotide-gated cation channel (hve-hcn4, comp20462_ c0); and a voltage-gated potassium channel (hve-vgkc1, comp25036_c1). Probes for these three genes labeled combinations of N and P cells that varied somewhat from ganglion to ganglion and even across the midline of individual ganglia (Fig. 3c-e). We attribute this variability, at least in part, to loss or displacement of cells caused by desheathing the ganglia, but an alternative and interesting possibility is that gene expression differs between the medial and lateral members of the N and P cell pairs. Pharmacological and anatomical studies have revealed differences in innervation and sensory coding by the medial and lateral N Fig. 3 In situ hybridization (ISH) verification of expression patterns found by RNASeq. Eight Trinity genes from four clusters were chosen to represent the widest variety of potential staining patterns. a-h. In each panel: the graph at left denotes the expression levels of a Trinity gene in the RNASeq analysis, with the cell type on the X-axis and the average normalized read count of the transcript in transcripts per kilobase million (TPM) on the Y-axis; error bars denote the standard error of the mean; the micrograph at right shows a typical ISH staining pattern for the Trinity gene in an adult H. verbana (Hve) ganglion. All ganglia are oriented ventral-side up unless otherwise indicated. Tph = tryptophan hydroxylase, ddc = dopa decarboxylase, pcad1 = protocadherin 1, hcn4 = hyperpolarization-activated cyclic nucleotide-gated channel 4, vgkc1 = voltage-gated potassium channel 1, cola = collagen-alpha, ip3rb2 = inositol triphosphate receptor b2, hyp1 = hypothetical 1 cells [22], which we expect to reflect differences in their gene expression patterns.
In the transcriptional profiles of T neurons, two genes in Cluster 3 stood out rather unexpectedly, because their biochemical functions seem to correspond to ubiquitously expressed "housekeeping" genes. One encodes a putative collagen-alpha (hve-cola, comp13598_c0) and the other encodes a putative receptor for inositol triphosphate (hve-ip3rb2, comp24045_c0). For both these genes, the ISH pattern was exceptionally clear, showing three bilateral pairs of labeled neurons in the anterolateral portions of the ganglia correlating with the known positions of the T neuron cell bodies ( Fig. 3f and g). In light of these results, we speculate that the seemingly increased transcript counts for these genes in the N cell transcriptomes ( Fig. 3f and g) may represent errors in cell identification during sample preparation.
As a further test for the inferred identity of the neurons expressing hve-ip3rb2, and to illustrate the potential for combining molecular and physiological approaches in Hirudo ganglia, we used standard techniques to identify T cells by intracellular electrical recordings, and then labeled the identified T neurons by iontophoretic injection of a charged, fixable fluorescent dextran (see Materials and Methods for details). When such preparations were fixed and processed for hve-ip3rb2 ISH, the ISH product co-localized with the fluorescently labeled neurons, as expected (Fig. 4).
In addition to the genes discussed above, we also carried out ISH for a transcript representing a gene for which orthologs are known only from other annelid species. Such genes are candidates for evolutionary novelties, representing hypothetical (hyp) proteins. We chose one such candidate (comp21991_c0, hve-hyp1) from Cluster 5. While hve-hyp1 did not satisfy the criteria for selection above due to a lack of similarity to any proteins in the SWISS-Prot database, we chose it for further analysis to begin to probe how lineage-specific genes may be involved in neuronal specification and function.
Consistent with the read counts (Fig. 3h), hve-hyp1 was expressed primarily in N and P neurons (Fig. 3h).
To explore the extent to which the neuronal markers identified in Hirudo may be applicable to other leech species, we identified Helobdella orthologs for several of the differentially expressed Hirudo genes described above, and then performed ISH for their transcripts on Helobdella embryos at stage 10-11 of development, by which time the nervous system is fairly well differentiated and yet ISH can be carried out on intact embryos without dissection (Fig. S2).
As expected, two Rz markers (hau-tph and hau-ddc) were expressed in the highly conserved serotonergic Rz neurons [41]. Intriguingly, hau-tph was expressed also in the location of previously described ventrolateral, dorsolateral, and posteromedial serotonergic neurons (Fig. S2 [41];), but as we had observed for Hirudo (Fig. 3), hauddc was not expressed in these cells.
The expression patterns observed for the Helobdella genes hau-pcad1 and hau-hcn4 were also similar to those of their Hirudo orthologs. Because ISH on Helobdella was performed without dissecting the sheath surrounding the ganglion, the expression patterns for these genes were not disrupted by loss or displacement of cells during processing, and clearly repeating patterns were observed. For hau-pcad1, four pairs of cells were observed in most ganglia, in positions expected for the bilateral pairs of medial and lateral N and P neurons, but the expression levels were lower in the putative medial N cell than in the other three cells. For hau-hcn4 three pairs of cells were observed in most ganglia, corresponding to both of the putative P neurons and the lateral but not the medial N neuron.
In contrast to the results for putative Rz, N and P neuron markers, neither of the T cell markers surveyed (hau-ip3rb2 and hau-cola) showed noticeably stronger expression in any particular set of ganglionic neurons (Fig. S2). This result suggests that either the T cells in Helobdella use different genes for their specification or shows that RDA signal in neuronal somata is masked by ISH product. Background signal is due to autofluorescence arising during ISH processing function, or that the T cells lag behind other neurons in their development and were not yet expressing these genes at stage 11.

Molecular phylogeny of amino acid decarboxylases (AADs)
It is paradoxical that the aromatic amino acid decarboxylase gene enriched in Rz neurons was not detected in other known serotonergic neurons in either Hirudo or Helobdella. One explanation for this observation is that the other neurons are recycling serotonin, taking up serotonin released by the neuromodulatory Rz neurons and then releasing it from their own synapses. This seems unparsimonious, however, given that the non-Rz serotonergic neurons do express tryptophan hydroxylase in both species.
Alternatively, these neurons may use a different aromatic AAD to synthesize serotonin. The Helobdella genome encodes four genes annotated as aromatic AADs (AAADs). Three of these genes (JGI gene models 84403, 84539 and 101612) represent comparatively recent duplication events--they are adjacent to one another on genome scaffold 40 and exhibit 57-68% amino acid sequence identity. The fourth gene (JGI gene model 186120), which lies on a separate scaffold and shows only 41-52% sequence identity to the other three, is the ortholog of hve-ddc, the gene expressed in Rz neurons.
To explore this issue further, we constructed a molecular phylogeny for the set of AAD sequences obtained by BLASTing a database of non-redundant protein sequences from two model organisms (mouse Mus musculus and fruit fly Drosophila melanogaster), and five sequenced lophotrochozoan species (Helobdella robusta, polychaete annelid Capitella teleta, bivalve Crassostrea gigas, cephalopod Octopus bimaculoides, and gastropod Lottia gigantea). Mus and Drosophila were chosen to represent the deuterostomes and ecdysozoans, respectively, because the AAADs used for serotonin biosynthesis in these species are known [42,43].
The genes recovered form two main clades (Fig. 5). One clade comprises acidic amino acid decarboxylases, including a paraphyletic group of glutamic acid decarboxylases (GADs) used to synthesize the neurotransmitter GABA. This GAD subclade included sequences from five of the species. The other clade comprises the AAADs. The AAAD clade contains three subclades with broad phylogenetic representation--histidine decarboxylases (HDs, used in histamine biosynthesis, apparently missing in leeches), tyrosine decarboxylases (TDs, used in tyramine and octopamine biosynthesis) and DOPA decarboxylases (DDCs, used in dopamine and serotonin biosynthesis). The gene expressed in leech Rz neurons belongs to the DDC clade, as do the mouse and fly genes used in serotonin biosynthesis. The other three leech AAADs group within the TD subclade. We speculate that one or more of these genes has been co-opted for serotonin biosynthesis in the non-Rz serotonergic neurons but the question remains open. Firstly, the Hirudo transcriptome generated here did not contain identifiable orthologs for all three of the Helobdella TD genes. Moreover, for the one Hirudo transcript (comp21658_ c0) that did show sequence similarity to one of the Helobdella gene models (101612), the normalized read counts were very low (less than 4 in all samples) in all Gm samples, compared with normalized hve-tph read counts of more than 30.

Expansion of the hcn gene family in leech
One of the prominently upregulated genes in the P and, to a lesser extent, the N cells was a hyperpolarizationactivated cyclic nucleotide-gated (HCN) channel (Figs. 3 and S2). This channel is a candidate for mediating a prominent, hyperpolarization activated "sag" current that is characteristic of leech P neurons [44] and heart interneurons [45]. However, evidence of a hyperpolarizationactivated current has also been found in the T, P, N, and Rz neurons [18,46].
In our transcriptome, we found four distinct additional hcn "genes", and examination of another transcriptome [47] yielded two more, suggesting that Hirudo contains at least seven HCN genes. Comparison of these transcripts to the Helobdella genome revealed the presence of seven orthologous, genomically distinct, HCN channel genes and one additional gene as yet found only in Helobdella. While this is not a large gene family in absolute terms, it still represents a significant expansion, given that the largest number found in an animal genome to date is four (for mammals; Fig. 6, expanded tree in Fig. S4). Our phylogenetic analysis revealed that the HCN gene family has expanded independently in the annelid and vertebrate lineages. Moreover, the expansion of the HCN gene family in annelids appears to be quite recent, as the genome of another annelid (Capitella teleta) only encodes one HCN gene, and other lophotrochozoans have at most two (data not shown). Despite their relatively recent emergence, the seven HCN genes in leech appear to have divergent patterns of expression (Fig. 6a). Thus, these results exemplify how gene family diversification may contribute to cell phenotype diversification.

A phylogenetically distinct IP3 receptor (IP3R) sub-type is preferentially expressed in touch sensitive neurons
Finding an IP3R-encoding transcript among the most prominent elements of the T neuron transcriptional profile, as judged by both relative enrichment and transcript abundance, was unexpected, because we usually think of the IP3Rs as ubiquitously expressed regulators of calcium release from endoplasmic reticulum [48]. Therefore, having validated this result by ISH (Figs. 3 and 4), we explored the diversity of this gene family in the Hirudo transcriptome and the Helobdella genome.
Previous work suggests that the bilaterian ancestor had three types of IP3 receptors: IP3RA, IP3RB, which has been lost in the vertebrates, and ryanodine receptors (RyaR; Fig. 7 [49];). Our initial BLAST analyses revealed 20 transcripts in the Hirudo transcriptome that BLAST to the IP3R family, compared with five gene loci in the Helobdella genome. As IP3Rs are typically large proteins, these five genes were often spread out over two or more (machine annotated) gene models in the Helobdella genome. Accordingly, the conceptually translated polypeptides of the 20 Hirudo ip3r transcripts map to different regions of four of the five IP3Rs inferred from the Helobdella genome, so our data are consistent with a set of at least four mutually orthologous IP3Rs in the two leech species.
The five Helobdella IP3Rs include two IP3RA genes, two IP3RB genes, and one RyaR (Fig. 7). Our phylogenetic analysis indicates that the T-enriched IP3R is ip3rb2. The IP3RB family of IP3Rs is understudied because it has not been reported in the vertebrates and is also absent in D. melanogaster and C. elegans. Given the absence of an ISH signal for ip3rb2 in N neurons, we suspect that the apparently significant levels of ip3rb2 in the N cell transcriptome may represent contamination of the N cell pools by mis-identified T neurons. In addition to ip3rb2, the T cells express ip3ra1 and ip3ra2 at levels comparable to the other profiled neurons and the ganglion as a whole.

Canonical mechanoreceptor candidate genes in the transcriptional profiles of the leech nervous system
The T, P and N neurons in the leech nervous system provide a cellularly well-defined and accessible system in which to study the mechanisms of mechanotransduction in three discrete classes of sensory neurons. As a start toward this goal, we examined the transcriptional profiles for genes related to those that have been implicated in mechanotransduction in other systems, including members of the piezo, trp and deg/enac gene families. This approach is particularly relevant for the trp and enac gene families, where the large numbers of paralogs would otherwise complicate systematic analysis.
The piezos are an ancient gene family with homologs in protozoa, plants and animals [50]. The piezos encode multipass transmembrane proteins that are required for touch sensitivity in mammalian Merkel cells [51] and Drosophila nociceptors [52]. In contrast to the Trp and Deg/ENaC channels discussed below, the diverse known physiological roles of Piezos all arise from mechanotransducer functions [53]. The Helobdella genome encodes two piezo genes, but a molecular phylogeny indicates that these are paralogs rather than orthologs of the two mammalian piezos (Fig. 8). The Hirudo ortholog of Helobdella piezo1, comp25540_c0, was among the list of differentially regulated genes and was grouped into cluster 1, as it is enriched in the Gm samples and T cells (Supplementary File 3). Normalized read counts for Hve-piezo1 are elevated 3-fold in T cells relative to the other neuronal phenotypes. Manual inspection of the Hirudo transcriptome revealed an ortholog of Helobdella piezo2 as well, but this gene was not differentially regulated in our analysis (Fig. 8). Thus, leech piezo1 in particular is a candidate for investigation as a mechanotransducer for touch in leech.
The transient receptor potential (trp) genes encode a diverse family of channel proteins, various members of which are implicated in transducing thermal, chemical and mechanical stimuli [54][55][56][57]. Seven subfamilies of Trp channels were present in the bilaterian ancestor (Trps A, C, M, ML, P, N, and V) [58,59].
Many independent trp gene duplications have occurred as well. Thus, we find 16 trp sequences in the Hirudo nervous system transcriptome, each of which has an ortholog in the Helobdella genome; these 16 genes include representatives of all sub-families except TrpN and TrpP (Fig. 9). Duplicate genes are present within the TrpC, M and V sub-families; for the M and V sub-families, these appear to represent duplications that had occurred in the bilaterian (M) and protostome (V) ancestors [58]. Eight genes in the Hirudo neuronal transcriptome lie within the trpA, V, and M families, which contain putative sensory genes in other organisms. Among these eight genes, only one, the Hirudo trpA1 transcript, was enriched in the P cell transcriptome to a statistically significant extent. Thus, this gene is another promising candidate for encoding a sensory transducer in leech.
Two Deg/ENaC channels, encoded by the mec4 and mec10 genes, mediate mechanotransduction in C. elegans touch neurons [60]. The deg/enac gene family has expanded extensively in the lineage leading to leech from the annelid ancestor; the Helobdella robusta genome contains 65 gene models labeled with the Pfam term 008585, corresponding to ENaCs [61]. This expansion precludes us from identifying orthologs of mec4 and mec10 in leech. Our transcriptome of the Hirudo nervous system yielded at least 13 transcripts encoding putative ENaCs, which BLAST to 9 presumptive orthologs in the Helobdella genome. Two of these Hirudo transcripts appear to be enriched in mechanosensory neurons; one (Hve-degenac1, corresponding to the Helobdella gene model 168363) shows moderate read counts (roughly 100 TPM) in the N cell sample and much lower counts (less than 25 TPM) in the other cell types and in the Gm samples. The other (Hve-degenac2, corresponding to the Helobdella gene model 185250) is expressed at much higher levels in both P and N cell samples (read counts of 1300 and 600 TPM, respectively, Fig. 10).

Transcriptional profiles of individually identified, phenotypically distinct neurons
In the work presented here, we have used deep RNA sequencing on pools of individually dissected cells to generate extensive transcriptional profiles for four physiologically and functionally distinct classes of identified neurons from the leech Hirudo verbana, as well as for the overall ganglion. Pairwise comparisons between and among the datasets allowed us to generate lists of candidates for genes whose differential expression would contribute to the phenotypic differences among the extensively characterized touch (T), pressure (P), nociceptive (N) and serotonergic neurosecretory Retzius (Rz) neurons of Hirudo; in situ hybridization (ISH) allowed us to validate the predicted gene expression patterns of selected genes.
The transcriptomes generated here also set the stage for fine-grained analysis of differences among neurons within these various classes, for example by scRNASeq comparisons of medial and lateral N and P neurons within ganglia, and of segment-specific differences in Rz neurons of reproductive segments M5 and M6, versus other midbody segments [22,62].

Expansion of the hcn and ip3r gene families
Comparative genomic studies indicate that the diversification of bilaterian taxa has been accompanied by lineage-specific expansions of various gene families. Leeches, for example, relative to an inferred annelid ancestor, appear to have undergone an expansion of the gene families encoding innexins, epithelial sodium channels (ENaCs) and homeodomain-containing transcription factors [61,63], and, as presented here, the hyperpolarization-activated, cyclic   Finding that one of these hcn genes, hcn4, is expressed preferentially in P and N cells correlates with previous work showing that P and N cells exhibit enhanced sag voltages and shows how divergence in the regulation of expression among duplicated genes may contribute to divergence in cellular phenotypes. Moreover, this result enables biophysical studies to investigate the extent to which this expansion has been accompanied by functional divergence of the HCN channels.
A second instance of gene family expansion highlighted by this work is in the gene family encoding IP3 receptors (IP3Rs). In the IP3R gene family, three subtypes were inferred for the bilaterian ancestor: IP3RA, IP3RB and RyaR. The leech genome contains duplications of both IP3RA and IP3RB. We found that ip3rb2 is highly enriched in the T neurons (which respond to light touch) relative to any other cells in the ganglion. Molecular phylogenies indicate that the IP3RB sub-family of IP3 receptors was present in the bilaterian ancestor, but has been lost in the lineage leading to vertebrates. Accordingly, the physiological characterization of these receptors for the most part remain to be determined.
While physiological and pharmacological evidence shows that Rz neurons employ both IP3R-and RyaRmediated Ca + 2 release [64,65], we detect ip3ra and ip3rb but not ryar transcripts in this cell type. We posit that this is due to a difference in the limit of detection for expression of this transcript, possibly due to sequence artifacts introduced during the de novo transcriptome generation, as its expression levels are low in all of our samples (Fig. 5). In any case, this discrepancy highlights the need to complement bioinformatic analyses with direct experimentation.
Given that the T cells express both paralogs of the broadly conserved ip3ra sub-family in addition to ip3rb2, we speculate that the broadly expressed IP3RAs carry out the housekeeping functions of Ca + 2 homeostasis, and that the IP3RB family may have been co-opted evolutionarily for T cell-specific functions, the nature of which remain to be determined. We note that mechanical modulation of IP3R-dependent Ca + 2 release has been observed in mouse endothelium [66].

Identification of candidate mechanotransducer genes
Transcriptional profiling allows us to sort through dozens of potential candidates and identify specific piezo, trp and deg/enac homologs as candidate transduction channels in leech mechanosensory neurons. We identify one of two piezo genes, two of~65 deg/enac genes, and one of at least 16 trp genes as prime transduction candidates in the three distinct classes of leech mechanosensory neurons. Moreover, these genes appear to be differentially expressed among the three classes of mechanosensory neurons, which correlates with their distinct physiological properties. Specifically, leech piezo1 appears to be up-regulated in the T neurons, which transduce light touch; this suggests a possible parallel with mammalian piezo, which functions to transduce touch in Merkel cells and the associated sensory neurons [51]. In contrast, the P neurons appear to upregulate the expression of leech trpA1 and one of the many leech deg/enac genes (Hve-degenac2, also increased in N cells), while a different deg/enac is differentially expressed in the N neurons (Hve-degenac1). Since N neurons function as multi-modal nociceptors, responding to salt, acid and heat, we anticipate that other receptors remain to be associated with this class of neurons.
Another intriguing possibility is that the genes that we identify in this study may interact with each other to transduce mechanical signals. For example, in several tissues in vertebrates, Trp channel activation can cause calcium release through IP3 receptors in the ER. While the Trp channels identified in vertebrates, such as TrpM7 [67] and Pkd2 [68], do not have exact orthologs in the leech, the fact that T cells differentially produce a noncanonical IP3 receptor (ip3rb2, Fig. 7) and express a member of the TrpM family (trpM-beta, Fig. 9) suggests that a similar mechanism may be at play in the lighttouch responsive neurons. Given the amenability of the leech for CRISPR/Cas9 and antisense knockdown techniques, we anticipate that future work will be able to test the functional roles of these candidates and other genes in manifesting the distinct phenotypes of the T, P and N neurons.

Similarities and differences between Hirudo and Helobdella
Among leeches, the utility of hirudinid species (chiefly Hirudo verbana and H. medicinalis) as models for physiology and behavior is complemented by the utility of glossiphoniid species (e.g., Helobdella robusta and H. austinensis) as models for studying early development [69]. In addition to the advantages of being able to apply embryological approaches to the ontogeny of behavior, delineating the similarities and differences among different leech species provides an evolutionary perspective for within this taxon. Comparing the expression of orthologous genes in Helobdella and Hirudo revealed similarities in expression of Hau-hcn4, Hau-pcad1, Hauddc and Hau-tph, consistent with known similarities in ganglion architecture among leeches [44,[70][71][72].
One curious similarity between Hirudo and Helobdella is that in both species, the dopa decarboxylase (ddc) gene that is enriched in serotonergic Rz neurons is not expressed in the smaller, dorsolateral and ventrolateral serotonergic neurons, while tryptophan hydroxylase (tph) is strongly expressed in all three serotonergic neurons in both species. It remains to be determined whether the decarboxylation step in serotonin biosynthesis is carried out by the product of one of the duplicated genes in the tyrosine decarboxylase clade of AADs in the lateral serotonergic neurons.
A noteworthy difference between the two species is that neither of the Helobdella orthologs of the T cell marker genes from Hirudo, encoding collagen-alpha and IP3 receptor type B, respectively showed the expected expression patterns in juvenile Helobdella. We speculate that the T neurons in the juvenile Helobdella used here had not yet differentiated to the point of expressing the collagen-alpha or ip3rb2 genes. An obvious experiment is to test for the expression of these genes in ganglia of adult Helobdella, but at present this experiment is technically challenging.

Conclusions
In conclusion, the work presented here provides comprehensive transcriptional profiles for four phenotypically distinct classes of identified neurons in the segmental nervous system of the leech Hirudo verbana, a tractable model for studying the neurobiological basis of behavior in terms of the properties and connections of individually identified cells. We have used ISH to show that candidate genes exhibit the predicted patterns of expression in Hirudo and that orthologous genes are similarly expressed in the nervous system of the leech Helobdella austinensis, a tractable model for studying annelid development at the cellular and molecular levels. This lays the basis for future work leveraging the strengths of each system and their underlying similarities to investigate the molecular processes underlying and linking mechanosensation, cell type specification, and behavior. Finally, investigating the differences between the species provides opportunities for studying the evolution of behavioral differences among species at the same level of cellular and molecular detail.

Isolation of identified Mechanosensory neurons
Visually identified T, P, N and Retzius neurons were isolated individually from the central nervous system of adult H. verbana as described previously [75,76]. In brief, leeches were anesthetized by immersion in ice, and short chains of midbody segmental ganglia (with the exception of ganglia 5 and 6) were dissected in Leech Ringer's solution [77] and pinned in a Sylgard-bottom dish. The ganglia were kept in L-15 culture medium (Gibco) supplemented with 6 mg/ml glucose, 0.1 mg/ml gentamycin (Sigma) and 2% heat-inactivated fetal calf serum (FCS, Microlab). The capsule of each ganglion was opened to expose cell somata and the ganglia chains were incubated for 1 h in 2 mg/ml collagenase/dispase (Boehringer-Mannheim). After the enzyme treatment, Retzius, T, P and N neurons were identified visually by their size and location in the ganglion. Individual neurons were removed from the ganglia by suction using a fire-polished glass pipette. Isolated neurons were rinsed several times in L-15 to sterilize them and remove debris. Three groups of ten leeches were used, yielding three biological replicates, of~300 cells each, for each cell type. Finally, ganglia from which all the mechanosensory and Retzius neurons had been removed were pooled to create three biological replicates to create the "ganglion" transcriptome.

RNA extraction and sequencing
RNA was isolated from pooled neurons that were stored at − 80 in saline solution with 1% (v/v) Recombinant RNasin Ribonuclease Inhibitor (Promega). The RNA was extracted with the Quick-RNA micro prep kit (Zymo) and eluted in 7 μL of nuclease-free water. The mRNA was isolated using the DynaBeads mRNA Purification kit (ThermoFisher) to select for poly-A RNA and libraries were constructed using the PrepX DNA Library kit (Takara). The resultant RNA libraries were sequenced by the Functional Genomics Library at UC Berkeley on an Illumina HiSeq 2500 with 2 × 100 bp reads.

Transcriptome assembly and alignment
We removed sequencing adapters and bases with low quality scores from the reads using the settings: ILLU-MINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILI NG:3 SLIDINGWINDOW:4:15 MINLEN:36. in Trimmomatic [78] and verified the quality post-trimming with FastQC. Overrepresented sequences in the data set post-processing were removed using custom Illumina-Clip parameters in Trimmomatic. To facilitate de novo assembly of the transcripts, we performed k-mer normalization using the built-in kmer normalization script in Trinity [30]. In testing phases we also used khmer [79] with a cutoff value of 20 or 50 and a kmer value of 17 or 20 to compare alternative assembly methods. We then used Trinity [30] to create de novo transcriptomes using these normalized datasets and determined the most representative transcriptome of the datasets. The resultant transcriptome was annotated using BLASTX [80] against the full Helobdella robusta genomic protein model database [61] and the SWIS SProt Database [34]. The trimmed reads were then aligned to the transcriptome and the expression levels were determined using Kallisto [81].

Expression analysis
To identify genes that were differentially expressed among the five sample types, all ten possible pairwise comparisons were performed using the R package edgeR [82] using a p-value of 0.01 after adjusting for multiple comparisons. All differentially regulated genes were then subjected to hierarchical clustering and grouped into 9 clusters using the cutree function and plotted using the R package dendextend [83]. A cutoff giving nine clusters was chosen because it was the smallest number of clusters that allowed for discrimination between all cell groups. Cluster-trait correlation analyses were performed using calculations from the R package WGCNA [84]. Plots were made using the R packages gplots, ggplot2 [85], and reshape2 [86].

In situ hybridization
We dissected out ganglia from various midbody segments (M1 through M21, excluding the reproductive segments M5 and M6) of adult H. verbana on ice and pinned them onto small, Sylgard-coated plastic petri dishes in Leech Ringer's solution [77]. We then removed the proteinaceous outer sheath using a micro knife (Fine Science Tools) on the ventral face of the ganglion to expose the neuronal cell bodies. At this point, roughly trapezoidal wedges of sylgard to which one or more ganglia were pinned were cut from the dishes and transferred to 1.7 ml polypropylene centrifuge tubes for all further processing. This step simplified the solution changes, reduced the volumes of solution required at each step, reduced damage to the ganglia, and prevented the nerves from folding over the ganglia during the final dehydration steps.
We then fixed the ganglia in 4% paraformaldehyde in 0.5x phosphate-buffered saline (PBS) for 2-3 h at room temperature. The ganglia were then brought through a methanol series (5 min at RT in 30, 50, 70, and 90% methanol), washed 3 × 5 min in 100% methanol and stored overnight or until needed in 100% methanol at 4 degrees C, then rehydrated in a methanol series (5 min at RT in 90, 70, 50, and 30% methanol) and washed 2 × 5 min at RT in PBS or PTw (PBS with 0.1% Tween 20). The samples were then digested with 20 μg/mL Proteinase K in PTw for 3-10 min at RT to allow for greater probe access. Digestion was stopped with one 30s wash followed by 2 × 5 min washes in 2 mg/mL glycine in PBS. The ganglia were then washed 3 × 5 min in PBS and post-fixed in 4% paraformaldehyde in PBS for 30-60 min at RT. The fixative was then removed with 2 × 5 min washes in PTw.
To prepare for probe hybridization the ganglia were incubated for 5-10 min in a 1:1 solution of PTw and Hybridization buffer (Hyb: 50% super pure formamide, 5X SSC, 0.2 mg/mL torula RNA, 1X Denhardt's solution, 0.1 mg/mL heparin, 0.1% Tween-20, 1 mg/mL CHAPS and 9.2 mMol Citric Acid), followed by 5 min in Hyb at RT and then 3-16 h in fresh Hyb at 65°C. For hybridization, the ganglia were covered with 500-750 μL of Hyb containing 0.2-2.0 μg/mL riboprobe (denatured at 65°C for 25 min prior to addition) and incubated in the probe solution for 16-48 h at 65°C in a rocking hybridization oven.
Probes were designed by using the IDT PrimerQuest tool to optimize for a PCR product of about 1000 nucleotides from the transcript of interest. Sequences of the primers used for probe synthesis can be found in Table  S1. The PCR products were purified with the GeneJet Gel Extraction Kit (ThermoFisher) and then ligated into plasmids with the pGEM-T Easy Vector System (Promega). These plasmids were transformed into NEB 5alpha competent E. coli (New England Biolabs), amplified in Luria-Bertani broth cultures, purified from the cultures using the QIAprep Spin Miniprep Kit (Qiagen), and sequenced by the UC Berkeley DNA Sequencing Facility. Plasmids from positive clones were linearized by PCR or restriction digest, and riboprobes were synthesized with MEGAscript T7 and SP6 transcription kits (ThermoFisher) and Digoxigenin-11-UTP (Sigma-Aldrich).
To remove unbound probe, the samples were transferred to 2X SSC through a series of 5 min washes at 65°C (Hyb, 1:1 Hyb:SSC, SSC), followed by 20 min 0.2X SSC in PTw, 20 min 0.1X PTw and then 30 s and 5 min in PTw, all at RT. To reduce nonspecific antibody staining the ganglia were incubated with 1X Western Blocking Reagent (Roche) for 2-3 h at RT. The samples were then incubated in 1:4000 anti-Digoxigenin conjugated to Alkaline Phosphatase (Roche) in 1X Western Blocking solution for 18 h at 4°C with rotation. The ganglia were rinsed 3 × 30 s in PTw, followed by 5 × 1-h washes in PTw to remove excess antibody.
To visualize the antibody staining in the ganglia, PTw was replaced with BMPurple reagent (Sigma Aldrich) and incubated at RT in the dark until color was seen. The reaction was terminated by rinsing the samples in PTw 3 × 5 min followed by an hour-long incubation in PTw. The specimens were then cleared in a glycerol series (first 40-50% and then 80%), after which they were unpinned from the sylgard block and mounted for microscopy.

Dye tracing
Individual ganglia were removed by dissection, pinned onto triangular-shaped slivers of Sylgard designed to slip into 1.5 ml conical tubes, and then desheathed on the ventral or dorsal side. In some experiments, one or more T cells were injected with fixable fluorescent dye (Dextran, Texas Red 3000 MW Lysine; ThermoFisher; 50 mg/ ml in the electrode) by iontophoresis, alternating + 5 and − 5 nA pulses, each with a duration of 600 ms. Injection times were typically on the order of 10-15 min total. Cells were fixed and processed for in situ hybridization as described above.

Molecular Phylogenetics
All protein sequences were obtained from the NCBI protein database [87], the H. robusta genome browser [88], the L. gigantea genome browser [89], or the C. teleta genome browser [90]. Protein alignments were generated via the phylogeny.fr website [91] using MUSCLE [92] and curated using Gblocks or Noisy curation alignment software [93,94]. Finally, the maximum-likelihood phylogenies were generated with PhyML [95] using the LG substitution model and 100 replicates during any bootstrapping process. All sequences used in these analyses can be found in Supplementary File 6.

Imaging
Images in this manuscript were acquired with one of the following microscopes and configurations: 1) Leica MDG30 dissecting microscope with a Planapo 2X objective attached to a Leica DFC300 FX camera using Leica Application Suite version 4.3.0 for image acquisition. (Fig. 3a-d, f-h).