A gene expression fingerprint of C. elegans embryonic motor neurons

Background Differential gene expression specifies the highly diverse cell types that constitute the nervous system. With its sequenced genome and simple, well-defined neuroanatomy, the nematode C. elegans is a useful model system in which to correlate gene expression with neuron identity. The UNC-4 transcription factor is expressed in thirteen embryonic motor neurons where it specifies axonal morphology and synaptic function. These cells can be marked with an unc-4::GFP reporter transgene. Here we describe a powerful strategy, Micro-Array Profiling of C. elegans cells (MAPCeL), and confirm that this approach provides a comprehensive gene expression profile of unc-4::GFP motor neurons in vivo. Results Fluorescence Activated Cell Sorting (FACS) was used to isolate unc-4::GFP neurons from primary cultures of C. elegans embryonic cells. Microarray experiments detected 6,217 unique transcripts of which ~1,000 are enriched in unc-4::GFP neurons relative to the average nematode embryonic cell. The reliability of these data was validated by the detection of known cell-specific transcripts and by expression in UNC-4 motor neurons of GFP reporters derived from the enriched data set. In addition to genes involved in neurotransmitter packaging and release, the microarray data include transcripts for receptors to a remarkably wide variety of signaling molecules. The added presence of a robust array of G-protein pathway components is indicative of complex and highly integrated mechanisms for modulating motor neuron activity. Over half of the enriched genes (537) have human homologs, a finding that could reflect substantial overlap with the gene expression repertoire of mammalian motor neurons. Conclusion We have described a microarray-based method, MAPCeL, for profiling gene expression in specific C. elegans motor neurons and provide evidence that this approach can reveal candidate genes for key roles in the differentiation and function of these cells. These methods can now be applied to generate a gene expression map of the C. elegans nervous system.


Background
The nervous system is assembled from disparate classes of neurons that together define the overall properties of the network. The specific functions of these neurons are governed by genetic programs that control cell fate [1]. Thus, a key to understanding the molecular basis for neural function is to establish the gene expression blueprint that orchestrates neuronal differentiation. With its simple, well-defined nervous system and powerful genetics, the nematode C. elegans is a useful model system for addressing this issue. The C. elegans hermaphrodite nervous system is composed of exactly 302 neurons. The morphology and connectivity of each one of these neurons has been defined at high resolution [2]. In addition, the birth of each neuroblast is embedded in a lineage diagram of every cell division in C. elegans development [3,4]. The C. elegans genome is fully sequenced and contains ~20,000 predicted genes [5]. At a fundamental level, the identity of a given class of neuron is defined by a unique combination of these genes. In principle, microarray-based strategies could be employed to establish these cell-specific patterns of gene expression. However, the small size of the nematode has limited access to individual cells for molecular analysis. Here we describe a strategy, MAPCeL (Micro-Array Profiling of C. elegans Cells) that overcomes these obstacles to generate neuron-specific gene expression profiles.
MAPCeL exploits recently developed methods of culturing C. elegans embryonic cells. GFP markers for specific classes of neurons and muscle cells are expressed in vitro and can be used to identify the corresponding differentiated cell types. We established that these GFP cells arise at a frequency predicted by their abundance in the intact embryo and display normal morphological, molecular, and physiological characteristics [6]. For example, a GFP reporter for the unc-4 homeodomain transcription factor gene is expressed in 13 motor neurons out of a total 550 cells in the mature embryo (Figs. 2D, 5A) [7]. In vitro, we detected a comparable fraction (~2%) of unc-4::GFP cells. Moreover, cultured unc-4::GFP cells adopt neuronal-like processes and express molecular markers also seen in vivo (Fig. 2D-E) [6]. On the basis of these results, we have profiled cultured unc-4::GFP neurons with the expectation that this approach will provide a comprehensive picture of genes expressed in these motor neurons in vivo.
We describe methods for isolating unc-4::GFP-labeled neurons by Fluorescence Activated Cell Sorting (FACS). mRNA from these cells is amplified, labeled and hybridized to the C. elegans Affymetrix Gene Chip. A comparison to microarray data derived from all embryonic cells reveals ~1000 genes with significantly higher levels of expression in unc-4::GFP neurons. The validity of these data is supported by the inclusion of genes known to be expressed in these neurons in vivo and by the generation of new GFP reporters from previously uncharacterized genes on this list. We conclude that MAPCeL offers a reliable strategy for profiling gene expression of a specific motor neuron class. Using this approach, we have provided, for the first time, a comprehensive picture of gene expression in a subset of C. elegans motor neurons. We expect that MAPCeL can now be applied to other C. elegans neurons and thereby link specific neuronal fates with unique combinations of differentially expressed genes.

Results
Profiling strategy unc-4::GFP is expressed in 13 embryonic motor neurons; (1) I5 (pharynx), (3) SAB (retrovesicular ganglion), and (9) DA (ventral nerve cord) (Figs 2D, 5A) [7]. Although each of the motor neuron classes is morphologically distinct, the DA and SAB motor neurons, which constitute the majority (12/13) of unc-4:GFP neurons, also share several characteristics including common presynaptic inputs, anteriorly directed axonal processes, cholinergic activity, and similar defects in unc-4 mutants [2,8]. It is therefore reasonable to assume that many of the same genes would be expressed in both of these motor neuron classes and that these could be revealed in microarray experiments.
A schematic of our approach to profile unc-4::GFP cells is presented in Fig. 1. C. elegans embryonic cells were cultured for 24 hr to allow differentiation of GFP-labeled MAPCeL strategy for profiling C. elegans GFP neurons Figure 1 MAPCeL strategy for profiling C. elegans GFP neurons. Embryos are isolated from gravid adults and treated with chitinase to degrade the egg shell. Embryonic cells are cultured for 24 hours and enriched by FACS. Amplified, labeled aRNA is hybridized to the Affymetrix C. elegans array. motor neurons. unc-4::GFP cells are rarely observed in freshly dissociated preparations but constitute about 2% of all cells after 1 day in culture. The delayed appearance of unc-4::GFP cells in culture is consistent with the developmental timing of unc-4::GFP expression in vivo; unc-4::GFP motor neurons are normally generated after morphogenesis is initiated [7]. These older embryos are not dissociated by our methods [6]. Fluorescence Activated Cell Sorting (FACS) is used to isolate enriched (~90%) populations of unc-4::GFP cells. RNA is extracted, amplified, and labeled for application to the C. elegans Affymetrix Gene Chip.

unc-4::GFP motor neurons are isolated by FACS
It is necessary to plate freshly dissociated embryonic cells on a solid substrate to promote differentiation and to pre-vent clumping. Although C. elegans neurons show extensive morphological differentiation on peanut lectincoated glass they also adhere avidly and cannot be easily removed. We discovered that cells plated on poly-L-lysine coated surfaces also differentiate but can be readily dissociated from substrate by gentle trituration. A fluorescence profile was established for cells from the non-GFP wildtype strain (N2) to identify autofluorescent intestinal cells. Because these cells autofluoresce in both the Propidium Iodide (PI) and GFP channels, they are largely restricted to the diagonal axis of this scatter plot (Fig 2A). PI was added immediately prior to sorting to stain damaged cells (~20%). Separate experiments with PI-stained wildtype cells and with cells from unc-4::GFP embryos were used to establish sorting gates for PI and GFP-labeled cells, respectively ( Fig 2B). As shown in Fig 2C,  :GFP neurons were simultaneously gated by light scattering parameters. This gate was established empirically to achieve ~90% enrichment of unc-4::GFP labeled cells ( Fig  2G). We typically obtained about 40,000 unc-4::GFP neurons from each sort. RNA from the equivalent of 100,000 unc-4::GFP neurons was pooled for each separate microarray experiment. We will refer to microarray results from unc-4::GFP marked cells as the "unc-4::GFP motor neuron" data set. Reference RNA was extracted from all viable cells sorted from a 24 hr culture of wildtype embryonic cells. Microarray results with this "Reference" data set should reflect transcript levels in the average differentiated embryonic cell.

Microarray experiments yield reproducible profiles
Data obtained from successive hybridizations of two separate arrays with the same labeled probe yielded a coefficient of determination, R 2 = 0.99 (data not shown). This result indicates that potential differences between individual Affymetrix arrays or hybridization and scanning procedures are not significant sources of error. The overall concurrence of the experimental (unc-4::GFP motor neuron) and Reference data is illustrated graphically in the scatter plots shown in panels A and B of Fig 3. To assess the reproducibility of sample preparation methods (e.g. FACS isolation, RNA extraction, amplification, labeling, etc.), R 2 was calculated for each pairwise combination of independent samples. An average R 2 of 0.96 (n = 4) was calculated for the wildtype (N2) reference samples ( Fig  3E); average R 2 was 0.92 (n = 3) for the unc-4::GFP motor neuron data set ( Fig 3D). These values are indicative of highly similar samples and thereby show that our methods are reliable.

Detecting Expressed Genes (EGs)
Differential hybridization to perfect match (PM) vs mismatch (MM) oligo probes on the Affymetrix chip was used to identify transcripts reliably detected as "present" in the Reference and unc-4::GFP motor neuron data sets (see Methods, Additional Files 4,5). This list was adjusted in two ways for the unc-4::GFP motor neuron data set to arrive at a more accurate representation of Expressed Genes (EGs) (Additional Files 7,17). In the first instance, transcripts that were statistically downregulated in unc-4::GFP motor neurons relative to the wildtype reference were removed from the "present" list as these are likely to be detected because they are actually highly enriched in contaminating the non-GFP cells (~10%) (Additional File 6). Conversely, we included transcripts that were considered enriched according to our statistical methods but originally scored as "absent" on the basis of PM vs MM signals used by Affymetrix MAS 5.0 software (see Methods, Additional file 17). This second adjustment simply acknowledges that enriched transcripts are clearly expressed and therefore should be scored as "present." We refer to the transcripts in these modified lists as EGs (Expressed Genes). A total of 9,103 EGs were detected in the Reference data set and 6,217 EGs in the unc-4::GFP motor neuron data set (Fig 4) (Additional Files 4, 7). Overall 10,071 unique transcripts were detected in these experiments or about 50% of all predicted C. elegans ORFs [9] (Additional file 8). These results are comparable to microarray data from whole embryos that also detected about half of the predicted C. elegans genes [10]. Genes that are not detected may be expressed in a relatively small number of cells. This point is substantiated by our finding that 968 EGs in the unc-4::GFP motor neuron data set are not scored as present in the Reference data set (Additional file 15, Fig 4). For example, the transcription factor UNC-3 is normally expressed in a small subset of embryonic neurons including the DAs [11]. The unc-3 transcript is enriched in the unc-4::GFP motor neuron data set (Table 2, Additional file 9) but is not detected in the Reference (Additional file 4). Thus, it seems likely that the overall number of EGs should increase as additional classes of embryonic cells are profiled (RMF, SEV, SJB, DMM, unpublished data).

Selected C. elegans genes are enriched in UNC-4 motor neurons
A majority of transcripts in the Reference and unc-4::GFP motor neuron data sets show comparable levels of expression (Fig 2). Many of these transcripts are likely to encode core functions required in every cell. Other transcripts in this group could be limited to subsets of embryonic cells that include UNC-4 motor neurons. Genes that are widely expressed in neurons, for example, may not be detectably enriched in unc-4::GFP motor neurons in comparison to the Reference because neurons constitute a significant fraction (~40%) of all cells in the embryo. To illustrate this point, we note that UNC-64 (Syntaxin), an integral component of the neurotransmitter release mechanism and therefore expressed in most neurons [12,13], is detected in the unc-4::GFP motor neuron data set but is not enriched ( Table 2, Additional Files 7, 14).
As graphically illustrated in the scatter plot shown in Fig.  3C, subsets of genes in the unc-4::GFP motor neuron data set are differentially expressed relative to the average expression levels for all cells in the Reference data set (R 2 = 0.88). As expected for a gene that is selectively expressed in unc-4::GFP neurons, the hybridization signal for the unc-4 transcript is highly elevated (13×) in comparison to all cells. Significant numbers of genes are also underexpressed in UNC-4 motor neurons relative to other embryonic cells. Transcripts showing ≥ 1.7× fold intensity difference in the unc-4::GFP motor neuron vs Reference data sets were defined using SAM statistics at a False Discovery Rate (FDR) of ≤ 1%. By these criteria 1012 genes are enriched (red) in UNC-4 motor neurons (Additional file 9) whereas 1596 transcripts are depleted (green) (Additional file 10). The threshold of ≥ 1.7× fold was defined empirically. At higher values (e.g. ≥ 2.0×) genes with known expression in these cells were excluded (e.g. acr -2, unc-5) [14,15] (Additional file 14) whereas, a lower threshold of 1.5× included significantly more false positives (e.g. muscle genes, pat-3, sup-10) [16,17].

Confirmation of UNC-4 motor neuron genes
Information gleaned from published literature http:// www.textpresso.org and from wormbase http:// www.wormbase.org, identified 27 genes with known expression in embryonic motor neurons that also express unc-4::GFP (I5, SAB, DA) (Additional file 11). We detect 21 (78%) of these genes as EGs of which 10 (37%) are enriched. In addition, a significant number of transcripts encoding core neuronal functions (e.g. axon guidance, neurotransmitter signaling, etc.) are detected in the unc-4::GFP data set ( Table 2, Additional file 14). For example, in addition to UNC-64 (syntaxin or t-SNARE,) other components of the SNARE complex, SNB-1 (synaptobrevin or v-SNARE) and SNAP-25 (Y22F5A.3) are detected [18,19]. We also examined the data set for potential false positives by considering transcripts that are known to be highly expressed in other tissues but not in UNC-4 motor neurons. For example, in the embryo, the homeodomain protein UNC-30 is exclusively detected in DD motor neurons. Expression of the GABA vesicular transporter, UNC-47, in DD motor neurons depends on unc-30 function [20]. UNC-4 motor neurons are cholinergic and as expected neither of these GABA specific transcripts are present in the unc-4::GFP motor neuron data set (Additional file 7).
The strong representation of ~80% of genes known to be expressed in I5, SAB, and DA motor neurons in the unc-4::GFP motor neuron dataset indicates that other previously uncharacterized transcripts in this list are also likely to be expressed in these cells in vivo. To test this idea, we evaluated GFP reporter lines for representative genes detected as enriched in the unc-4::GFP motor neuron data set (Fig. 5). As shown in Table 1, 82% (15/18) of these promoter-GFP fusions show expression in UNC-4 motor neurons in vivo. Of the GFP reporters not detected in these neurons, one of them, T19C4.5, fails to express GFP in any cell. This finding could mean that the upstream sequence selected for this construct does not overlap the endogenous T19C4.5 gene regulatory region. In some cases, cell-specific expression of C. elegans genes depends on distal upstream regions, intronic sequences, or 3' domains that would not be included in these 5' promoter GFP fusions [21]. This explanation could also account for the apparent absence of GFP expression in the unc-4::GFP motor neurons of the nlp-9 and nlp-15 GFP reporters. The validity of this data set is further substantiated by the observation that GFP expression in DA motor neurons is detected even for lower ranking genes (e.g. syg-1::GFP, statistical rank = 877). Thus, we believe that the transcripts listed in the unc-4::GFP motor neuron data set are likely to constitute an accurate representation of genes normally expressed in these cells.
We note that the positive GFP reporters shown in Table 1 are not uniformly detected in UNC-4 neurons: all but one (flp-13) are expressed in the DAs, one in I5 and none in the SAB motor neurons. This bias reflects the relative abundance of DA motor neurons (~70% or 9/13 of unc- Table 1: Expression of promoter-GFP reporters for transcripts enriched in unc-4::GFP motor neuron data set. Reporters were examined for expression in DA, SAB and I5 neurons. 15/18 reporters showed GFP expression (bold type) in these cells. GFP reporters are listed according to statistical rank. All GFP-positive reporters were visible in embryos (data not shown) but were scored in larval animals to ease neuron identification.

Rank Cosmid
Gene  1010 STE20-like serine/threonine kinase MST 4::GFP neurons in vivo) in the cells used to generate this data set and thus could indicate that most of the enriched transcripts are also expressed in the DAs. Therefore, results presented below are largely focused on potential gene functions in DA motor neurons.

Families of neuronal genes expressed in UNC-4 motor neurons
Here we describe transcripts detected in the unc-4::GFP dataset with an emphasis on genes that are enriched in these cells and therefore likely to encode proteins with important roles in the differentiation or function of UNC-4 motor neurons (Table 2). A comprehensive discussion of gene families from this list can be found in Additional file 16. Selected examples are presented here. Gene names for enriched trancripts discussed in this section are shown in bold and are listed in Table 2. All EGs are listed in Additional file 7.

Axon guidance and outgrowth
Growth cone steering and cell migration along the dorsalventral body axis in C. elegans depend on the UNC-6/ netrin guidance cue. The UNC-40/DCC receptor mediates an attractive response to UNC-6/netrin whereas coexpression of UNC-40/DCC with a second UNC-6 receptor, UNC-5, results in repulsion [15,22]. The UNC-6/ netrin signal is released from ventral ectoderm [23] to repel growth cones expressing both UNC-40 and UNC-5; this interaction is required for normal outgrowth of DA motor neuron commissures to the dorsal nerve cord [15]. As expected, unc-5 and unc-40 transcripts are enriched in UNC-4 motor neurons. unc-6, which is known to be expressed in the I5 pharyngeal neuron, is also elevated [23]. The CLR-1 receptor protein tyrosine phosphatase (RPTP) is proposed to inhibit attractive UNC-6/netrin responses via interactions with UNC-40. In the DA motor neurons, CLR-1 also promotes UNC-6/netrin repulsion by an UNC-40-independent mechanism [24]. As pre-  dicted by these models, the clr-1 transcript is elevated in UNC-4 motor neurons. Relevant to this point, we note that the C. elegans Abelson tyrosine kinase ortholog, abl-1, is also enriched. In Drosophila, Abl tyrosine kinase antagonizes the axon guidance role of RPTPs in motor neurons [25]. It will be interesting to determine if ABL-1 functions similarly in C. elegans and, in this case, if ABL-1 works in opposition to CLR-1 during DA motor axon outgrowth (Fig. 6).
We also detected axon guidance effectors unc-115 and ced-10 in our microarray array dataset. Genetic approaches have shown that unc-115 (AbLIM, actin binding protein) and ced-10 (Rac GTPase) are downstream effectors of UNC-40 signaling and presumptive links to the cytoskeleton [26,27].
Transcripts for genes with general roles in axon outgrowth are enriched in the unc-4::GFP motor neuron data set. These include unc-44 (ankyrin-like), unc-76 (novel) and unc-14 (RUN domain). All three of these genes are highly expressed in the C. elegans nervous system. unc-44 encodes multiple alternatively spliced transcripts with broad roles in axonal morphogenesis [28]. UNC-76 and its vertebrate homologs define a new protein class of unknown biochemical function. In C. elegans, unc-76 mutants show axon outgrowth and fasciculation defects [29]. unc-14 and unc-51 (serine/threonine kinase) mutants display similar neuronal defects with misplaced processes and enlarged abnormal varicosities [30]. UNC-51 (EG) has been proposed to phosphorylate UNC-14 to regulate vesicular trafficking during axonal process outgrowth [31,32].

Wingless signaling
Wingless (Wnt) signaling controls multiple developmental processes in the nervous system ranging from cell determination to axon guidance and synaptogenesis [33,34]. The C. elegans genome contains 5 Wnt genes and 4 Wnt receptors or Frizzled homologs [35].  [36]. A gradient of Wnt signaling controls cell migration along the AP axis in C. elegans [37]. Responsiveness to this graded Wnt signal could account for the anterior polarity of DA motor neurons in the dorsal nerve cord as suggested by the recent finding that commissural axonal polarity along the AP axis in the vertebrate spinal cord is dependent on Wnt signaling [38].

Nicotinic Acetylcholine Receptors (nAChRs)
The C. elegans genome encodes at least 27 distinct nAChR subunits [39]. Two of these, ACR-2 and UNC-63 are expressed in DA class motor neurons [14,40] and are enriched in the unc-4::GFP motor neuron data set. Expression of unc-29 [41] and unc-38 (J.L. Bessereau, personal communication) in ventral cord motor neurons has been previously reported and these are also detected as EGs (Additional file 7). acr-12::GFP is expressed in neurons (A. Gottschalk and W. Schafer, personal communication), and we have validated the enrichment of acr-14 with GFP reporters that confirm expression in DA motor neurons (Fig 5). In body muscle, UNC-63 is an essential component of a levamisole-sensitive nACh receptor that also includes UNC-29, UNC-38, LEV-1 and LEV-8 [40,42].

ACR-12 may coassemble with UNC-63, UNC-29, and
Model of DA motor neuron axon guidance Figure 6 Model of DA motor neuron axon guidance. Ventrally derived UNC-6/Netrin guidance cues binds to the UNC-40/ DCC and UNC-5 receptor to steer the DA motor axon toward the dorsal nerve cord. The receptor tyrosine phosphatase, CLR-1, promotes dorsal motor axon outgrowth via an UNC-40/DCC independent pathway [24]. The transcript encoding the C. elegans ortholog of Abelson tyrosine kinase (ABL-1) is enriched in the unc-4::GFP motor neuron data set and is proposed to antagonize CLR-1 activity.
UNC-38 to generate a related nACh receptor in UNC-4 motor neurons (A. Gottschalk and W. Schafer, personal communication). Five additional sets of nAChR subunits are detected as EGs and a so-called "orphan" ligand gated ion channel (LGIC) subunit, F21A3.7, with significant similarity to the nAChR gene family, is enriched. Despite the diversity of nAChR subunits expressed in UNC-4 motor neurons and the potentially complex array of resultant receptors, no functions have been directly assigned to nAChRs in these cells [43]. Although loss-offunction mutations in nAChR subunits that are also expressed in muscle (i.e. unc-29, unc-38, unc-63) result in locomotory defects, gene knockouts of acr-2 (Y. Jin, personal communication) and acr-12 (data not shown), which are exclusively expressed in neurons, do not produce obvious effects on motility or behavior. Perhaps the surprisingly large number (12) of nAChR subunit genes detected in these cells results in overlapping functions that mask defects in single gene knockout mutants. Alternatively, these nAChRs may mediate subtle aspects of motor neuron activity. This idea is consistent with models in which nAChRs act presynaptically to modulate neurotransmitter release [44,45]. Finally, we detect enrichment of transcripts for proteins RIC-3 (novel) and LEV-10 (CUB domain) that mediate nAChR localization [46,47].

Ligand-Gated Ion Channels
UNC-4 motor neurons are potentially responsive to additional classes of neurotransmitters. Enrichment of glr-5 (kainate type ionotropic glutamate receptor subunit) is correlated with its known expression in the SAB motor neurons [48]. As members of the GABA/Glycine family of ligand-gated receptors, the presumptive anion channels encoded by T27E9.9 and Y71D11A.5 are predicted to hyperpolarize UNC-4 motor neurons and thus inhibit cholinergic activity [49]. It may be significant that a candidate sodium/chloride-dependent glycine transporter, snf-5, is enriched. (C09E8.1, an outlier in the sodium/chloride-dependent transporter family is also enriched.) In mammalian cells, plasma membrane transporters GLYT1/GLYT2 remove glycine from the synaptic cleft, and in the case of GLYT2, thereby recycle glycine for reuptake into synaptic vesicles [50]. UNC-4 motor neurons do not express the GABA/Glycine vesicular transporter, UNC-47, however, and are therefore unlikely to release glycine presynaptically [51]. In this case, the physiological function of the SNF-5 transporter could mirror that of GLYT1, which is believed to attenuate glycinergic signaling by pumping glycine into a non-glycinergic glial cell [52]. To date, the potential function of glycinergic signaling in C. elegans has not been explored.

G-protein signaling
Cholinergic motor neuron activity in C. elegans is modulated by G-protein signaling pathways that respond to the neurotransmitters acetylcholine, serotonin (5-HT), and dopamine ( Fig. 7) [53][54][55]. In each case, acetylcholine release is either promoted by EGL-30 (Gα q ) or inhibited by GOA-1 (Gα o ). Input to these antagonistic pathways is provided by G-protein coupled receptors (GPCRs). Pharmacological evidence suggests that a muscarinic acetylcholine receptor activates EGL-30 to enhance ACh release at the neuromuscular synapse [53,56]. The enriched muscarinic AChRs, GAR-2 and GAR-3 could account for this effect [57,58]. Similarly the enriched 5-HT receptor, SER-4, is a strong candidate for the GPCR mediating the inhibitory effect of serotonin on ACh release from ventral cord motor neurons [54]. Dopamine may either activate or inhibit ACh release within the same cholinergic motor neuron. Activation depends on DOP-1 which is enriched in UNC-4 motor neurons. Inhibition is attributed to DOP-3. Expression of DOP-3 in cholinergic ventral cord motor neurons is reportedly weak and we do not detect the dop-3 transcript in our data set [55]. UNC-4 motor neurons are also potentially responsive to GABA as a transcript (Y41G9A.4) encoding a metabotropic GABA type B1 receptor is enriched. GABA dependent effects on cholinergic motor neuron activity have not been previously reported in C. elegans.
Genetic screens for mutations affecting neurotransmitter release have revealed a complex web of interacting components that couple G-protein signaling to synaptic vesicle fusion (Fig 7) -8, unc-13). Lack of enrichment of some of these components is consistent with the widespread utilization of G-protein signaling pathways in C. elegans neurons and muscle cells [61,62]. As noted above, these data have also revealed several additional enriched transcripts with potential roles in G-protein dependent locomotory behavior. egl-47 encodes an orphan GPCR and rgs-1 an RGS protein, both of which can regulate goa-1 signaling in the egg laying circuit [63,64]. RNAi of F39B2.8, which encodes a highly conserved but unusual protein with both serine/threonine kinase and 7-transmembrane domains, results in a locomotory defect [65] that could be indicative of a neuromodulatory function in DA motor neurons. A complete list of G-protein signaling components detected in this dataset can be found in Table 3.

Neuropeptide signaling
The C. elegans genome includes a large and diverse array of genes encoding potential neuroactive peptides. GFP reporter studies indicate that these genes are predominantly expressed in neurons. 23 "flp" genes encoding FMRFamide and related peptides (FaRPs) have been described. FaRPs have been shown to modulate a wide array of invertebrate neural functions [66]. Previously reported expression of flp-2, flp-4, flp-13 in the pharyngeal I5 neuron [67] (Fig 5) is confirmed by their enrichment in the unc-4::GFP motor neuron data set (Table I). flp-5 is also elevated in these cells and 8 additional flps are detected as EGs (Additional Files 7,14). Specific FaRPs modulate cell excitability (flp-13), locomotion (flp-1) and feeding behavior (flp-21) in C. elegans [68,69]. The inhibitory action of the FLP-13 peptide on pharyngeal muscle activity is consistent with its expression in I5 [70].
The C. elegans genome contains 37 genes encoding predicted insulin-like peptides [71]. Transcripts for two of these, ins-1 and ins-18, are enriched; ins-17, ins-24 and ins-30 are present but not significantly elevated relative to other cells. ins-1 and ins-18 have been implicated in the DAF-2 insulin receptor dependent pathways regulating growth, metabolism and lifespan [71].
A total of 32 genes encoding other potential classes of neuropeptides have also been identified in the C. elegans genome. Three of these neuropeptide-like protein (nlp) genes, nlp-9, nlp-15, and nlp-21, are enriched in UNC-4 motor neurons (Table 2). An additional group of 11 nlp transcripts are detected as EGs (Additional File 7, 14). To  Table 3 for protein descriptions. Green denotes interactions that promote acetylcholine (ACh) release and red marks steps that inhibit synaptic vesicle fusion. Neurotransmitters are highlighted in gray boxes. This figure adapted from Reynolds et al. (2004) [60].
date, no functions have been directly assigned to nlp genes in C. elegans [72].
Our studies have revealed that a surprisingly large number of neuropeptide genes are transcribed in UNC-4 motor neurons. These results indicate that UNC-4 motor neurons are likely to exhibit significant neurosecretory activity. This conclusion is consistent with our finding that proteases required for neuropeptide processing and activation [T03D8.3 (Proprotein convertase (PC) 2 chaperone), egl-3 (zinc carboxypeptidase) and egl-21 (subtilisinlike proprotein convertase)] are also expressed in these cells [73][74][75]. Other genes with important roles in neurosecretion are also detected. ric-19 encodes a novel arfaptin-related protein that is believed to function in the Golgi in the generation of neurosecretory granules and may through this activity and subsequent neuropeptide signaling exert an indirect effect on ACh release from conventional synaptic vesicles [76,77]. Our finding that ric-19 is highly enriched in cholinergic motor neurons could be indicative of autocrine neuropeptide modulation of ACh secretion at the neuromuscular synapse. Consistent with this idea is our finding that ida-1, a conserved membrane component of the dense core vesicles in which neuropeptides are typically packaged, is an EG [78]. Finally, UNC-31 (CAPS), a known facilitator of dense core vesicular release, is enriched [78]. Plasma membrane fusion of both dense core vesicles and the small, clear vesicles in which classical neurotransmitters are packaged, depend on a common set of calcium-activated components [79] most of which are either enriched or present in these cells (see Table 2 and Additional Files 14,16).
In addition to secreting neuropeptides, UNC-4 motor neurons are also likely to respond to neuropeptide signaling. Transcripts for nine putative neuropeptide receptors are enriched. ( Table 2). RNAi of two of these, npr-2 and F59D12.1, results in locomotory defects that could be indicative of specific functions in DA motor neurons [65]. npr-2 is a close relative of npr-1 (not detected) which has been shown to control social feeding behavior in response to the FLP-21 (not detected) peptide [69]. F59D12.1 is most closely related to melatonin receptors but its in vivo ligand is unknown. Neuropeptides are believed to modulate secretion of classical neurotransmitters [79]. Neuropeptide specific effects on excitatory  [80]. Genetic evidence in C. elegans indicates that acetylcholine release at the neuromuscular junction is enhanced by neuropeptide activity [73] and that this pathway depends on the EGL-30 G q α protein [68] ( Table 2). These neuropeptides may be released from neurons and also as a retrograde signal from muscle cells [73,81].

Discussion
We have described MAPCeL, a microarray-based strategy for fingerprinting specific C. elegans neurons, and provide evidence that this approach can reveal a comprehensive picture of gene expression in these cells in vivo. unc-4::GFP-marked neurons were isolated by FACS from primary cultures of embryonic cells and profiled on the C. elegans Affymetrix gene chip. Because these unc-4::GFP neurons differentiate in vitro, it was important to establish that our microarray data provide an accurate representation of gene expression in the intact animal. This conclusion is supported by three observations: (1) A majority (21/27) of genes with known expression in unc-4::GFP neurons are detected in our microarray data set; (2) ~80% (15/18) of GFP reporters constructed for transcripts enriched in UNC-4 motor neurons are expressed in these cells in vivo (Table 1, Fig. 5); (3) Transcripts known to encode proteins with key roles in unc-4::GFP motor neuron differentiation (e.g. axon guidance and outgrowth, synaptogenesis) and function (e.g., neurotransmitter vesicle release, G-protein signaling pathways) are highly represented in our data sets. These findings parallel earlier studies showing that cultured C. elegans neurons and muscle cells adopt apparently normal morphological and physiological characteristics [6] and are consistent with evidence favoring a cell autonomous mode of differentiation for C. elegans embryonic cells after an initial phase of inductive signaling events [4,82]. We have now generated comparable microarray profiles of other motor neuron classes and muscle cells that also show strong congruence with known patterns of gene expression (RMF, SEV, SJB, DMM, unpublished data). We therefore conclude that our approach of profiling GFP marked neurons isolated from primary culture can now be widely applied to fingerprint specific C. elegans embryonic cells. In some cases, however, differentiation of a given neuron is likely to depend on specific intercellular signals that primary cultures will not provide. Thus, in every instance, it will be necessary to confirm microarray data by independent methods as described here.

Methods for profiling specific C. elegans cells
Previous studies have described other methods for cataloging transcripts from specific C. elegans cells. Comparisons of microarray data from mutant animals with either supernumerary or absent sensory neurons in the male tail, have revealed genes that are preferentially expressed in these cells [83]. However this approach is limited to cell types that can be manipulated by specific mutants. In addition, this method may be insufficiently sensitive to detect changes in smaller subsets of cells due to high background mRNA from cells that are not affected by the mutation (SEV, DMM, unpublished data). This limitation can be overcome by enriching for mRNA from target cells.
To this end, Zhang et al. (2002) used an approach similar to the strategy outlined in this paper to identify downstream genes of the MEC-3 transcription factor in C. elegans touch neurons [84]. However, this work did not provide a comprehensive cell-specific gene expression profile as we have here perhaps due to the limited enrichment (~50%) of GFP-labeled touch neurons. We have now optimized the application of nematode embryonic cell culture and FACS technology to obtaiñ 90% enrichment of GFP-marked neurons and muscle cells (Fig. 2) (RMF, SEV, SJB, DMM, unpublished data). These methods have now been successfully applied to profile other classes of C. elegans embryonic cells [85,86].
MAPCeL cannot be used for postembryonic cells because these apparently do not arise in culture [6]. Microarray profiles of specific larval cells have been obtained, however, by mRNA tagging. In this approach, an epitopelabeled polyA binding protein (FLAG-PAB-1) is expressed transgenically under the control of a cell-specific promoter and mRNAs isolated by co-immunoprecipitation with anti-FLAG. This method has been used for microarray analysis of C. elegans body muscle cells and ciliated sensory neurons [87,88]. We have now successfully used the mRNA tagging strategy to profile specific subsets of motor neurons from C. elegans larvae (SEV, RMF, J. Watson, S. Kim, P. Roy, DMM, unpublished data). Thus, in principle, it should now be possible to obtain an accurate gene expression profile for virtually any C. elegans cell throughout development.

UNC-4 motor neurons are sensitive to a wide range of neurotransmitters and peptidergic signals
Acetylcholine (ACh) release at the DA neuromuscular junction is presumptively triggered by excitatory input from command interneurons. The strength of the DA cholinergic signal, however, may be strongly modulated by other cells that release neurotransmitters from distal locations. For example, dopamine is produced by 8 neu-rons, none of which are presynaptic to DA motor neurons [89]. Dopamine, however, is a potent regulator of cholinergic secretory activity in the ventral motor circuit. The dopamine effect is mediated in part by DOP-1, a G-protein coupled receptor (GPCR) [55]. We have confirmed enrichment of the dop-1 transcript and also detected elevated levels of transcripts encoding GPCRs for acetylcholine and serotonin, additional neurotransmitters known to modulate cholinergic motor neuron activity via G-protein signaling pathways [18,55]. Enrichment of a GABA metabotropic receptor transcript offers yet another mechanism for exogenous adjustment of neurotransmitter vesicular fusion in DA motor neurons. Indirect evidence indicates that acetylcholine release from ventral cord motor neurons may also be sensitive to neuropeptide signals from other neurons or muscle cells [73,81]. We have established that unc-4::GFP motor neurons express elevated transcript levels for nine different GPCRs with significant homology to insect or mammalian neuropeptide receptors. This signaling complexity is further com-Signaling components detected in unc-4::GFP motor neurons Figure 8 Signaling components detected in unc-4::GFP motor neurons.
pounded by the enrichment of transcripts for 18 members of the serpentine GPCR-like family in unc-4::GFP neurons (Additional Files 9, 16). Ligands for this outlier group of GPCRs are unknown [90]. The picture emerging from these data is of a motor neuron festooned with multiple G-protein linked receptors each responding to a different class of neurotransmitter or peptidergic signal (Fig 8). In effect, these motor neurons are also functioning as a kind of sensory neuron in which disparate inputs are internally assessed to fine-tune output in concert with temporal requirements for locomotory activity.
The microarray data also reveal multiple additional classes of receptors and ion channels through which the differentiation and function of unc-4::GFP motor neurons could be modulated by extracellular signals (Fig 8). Finally, we have detected enrichment of transcripts encoding TGF-β, wingless, and several classes of neuropeptides ( Table 2, Additional Files 9, 16). Thus, in addition to responding to a wide range of stimuli, unc-4::GFP motor neurons are also potentially capable of regulating the activities of other cells with a variety of different signals. If an organism as simple as C. elegans builds motor neurons with such sophisticated signaling and response mechanisms, it is tempting to speculate that neurons in other, more advanced species may have evolved even more complex pathways. It is likely, however, that the core signaling systems described here are also conserved. This prediction is underscored by our finding that approximately half of the enriched transcripts (537/1012) and 2/3 of EGs (4050/6217) detected in unc-4::GFP neurons have human homologs (BLAST ≤ e -10 ) (Additional Files 12, 13).

Applications of MAPCeL
In addition to confirming expression of genes with known roles in unc-4::GFP motor neuron differentiation and function, the microarray data also uncovered strong candidates for new genes governing these events. For example, DA motor axons grow dorsally in response to a ventrally provided repulsive UNC-6/netrin guidance cue [15]. Recent work has shown that the receptor protein tyrosine phosphatase (RPTP), CLR-1, positively enhances this response [24]. As expected, we found that the clr-1 transcript is enriched in the unc-4::GFP motor neuron data set. We also noted enrichment of abl-1, the C. elegans homolog of Abelson tyrosine kinase. By analogy to findings in Drosophila in which Abelson tyrosine kinase functions in opposition to RPTP-dependent axon guidance [91], we propose that ABL-1 antagonizes CLR-1 activity (Fig 6). This model predicts that either genetic ablation or RNAi of abl-1 will suppress the DA motor axon guidance defects of clr-1 mutants.
Another application of this strategy includes the identification of transcription factor target genes. A comparison of expression fingerprints of wildtype cells vs cells that are mutant for a specific transcription factor could reveal downstream genes [84]. For example, the UNC-4 transcription factor regulates axon morphology and synaptic strength in embryonically derived unc-4::GFP neurons [8]. UNC-4 also defines the specificity of synaptic inputs to postembryonically-derived VA motor neurons [7,92,93].
We have now used a combination of MAPCeL and mRNA tagging strategies to identify candidate genes regulated by UNC-4 in these cells (RMF, SEV, DMM, unpublished data). Gene regulatory motifs to which transcription factors bind may also be revealed as common cis-acting sequences in cohorts of co-regulated genes [94].
The C. elegans nervous system is composed of exactly 302 neurons. The morphology and connectivity for every neuron has been defined by serial section electron microscopy to generate a detailed wiring diagram for the entire network [2]. The C. elegans genome is similarly well defined. All 6 chromosomes are completely sequenced and the structure of over 20,000 genes described [5]. Unique combinations of these genes are likely to specify different classes of neurons and their differentiated traits. The problem now is to link the gene map with the wiring diagram. We believe that MAPCeL offers a powerful approach toward achieving this goal.

Conclusion
We have described a new method, Micro-Array Profiling of C. elegans cells, or MAPCeL, for generating gene expression fingerprints of subsets of C. elegans neurons. Embryonic motor neurons marked with a reporter gene, unc-4::GFP, were isolated by FACS from primary cultures and profiled on the C. elegans Affymetrix array. We confirmed that microarray data generated by this approach reliably identify genes expressed by these motor neurons in vivo. We propose that MAPCeL can now be used to generate a gene expression map for the C. elegans nervous system.

Cell Culture
Embryonic cells were obtained using methods previously described [6]. Briefly, embryos were isolated from gravid adults following lysis in a hypochlorite solution. Intact embryos were separated from debris by flotation on 30% sucrose. Eggshells were removed by incubation in 0.5 ml chitinase (0.5 U/ml in egg buffer) for 45 minutes. Following resuspension in L-15 medium supplemented with 10% FBS (L15-10) and antibiotics, the embryos were dissociated by passage through a 5µm syringe filter (Durapore). Cells were plated on poly-L-lysine (0.01%, Sigma) coated single-well chambered coverglasses (Nalge Nunc International) at a density of ~10 million cells/ml and maintained in L15-10 media. Cells were incubated at 25°C in a humidified chamber. Wildtype (N2) cells were isolated and treated similarly.

FACS analysis
Sorting experiments were performed on a FACStar Plus flow cytometer (Becton Dickinson, San Jose, CA) equipped with a 488 nm argon laser. Emission filters were 530 ± 30 nm for GFP fluorescence and 585 ± 22 nm for PI fluorescence. The machine was flushed with egg buffer [6] prior to sorting to enhance viability. 2 µm fluorescent beads were used to calibrate light scattering parameters for the relatively small size of C. elegans embryonic cells. Cells were sorted at a rate of 4000-5000 cells per second through a 70 µm nozzle.
Immediately prior to sorting, supernatant from the 24 hour cultures was removed and discarded. 1 ml of egg buffer was added to the chamber coverglass. Cells are loosely adherent to poly-L-lysine and can be easily dislodged with gentle pipetting. 3 ml of egg buffer + cells were drawn into a 3 cc syringe and the suspension filtered with a 5 µm Durapore syringe filter. Propidium Iodide (PI) was added to the cell suspension at a final concentration of 5 µg/ml prior to sorting. Autofluorescence levels were established by flow cytometry of cells isolated from wildtype (i.e. non-GFP) embryos (See Fig 2A). Next, wildtype cells stained with PI were used to define the sorting gate for damaged cells. GFP+ cells containing no PI were sorted to establish the intensity range of GFP fluorescence. Finally, unc-4::GFP cells stained with PI were gated ( Fig 2B) using the parameters established above. The sorting gate for size and granularity (Fig 2C) was empirically adjusted to exclude cell clumps and debris and to achievẽ 90% enrichment for GFP-labeled cells. unc-4::GFP cells were collected in a 15 ml conical tube containing 1 ml of L15-10 media. Cells were pelleted using low-speed cenrifugation (300 × g) and either plated on peanut lectincoated slides for visualization [6] or used for RNA isolation (see below). Reference cells were obtained from 1 day old cultures of embryonic blastomeres isolated from the non-GFP wildtype strain (N2). In this case, all viable cells (i.e. non-PI stained) were collected by FACS for RNA isolation.

RNA isolation, amplification, and hybridization
RNA was prepared from FACS-isolated unc-4::GFP cells for comparison to RNA from the wildtype reference strain (N2). Cells were pelleted using low-speed centrifugation (300 × g). The supernatant was removed and RNA was extracted with a micro-RNA isolation kit (Stratagene) using the recommended volumes for 1 million cells. Typical yields were 1 pg total RNA/cell. 100 ng of total RNA was subjected to 2 rounds of amplification, as described in the Affymetrix GeneChip Eukaryotic Small Sample Target Labelling Protocol, with the following modifica-tions. 100 ng (5 pmol) of T7-dT primer (5'-GGCCAGT-GAATTGTAATAC GACTCACTATAGGGAGGCGG-(dT) 24 -3') was used as opposed to the recommended 100 pmol. RNA cleanup was achieved using the RNeasy mini kit (Qiagen); 300 µl of 100% ethanol (final concentration = 40% ethanol) was added to the sample prior to absorption to the column matrix. Eluate was passed through the column 2× prior to washing to improve yields. The BioArray High Yield RNA Transcript Labeling Kit (Enzo) was used to biotinylate the sample in the second round of amplification. 10-15 µg of labeled aRNA (amplified RNA) was fragmented and hybridized to the Affymetrix C. elegans chip according to the Affymetrix Expression Analysis Technical Manual. The Agilent Bioanalyzer was used to assess RNA quality prior to labeling and to confirm fragmentation (<200 bp) before hybridization.

Data analysis
The commercially available C. elegans Affymetrix array was used for all experiments. This chip was designed using the December 2000 genome sequence. All probe set information is available at http://www.affymetrix.com as well as http://www.wormbase.org. unc-4::GFP neurons were profiled in triplicate; baseline data (all cells) were obtained from four independent experiments with wildtype embryonic cells. Hybridization intensities for each experiment were scaled in comparison to a global average signal from the same array (A complete list of Affy normalized values can be found in Additional file 2) [95]. Expressed transcripts were initially identified on the basis of a "Present" call in a majority of experiments (2/3 for unc-4::GFP and 3/4 for wildtype cells) as determined by Affymetrix MAS 5.0 (see below) (Additional Files 4,5). In this approach, a Mismatch (MM) value for each feature is compared to a Perfect Match (PM) value to estimate nonspecific binding. This strategy, however, tends to arbitrarily exclude low intensity signals in which PM and MM values may be comparable [96,97]. To avoid this bias in the detection of transcripts that might be differentially elevated in the unc-4::GFP data set, intensity values were normalized using RMA (Robust Multi-Array Analysis) available through GeneTraffic (Iobion) in which the MM values are not considered (Additional file 3) [96,97]. Comparisons of RMA normalized intensities for unc-4::GFP vs reference cells were statistically analyzed using Significance Analysis of Microarrays software (SAM, Stanford) [98,99]. A two-class unpaired analysis of the data was performed to identify genes that differ by ≥ 1.7-fold from the wildtype reference at a False Discovery Rate (FDR) of ≤ 1%. These genes were considered significantly enriched (Additional file 9). This analysis also identified 1600 transcripts that are depleted (1.7×, ≤ 1% FDR) in unc-4::GFP cells vs the wildtype reference (Additional file 10). Although 729 of these transcripts are also scored as "present" in the unc-4::GFP motor neuron dataset, we attribute their detection to high expression in the small fraction (~10%) of non-GFP cells contaminating this preparation (see above). Therefore, we excluded all 729 of these wildtype-enriched transcripts from the list of present calls in the unc-4::GFP motor neuron data set (Additional file 6). Finally, to compute the overall sum of Expressed Genes (EGs) in the unc-4::GFP data set we restored 118 unc-4::GFP-enriched genes that were initially excluded from the present list due to high mismatch signals. These considerations produce a final list of 6,217 genes that are detected in unc-4::GFP motor neurons (Additional file 7). (see Logic Tree, Additional file 17).

Annotation of datasets
A Wormbase mirror was established by downloading code and databases from http://www.wormbase.org. Using the acedb perl module, an annotation script was generated that queries the wormbase mirror. Affymetrix IDs have been mapped to specific transcripts in wormbase. Text files containing Affy IDs (one per line) and cosmid names are input into the script which then searches the wormbase mirror and matches Affy ID/cosmid name to a specific transcript. Cosmid names are used for this search when Affy IDs have not been mapped in wormbase. This information is used to acquire other linked annotations (i.e. KOG, common name, RNAi phenotype, Expression data, Kim mountain data and Gene Ontology, etc.).

In litero analysis
An extensive literature search was performed using Textpresso http://www.textpresso.org. The keywords "DA motor neurons" generated a list of 68 citations, a similar search was conducted using the keywords "I5 pharyngeal neuron" and "SAB neurons" that detected an additional 21 citations. Expression patterns on wormbase were also searched using the "Cell identity" function to identify genes with documented expression in DA, SAB or I5. A list of 27 genes with documented expression in DA motor neurons, the I5 pharyngeal neuron and the SAB neurons was compiled from this information (Additional file 11).

Strains
Nematode strains were maintained at 20-25°C using standard culture methods [100]. The wildtype strain was N2. Transgenic lines carrying promoter GFP fusions are listed in Additional file 1.
Generating transgenic promoter GFP strains twk-30::GFP (25 ng/ul) was microinjected with the myo-3::dsRed2 marker (25 ng/ul) [101]. Other transgenics were generated by biolistic transformation with promoter::GFP constructs from the Promoterome project (Additional file 1). Primer sequences for "promoterome" constructs can be found at http://vidal.dfci.harvard.edu/ promoteromedb [102]. Microparticle bombardment was conducted [103] in a BioRad Biolistic PDS-1000/He equipped with the Hepta Adapter. Gold beads (1 micron) were coated with DNA at 1 ug/ul. 100 mm NGM plates were seeded with a monolayer of ~100,000 L4/adult unc-119 (ed3) animals. For each construct, 1 'shot' was performed using a 1550 psi rupture disk at 28 inches of Hg vacuum. After a 1 hr recovery period, animals were washed from the plates with 7 ml M9 buffer and transferred to 7 NGM plates (1 ml/plate). Animals were grown at 20°C for 1 week. To pick transgenic animals, one-half of the plate was 'chunked' and added to a new 100 mm NGM plate; animals with wildtype movement were picked to 60 mm NGM plates and allowed to self. Worms derived from separate plates were considered independent lines; at least 2 lines were obtained for each construct.

Microscopy
Transgenic animals and cultured cells were visualized by differential interference contrast (DIC), or epifluorescence microscopy using either a Zeiss Axioplan or Axiovert compound microscopes. Images were recorded with CCD cameras (ORCA I, ORCA ER, Hamamatsu Corporation, Bridgewater, NJ). Some images were recorded on a Zeiss 510 META confocal microscope.