Complementary RNA amplification methods enhance microarray identification of transcripts expressed in the C. elegans nervous system

Background DNA microarrays provide a powerful method for global analysis of gene expression. The application of this technology to specific cell types and tissues, however, is typically limited by small amounts of available mRNA, thereby necessitating amplification. Here we compare microarray results obtained with two different methods of RNA amplification to profile gene expression in the C. elegans larval nervous system. Results We used the mRNA-tagging strategy to isolate transcripts specifically from C. elegans larval neurons. The WT-Ovation Pico System (WT-Pico) was used to amplify 2 ng of pan-neural RNA to produce labeled cDNA for microarray analysis. These WT-Pico-derived data were compared to microarray results obtained with a labeled aRNA target generated by two rounds of In Vitro Transcription (IVT) of 25 ng of pan-neural RNA. WT-Pico results in a higher fraction of present calls than IVT, a finding consistent with the proposal that DNA-DNA hybridization results in lower mismatch signals than the RNA-DNA heteroduplexes produced by IVT amplification. Microarray data sets from these samples were compared to a reference profile of all larval cells to identify transcripts with elevated expression in neurons. These results were validated by the high proportion of known neuron-expressed genes detected in these profiles and by promoter-GFP constructs for previously uncharacterized genes in these data sets. Together, the IVT and WT-Pico methods identified 2,173 unique neuron-enriched transcripts. Only about half of these transcripts (1,044), however, are detected as enriched by both IVT and WT-Pico amplification. Conclusion We show that two different methods of RNA amplification, IVT and WT-Pico, produce valid microarray profiles of gene expression in the C. elegans larval nervous system with a low rate of false positives. However, our results also show that each method of RNA amplification detects a unique subset of bona fide neural-enriched transcripts and thus a wider array of authentic neural genes are identified by the combination of these data sets than by the microarray profiles obtained with either method of RNA amplification alone. With its relative ease of implementation and greater sensitivity, WT-Pico is the preferred method of amplification for cases in which sample RNA is limiting.


Background
The human brain is comprised of diverse classes of neurons, and many of these neural classes are conserved throughout evolution. Our understanding of the molecular basis for these differences would be greatly advanced by a gene expression map of the nervous system. In principle, this information could be compiled from high density microarray experiments that catalog transcripts expressed in each class of neuron [1,2]. This approach necessarily requires, however, methods for extracting transcripts from individual cell types. Several approaches are now available for overcoming this technical hurdle. Laser Capture Microdissection (LCM) [3] and FACS (Fluorescence Activated Cell Sorting) [4] have been used to isolate specific neurons for RNA extraction. For example, specific GFP-labeled neurons and muscle cells have been obtained by FACS from the nematode, C. elegans, for microarray gene expression profiling experiments [5][6][7][8][9]. In several instances, disruption of specific genes included in these data sets revealed key functional roles in the profiled cell type [5]. A recently developed alternative biochemical strategy is also available for extracting RNA from specific cells that may not be readily dissociated for FACS. In this "mRNA-tagging" approach, an epitope-labeled mRNA binding protein (FLAG-PAB-1) is transgenically expressed with a cell-specific promoter. Bound mRNA is then obtained by co-immunoprecipitation with the FLAG-PAB-1 protein [10]. This method has been utilized to profile tissues and cell types in C. elegans and in Drosophila [8,[10][11][12][13][14]. Thus, robust physical and biochemical methods are now available for obtaining mRNA from specific types of neurons in several different organisms. The limited amount of RNA (< 1 ug) available from these approaches, however, typically requires amplification before microarray analysis [15]. One method of amplification, PCR based exponential amplification, can generate microgram quantities of cDNA from as little as 1 ng of total RNA. PCR based methods, however, have been shown to be less reproducible than linear amplification methods, such as In Vitro Transcription (IVT) [16]. IVT has been widely utilized for RNA amplification [17]. In this approach, cDNA is initially synthesized to provide a template for amplification by T7 RNA polymerase. In most cases, two rounds of cDNA synthesis and IVT are required to generate sufficient aRNA (> 10 ug) for microarray hybridization. We have used the IVT method to produce robust gene expression profiles of C. elegans neurons and muscle cells [6,8,9,12,18]. In some instances, it was difficult to obtain enough RNA for reliable IVT amplification; in other cases IVT failed for unknown reasons (JDW, SEV and DMM, unpublished data). Thus, we needed a more sensitive and reliable method of RNA amplification. Here we describe microarray results obtained with RNA amplified by WT-Ovation Pico, an isothermal linear amplification system (WT-Pico) [19][20][21][22]. For the first step in this protocol ( Fig.   1), cDNA is synthesized with a combination of a Poly-dT and random primers. cDNA synthesis primers and the amplification primer are chimeric DNA/RNA oligonucleotides, comprising 3'-DNA and a 5'-RNA sequences. Second strand cDNA synthesis produces a complementary DNA/RNA duplex adjacent to the 1 st strand-priming site. Treatment with RNase H selectively removes the RNA sequence from the heteroduplex and provides a unique priming site for cDNA amplification. A chimeric amplification primer hybridizes to this priming site and is extended by DNA polymerase with strand displacement activity to generate a new cDNA strand. The RNA portion of the hybridized primer is removed by RNase H and the cycle is re-initiated by annealing of the amplification chimeric primer (SPIA primer). The net result is synthesis of multiple copies of cDNA in a single amplification step. The WT-Pico system offers the advantage of requiring less time and fewer steps than the IVT amplification [19]. In addition, the DNA target generated by WT-Pico amplification is reported to result in more efficient and specific hybridization with DNA probe sets than the labeled aRNA target produced with IVT amplification [23,24]. We utilized the WT-Pico method to amplify RNA extracted from the C. elegans nervous system by the mRNA-tagging method and compared microarray data from this sample to a previously reported pan-neural profile generated with IVT amplification [8]. In our hands, robust microarray data were obtained with WT-Pico amplification from 10fold less starting RNA than with IVT. In addition, we have confirmed that the WT-Pico cDNA target results in a greater dynamic range and improved signal/noise with more present calls than with the IVT-generated aRNA target. Bioinformatic analysis and in vivo expression data from promoter-GFP fusion genes established that the gene expression profile generated with WT-Pico is highly enriched for neuronal transcripts. The microarray data from the pan-neural-derived samples amplified by the IVT and WT-Pico methods identifies 2,173 transcripts with elevated expression in the C. elegans nervous system. mRNAs included in this neural-enriched sample encode proteins with a broad array of predicted functions in the C. elegans nervous system. Only ~50% of these transcripts, however, are detected as enriched by both methods of RNA amplification. On the basis of this result, we suggest that the IVT and WT-Pico amplification methods show significant nucleotide sequence bias and therefore that, where possible, comprehensive gene expression profiles should be based on more than one method of RNA amplification.

A comparison of two amplification methods, WT-Pico and IVT
We used the WT-Pico method ( Fig. 1) [20] to amplify RNA obtained from all C. elegans neurons ("pan-neural") by Diagram of the ribo-SPIA process for the synthesis of sscDNA Figure 1 Diagram of the ribo-SPIA process for the synthesis of sscDNA. First strand cDNA is generated from template RNA using reverse transcriptase (RT) and two types of chimeric primers, random and oligo(dT), containing an RNA overhang (Step 1). DNA Polymerase is added to the reaction to generate second strand cDNA (Step 2). sscDNA is amplified from the dscDNA template in a cycle in which a SPIA™ primer (DNA/RNA hybrid) anneals to the template, DNA Polymerase begins duplicating the cDNA, the RNA portion of the primer degraded by RNase H (which only degrades RNA when it is in a duplex with DNA), thus allowing another SPIA™ primer to bind to the template and restart the reaction (Step 3). Step 1: First Strand cDNA synthesis Step 2: Second Strand cDNA synthesis Step  [8,10]. Microarray data were generated from five independent pan-neural RNA samples. A companion reference data set was obtained with three replicates of RNA from all C. elegans cells. These results were compared to microarray data previously obtained from IVT-amplified samples [8]. Eight pan-neural and reference data sets were produced for each amplification method ( Table 1). All WT-Pico amplification reactions were performed with 2 ng of starting RNA whereas the IVT amplifications utilized 25 ng of sample RNA. Comparisons of signal intensities generated from independent replicates showed that the WT-Pico-amplified pan-neural and reference samples are reproducible. For example, the coefficient of determination for the five pan-neural WT-Pico amplified samples, R 2 = 0.96, compares favorably to an R 2 = 0.98 for the three IVT-generated pan-neural profiles ( Fig. 2a-d). Thus, these R 2 values are indicative of highly reproducible data sets.
We measured other parameters derived from the microarray data to compare the performance of the WT-Pico vs IVT-amplified targets. The C. elegans Affymetrix Gene Chip includes 22,499 probe sets. On the Affymetrix Gene Chip, each Perfect Match (PM) oligonucleotide is paired with a MisMatch (MM) probe that includes a single base pair substitution. The hybridization intensity of each MM probe is subtracted from that of the paired PM probe to correct for stray signal. An overall PM vs MM discrimination score for the probe set is calculated from these values to distinguish between present, Marginal or Absent transcripts [25] (See methods). Overall, hybridizations with the WT-Pico amplified sample resulted in a greater number of present calls than with the IVT target (Table 1). For example, an average of 56% (~12,600) of probe sets were scored as present in the pan-neural profiles obtained by WT-Pico amplification whereas an average of only 41% (~9,200) of probe sets were called present in the IVT-generated pan-neural data set (Table 1) (p < 0.05) (See Methods). A similar difference is noted from a comparison of the fraction of transcripts detected on all of the pan-neural arrays. In this case, WT-Pico amplification identifies 9,198 present transcripts whereas the IVT-derived target reports 7,382 present calls (Table 2) [See Additional file 1]. The removal from these comparisons of duplicate probe sets (i.e. probe sets for the same gene) resulted in a final difference of 7,409 present genes in the WT-Pico-derived data set vs 6,354 present calls in the IVT profile ( Table 2). The greater number of present calls derived from the WT-Pico data sets is correlated with the finding that the WT-Pico target results in relatively less mismatch hybridization than the IVT sample [23]. For the combined IVTamplified pan-neural and reference samples, we find that 29 ± 0.5% of MM signals exceed the paired PM value whereas only 24 ± 1% of the WT-Pico derived signals show MM > PM ratios (p < 0.01) (Fig. 3). Similar results have been noted previously and attributed to the finding that mismatched RNA:DNA heteroduplexes are thermodynamically more stable than comparable DNA:DNA hybrids [23,24].

Neuron-enriched transcripts are identified in both the WT-Pico and IVT-amplified samples
To test the ability of the WT-Pico-amplified sample to detect differentially expressed transcripts, the pan-neural data set was compared to the reference profile obtained from all cells (see Methods). As expected, scatter plots reveal significant differences between these data sets with 1,625 transcripts showing elevated intensity values in the pan-neural sample vs 1,325 depleted mRNAs (Fig. 2e) [See Additional file 2]. (Similar results (Fig. 2f) were obtained by the IVT amplification method [8]). As an independent test of the validity of these data, the list of 1,625 transcripts showing elevated intensity values in the WT-Pico derived Pan-neural data set (i.e., "enriched genes") was compared to WormBase to identify the subset of transcripts previously described as expressed in neurons [8].
This analysis revealed 520 transcripts in the WT-Picoamplified data set with known expression patterns in vivo. Of these, 85% are annotated in WormBase as expressed in neurons (Fig. 4). This finding is comparable to the observation that 90% of the 524 transcripts in the enriched IVTamplified pan-neural profile with expression data in WormBase are also detected in neurons [See Additional file 3]. In both cases, the microarray profiles show a significant bias for authentic neuronal transcripts as only 55% of all genes with expression patterns listed in WormBase are neuronal (Fig. 4) [See Additional file 4]. These findings confirm that both the WT-Pico and IVT amplification methods detect transcripts that are differentially expressed in the C. elegans nervous system.
To estimate the concordance of these data, we compared normalized intensity values for differentially expressed transcripts identified by each method. Log2 of the IVT pan-neural/reference ratio was plotted versus that of the WT-Pico/reference (Fig. 5) for probe sets with present calls in all of the pan-neural samples (see Methods) [See Additional file 5]. The R2 value of 0.72 is indicative of significant correlation between these two amplification methods for the subset of transcripts that are detected in both, an outcome similar to that seen in previous comparisons of WT-Pico vs IVT [20,24].
We expanded this comparison to consider all probe sets on the C. elegans chip. These results are depicted in Fig. 6 in the form of a line graph in which the intensity values for all three of the IVT pan-neural replicates and for the Microarray data derived from WT-Pico amplified pan-neural samples are reproducible and enriched for neural genes   five WT-Pico pan-neural samples are normalized against the corresponding average reference intensities. Lines are color coded as enriched (red), depleted (blue) or unchanged (yellow) relative to the reference. Colors for each gene are fixed by the relative values of sample #3 (vertical bold white line) in the WT-Pico data set. This global analysis suggests an overall trend in which transcripts detected by both methods show similar patterns of differential expression. For example, 53 transcripts enriched in the IVT-derived pan-neural sample encode proteins with established or likely functions in neurotransmitter release at the synapse [8]. 37 (70%) of these genes are also enriched in the WT-Pico pan-neural data set and essentially all of these transcripts show intensity values greater than or equal to the reference (Fig. 6B). Similar results were obtained for transcripts encoding FMRFamide-like proteins (flps), a large family of peptide neurotransmitters that are largely restricted to the C. elegans nervous system (Fig. 6C) [26]. In addition to identifying similar trends in the relative intensity values of specific transcripts obtained by both methods, these line graphs also reveal a difference in the apparent overall spread of hybridization signals with the WT-Pico results showing a significantly larger dynamic range of differential expression vs the IVT data set (Fig. 6A, B). A similar result was obtained in a previous comparison of IVT vs WT-Pico derived microarray data [23].

WT-Pico and IVT amplified targets reveal distinct neural transcripts
A comparison of the pan-neural enriched transcripts detected in these microarray experiments identifies a core group of 1,044 genes that are detected by both the IVT and WT-Pico methods (Fig. 7) [See Additional file 6]. This analysis also revealed, however, that a comparable number of transcripts is selectively enriched in either the WT-Pico or IVT derived data sets; 581 transcripts are detected as enriched in the WT-Pico pan-neural sample but not in the IVT data set whereas 548 genes are specifically enriched in the IVT pan-neural profile but not in the WT-Pico pan-neural data set [See Additional file 6]. These findings were validated by comparison to independently derived data that measures the expression and function of these genes in the C. elegans nervous system in vivo. First, we established that a majority of genes in either the IVTonly or WT-Pico-only pan-neural enriched data sets with known expression patterns in WormBase are annotated as expressed in neurons (Fig. 7). Additional genetic data have established specific neural functions for a subset of these differentially detected genes. For example, the WT-Pico-only subset of pan-neural enriched transcripts includes rpm-1, an E3 ubiquitin ligase that regulates synaptic assembly (Fig. 2e) [27,28]. Similarly, the transcripts encoding the transmembrane protein MIG-13, which affects migration of the Q neuroblast and its descendants, is enriched exclusively in the IVT data set (Fig. 2f) [29]. These findings suggest that each method of RNA amplification may result in the detection of a unique subset of bona fide pan-neural enriched genes. We tested this idea by constructing GFP reporters for a representative set of genes listed in either the WT-Pico only or IVT pan-neural only enriched data sets (Table 3). In this approach, the upstream promoter or regulatory region of a specific gene is fused to GFP and reintroduced into the organism to monitor expression in the intact animal (see Methods). Nine transgenic lines were constructed from the WT-Picoonly data set. Neuronal GFP expression was confirmed in all 9 of these lines with reporters for two genes (ZC155.2, C07H6.1) showing GFP expression exclusively in neurons (Fig. 8). Similarly, 7 out of 8 (Table 3) GFP reporters for genes in the IVT-only neural enriched data set show expression in C. elegans neurons in vivo (Fig. 8). Thus, these results support the conclusion that the pan-neural enriched data sets generated by each of these methods are reliably detecting transcripts expressed in the C. elegans nervous system.  Signal/Noise is enhanced with WT-Pico vs IVT-amplified targets

The WT-Pico and IVT amplified samples identify C. elegans genes with homologs expressed in the mammalian brain
Microarray analysis of the IVT-amplified pan-neural sample detected 1,592 transcripts with elevated expression in C. elegans neurons [8] [See Additional file 3]. The independent microarray profile of these samples generated with the WT-Pico method has now identified an additional set of 581 neuron-enriched genes to yield a total of 2,173 transcripts that are highly expressed in the C. elegans nervous system (Fig. 7). Thus, the use of two alternative methods of RNA amplification has significantly expanded (~36%) the list of transcripts that are differentially expressed in C. elegans neurons. To assess the potential value of these additional data for studies of gene function in the nervous system, we identified a subset of genes in the WT-Pico-only list that are evolutionarily conserved but for which biochemical functions have not been previously assigned. This analysis yielded a total of 39 uncharacterized, highly conserved genes [See Additional file 7].
To determine if these transcripts are also expressed in mammalian neurons, we searched the Allen Brain Atlas, an online in situ hybridization database, for evidence of expression in the mouse brain [30]. in situ data are available for 27 apparent homologs of the C. elegans genes on our list of WT-Pico-only enriched transcripts; 74% of these genes (20/27) show expression in the mouse brain.
In the case of the IVT-only enriched transcripts, all seven of the uncharacterized, conserved genes for which in situ data are available in the Allen Brain Atlas are annotated as expressed in the mouse brain [8]. These results support the idea that genes that are uniquely detected by one of these amplification methods are likely to encode authentic neural transcripts and that these combined data can provide potentially valuable clues to gene expression in the human brain.

3' bias does not account for differentially enriched targets identified by either WT-Pico or IVT
WT-Pico uses a combination of Poly-dT and random priming to amplify RNA. In contrast, the first round of the IVT is limited to Poly-dT priming. We speculated that this inherent difference in the amplification procedures might bias IVT towards probesets near the 3' end of a transcript.
To test this hypothesis, each probeset identified as enriched by only IVT or only WT-Pico was mapped with the BLAT tool [31] to a unique chromosomal location in the WS170 assembly. From this position, we calculated the distance from the 3' end of the probeset to the 3' end of the gene it targets [See Additional file 8]. No statistically significant difference was found between the locations of the probesets unique to the WT-Pico method and those unique to the IVT method (p = 0.75). We therefore conclude that differential hybridization of WT-Pico vs IVTgenerated targets is not due to a systematic bias of either amplified sample for probe sets near the 3' end of targeted transcripts. It should be noted however, that the probe sets in the GeneChip expression arrays used in this study are largely directed towards the 3'-end of the transcripts and therefore would not detect WT-Pico derived targets originating from more 5' regions. In the future, it will be interesting to examine transcripts that are independently detected with either IVT or WT-Pico-derived samples for potential nucleotide sequences that could exert differen-Similar overall patterns of differential gene expression are observed for WT-Pico and IVT-amplified pan-neural samples tial effects on either RNA amplification or target hybridization.

Conclusion
We have confirmed that the WT-Pico method affords rapid and efficient RNA amplification with a higher fraction of present calls after microarray hybridization than targets amplified by the IVT protocol. The WT-Pico method is also technically easier to implement than IVT and requires significantly less time to perform. Although both approaches generate robust microarray profiles of gene expression in the C. elegans nervous system, a significant fraction of authentic neuron-enriched transcripts are uniquely identified by each of these methods of RNA amplification. Thus, the combined result obtained with both amplification strategies provides a more complete picture of neural gene expression than either sample alone. For cases in which RNA is limiting, as in the effort to profile single neuron types from C. elegans, the enhanced sensitivity of the WT-Pico method is advantageous.

Generating transgenic lines expressing GFP reporter genes
Promoter-GFP fusion genes were obtained from the Promoterome project and transgenic lines generated by microparticle bombardment as described [6]. Additional file 9 contains a list of strains described in this paper.

mRNA-tagging and RNA amplification
The "in vitro transcribed" or "IVT" microarray data sets used in this paper are described in a previous publication.
To generate these data sets, 25 ng of RNA from three panneural replicates and from five independent N2 (reference) samples was amplified by the IVT method [8]. 2 ng of these RNAs was amplified using version 1 of the WT-Ovation Pico System, which combines WT-Ovation™ Pico RNA Amplification System and target preparation according to fragmentation and labeling section of Ova-tion™ Biotin RNA Amplification and Labeling System as described in the User Guides [33]. Two of the previously prepared reference RNA preparations did not amplify by WT-Pico. Two additional samples were isolated by the mRNA-tagging method from the pan-neural transgenic line, SD1241 [8] for WT-Pico amplification to yield a total of five pan-neural replicates and three reference samples for the "WT-Pico" profiles. Thus, six of the eight pan-neural and reference data sets generated by each of the RNA amplification methods (IVT or WT-Pico) were obtained from identical RNA samples. A quantitative comparison of microarray results obtained from the two new pan-neural RNA samples (DMM10 and DMM11) used for the WT-Pico amplification vs the originally isolated pan-neural preparations (DMM2, DMM3, DMM4) also used for the IVT amplification [8] showed a broadly similar distribution of intensity values (R 2 > 0.91) (see Fig 2).

Microarray data analysis
Microarray data were processed as described [8,9]. Briefly, intensity values from each hybridization were scaled vs a global average signal from the same array and normalized by Robust Multichip Average analysis (RMA) [34]. To identify differentially expressed transcripts, normalized intensity values from the pan-neural data sets were compared to a reference (from all larval cells) using Significance Analysis of Microarray software (SAM) [35]. A twoclass unpaired analysis of the data was performed to identify neuron-enriched genes. Pan-neural enriched transcripts in the IVT and WT-Pico-derived data set were defined as 1.5X elevated vs the reference at a False Discovery Rate (FDR) = 3%. An earlier report describing the IVT- 48% overlap

WT-Pico and IVT Venn Diagram
amplified pan-neural data set utilized a more stringent FDR of 1% and therefore identified a smaller number of pan-neural enriched transcripts (1,562 vs 1,592 in this study) ( [8], this work). The data discussed in this manu-script are available in the NCBI Gene Expression Omnibus, series accession number GSE9485.
Promoter-GFP reporter genes confirm neural expression of transcripts from pan-neural enriched data sets  RMA normalized intensity values for all data sets were imported into GeneSpring GX 7.3 [37] to generate the line graphs shown in Fig. 6. Each Experimental data set was normalized vs the average intensity value for each probe set in the corresponding reference data set and plotted as log (Experimental/reference). Each vertical line represents an individual replicate for each Experimental sample.
p-values for total yield, number of present genes, and perfect vs mismatch probes were calculated using a two-tailed t-test with unequal variance.

3' bias analysis
IVT-only and WT-Pico only enriched transcripts were examined for 3' bias. C_elegans_target.fa was downloaded from Affymetrix. This file contains the reference sequence for each probeset on the array. The file Caenorhabditis_elegans.WB170.45.dna.seqlevel.fa was downloaded from Ensembl (Ensembl 45, based on Wormbase 170). Probesets were aligned to chromosomes using BLAT [31]. Where multiple alignments were found, the alignment that covered the longest portion of the probeset sequence was chosen. The genes and chromosomal locations of those genes were downloaded using Ensembl 45. The probeset distance from the 3' end of the gene was calculated. For genes on the (+) strand, the distance is given as (Gene End) -(Probeset End). For genes on the (-) strand, the distance is given as (Probeset Start) -(Gene Start). For probesets that correspond to multiple genes, the gene with the smallest absolute value of 3' distance was chosen. p-value was calculated using a 2-tailed t-test with equal variance.

Microscopy and identification of GFP expressing cells GFP-expressing animals were visualized by Differential
Interference Contrast (DIC) and epifluorescence optics in either a Zeiss Axioplan or Axiovert compound microscope. Digital images were recorded with CCD cameras (ORCA I or ORCA ER, Hamamatsu Corporation, Bridgewater, NJ).