Our group has generated and deposited in public databases 163,586 ESTs derived from six developmental stages of S. mansoni . A total of 33,180 of these sequences were derived from adult worms. However, due to the normalizing approaches employed for preparing the cDNA libraries used for sequencing - ORESTES  and traditional normalized cDNA libraries , our sequences offered only a qualitative view of the parasite transcriptome. Sequencing of these cDNA clones provided a glimpse of gene expression from different life-cycle stages of the parasite with a dramatic gene-discovery impact. However, while cDNA sequencing from normalized libraries is a powerful tool for gene discovery, it is not adequate for determining quantitative gene expression patterns. As a complement to the qualitative analysis of the transcriptome of S. mansoni we have used SAGE to perform a quantitative evaluation of the adult-worms' transcriptome, one of the most complex life-cycle stages of S. mansoni, which expresses at least half of the genes transcribed in this organism . In order to quantify the gene expression in adult worms, we produced a SAGE library and generated 68,238 tags that have been clustered and assigned to genes.
The SAGE technique involves generation and sequencing of large numbers of short tags, defined by the occurrence of a recognition site for a type I restriction enzyme in the mRNA . Ideally, these tags are long enough to be unique to the transcript in question, and the number of copies of a given tag is proportional to the expression level of that transcript in the original mRNA pool. Limitations of the technique include the difficulty of tagging very rare transcripts when a reduced number of tags is generated, the possibility of non-specific tags (tags mapping to distinct transcripts) or transcripts that produce no tags, due to the absence of the restriction site or the poly-A tail . Microarray is the most used approach to evaluate gene expression in large-scale. However, this approach relies on the previous knowledge of gene sequences for the design of the array, and thus, the transcriptome coverage depends on how well defined is the gene set of the target organism. Also, gene quantification using microarrays depend on intensity of hybridization signal, which can be affected by many factors such as location of the probe with respect to the 3'-end of the message, length and G+C content of the probe and signal-to-noise ratios. Depending on the probe spotted, the intensity observed in microarray experiments may reflect the expression of either a single or multiple splicing isoforms for a given gene, making the comparisons with SAGE even more complex. Gene expression data produced by arrays are relative, while SAGE provides an absolute measure of expression. Unlike cDNA microarrays, gene expression analysis using SAGE does not depend on previous sequence knowledge and thus it opens up the possibility of discovering and evaluating the expression of new transcripts. However, the process of constructing and sequencing a SAGE library is laborious and expensive, with a final cost that is 5–10 × higher than microarrays. Another limitation of SAGE is that it limits the analysis of genes that contain restriction sites for the enzyme used to construct the library. In an analysis of 364 full-length S. mansoni genes available in public databases, we could not identify restriction sites for NlaIII (the enzyme used in our library) in 35 (9.6%) of them. An extrapolation of this would suggest that the frequency of expression of 90% of the S. mansoni genes expressed in adults could be evaluated by the SAGE approach employed here.
On the other hand, when 8,669 S. mansoni Unigene cluster sequences were evaluated, we observed that 2,193 clusters contained ESTs derived from adult worms. Only 169 of these clusters contained full-length sequences. When tags (rank 0 and rank 1) of these 169 clusters were considered, we observed that 132 (78%) were represented in our SAGE tag list. So, this alternative estimate shows that coverage of our SAGE tags was of about 78% of the genes expressed in adult worms. We also noted that 39 UniGene clusters, with no adult-worm derived ESTs in the cluster composition, had their expression confirmed in this stage by our SAGE data.
Comparing SAGE and EST data
To establish how the transcriptome derived from SAGE and ESTs can be compared to each other, we evaluated the relative distribution of SAGE and EST sequences over a set of 208 worm full-length mRNA sequences available in GenBank. The 208 full-length transcripts are covered by 26,888 ESTs and 9,589 SAGE tags. As expected, 42% of the SAGE tags that map to the set of 208 full-length genes are positioned in the last 20% of the transcripts. On the other hand, only 17% of the ESTs mapped to these genes cover this same 3' portion of the transcripts (see Additional file 3). This clearly results from the biased distribution of the ESTs that were produced using the ORESTES technique (94,308/110,328 ESTs available at the time of preparation of this manuscript) and shows the necessity of generating further S. mansoni ESTs from the 3' end of the transcripts, for a more complete knowledge of the schistosome transcriptome. This also points to a reduced overlap of the SAGE and available EST data, which will result in a poor coverage of low expressed genes by non-normalized 3' UTR ESTs and in the failure of SAGE-to-transcript assignment.
Indeed, from the total of 6,263 tags with frequency higher than one, 2,916 (46.6%) found no matches on the transcript databases used. As expected, this failure in finding the correspondent gene for a specific tag was found to be directly related to the low expression of the corresponding transcript, and its reduced coverage by ESTs. In fact this can be used as an indirect measurement of correlation of SAGE and EST coverage. Whereas 96% of the 50 most frequent tags or 92% of top 100 tags could be identified in a transcript, only 53% of all ditags (6,263 top) or only 40% of all 15,655 tags could be assigned to its correspondent gene. As the S. mansoni SAGE tags are usually located at 242 nt upstream from the 3' end of the transcripts (average position of the CATG tags in full length transcripts), this data clearly demonstrates that more 3' sequences from normalized cDNA libraries are required for deciphering the transcriptome of this parasite.
Putative poly-adenylation in S. mansoni
While the same tag can be mapped to many transcripts (indicating a conservation of a nucleotide motif), we also see that a single transcript might sometimes generate various different tags. This parallels to what happens in proteomic studies when the same protein sometimes generates different spots in a gel. The occurrence of multiple tags deriving from the same transcript could occur by methodological problems (such as an incomplete digestion by the anchoring enzyme or the presence of false-polyA tails) or due to biological features such as splicing variants in the transcript region containing the most 3' tag or as the result of the use of multiple poly-adenylation sites. Whereas the use of SAGE tags to evaluate alternative-splicing is more difficult, the occurrence of alternative poly-adenylation events could be evaluated with less assumptions. In order to reduce the impact of methodological aspects over the determination of alternative poly-adenylation events, we have not considered tags sequenced only once, ambiguous tags (those that could be mapped to different transcripts) or internal tags that appeared before long stretches of A's in the transcript, which could have been used as false polyA tails during the cDNA synthesis step .
After using the above described filters, consistent events of multiple tags in a single transcript were identified in 13 full length genes. Poly-adenylation events cause a reduction in the transcript size, blocking the transcription of portions of its 3' region, together with the most 3' restriction site of the enzyme used for constructing the SAGE library. The reduction of the 3' UTR observed here, caused by the alternative poly-adenylation was usually accompanied by a removal of a significant portion of the putative ARE transcript repertoire (Adenosine and Uridine-Rich Elements) . AREs are elements that can target host mRNAs towards rapid degradation (by a mechanism dependent on deadenylation), can repress their translation or can increase their stability [reviewed in ], dependent on the ligation of ARE binding proteins (ARE-BPs). The putative removal of AREs (observed in 11 out of the 16 putative poly-adenylation events), and the identification of ARE-BPs (such as hnRNPs, CUG-BP and nucleolin) in the transcriptome of S. mansoni, suggests that this parasite employs this mechanism for regulating mRNA stability. We should note that the occurrence of partial digestion with NlaIII seems to be rare here, as in our list of 15,655 distinct tags, not a single CATG (the restriction site for NlaIII) could be found.
Comparing transcriptome and proteome data
Some reports of proteomic analysis of different developmental stages of S. mansoni became recently available. Curwen et al. , presented an analysis of the four commonly used schistosome-soluble protein preparations (derived from cercariae, lung-stage, adults and eggs), finding 32 distinct proteins among the most expressed. In adult worms, 26 of the 40 most abundant spots were identified, and corresponded to 22 different proteins. According to Curwen et al. , the top 40 most abundant soluble proteins in adult worms, accounted for 27.4% of the total protein content of this stage. In our SAGE analysis, we reached a similar value as the 40 top genes were tagged by 12,364 tags or 21% of the total tags. When the top 10 most abundant adult-worm soluble proteins identified by Curwen et al.  are compared to our expression rank based on SAGE, we see that 5/10 proteins are ranked among our top 20 most abundant transcripts (14-3-3 homolog, GST28, FABP, fructose 1,6 bisphosphate aldolase and GAPDH). The remaining proteins vary in our ranking from 21st to 253th (see Additional file 2), suggesting higher stability and/or higher translation rates of these less transcribed genes, when pos-transcriptional events are acting as a second mechanism in the regulation of protein abundance.
RNA analysis by SAGE enabled the evaluation of genes coding for proteins whose physical-chemical properties impaired their analysis by 2D gel electrophoresis. An example is the determination of transcript abundance of priority vaccine candidates of the World Health Organization (such as Sm23 the 793th transcript with 13 tags and paramyosin the 1456th with 7 tags) that could not be evaluated by proteomic analysis  due to technical limitations, such as protein size or solubility, imposed by 2D gels.
The analysis of SAGE tags as to their mapping to genes coding for proteins classified into Gene Ontology functional categories, provides a general view of the parasite functions in terms of their relative frequency. From the data generated it is clear that in the adult stage, the parasite still undergoes intense cellular activity, possibly due to its accelerated membrane turnover as well as metabolic activities possibly involved with immune response evasion and the intense egg-laying activity. Furthermore, the large proportion of proteins potentially involved in defense mechanisms, suggests a dynamic interaction with host and its immune defense system.
The use of SAGE to interrogate the S. mansoni transcriptome
The most abundant tag identified here is 'ACTATTCGGG', a sequence tag that matches diverse isoforms of the gene encoding SmP14, or F10 eggshell protein family. The frequency of this tag strongly suggests that this is the most abundant mRNA species found in adult worms. This abundance is highly significant, especially if we consider the larger biomass of male worms as well as the male bias found in the sex ratio of S. mansoni infections . Indeed, among the top 5% most abundant transcripts of adult worms, we can find other eggshell-related genes such as P40 (146th most abundant transcript, with 56 tags), P19 (202nd with 42 tags) and P48 (356th with 26 tags), which advocates their importance in the early-stages of eggshell formation. We should observe that no tags could be identified for egg-secreted proteins (such as ESP3-6 and ESP15), suggesting their expression only in later stages of the eggshell development. The high expression of actin and myosin (heavy and light chains) was also observed, with the identification of their respective genes and gene-paralogs among the top 100 transcripts, reflecting the musculature as one of the major worm tissues. Among the 50 top transcripts, as expected, we observe the high abundance of 12 ribosomal-protein genes as well as genes that encode proteins involved in protein and carbohydrate metabolism. It is also interesting to note the high abundance of the gene that codes for a protein similar to thymosin beta (17th most abundant transcript in adult worms), especially due to its involvement with wound healing , its anti-inflammatory properties [34–36] and its possible involvement in the escape from the host immune system in malaria .