A proteomics approach to decipher the molecular nature of planarian stem cells

Background In recent years, planaria have emerged as an important model system for research into stem cells and regeneration. Attention is focused on their unique stem cells, the neoblasts, which can differentiate into any cell type present in the adult organism. Sequencing of the Schmidtea mediterranea genome and some expressed sequence tag projects have generated extensive data on the genetic profile of these cells. However, little information is available on their protein dynamics. Results We developed a proteomic strategy to identify neoblast-specific proteins. Here we describe the method and discuss the results in comparison to the genomic high-throughput analyses carried out in planaria and to proteomic studies using other stem cell systems. We also show functional data for some of the candidate genes selected in our proteomic approach. Conclusions We have developed an accurate and reliable mass-spectra-based proteomics approach to complement previous genomic studies and to further achieve a more accurate understanding and description of the molecular and cellular processes related to the neoblasts.


Background
As we move further into the post-genomic era it becomes increasingly clear that DNA sequence data alone is insufficient to explain complex cellular and molecular processes. Although the enormous volume of data generated by genome sequencing projects, expressed sequence tags (ESTs), and cDNA analyses has improved our understanding of many processes, they often fail to reflect the influence of posttranscriptional modifications and protein interactions or offer a true reflection of protein levels or activity. Consequently, the role of specific proteins is relatively difficult to determine with confidence on the basis of mRNA expression or genomic data alone [1,2].
Proteomic approaches offer a more realistic description of protein function and its influence on cell dynamics. Although comparative analysis of phenotypically different biological samples, such as in diseased versus healthy tissue [3], remains a challenge, those studies raise the possibility of identifying the protein "signatures" that underlie key biological phenomena [4]. Furthermore, the use of bioinformatics to integrate data obtained using genomic and proteomic techniques could help to bypass the limitations of each approach and achieve a more comprehensive view of the information flow within cells.
Planarians, an emerging model system for the investigation of stem cell and regenerative biology, [5][6][7], have a unique population of stem cells called neoblasts (see Figure 1), which can give rise to all of the differentiated cell types present in the adult organism during regeneration or normal homeostasis [8,9]. Albeit a great deal is now known about the biology of these cells, most molecular data have come from cDNA and genomic analyses. The neoblasts are particularly suited to proteomic approaches, however, as they contain chromatoid bodies (CB) that are progressively lost during differentiation [10][11][12] and can be employed as a marker for undifferentiated cells. The CB complexes are mainly formed by proteins and latent mRNA molecules, which can distort the levels of gene expression in transcriptional analyses of neoblasts samples. Moreover, since the neoblasts are the only dividing cells in the planaria [5], they can be easily depleted by irradiation [13]. Thus, these unique characteristics make planarians an ideal system in which to explore the use of proteomics to analyze the biology of processes such cell differentiation, stem cell behavior, homeostasis and an array of other events. As a first step in the development of such an approach, here we describe the methodological establishment and validation of a proteomic analysis of the planarian neoblast.

Establishment of the planarian proteomic approach
Different methods were tested to achieve a consistent and reproducible pattern on two-dimensional (2D) gels. To optimize sample preparation, proteins were extracted from dissociated cells or from whole animals. The yield from dissociated cells was insufficient to establish an efficient 2D procedure. Furthermore, the reproducibility of the 2D gel pattern was poor (data not shown). Prior to extraction from whole animals, a short treatment with 2% cysteine chloride in planarian water was used to eliminate mucous production, which is known to interfere with molecular techniques [14]. Based on our tests and previous work by Collet and Baguñà [15], we established a consistent method for 2D analysis from planarian samples (Figure 2 and Additional File 1). The different lysis buffers and sample cleaning procedures tested are shown in Table 1. Between 50 and 1000 μg of total planarian proteins were loaded on 2D gels to establish the best sample quantity in terms of spot definition. From 100 to 500 μg the spot resolution was acceptable. We selected the 500 μg as the optimal amount of protein to load onto 2D gels to achieve the maximum number of spots. A minimum of 100 μg was necessary for spot detection. Different immobilized pH gradient strips were used and the second-dimension protocol was modified to avoid streaking problems ( Table 1). All these variables were tested on 12-cm 2D gels and scaled up to 24-cm gels for subsequent procedures.

Proteomic data
In order to identify proteins specifically expressed in neoblasts, we compared 2D patterns of two samples: wild type (WT) versus irradiated animals (IA). This method has been extensively used to study the effects of neoblast depletion [8,13]. Extractions were done 14 days after irradiation, when animals remained viable but cell proliferation was absent ( Figure 1). Once the protocol was set up and the spot patterns were reproducible ( Figure 2 and Additional File 2), the spots were compared and selected. Although spot labelling by silver staining and DIGE was consistent in each case, we did not succeed in obtaining a uniform pattern with the two techniques. Follow-up analysis was therefore done separately. With the aim of establishing the real potential of the silver-staining technique, only clear and conserved qualitative comparison based on silver staining was considered (present in WT sample and not present in irradiated sample). Image master 2D™software (from Amersham Biosciences) was used to analyze the scanned gels. However, the potential bottleneck of this proteomic approach is the image analysis. Many authors have highlighted the difficulties in obtaining good replicates [16], and this has now been partially overcome with the use of DIGE. Whereas our silver-staining results showed remarkable pattern conservation within replicates (Additional File 2), the numbers after spot image analysis showed some variability. In order to improve signal specificity we used two types of Figure 1 Neoblast depletion by irradiation and image of a neoblast shown by electron microscopy. Immunostaining with antiphosphorylated histone H3 (αH3P), labelling mitotic neoblasts in 3-day head-regenerating organisms: A, control; B, 75 Gy irradiated 3 days after irradiation; and C, 75Gy irradiated 14 days after irradiation. Whereas a high number of proliferating cells appear in control animals next to the blastema and some mitotic cells still remain 3 days after irradiation, no divisions are detected after 14 days, showing that neoblasts are completely eliminated at that time. D, Electron microscopy image of a neoblast cell. Cytoplasm (dim yellow) and nucleus (yellow) are highlighted for clarity. The red arrow indicates a chromatoid body. Scale bars: A-C = 0.5 mm, D = 3 μm. gels, one loaded with 100 μg and another with 500 μg of sample protein. The differences between irradiated and non-irradiated samples that were conserved in both sample loads and also had three surrounding reference spots in both experimental conditions were selected after reviewing the correspondence in Ip and Mw ( Figure 2). These restrictions reduced the number of selected spots substantially, but ensured a high degree of confidence in the differences selected, providing a better platform for validation of the technique. For DIGE staining, the standard protocol was followed without modifications and the analysis software was used with the default parameters. Only clear and conserved quantitative changes (>2-fold changes) were selected, drastically reducing the number of final candidate spots ( Figure 2). A total of 26 and 58 spots were selected for silver and DIGE staining, respectively ( Table 2).

Computational analyses
MASCOT [17] was tested against different open reading frame (ORF) datasets derived from NCBI-nr/RefSeq Figure 2 Two-dimensional gels used for the selection of differential spots. The proteomic approach shown compares the protein profile of a sample containing neoblast cells with one in which these cells have been depleted by irradiation. Upper panels show a comparison between two silver-stained 2D gels of a whole proteome from wild type and irradiated animals. Spots not present in the proteome of irradiated planarians are shown and lettered in red. These spots were selected and analyzed by mass spectrometry. Bottom panels show DIGE comparison of irradiated and wild type planarian proteomes. Spots that increase or decrease in the irradiated planarian proteome are shown in red and blue, respectively. These spots were included in the mass spectrometry analyses. (Immobiline Dry strip gels 24 cm) Linear pH 4-7, Linear pH 7-11, Non-linear pH 3-11.
All the different variables affecting protein sample production and 2D gel electrophoresis are listed on this table. Summarized data are shown for the 2D gel analyses. Image master 2DTM software (from Amersham Biosciences) was used to analyze the scanned gels. SIL, Silver staining; 100-SIL, 100 μg of total protein extract loaded on the gel; 500-SIL, 500 μg of total protein extract loaded on the gel; DIGE, differential in gel electrophoresis. [18,19], Schmidtea mediterranea ESTs [20], the contigs for the planarian genome WUSTL assembly version 3.1 [21], and S. mediterranea whole-genome shotgun reads (traces). Of those datasets only NCBI-nr and traces are discussed here; the former is routinely used on this kind of analyses, while the latter yielded the largest number of peptide assignments (unpublished results). MASCOT assigned 20,107 peptides to spectra for NCBI-nr, which mapped to 602 protein sequences. Sequences from traces contained in the "forward" database were reversed to produce a "decoy" database containing sequences of the same length and composition but a different distribution of trypsin targets to those from the "forward"; Figure 3 illustrates the whole process. MASCOT returned 50 hits per search on each trace database, both for "forward" and "decoy". This resulted in 100 hits per search, for a total MS-fingerprint of 83 different spots.
MASCOT predicted a total of 44,712 and 36,956 peptides for the forward and decoy databases, respectively, and these were mapped to 8300 unique ORFs (URFs), corresponding to 23,376 and 26,741 unique peptide sequences. When the same peptide was mapped on two or more URFs, the highest score was retrieved. Figure 4 shows the score distribution of the two sets of unique peptides. Assuming that the decoy database comprised reversed sequences, it would be expected that none of the peptide hits found there would be real. Assuming that by chance some of the peptide sequences predicted for this set could be similar to those from the forward database, we can thus consider a false-negative error rate in order to determine a score threshold for both datasets. On this basis, for a 5% false-negative error rate in the decoy database, 1337 peptides would be above the threshold. Ranking the list of peptides, sorting by score, and taking 5% of the highest scoring peptides, the score threshold was set at 55 (shown in all panels of Figure 4 as a vertical blue line). When applying that score cut-off to the peptides obtained from the forward database, 1249 of 23,376 unique peptides (5.34%) from that database were "decoy" filtered. Translating this to the 8300 URFs used to detect the peptides, 1728 of these had at least one significant "decoy" peptide mapped onto it or was aligned with one such URF sequence. Therefore, 20.82% of the URFs can be considered more reliable than the rest.
The sequences of all the URFs for the forward database were uploaded into the BLAST2GO software suite [22,23]. The first step was to compare those amino acid sequences to homologous proteins (using BLASTP against NCBI-nr, min e-value = 0.001, min hsp length = 25). Of the URFs Figure 3 Computational screening of protein candidates. Spectra fingerprints were analyzed by MASCOT, comparing the experimental peaks against those obtained in silico from sequence databases. URFs were derived from planarian genome traces. Small triangles correspond to peptides found by MASCOT, mapped on candidate protein sequences for both databases, RefSeqs and URFs. Due to the size of the URFs database, a decoy approach was taken to select significant protein sequences. Putative protein sequences were ranked prior to experimental validation, taking into account MASCOT scores, number of peptide hits per sequence, decoy score, as well as functional assignment by BLAST2GO. Figure 4 Selection of candidate peptides by decoy score threshold. Upper panels: histograms showing the distribution of the peptide scores (the maximum score was chosen when a peptide was mapped more than once to different open reading frames). Lower panels: scatter-plots comparing those peptide scores with the information content, in bits. Above a bit score of 2.5 (orange line), the peptide sequences can be considered of low complexity or repetitive. Decoy score threshold is depicted on all the panels as a vertical blue line, set at a score of 55 for our data.
with scores above decoy threshold, 1416 (81.94%) had at least a significant BLAST hit. In contrast, only 636 out of 6572 URFs with scores below the decoy threshold (10.71%) also had one or more significant BLAST hits. It was possible then to obtain a functional Gene Ontology (GO) annotation for those URFs having a BLAST hit against a known functionally annotated protein. Results of the functional annotation are summarized in Figure 5.
After GO assignment and the corresponding functional annotation of the sequences derived from our approach, enzyme codes were mapped by BLAST2GO when possible. With such codes it was possible to retrieve the KEGG pathway where the protein may play its role on the planarian molecular biology. However, less than one third of the sequences had a homologous gene/protein BLAST hit-especially for URFs dataset-, and from those many had a GO functional assignment.
A fraction of the sequences with at least one GO hit was linked to an enzyme code, which would be related to a component of the KEGG pathways: 1,670 of 2,804 clusters, mapping to 118 pathways, and 131 of 5,528 clusters, mapping to 35 pathways, for MASCOT results on RefSeq and URFs respectively. All 35 pathways for URFs were also found using the RefSeq dataset. The lower ratio for the URFs set can be explained by species specific sequences, proteins or functions that are not yet annotated on the reference databases. 297 RefSeq clustered sequences had a match to 171 enzyme codes for proteins distributed on the 118 pathways. 16 URFs clustered sequences had a match to 9 enzyme codes for proteins distributed on the 35 pathways. The enzymes can appear on several pathways, due to the hierarchical structure of the KEGG a match can be found on both, a general route as "Metabolic pathway", and a more specific process, such as "Glycolysis/Gluconeogenesis". Among the pathways found, metabolism routes of sugars and lipids were expected, as energy is required for cellular processes, regeneration among them. Nevertheless, there are few candidate sequences that will deserve further analyses, as they appear on pathways close to development and regeneration: "Selenoamino acid metabolism", "Retinol metabolism in animals", and "mTOR signaling pathway". Additional data, including figures of all those pathways with color-highlighted boxes for proteins found, is available on the planarian proteomics web page [24].

Gene profile
As depicted in Figure 5, the annotated proteins cover a wide range of biological processes, of which four main groups can be emphasized: proteins involved in energy production and metabolism (red dots in Figure 5); gene expression and transcription regulators (yellow squares); proteins related to development and differentiation (blue diamonds); and proteins involved in stress-response pathways and the apoptosis (purple stars). This functional distribution resembles the distributions described in previous studies of embryonic stem (ES) cells [25], proliferating cells [26], and differentiating neural stem cells [27], among others [28][29][30] (see corresponding table in Additional File 3). Additional protein sequence comparisons were performed using NCBI BLAST [31] (E-value < 10e-3) to extensively compare sets of candidate proteins from our RefSeq and URFs databases with the sequences described in those studies as stem-cell related. The same analysis was applied to the genes reported in two studies using high-throughput approaches to detect neoblast genes by RNAi-feeding [32] and by expression macrochip [33] (see corresponding table in Additional File 3). A total of 822 sequences out of 2801 (29.35%) from the RefSeq dataset and 50 out of 309 (16.18%) from the URFs dataset presented homology with at least one sequence in any of the studies. Yet only 52 (1.86%) from RefSeq and none from the URFs dataset had homology with sequences reported in the planarian studies.

Functional studies
We performed functional analyses on some candidates from our lists to further assess the quality and accuracy of the approach used. Candidates were selected from the RefSeq and the URFs from the traces (see Table 3). In the case of RefSeq candidates, the sequence was mapped onto the draft genome and primers were designed to clone a longer fragment of the protein for subsequent characterization. Three main groups of genes were selected. The first two groups were proteins belonging to the Ras superfamily of small GTPases and the heat shock proteins (HSP) family. The third group encompassed unrelated genes from different spots. The first family includes the genes Rab-11B, Rab-39 (vesicle and membrane traffic) [34][35][36] and Rac-1 (cytoskeleton regulation and apoptosis) [37,38]. The second family contains HSPs (40, 60 and 70 kDa) involved in a wide variety of processes [39][40][41]. The last group contained the transcription factor Hunchback-like (related to Drosophila axial polarity and neuroblast lineage) [42], PrkC (a kinase linked to apoptosis and other processes) [43,44] and LSm proteins (RNA processing and regulation) [45][46][47]. This gene selection was done because no direct relation with neoblasts was established previously, with the exception of the HSPs.
To assess the relationship between these genes and the neoblasts, we analyzed their expression patterns and RNAi phenotypes ( Figure 6). The observed expression patterns were variable. Some of the genes were expressed in the blastema ( Figure 6C and 6E), where neoblasts migrate to after division in order to regenerate the missing body parts. Others were expressed in the post-blastema (Figure 6B, D, G, H and 6I), where the neoblast population is amplified by division to generate the cells that will form the blastema. Finally, some genes were expressed in both blastema and postblastema ( Figure 6A and 6F). These expression patterns disappeared in late stages of regeneration or developed over time to correspond to the typical expression pattern for neoblasts, distributed throughout the parenchyma with no expression in the pharynx or at the head tip anterior to the eyes [5]. In addition, for some of the genes, expression was only detectable under regeneration conditions, in which neoblasts are known to proliferate at higher rates. In that case, expression was barely detectable when only a basal number of neoblast cells was present in intact adult animals ( Figure 6C, E and 6G). Therefore, the expression patterns for the candidate genes were consistent with neoblast expression. Since neoblasts are known to be the only source of cells for homeostasis and regeneration, the relationship between the selected genes and the neoblasts was validated by RNAi experiments [48,49]. All injected animals, both intact and regenerating, died within a few days or weeks, except in the case of Rab39 and Hunchback-like ( Figure 6B and 6G), for which no phenotype was observed in RNAi experiments. Intact planarians showed a gradual head regression followed by lysis after several weeks, as shown in Figure 6C, D and 6H. This phenotype has been linked to a lack of neoblast cells available for cell renewal [50]. In addition, regeneration was completely absent in fragments from RNAi-treated animals, which produced small blastemas that never differentiated, or no blastema at all with indented wounds, as illustrated in Figure 6A, E, F and 6I.
In a second screen to validate candidate URFs from the traces, the expression of some of these genes was analyzed by comparing intact and irradiated organisms. Whole-mount in situ hybridization in intact adult organisms revealed parenchymal expression consistent In intact animals, the signal is weak for some of the genes analyzed, although the genes for which expression was detectable presented a pattern with a typical neoblast distribution. This pattern encompasses the parenchyma of the whole body excluding the gut, pharynx, and the anterior region of the head. Knock-down experiments by RNA interference were performed to further address the association of the selected genes with neoblasts (A3-I3). Detectable phenotypes were obtained in all cases except for B3 and G3. A3, E3, F3 and I3 show the phenotypes affecting the regeneration process, while C3, D3, and H3 show phenotypes affecting the intact animals. Scale bars: 250 μm.
with a neoblast distribution, whereas this expression pattern was not present in irradiated animals ( Figure 7A-B). This is consistent with neoblast-related genes, since high-dose irradiation destroys neoblasts. Some genes showed additional expression around the CNS that may have been associated with a non-dividing neural precursor cell type. While this expression pattern remained after irradiation, the signal in the parenchyma disappeared ( Figure 7C-E). Finally, the planarian ortholog of C-type lectin-like was only expressed in the digestive system of irradiated organisms and never in intact animals ( Figure 7F), suggesting a role in cell renewal under stress conditions, given that the gut has the fastest cell turnover of all tissues. These data provide further support for the involvement of these candidate genes in processes linked to neoblast biology, such as proliferation, cell migration or the regulation of differentiation.

Discussion
The results of this study show that we have successfully developed a rapid and reliable method for 2D analysis of planarian protein samples (Figure 2 and Additional files 1 and 2). This approach will provide the basis for future proteomics studies that will increase our understanding of a number of biological processes, in planarians and beyond, building upon data obtained using genomics and cDNA-based approaches.
Proteomic studies can help to fill gaps on the annotation of the planarian genome. Despite the large number of entries already submitted, sequence databases such as NCBI [51] or UniProt [52] are far from complete. Recent metagenomic projects have identified novel putative protein sequences not present in current sequence databases, thus extending the range of biological functions that may be represented [53]. For instance, Yooseph et al [54] report up to 1 in 3 orphan ORFs from whole-genome shotgun sequencing of marine samples containing a mixture of prokaryotic organisms. Our findings indicate that MASCOT can assign substantially more peaks on those spots selected from 2D gels when using the Smed_URF database than with NCBI-nr/ RefSeq, as would be expected.
The use of ORF sequences in whole genomes without prior knowledge of where the genes, mainly the exons, are located presents a number of issues that can distort the measures used to discriminate between true and false peptide hits. These include the ratio of coding to non-coding sequences, which can be quite low (around 2% of coding regions for the human genome [55]), and the presence of more repetitive sequences in intergenic regions, despite the fact that some amino acid repeats are vital functional and structural regions in proteins [56]. Moreover, the experimental spectra are compared to simulated ones that were computed from putative protein-coding regions directly translated from genomic sequences of the same species, not from related homologs from different organisms at different phylogenetic distances.
Galindo et al. [57] described a novel family of eukaryotic coding genes consisting of peptides shorter than 50 amino acids (small ORFS [smORFs]) with key biological functions during Drosophila development. Therefore, future searches will have to take this into account, for instance removing any length constraint when building up the ORF databases.

Identification of proteins
Apart from the presence of metabolic proteins that indicate the high metabolic rate of neoblasts, several of the proteins detected in this analysis seem to be good candidates to be involved in neoblast-related functions, and thus in regeneration and tissue homeostasis. One of those, Smed-SmB, from the LSm family, has been analyzed in detail and shown to be essential for neoblast proliferation and maintenance [58]. Moreover, other candidates belonging to the HSP class of proteins have been linked to the biology of neoblasts in recent studies [59][60][61]. The experimental results described in this paper support the use of an ORF database built upon genomic sequences from the same species, which yields, as one might expect, more reliable results in subsequent proteomic searches, despite assuming nothing about the coding content of those ORFs. This will bridge the gap between proteomic and genomic approaches to extend our knowledge of the functional components of emerging model organisms.

An initial proteomic picture of the neoblasts
The genes identified in this study represent the first list of neoblast-related candidate genes identified using a proteomic approach in planarias (Table 3 and Additional file 4). The results show little correspondence to those of previous genomic studies [32,33]. Interestingly, however, a number of the genes reported in this analysis were also present in studies designed to identify stem cell-specific genes in other model organisms [25][26][27][28][29][30]. In addition, five of the neoblastrelated genes characterized through our proteomic approach (Hsp40, Hsp60, Hsp70, Chaperonin containing TCP1 theta subunit and Splicing factor 3b subunit 1) have also been analyzed in a planarian transcription macrochip, but only one of them was detected (Hsp60) [33]. These findings support our proteomic strategy as a complement to genomic approaches. Furthermore, the large number of putative neoblastrelated proteins identified in this proteomic study will be of invaluable help in future research investigating the biology of the neoblast.

Conclusions
We have developed a proteomic approach to characterize specific planarian stem-cell (neoblast) proteins. An accurate and reproducible method for protein purification, 2D gel electrophoresis and MS analysis was defined and an ORF database of species-specific genomic DNA was developed for peptide assignment of the retrieved MS spectra. Subsequent computational analyses yielded a list of annotated candidate proteins, some of which were functionally validated as neoblast-specific genes by RNAi and whole-mount in situ hybridization. Substantial overlap was observed between the candidate genes identified in our study and those reported from previous analyses of embryonic stem cells, thus validating the specificity of the approach. In addition, we detected novel sequence candidates and expression changes that merit further investigation in future studies to determine their role in stem-cell biology.

Sequences
The genome of S. mediterranea (strain S2F2) was sequenced and assembled at the Genome Sequencing Center (GSC) at Washington University in Saint Louis (WUSTL) [62,63]. It contains around 800 Mbp distributed on four chromosomes (2n = 8). The latest assembly version, v3.1 [21], comprises up to 90,000 sequences, which were reduced to 45,000 by means of pair-ends sequencing. Lengths of those sequences range from thousands to hundreds of thousands of nucleotides. During the assembly process, sequencing errors can be fixed by aligning different traces, but the software can also reduce polymorphisms and misplace those trace sequences because of the repeats. In order to overcome those limitations, a database of ORFs was produced directly from the set of the whole-genome shotgun reads. About 16 million traces were downloaded from the NCBI Trace Archive [64] and translated, without prior masking, into the six possible reading frames, taking into account only those ORF sequences longer than at least 50 amino acids. The ORFs were stored in a MySQL relational database along with the original sequences, to make it possible to retrieve the original nucleotide sequences and design probes for experimental validation. To reduce the large amount of sequence data produced and thus speed up the peptide searches by MASCOT [65], a set of URFs was derived from the set of ORFs with a checksum function to generate hash keys as unique identifiers for every sequence. A total of 54,382,803 ORFs were retrieved from 16,580,722 shotgun reads. This resulted in 28,946,081 URFs with properly formatted sequences to populate a MASCOT database. As MASCOT was not able to work with databases larger than 24 million entries, the original set was split into two databases. MASCOT results for both sets were then merged to get the final set of ORFs that had at least one peptide matching spectra. The probability of false matches increases when large databases, with millions of protein sequences, are used to detect a wide variety of possible candidate proteins in a sample [66,67]. To assess the significance of the peptide hits found by MASCOT, a decoy database was built by reversing all the URF sequences [68][69][70]. It was also split into two, as described above for the "forward" database. MASCOT was run separately on the decoy databases for all the mass fingerprints previously analysed with the original URF dataset.

Sample preparation
Protein samples were obtained from whole animals using a lysis buffer and heating. See Additional File 1 for further details.

Running 2D gels
First-dimension isoelectric focusing was performed on immobilized pH gradient strips (24 cm, pH 3-10) using an Ettan IPGphor system. Second-dimension SDS-PAGE was performed by laying the strips on 12.5% isocratic Laemmli gels (24 × 20 cm) cast in low-fluorescence glass plates on an Ettan DALT system. Details of the procedure are available in the Additional File 1.

Sample analysis
Gel spots were extracted and digested before analysis by MS. Then, MASCOT software (Matrix Science, London, UK) was used to search those spectra on different databases. All spectra were processed by PRIDE Converter software [72] and were submitted to the PRIDE database [73], project accession number is 15541. For details see Additional File 1. After careful selection of score thresholds for the predicted peptides (see the Results section for the values chosen and the final numbers of the filtered datasets), the sequences that allowed detection of the URFs were uploaded into BLAST2GO [22,23]. This software tool facilitates high-throughput integration of sequence data, homology to related species via NCBI-BLAST [31] and functional annotations of DNA or protein sequences based on the Gene Ontology (GO) classification [74]. MASCOT output files, selected peptide and protein sequences, as well as BLAS2GO results and KEGG summary, are available at the planarian proteomics materials web page [24].

RNA interference
Double-stranded RNAs (dsRNA) were produced by in vitro transcription (Roche) and injected into the gut of the planarians as described in Sánchez-Alvarado and Newmark [49]. Three aliquots of 32 nl (400-800 ng/μl) were injected on three consecutive days with a Drummond Scientific Nanoject injector (Broomall, PA). On the fourth or fifth day, some of the planarians were amputated while the rest were left intact. Control organisms were injected with water.

Additional material
Additional file 1: Details on Material and Methods. An extended description of the proteomics protocols applied to perform the analyses presented on this paper.
Additional file 2: Image scans of all silver-stained 2D gel replicates. Image scans of different and independent silver-stained 2D gels used in the study. A to D and the respective zooms, for the regions delimited by red squares, I to L, come from 100 μg of loaded samples. E to H and the respective zooms M to P correspond to 500 μg loaded samples. A, C, E and G are control samples. B, D, F and H are irradiated samples. Although the staining and running conditions were not exactly equivalent, one can observe that the spot pattern shown by all the gels is repetitive, which is more evident on the zoomed regions.
Additional file 3: Comparing the results presented in this manuscript with previously published studies relating to stem cells. Comparison of candidate neoblast protein sequences presented in this paper with genes reported in other proteomic studied to be related to stem cells [25][26][27][28][29][30] and with specific neoblast-related genes identified in two different high-throughput approaches [32,33]. From the URFs database, only those sequences with a positive decoy were selected. NCBI BLASTP [31] (min e-value = 0.001) was used on sequence comparison. Sequences were clustered according to their homology and they are listed in the table by their original GI identifier from the corresponding NCBI database.
Additional file 4: Table of peptide candidates. Listing of the sequence candidates obtained from the computational analysis of the raw proteomics data over the RefSeq and URF datasets (see the corresponding sheet on the spreadsheet file). Only those with a significant BLAST hit are shown (using BLASTP against NCBI-nr, min evalue = 0.001, min hsp length = 25). Genes described in detail in Table 3 are not included. The sequences in this table were built from sets of URFs derived from traces; we provide the corresponding trace identifiers from Genbank TraceDB [64]. Barcelona for their help in the proteomic work and analyses. We thank all members of the Saló group for advice and critical reading of the manuscript and Dr. Iain Patten for editorial advice. We are also grateful to the reviewers of the earlier version of the manuscript for their helpful comments. This work was supported by grants BFU-2005-00422 and BFU2008-01544/BMC from the Ministerio de Educación y Ciencia, Spain, and grant 2009SGR1018 from AGAUR (Generalitat de Catalunya, Spain). JFA started this project as a Juan de la Cierva post-doctoral fellow. E.F.T. and G.R.E. received an FPI fellowship from the Ministerio de Ciencia y Cultura.
Authors' contributions EFT, ES and JFA conceived of the study. EFT ran the 2D gels and counted the spots. JFA performed the computational analyses, compiled the sequence databases, processed the MASCOT results, and ran the GO functional and KEGG annotation. EFT ran the MASCOT searches and produced the initial BLAST annotation for RefSeq candidates. EFT and GRE performed the experimental validation of the selected protein candidates. All authors participated in its design and coordination, helped to draft the manuscript, and read and approved the final manuscript.