Digital gene expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea
BMC Genomics volume 16, Article number: 361 (2015)
The freshwater planarian Schmidtea mediterranea is recognised as a valuable model for research into adult stem cells and regeneration.
With the advent of the high-throughput sequencing technologies, it has become feasible to undertake detailed transcriptional analysis of its unique stem cell population, the neoblasts. Nonetheless, a reliable reference for this type of studies is still lacking.
Taking advantage of digital gene expression (DGE) sequencing technology we compare all the available transcriptomes for S. mediterranea and improve their annotation. These results are accessible via web for the community of researchers.
Using the quantitative nature of DGE, we describe the transcriptional profile of neoblasts and present 42 new neoblast genes, including several cancer-related genes and transcription factors. Furthermore, we describe in detail the Smed-meis-like gene and the three Nuclear Factor Y subunits Smed-nf-YA, Smed-nf-YB-2 and Smed-nf-YC.
DGE is a valuable tool for gene discovery, quantification and annotation. The application of DGE in S. mediterranea confirms the planarian stem cells or neoblasts as a complex population of pluripotent and multipotent cells regulated by a mixture of transcription factors and cancer-related genes.
During the last decade, there has been increasing interest in the use of Schmidtea mediterranea as a model organism for the study of stem cells. These freshwater planarians contain a population of adult stem cells known as neoblasts, which are essential for normal cell renewal during homeostasis and which confers them with amazing regeneration capabilities [1-4]. Although a number of studies based on massive RNA interference (RNAi) , gene inhibition , microarray , and proteomics [8,9] approaches have been carried out to identify the crucial neoblast genes responsible for their stemness, our understanding of their biology is far from complete. The use of next generation sequencing (NGS) technologies provides an opportunity to study these cells in depth at a transcriptional level. For that to be accomplished, however, a reliable transcriptome and genome references are required. Up to eight versions of the transcriptome for this organism have been published to date, making use of different RNA-Seq technologies [10-16], including one meta-assembly which slightly improves each one separately . Despite all these efforts, a consistent reference transcriptome is still lacking.
Some studies have provided quantitative data on transcripts and their respective assemblies, focusing on regeneration [13,17,18] or directly on neoblasts [11,14,15,19]. However, RNA-Seq suffers from an intrinsic bias that affects the quantification of transcript expression in a length-dependent manner. This bias is independent of the sequencing platform and cannot be avoided nor removed by increasing the sequencing coverage or the length of the reads. Furthermore, it cannot be corrected a posteriori during the statistical analysis (by transcript length normalization, for instance). Consequently, the quantification of the transcripts and the detection of differentially expressed genes is compromised [20-22]. Digital gene expression (DGE)  is a sequence-based approach for gene expression analyses, that generates a digital output at an unparalleled level of sensitivity [22,24]. The output is highly correlated with qPCR [25-27] and does not suffer from sequence-length bias. The combination of DGE and RNA-Seq data has been shown to help overcome the specific limitations of RNA-Seq , and the usefulness of DGE has been thoroughly demonstrated in research ranging from humans [26,29] to non-model organisms [22,24]. However, to date, DGE has not been extensively applied to the study of the planarian transcriptome.
Here, we have compiled and analyzed all the transcriptomic and genomic data available for S. mediterranea using DGE. This has facilitated an improved annotation and provided tools to ease the comparison and browsing of all the information available for the planarian community.
We have taken advantage of the resolution of DGE to quantitatively characterize isolated populations of proliferating neoblasts, their progeny, and differentiated cells through fluorescence-activated cell sorting (FACS) [30,31]. The resulting changes in transcription levels were analyzed to obtain transcript candidates for which an extensive experimental validation was performed. This has yielded new neoblast-specific genes, including many transcription factors and cancer-related homologous genes, confirming the validity of our strategy and the utility of the tools that we have implemented. Moreover, we provide a deeper molecular description of four of those candidates, the Smed-meis-like, and the three subunits of the Nuclear Factor Y (NF-Y) complex Smed-nf-YA, Smed-nf-YB-2, and Smed-nf-Y-C. Both families of genes are attractive candidates to be studied in planaria. The Meis family of transcription factors specify anterior cell fate and axial patterning , whereas the NF-Y complex is a heterotrimeric transcription factor that promotes chromatin opening and is involved in the regulation of a wide number of early developmental genes .
Results and discussion
Three DGE libraries were obtained from FACS-isolated cell populations X1 (proliferating stem cells, S/G2/M), X2 (a mix of stem cell progeny and proliferating, G0/G1), and Xin (differentiated cells, G0/G1)  (Additional file 1). 8,298,210 total reads were sequenced (X1: 3,641,099; X2: 3,488,712; Xin: 1,168,399), representing 98,156 distinct tags (X1: 70,849; X2: 24,621; Xin: 25,221), with an average of 84.5 reads per tag (X1: 51.4; X2: 141.7; Xin: 46.3). The distribution of the tags in each cell population can be observed in Additional file 2A. DGE is reported to achieve near saturation in genes detected after 6-8 million tags . Furthermore, for moderately to very highly expressed genes (>2 cpm) it occurs with three or even just two million tags [22,34]. Figure 1 shows that saturation was reached at around two million tags for most of the data sets which the distinct tags were mapped to, although the slope for the total number of distinct tags decreases without saturating. It is worth noting that all the reference transcriptome sets performed similarly, achieving a maximum near 20,000 mapped tags. However, when looking at how many distinct tags map to any of those transcriptomes, about 5,000 tags appear not to be shared among all of them (see the “All mapped” and the “All distinct” data series on Figure 1, and further details on mapping below).
A critical point in this kind of experiment has to do with the number of times a tag has to be seen so that it can be considered reliable. Discarding too many tags in an attempt to increase reliability will result in a loss of information whereas keeping all of them may generate background noise. To estimate the specificity of our tags and to establish an optimal cutoff for the minimum number of counts a tag should have in order not to be considered artefactual, we performed a series of simulations mapping iteratively randomized sets of our data. The results are summarized in Additional file 3 for the different cutoffs tested (1, 5, 10, 15 and 20 minimum occurrences of tags). For cutoffs higher than five there is no substantial gain in terms of specificity (the number of hits decreases less than one order of magnitude). Thus, we defined reliable tags as those sequenced five times or more and discarded the rest. Thereafter, for the subsequent computational and experimental analyses, only those tags occurring at least five times were considered. From the initial set of 98,156 distinct tags, 40,670 passed that cutoff (Additional file 2B).
The low technical variability of DGE and its high reproducibility, together with the digital quantification of transcripts, enables direct comparison of samples across different experiments, even from different laboratories [21,22,24-26,29,35]. That property allowed us to contrast our results with those from Galloni , who used DGE to identify neoblast genes by comparing irradiated versus control animals over the same strain of clonal S. mediterranea. A Venn diagram showing the similarity of the strategies can be seen in Additional file 4. From the total distinct tags, 31.38% (30,806 out of 98,156) were sequenced 10 times or more in our study, compared with just 11,28% (42,159 out of 373,532) in the irradiation strategy, indicating a greater representation of each tag. This suggests, as expected, that the cell-sorting approach has higher specificity. In addition, the strand-specific nature of DGE allows the discrimination of sense and antisense transcripts. Almost 30% of the transcripts successfully identified also presented antisense transcription, even though at lower levels than canonical transcription. This confirms the findings of the aforementioned study in planarians  and others , and shows that a large proportion of the genome is transcribed from both strands of the DNA. Although the purpose of these transcripts is still open to debate, evidences point to a post-transcriptional gene regulatory function .
Tag mapping to reference sequence data sets
An essential step in DGE is the recovery of the transcript represented by each tag. The nature of the DGE methodology, which generates reads of only 21 nucleotides, implies mapping short reads against a reference genome or a collection of ESTs to retrieve full-length sequences for the original transcripts. On the other hand, the short length facilitates the fast mapping of the tags against the reference sequence data set. To obtain the maximum number of transcripts, tags were mapped against the 94,876 S. mediterranea ESTs from the NCBI dbEST[39-42] and all the available transcriptomes (formally those can also be considered as ESTs libraries). 26,822 tags (65.95%) mapped over at least one set of ESTs/transcripts, leaving a huge number (34.05%) unmapped.
In an attempt to recover tags that did not map over the transcripts, tags were also mapped over the S. mediterranea genome assembly draft AUVC01 masked with the S. mediterranea repeats [23,43-45] (Table 1 and Figure 2). The overlap between transcriptomes was high. Although in most cases sets of reads mapping over a single transcriptome has a very low incidence, there were two cases where one could find a relatively small number of tags mapping to only one transcriptome: 327 tags (1.1%) for Labbé et al. 2012; 208 tags (0.7%) for Rohuana et al. 2012; 3,231 tags (10.7%) remarkably mapping only over the genome; and 26.1% of tags (10,617 out of 40,670) not mapping at all. For tags sequenced 10 times or more, the proportion of unmapped tags is similar: 20.5% (6,327 out of 30,806) (Additional file 2B). Even allowing up to two mismatches, 9.36% of the reads remain not mappable to the genome. This is still an important amount, considering that two mismatches is very permissive (it represents almost a 10% of nucleotide substitution in the read with respect to the reference sequence).
These results indicate that there will be a significant number of transcripts that are not represented yet neither in the current transcriptomic sets nor in the reference genome, despite their coverage depth [46-49], and may correspond, for instance, to weakly expressed genes . Mapping tags are expressed on average at 50.78 cpm, while non-mapping tags only at 19.85 cpm. Nonetheless, since transcriptomes currently available lack the complete annotation of 3’-UTR regions and the DGE libraries were made from the 3’-ends, reads that map to genomic sequences but not to current transcripts may potentially come from the 3’-UTR ends not yet sequenced. To evaluate this possibility, we have projected the transcriptome from Kao et al. 2013  over the genome and looked for the proximity of the tags mapping next to the 3’-end of the transcripts (Additional file 5). Downstream sequenced DGE tags account for 4.12% of all the possible CATG targets. This small amount of sequenced tags only mapping to the genome may correspond to potential novel unsequenced transcripts, alternative 3’-UTR exons of splicing isoforms, misannotated or alternative poly-adenylation sites, or even to non-coding RNAs not represented yet in the present transcriptome sets. Future RNA-Seq experiments may provide further sequence evidences supporting transcripts for those tags.
8,903 contigs from Smed454_90e—Smed454 from now on— showing significant expression changes (p < 0.001) were selected and, from those, 7,735 contigs presented a hit to a Pfam domain model (Figure 3). For those sequences having a significant hit to a known domain/protein, gene ontology (GO) analysis was performed in order to summarize changes on the biological processes and molecular functions due to the observed expression patterns of the enriched sets of transcripts. Those transcripts were classified according to the cell type in which they were mostly expressed, then their significant GO annotations were clustered (also taking into account their parent nodes in the ontology), to calculate the terms abundance log-odds ratio. Comparison of GO categories between transcripts predominantly expressed in X1, X2 or Xin cell fractions revealed significant patterns of enrichment as indicated in Additional file 6 (see also the “Transcriptomes” tables available from the web site—planarian.bio.ub.edu/SmedDGE—for specific GO terms assigned to each transcript).
The GO comparison between the neoblast population (X1) and the differentiated cells (Xin) reflects distinct functional signatures: X1 is enriched in ubiquitin-dependent protein catabolic process, nucleic acid binding, RNA-binding, helicase activity, ATP binding, translation, and nucleosome assembly; Xin most represented categories include actin binding, actin cytoskeleton organization, small GTPase mediated signal transduction, proteolysis, and calcium ion binding; whereas in X2, markers of secretory activity such as vacuolar transport are more abundant.
All tag mappings over the different transcriptome versions are available in the form of dynamic tables from our web site (planarian.bio.ub.edu/SmedDGE, Figure 4A). The relationship between Smed454, along with their domains and functional annotation, with the other reference transcriptomes described in this manuscript can be browsed on a subset of those tables. In order to establish the correspondence between the transcriptomes, a megablast—NCBI BLAST+ 2.2.29 —was performed, filtering the resulting hits afterwards by three levels of coverage (90%, 95% and 98%). Although the focus is set on Smed454, the user can reorder those tables by columns containing identifiers for other transcriptome versions or she can choose to jump to the transcriptome version specific summary table.
Moreover, the Smed454 contig browser [10,52]has been revamped into a more flexible interface based on GBrowse2 (planarian.bio.ub.edu/gbrowse/smed454_transcriptome). One can find there different types of annotation tracks: reads coverage, homology to known genes/proteins, hits to Pfam domains, and also the information of the tags mapped over the sequence. One track-specific GBrowse2 Perl module was modified to display DGE tags data, such as the sequence, counts and rank position. Further customization of the GBrowse2 configuration facilitates the access to most of that information in the form of pop-up summary boxes, but also by means of additional “Details” page (see yellow panel on the right side of Figure 4B).
This browser has been developed under the principle of easy accessibility, in the hope that it will become a useful and informative user friendly tool for experimental researchers in their daily work.
The validity of our approach is corroborated by the expression levels detected in 40 already known and well-characterized neoblast genes (Table 2), plus another 29 genes described in the literature with evidence of also being neoblast related (Table 3). As can be observed in Figure 5, both sets of genes show the expected expression pattern along the vertical right hyperbola, indicating a clear X1 specificity, with two exceptions overrepresented in X2: Smed-nlk-1 and Smed-prog-1, which is described to be found in postmitotic cells . Smed-dlx and Smed-sp6-9 are key genes in eye formation ; despite their localized activation, DGE was sensitive enough to identify both of them predominantly in the X1 subfraction. Moreover, we could detect expression of genes such as Smed-smg-1—which is described as broadly expressed through all tissues, including neoblasts —in both neoblasts and differentiated cells. Finally, 133 clones from two different studies [6,56] focussing on regeneration, stemness and tissue homeostasis are, indeed, significantly overexpressed in neoblasts (Additional file 7).
Based on their X1/Xin expression ratio, we selected a collection of potential new neoblast genes among the most represented in the X1 population. With the chosen candidates we performed expression pattern analysis by whole mount in situ hybridization (WISH) in irradiated animals. At different times after irradiation, as the neoblasts and its progeny decline, the hybridizationsignal disappears . The expression of 42 out of 47 genes tested was diminished or completely lost in irradiated animals (Table 4 and Additionalfile 8).
Although neoblasts are essential also during homeostasis for normal cell renewal, the phenotype becomes more evident during regeneration. Functional analyses were therefore carried out by RNAi followed by head and tail amputation in order to visualize defects in the regenerating process. From the 42 genes whose expression was affected by irradiation, 24 showed a phenotype after RNAi (Additional file 9), most of them preventing a successful regeneration and leading to the death of the animals, the usual phenotype for neoblast genes [58,59].
New neoblast genes
Interestingly, several of the new genes identified as neoblast genes correspond to transcription factors, which are key elements implicated in cell fate decisions. Furthermore, many are also homologous to cancer related genes. We briefly describe those that produce planarian regeneration impairment after RNAi (Additional file 9). The inhibition of six of them produce a reduced blastema with defective head and eyes. Smed-atf6A, is a cyclic AMP-dependent transcription factor, which interacts with the Nuclear Transcription Factor Y (NF-Y) complex (further analyzed later). Smed-ccar1, is a perinuclear phospho-protein that functions as a p53 coactivator modulating apoptosis and cell cycle arrest . Smed-hnrnpA1/A2B1, a component of the ribonucleosome, is involved in the packaging of pre-mRNA into hnRNP particles in embryonic invertebrate development  and in stem cells . Smed-srrt, modulates arsenic sensitivity, a carcinogenic compound that inhibits DNA repair . Smed-med7 and Smed-med27 belong to a mediator complex essential for the assembly of general transcription factors. Smed-ranbp2 is a member of the nuclear pore complex and is implicated in nuclear protein import. Within the same family, Smed-nup50 shows also a stronger phenotype. The knockdown of the other 14 genes prevents the formation of the blastema completely. Smed-gtf2E1 and Smed-gtf2F1, are components of the general transcription factors IIE and IIF. Smed-ncapD2 is necessary for the chromosome condensation during mitosis . Smed-pes1, is required in zebrafish for embryonic stem cell proliferation . Smed-rack1, is an intracellular adaptor of the protein kinase C in a variety of signaling processes. Smed-lin9, is related to the retinoblastoma pathway interacting with Retinoblastoma 1, which is required for cell cycle progression . All six different retinoblastoma binding proteins produce a non-blastema phenotype. The retinoblastoma pathway has been described to regulate stem cell proliferation in planarians  and some of its genes are already identified. Despite that, most of them are yet to be analyzed. Finally, Smed-rrM2B, is a subunit of the ribonucleotide reductase (RNR) complex required for DNA repair . Details on these genes as well as the rest of the genes tested from the X1 population can be examined in the Additional file 10.
The four remaining genes presenting an aberrant phenotype during regeneration when inhibited by RNAi are described in detail in the following two sections: the Smed-meis-like, a new member of the Meis family, and the three components of the Nuclear Factor Y complex, all of them found to be overexpressed in neoblasts.
Smed-meis-like is a member of the TALE-class homeobox family, similar to Meis genes, which was found to be overexpressed in the X1 subpopulation. This gene family is characterized by the presence of a homeobox domain with three extra amino acids between helices 1 and 2 . Some of its members can act as cofactors for Hox genes . In S. mediterranea, other members of the family have been described: Smed-prep , Smed-meis  and Smed-pbx [71,72].
WISH on intact animals shows that it is expressed in the cephalic ganglia, the pharynx, the tip of the head, and the parenchyma (Figure 6A). The downregulation observed three days after irradiation suggests that the parenchyma-associated expression is related to neoblasts and early postmitotic cells. To corroborate this, a double fluorescence in situ hybridization (FISH) together with the neoblast marker Smed-h2b  has been carried out (Figure 6B and Additional file 11A). Confocal microscopy shows colocalization of both genes in some cells, which confirms the expression of Smed-meis-like in neoblasts and, thus, the DGE results. Nevertheless, not all Smed-meis-like positive cells are expressing Smed-h2b, reinforcing the idea that Smed-meis-like is not exclusive of neoblasts.
Knockdown of Smed-meis-like through RNAi produced a diverse range of anterior regeneration phenotypes (Figure 6C), which can be explained by a different penetrance. The mildest phenotype produced a squared head with elongated and disorganized eyes. This phenotype was also clearly visible with fluorescence in situ hybridization (FISH) against Smed-opsin  and Smed-tph , which label the photoreceptor and the pigment cells of the eye (Figure 6D). In an intermediate phenotype, cyclopic animals are obtained, whereas in the strongest one there is no anterior blastema formation. This range of phenotypes can also be observed with the marker of brain branches Smed-gpas , which shows a gradual reduction of brain regeneration after Smed-meis-like inhibition. These results are also confirmed by the reduction of the brain signal of the pan-neural marker α-SYNAPSIN (Additional file 11B). Posterior regeneration was normal.
In the strongest phenotype, there is also no expression of the anterior markers Smed-notum  and Smed-sfrp-1 [76,77], and the marker of sensory-related cells Smed-cintillo (Figure 6E) . This indicates that Smed-meis-like is necessary for anterior identity. In contrast, expression of the posterior marker Smed-wnt-1  remains after Smed-meis-like inhibition. Thus, we can conclude that Smed-meis-like is necessary for anterior, but not for posterior regeneration.
Finally, immunohistochemistry against H3P (Figure 6F) shows a slight—but significant—decrease in proliferation in the whole animal (133.8 ±5.22 mitosis/mm2 in n=9 controls versus 94.6 ±4.06 cells/mm2 in n=9Smed-meis-like(RNAi), mean ±s.e.m.). This decline in mitosis is matched by the lack of progenitors of some anterior structures, indicating also defects in differentiation. Thus, eye progenitor cells, which are labeled with Smed-ovo , are not present in Smed-meis-like(RNAi) animals (Figure 6E).
The requirement for Smed-meis-like in anterior regeneration is similar to another member of the family, Smed-prep . This differential phenotype is also observed after the inhibition of other genes, such as Smed-egr4 , Smed-zicA [80,81] and Smed-FoxD . The milder phenotype, showing elongated eyes, is similar to the effect of Smed-meis(RNAi) , and also to the mild inhibition of Smed-bmp4 . Altogether, these results suggest that Smed-meis-like is important for eye and anterior regeneration, similarly to other members of the TALE-class homeobox family. However, given the lack of expression of Smed-meis-like in the eyes, the abnormal eye formation could be a consequence of the anomalous brain regeneration.
Nuclear Factor Y complex
The Nuclear Factor Y complex (NF-Y) is an important transcription factor composed by three subunits (NF-YA, NF-YB and NF-YC), each one encoded by a different gene. This heterotrimeric complex acts as both an activator and a repressor, and it regulates other transcription factors, including several growth-related genes, through the recognition of the consensus sequence CCAAT localized in the promoter region [84-88]. In addition, it has been reported that the NF-Y complex regulates the transcription of many important genes like Hoxb4, y-globin, TGF-beta receptor II, or the Major Histocompatibility Complex class II and Sox gene families . This large number of interactions makes the NF-Y complex an important mediator in a wide range of processes, from cell-cycle regulation and apoptosis-induced proliferation to development and several kinds of cancer .
In the sexual strain of S. mediterranea, an NF-YB is necessary to maintain spermatogonial stem cells . We have isolated a different NF-YB subunit (NF-YB-2), and also a member of the other two subunits (NF-YA and NF-YC). WISH shows that the three genes are expressed ubiquitously and in the cephalic ganglia (Figure 7A). Moreover, the expression decrease one day after irradiation indicating a linkage with stem cells, as described in other organisms . Double FISH of each NF-Y subunit together with Smed-h2b confirms the expression of this complex in neoblasts and also in some determined cells (Figure 7B and Additional file 12A).
It has been suggested that each NF-Y component could have a specific role . Therefore, to better understand the function of this complex, we knocked down each subunit separately. Although the penetrance varies depending on the subunit inhibited, the phenotype observed after RNAi treatment is the same. In intact non-regenerating animals, RNAi resulted in head regression, ventral curling and, finally, death by lysis (data not shown), as described for other neoblast-related genes [58,59]. After 11 days, head and tail amputated animals failed to regenerate properly, with a smaller brain and fewer brain ramifications as revealed by Smed-gpas (Figure 7C) and by α-SYNAPSIN (Additional file 12B). Furthermore, we observe an increase in the number of Smed-h2b + cells (Figure 7C,E), also in the area in front of the eyes, where there should not be undifferentiated neoblasts, even though mitosis are reduced (Figure 7D). There is also a decrease in the number of early postmitotic cells (Smed-nb.21.11e +) (Figure 7C,E), whereas late postmitotic cells (Smed-agat-1 +) do not present significant differences (Figure 7E) . These early progeny markers have recently been associated with epidermal renewal . Hence, the accumulation of neoblasts and the decrease of the subepidermal postmitotic population suggest a defect in the early stages of the differentiation process affecting the epidermal linage. The neural lineage may also be compromised according to the atrophied cephalic ganglia.
This work presents experimental validation of a collection of putative neoblast genes obtained from a DGE assay on cell fractions. As clearly depicted in the splashplot for the comparison of expression levels between X1, X2 and Xin fractions (Figure 5 and Additional file 13A), there are only a few transcripts specific to X2. The plot produced with the data provided by Labbé  from their RNA-Seq analysis on X1, X2 and Xin cell fractions for S. mediterranea shows a similar pattern (Additional file 13B). Moreover, comparison among the three sets using Pearson and Spearman correlations indicates that X1 and X2 are the most correlated populations (Additional file 14). Following these results, most of the transcripts expressed in X2 are also expressed in X1. Hence, X2 is a heterogeneous population that cannot be transcriptionally differentiated from X1 without a deeper discrimination method. In this regard, the strategy recently applied by van Wolfswinkel and collaborators using the last sequencing technology to obtain the transcriptome of individual cells , represents the most promising approach to deciphering the heterogenity of the neoblast progeny.
Randomization simulations also illustrate the specificity of the 21bp tags to detect real transcripts, corroborating previous estimations [29,46,48,49,95,96]. Furthermore, those results reinforce the assumption that most of the non-mapping tags will correspond to real transcripts [46-49], still lacking from reference data sets for this species. Antisense transcription was also detected, confirming previous reports [25,36,49]. Although further analysis will be required to determine whether this could explain a fraction of the “novel” tags, our primary focus was to characterize the canonical protein-coding transcripts. Due to the heterogeneity of this species genome, we would expect some variability-both at sequence and expression arising from individuals (the pool of animals taken for the samples), and cells (as they do not come from a cell culture). This could explain another fraction of tags not mapping onto the reference transcriptomes. Consequently, we were quite strict in the current manuscript to look for exact tag matches, taking into account that one or more mismatches represents a mappability issue even for finished transcriptomes of the quality of human  or Drosophila melanogaster .
DGE has proven to be reliable for transcript quantification and new gene identification in planaria. In this work, we have described a new member of the TALE-class homeobox family, Smed-meis-like. Similar to other members of this family, this gene seems to be involved exclusively on anterior polarity determination during regeneration. Given that the expression of this gene is not restricted to neoblasts, its role can also be important in committed cells. Our results with the NF-Y complex suggest that the knockdown of this complex blocks early differentiation of the epidermal and, probably, neural lineages, both belonging to the ectodermal line, generating a neoblast accumulation and deregulation. This effect has been shown in other organisms such as Drosophila, in which NF-Y knockout blocks differentiation of R7 neurons through senseless [89,99]. The majority of the new neoblast genes reported and validated in this study were found to participate in cell proliferation, cell cycle regulation, embryogenesis or development in other models, and many of them are involved in processes related to cancer. The pathways participating in tumorigenic processes and stem cell regulation are often the same, as has been proposed previously for planarians . These genes are probably fundamental for stem cell maintenance andthe control of proliferation in organisms with the capacity to regenerate , thus reinforcing the potential valueof S. mediterranea as an in vivo model for stem cell research .
Our DGE analysis pointed out a high resemblance among all the transcriptomes available for S. mediterranea. We have also shown the redundancy of the transcriptomes currently available for S. mediterranea in agreement with Kao , together with their incompleteness under the light of the DGE data. Although our results provide a comprehensive comparison among them, it would be desirable to agree on a unique transcriptome to be used by the whole community. To this end, the PlanMine initiative  is attempting to obtain consensus among the researchers on an appropriate reference. Nonetheless, the need for a completely sequenced and well-annotated genome remains. The DGE strategy can help in this endeavour, since short sequences can be rapidly projected over the reference genome or the transcriptome, even from different laboratories, in order to improve their annotation . Similarly, DGE allows the data generated to be reassessed as many times as required, as a more complete genome and transcriptome references for this species become available. Hence, the quantitative data provided here by DGE will prove useful in order to recover and annotate more undescribed genes in the future.
Planarians used in this study were from the asexual clonal line of S. mediterranea BCN10. Animals were maintained in artificial water and were starved at least seven days prior to experimentation.
Cell dissociation, cell sorting and RNA extraction
To trigger neoblast proliferation and differentation, two days head and tail regenerating animals were used for the preparation of the libraries. Three animals per library were used in order to obtain the required amount of RNA. Cell dissociation and FACS were carried out as described by Möritz  and Hayashi . Briefly, after cell staining with Calcein AM and Hoechst 33342 (Molecular Probes, Life Technologies), one million cells were separated for each population in a FACSAria sorter (Becton Dickinson) at the Scientific and Technological Centers of the University of Barcelona (CCiTUB) cytometry facilities. A representative plot of the cell populations after the sorting can be seen in Additional file 1A. Cells were directly collected in TRIzol LS (Life Technologies) at 4°C and maintained in ice to preserve RNA integrity. RNA extraction followed to obtain 1 μg of total RNA for each library. Quantification of RNA was assessed with a Nanodrop ND-1000 spectrophotometer (Thermo Scientific) and quality check was performed by capillary electrophoresis in an Agilent 2100 Bioanalyzer (Agilent Technologies) prior to librarypreparation.
Unlike RNA-Seq, this method only sequences a short read of a fixed length, named tag, derived from a single site proximal to the 3’-end of polyadenylated transcripts. This short read is later used to identify the full transcript. The number of times that the very same tag has been sequenced—its number of occurrences—is proportional to the abundance of the transcript which it belongs to. Since it only counts one sequence per transcript, its ability to quantify is not affected by the transcript length. For that reason, DGE is better suited for the detection of short transcripts and low expressed genes when compared with RNA-Seq [20-22].
Sequence tag preparation was done with Illumina’s DGE Tag Profiling Kit according to the manufacturer’s protocol as described . In short, the most relevant steps included the incubation of 1 μg of total RNA with oligo-dT beads to capture the polyadenlyated RNA fraction followed by cDNA synthesis. Then, samples were digested with NlaIII to retain a cDNA fragment from the most 3’ CATG proximal site to the poly(A)-tail. Subsequently, a second digestion with MmeI was performed, which cuts 17 bp downstream of the CATG site, generating, thus, the 21 bp tags.
Cluster generation was performed after applying 4pM of each sample to the individual lanes of the Illumina 1G flowcell. After hybridization of the sequencing primer to the single-stranded products, 18 cycles of base incorporation were carried out on the 1G analyzer according to the manufacturer’s instructions. Image analysis and base calling were performed using the Illumina pipeline, where tag sequences were obtained after purity filtering. Generation of expression matrices, data annotation, filtering and processing were performed by using the Biotag software (SkuldTech, France) .
Raw sequencing data in FASTQ format as well as processed tag sequences and their associated expressions have been deposited at NCBI Gene Expression Omnibus (GEO)  and are accessible through GEO Series accession number GSE51681 .
Comparison of expression data
Tag raw expression was normalized to counts per million (cpm). The statistical value of DGE data comparisons, as a function of tag counts, was calculated by assuming that each tag has an equal chance to be detected, in fair agreement with a binomial law. An internal algorithm allows the comparison between different libraries and measures the significance threshold for the observed variations and p-value calculation (see Mathematical Appendix of Piquemal et al. 2002 ).
Different Perl  scripts were designed for the subsequent analyses. All of them are available from the web site planarian.bio.ub.edu/SmedDGE.
A database with all the possible CATG + 17bp theoretical tag sequences was constructed for each one of the reference data sets. Tags were compared to these databases to identify all perfect matches and, when more than one tag mapped over the same transcript, only the tag closer to the 3’-end was considered. For the genome reference, 2 mismatches were also considered for unmappable tags with the SeqMap mapper [108,109].
In addition, tags were also mapped against a database of 8,662,308 CDS and 5,189 genomic sequences from bacteria directly downloaded from GenBank  repositories to check sample contaminations. Only two tags mapped on bacterial transcripts, confirming the purity of our libraries.
For the 3’-UTR prediction, all 23,020 contigs of the transcriptome from Kao et al. 2013 , were mapped over the genome using Exonerate 2.2.0  to characterize the putative 3’-UTR ends (poly-A sites were not predicted though). Apart from aligning the transcripts to the genomic contigs, the strand for the longest ORF contained was also considered to ensure proper transcript orientation. For each transcript, 1,000bp upstream and downstream regions around the genomic coordinate for the putative 3’-UTR ends found were considered to retrieve DGE tags (noted as transcripts 3’-end relative position in Additional file 5).
Libraries and reference sequence data sets randomization
Libraries and reference data sets were randomized using Perl  scripts and the Inline::C library to generate analogous sets of random sequences. This method resembles the original data sets in terms of size and nucleotide abundance in comparison with other approximations which generate virtual sequences based on mathematical distributions . 500 and 100 randomizations for each library and data sets respectively were generated. Mapping was performed using cutoffs of 1, 5, 10, 15 and 20 occurrences (Additional file 3).
Browsing data sets
The transcriptome browser shown in Figure 4B was initialized with the Smed454  contigs using the GBrowse2 engine . The browser also includeshigh-scoring segment pairs (HSPs) from whole-transcriptome BLAST searches performed over the UniProt database  (NCBI BLAST+ 2.2.29  with default parameters), as well as the Pfam [115,116] domains mapped by HMMER—with E-val =1 and domain E-val =1— on the six-frame translations for the contigs sequences. DGE tag sequences—together with the corresponding counts, normalized scores, their ranks, etc.—were uploaded to the GBrowse2 MySQL database, and they are shown in the browser using a customized version of the Bio::Graphics::Glyph::xyplot module.
Functional annotation was projected from the UniProt GO annotations over the homologous Smed454 contig sequences. Two-tailed hypergeometric test, which accounts for significant overrepresented (positive-tail) or under-represented (negative-tail), was performed by comparing the set of GO assigned to transcriptome contigs over-represented on each of the cell fractions against the set of GO annotations for the whole set of contigs. Significance threshold was set to p < 10-5 and the results are summarized in Additional file 6 for the different cell fraction sets.
New genes were named following the nomenclature proposed for S. mediterranea  based on their BLASTx homology—NCBI BLAST+ 2.2.29  with default parameters against the UniProt database —to its human homologous gene according to the official gene name approved by the HUGO Gene Nomenclature Committee (HGNC)  whenever possible, and trying to honor the names of other members of the family if they were already stated for S. mediterranea. When no significant homology for the corresponding gene was available, its characteristic domain found at the Pfam site [115,116] was used to identify it.
For experimental protocols requiring irradiated animals, irradiation was carried out at 75 Gy (1,66 Gy/minute) in a X-ray cabinet MaxiShot 200 (Yxlon Int.) at the facilities of the Scientific and Technological Centers of the University of Barcelona (CCiTUB).
In situ hybridization
WISH was conducted for gene expression analysis, as previously described [120,121]. Images from representative organisms of each experiment were captured with a ProgRes C3 camera (Jenoptik) through a Leica MZ16F stereomicroscope. Animals were fixed and hybridized at the indicated time points.
Fluorescence in situ hybridization
For double FISH animals were treated as described elsewhere . Confocal laser scanning microscopy was performed with a Leica SP2.
Immunostaining was carried out as described previously . The following antibodies were used: α-SYNORF-1, a monoclonal antibody specific for SYNAPSIN, which was used as a pan-neural marker  (1:50; Developmental Studies Hybridoma Bank); and α-phospho-histone H3 (H3P), which was used to detect mitotic cells (1:500; Cell Signaling Technology). Alexa 488-conjugated goat α-mouse (1:400) and Alexa 568-conjugated goat α-rabbit (1:1000; Molecular Probes) were used as secondary antibodies.
Double-stranded RNAs (dsRNA) were produced by in vitro transcription (Roche) and injected into the gut of the planarians as previously described . Three aliquots of 32 nl (400-800ng/ μl) were injected on three consecutive days with a Drummond Scientific Nanoject II injector. Head and tail ablation pre- and post-pharyngeally followed the fourth day. If no phenotype was observed after two weeks, a second round of injection and amputation was carried out in the same manner, unless otherwise stated. Control organisms were injected with gfp dsRNA.
Availability of supporting data
All data sets are fully available without restriction. Yet relevant data sets were already included within this article and its additional files, further supporting material, as well as updates, will be publicly available through the project web site [https://planarian.bio.ub.edu/SmedDGE].
Raw sequencing data in FASTQ format, along with processed tag sequences and their associated expressions, have been deposited at NCBI Gene Expression Omnibus (GEO) ; they are accessible through GEO Series accession number GSE51681 . Gene sequences and primers used for cloning are deposited at the GenBank  repository, the corresponding accession numbers for the gene sequences are listed on Table 4.
Three prime untranslated region
Counts per million
Digital gene expression
Expressed sequence tag
Fluorescence-activated cell sorting
Fluorescence in situ hybridization
Open reading frame
Three amino acid loop extension
Whole mount in situ hybridization
Reddien PW. Specialized progenitors and regeneration. Development. 2013; 140(5):951–7.
Rink J. Stem cell systems and regeneration in planaria. Dev Genes Evol. 2013; 223(1-2):67–84.
Elliott S, Sánchez Alvarado A. The history and enduring contributions of planarians to the study of animal regeneration. Rev Dev Biol. 2013; 2(3):301–26.
Baguñà J. The planarian neoblast: the rambling history of its origin and some current black boxes. Int J Dev Biol. 2012; 56(1-3):19–37.
Sánchez Alvarado A, Newmark PA. Double-stranded RNA specifically disrupts gene expression during planarian regeneration. Proc Natl Acad Sci USA. 1999; 96(9):5049–54.
Reddien PW, Bermange AL, Murfitt KJ, Jennings JR, Sánchez Alvarado A. Identification of genes needed for regeneration, stem cell function, and tissue homeostasis by systematic gene perturbation in planaria. Dev Cell. 2005; 8(5):635–49.
Rossi L, Salvetti A, Marincola FM, Lena A, Deri P, Mannini L, et al. Deciphering the molecular machinery of stem cells: a look at the neoblast gene expression profile. Genome Biol. 2007; 8(4):62.
Fernández-Taboada E, Rodríguez-Esteban G, Saló E, Abril JF. A proteomics approach to decipher the molecular nature of planarian stem cells. BMC Genomics. 2011; 12:133.
Böser A, Drexler HC, Reuter H, Schmitz H, Wu G, Schöler HR, et al. SILAC proteomics of planarians identifies ncoa5 as a conserved component of pluripotent stem cells. Cell Reports. 2013; 5(4):1142–55.
Abril JF, Cebrià F, Rodríguez-Esteban G, Horn T, Fraguas S, Calvo B, et al. Smed454 dataset: unravelling the transcriptome of Schmidtea mediterranea. BMC Genomics. 2010; 11(1):731.
Blythe MJ, Kao D, Malla S, Rowsell J, Wilson R, Evans D, et al. A dual platform approach to transcript discovery for the planarian Schmidtea mediterranea to establish RNAseq for stem cell and regeneration biology. PLoS One. 2010; 5(12):15617.
Adamidi C, Wang Y, Gruen D, Mastrobuoni G, You X, Tolle D, et al. De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics. Genome Res. 2011; 21(7):1193–200.
Sandmann T, Vogg MC, Owlarn S, Boutros M, Bartscherer K. The head-regeneration transcriptome of the planarian Schmidtea mediterranea. Genome Biol. 2011; 12(8):76.
Labbé RM, Irimia M, Currie KW, Lin A, Zhu SJ, Brown DD, et al. A comparative transcriptomic analysis reveals conserved features of stem cell pluripotency in planarians and mammals. Stem Cells. 2012; 30(8):1734–45.
Resch AM, Palakodeti D, Lu YC, Horowitz M, Graveley BR. Transcriptome analysis reveals strain-specific and conserved stemness genes in Schmidtea mediterranea. PLoS One. 2012; 7(4):34447.
Rouhana L, Vieira AP, Roberts-Galbraith RH, Newmark PA. PRMT5 and the role of symmetrical dimethylarginine in chromatoid bodies of planarian stem cells. Development. 2012; 139(6):1083–94.
Kao D, Felix D, Aboobaker A. The planarian regeneration transcriptome reveals a shared but temporally shifted regulatory program between opposing head and tail scenarios. BMC Genomics. 2013; 16(14):797.
Solana J, Kao D, Mihaylova Y, Jaber-Hijazi F, Malla S, Wilson R, et al. Defining the molecular profile of planarian pluripotent stem cells using a combinatorial RNAseq, RNA interference and irradiation approach. Genome Biol. 2012; 13(3):19.
Scimone M, Kravarik K, Lapan S, Reddien P. Neoblast specialization in regeneration of the planarian Schmidtea mediterranea. Stem Cell R. 2014; 3(2):339–52.
Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct. 2009; 4:14.
Raz T, Kapranov P, Lipson D, Letovsky S, Milos PM, Thompson JF. Protocol dependence of sequencing-based gene expression measurements. PLoS One. 2011; 6(5):19287.
Hong LZ, Li J, Schmidt-Kuntzel A, Warren WC, Barsh GS. Digital gene expression for non-model organisms. Genome Res. 2012; 21(11):1905–15.
Hanriot L, Keime C, Gay N, Faure C, Dossat C, Wincker P, et al. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome. BMC Genomics. 2008; 16(9):418.
Veitch NJ, Johnson PC, Trivedi U, Terry S, Wildridge D, MacLeod A. Digital gene expression analysis of two life cycle stages of the human-infective parasite, Trypanosoma brucei gambiense reveals differentially expressed clusters of co-regulated genes. BMC Genomics. 2010; 11(1):124.
’t Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RHAM, de Menezes RX, et al. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res. 2008; 36(21):141.
Asmann YW, Klee EW, Thompson EA, Perez EA, Middha S, Oberg AL, et al. 3’ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer. BMC Genomics. 2009; 10(1):531.
de Lorgeril J, Zenagui R, Rosa RD, Piquemal D, Bachère E. Whole transcriptome profiling of successful immune response to vibrio infections in the oyster Crassostrea gigas by digital gene expression analysis. PLoS ONE. 2011; 6(8):23142.
Philippe N, Bou Samra E, Boureux A, Mancheron A, Ruffle F, Bai Q, et al. Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome. Nucleic Acids Res. 2014; 42(5):2820–32.
Dong H, Ge X, Shen Y, Chen L, Kong Y, Zhang H, et al. Gene expression profile analysis of human hepatocellular carcinoma using SAGE and LongSAGE. BMC Med Genomics. 2009; 2(1):5.
Hayashi T, Asami M, Higuchi S, Shibata N, Agata K. Isolation of planarian X-ray-sensitive stem cells by fluorescence-activated cell sorting. Dev Growth Differ. 2006; 48(6):371–80.
Moritz S, Stöckle F, Ortmeier C, Schmitz H, Rodríguez-Esteban G, Key G, et al. Heterogeneity of planarian stem cells in the S/G2/M phase. Int J Dev Biol. 2012; 56(1-3):117–25.
Mukherjee K, Bürglin TR. Comprehensive analysis of animal TALE homeobox genes: new conserved motifs and cases of accelerated evolution. J Mol Evol. 2007; 65(2):137–53.
Bhattacharya A, Deng J, Zhang Z, Behringer R, de Crombrugghe B, Maity S. The B subunit of the CCAAT box binding transcription factor complex (CBF/NF-Y) is essential for early mouse development and cell proliferation. Cancer Res. 2003; 63(23):8167–72.
Zhang X, Hao L, Meng L, Liu M, Zhao L, Hu F, et al. Digital Gene Expression tag profiling analysis of the gene expression patterns regulating the early stage of mouse spermatogenesis. PLoS ONE. 2013; 8(3):58680.
Gowda M, Jantasuriyarat C, Dean RA, Wang GL. Robust-LongSAGE (RL-SAGE): a substantially improved LongSAGE method for gene discovery and transcriptome analysis. Plant Physiol. 2004; 134(3):890–7.
Galloni M. Global irradiation effects, stem cell genes and rare transcripts in the planarian transcriptome. Int J Dev Biol. 2012; 56(1-3):103–16.
Taft A, Vermeire J, Bernier J, Birkeland S, Cipriano M, Papa A, et al. Transcriptome analysis of Schistosoma mansoni larval development using serial analysis of gene expression (SAGE). Parasitology. 2009; 136(05):469.
Pelechano V, Steinmetz L. Gene regulation by antisense transcription. Nat Rev Genet. 2013; 14(12):880–93.
Boguski MS, Lowe TMJ, Tolstoshev CM. dbEST - database for “expressed sequence tags”. Nat Genet. 1993; 4(4):332–3.
Sánchez Alvarado A, Newmark PA, Robb SM, Juste R. The Schmidtea mediterranea database as a molecular resource for studying platyhelminthes, stem cells and regeneration. Development. 2002; 129(24):5659–65.
Zayas RM, Bold TD, Newmark PA. Spliced-leader trans-splicing in freshwater planarians. Mol Biol Evol. 2005; 22(10):2048–54.
Zayas RM, Hernández A, Habermann B, Wang Y, Stary JM, Newmark PA. The planarian Schmidtea mediterranea as a model for epigenetic germ cell specification: analysis of ESTs from the hermaphroditic strain. Proc Natl Acad Sci U S A. 2005; 102(51):18491–6.
The Schmidtea mediterranea Genome Sequencing Project. http://genome.wustl.edu/genomes/detail/schmidtea-mediterranea.
NCBI Genome Database: Schmidtea mediterranea. http://www.ncbi.nlm.nih.gov/genome/232.
Reverter A, McWilliam S, Barris W, Dalrymple B. A rapid method for computationally inferring transcriptome coverage and microarray sensitivity. Bioinformatics. 2004; 21(1):80–9.
Saha S, Sparks A, Rago C, Akmaev V, Wang C, Vogelstein B, et al. Using the transcriptome to annotate the genome. Nat Biotechnol. 2002; 20(5):508–12.
Chen J, Sun M, Lee S, Zhou G, Rowley J, Wang S. Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc Natl Acad Sci U S A. 2002; 99(19):12257–62.
Pleasance E, Marra M, Jones S. Assessment of SAGE in transcript identification. Genome Res. 2003; 13(6):1203–15.
Keime C, Sémon M, Mouchiroud D, Duret L, Gandrillon O. Unexpected observations after mapping LongSAGE tags to the human genome. BMC Bioinf. 2007; 8(1):154.
Hene L, Sreenu V, Vuong M, Abidi S, Sutton J, Rowland-Jones S, et al. Deep analysis of cellular transcriptomes-LongSAGE versus classic MPSS. BMC Genomics. 2007; 8(1):333.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinf. 2009; 10(1):421.
Smed, 454 Contig Browser. https://planarian.bio.ub.edu/datasets/454/planarian_menu.php.
Eisenhoffer G, Kang H, Sánchez Alvarado A. Molecular analysis of stem cells and their descendants during cell turnover and regeneration in the planarian Schmidtea mediterranea. Cell Stem Cell. 2008; 3(3):327–39.
Lapan S, Reddien P. Transcriptome analysis of the planarian eye identifies ovo as a specific regulator of eye regeneration. Cell Rep. 2012; 2(2):294–307.
González-Estévez C, Felix DA, Smith MD, Paps J, Morley SJ, James V, et al. SMG-1 and mTORC1 act antagonistically to regulate response to injury and growth in planarians. PLoS Genet. 2012; 8(3):1002619.
Wenemoser D, Lapan SW, Wilkinson AW, Bell GW, Reddien PW. A molecular wound response program associated with regeneration initiation in planarians. Genes Dev. 2012; 26(9):988–1002.
Wolff E, Dubois F. Sur la migration des cellules de regeneration chez les planaires. Rev Suisse Zool. 1948; 55:218–27.
Reddien P, Oviedo N, Jennings J, Jenkin J, Sánchez Alvarado A. SMEDWI-2 is a PIWI-Like protein that regulates planarian stem cells. Science. 2005; 310(5752):1327–30.
Guo T, Peters A, Newmark P. A Bruno-like gene is required for stem cell maintenance in planarians. Dev Cell. 2006; 11(2):159–69.
Kim JH, Yang CK, Heo K, Roeder RG, An W, Stallcup MR. CCAR1, a key regulator of mediator complex recruitment to nuclear receptor transcription complexes. Mol Cell. 2008; 31(4):510–9.
Ball EE, Rehm EJ, Goodman CS. Cloning of a grasshopper cDNA coding for a protein homologous to the A1, A2/B1 proteins of mammalian hnRNP. Nucleic Acids Res. 1991; 19(2):397.
Rigbolt KTG, Prokhorova TA, Akimov V, Henningsen J, Johansen PT, Kratchmarova I, et al. System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation. Sci Signal. 2011; 4(164):3.
Rossman T, Wang Z. Expression cloning for arsenite-resistance resulted in isolation of tumor-suppressor fau cdna: possible involvement of the ubiquitin system in arsenic carcinogenesis. Carcinogenesis. 1999; 20(2):311–6.
Kimura K, Cuvier O, Hirano T. Chromosome condensation by a human condensin complex in Xenopus egg extracts. J Biol Chem. 2001; 276(8):5417–20.
Allende ML, Amsterdam A, Becker T, Kawakami K, Gaiano N, Hopkins N. Insertional mutagenesis in zebrafish identifies two novel genes, pescadillo and dead eye, essential for embryonic development. Genes Dev. 1996; 10(24):3141–55.
Knight AS, Notaridou M, Watson RJ. A Lin-9 complex is recruited by B-Myb to activate transcription of G2/M genes in undifferentiated embryonal carcinoma cells. Oncogene. 2009; 28(15):1737–47.
Zhu SJ, Pearson BJ. The Retinoblastoma pathway regulates stem cell proliferation in freshwater planarians. Dev Biol. 2013; 373(2):442–52.
Nakamura Y, Tanaka H, Arakawa H, Yamaguchi T, Shiraishi K, Fukuda S, et al. A ribonucleotide reductase gene involved in a p53-dependent cell-cycle checkpoint for DNA damage. Nature. 2000; 404(6773):42–9.
Bertolino E, Reimund B, Wildt-Perinic D, Clerc RG. A novel homeobox protein which recognizes a TGT core and functionally interferes with a retinoid-responsive motif. J Biol Chem. 1995; 270(52):31178–88.
Felix DA, Aboobaker AA. The TALE class homeobox gene Smed-prep defines the anterior compartment for head regeneration. PLoS Genet. 2010; 6(4):1000915.
Chen CC, Wang IE, Reddien PW. pbx is required for pole and eye regeneration in planarians. Development. 2013; 140(4):719–29.
Blassberg RA, Felix DA, Tejada-Romero B, Aboobaker AA. PBX/extradenticle is required to re-establish axial structures and polarity during planarian regeneration. Development. 2013; 140(4):730–9.
Fraguas S, Barberán S, Cebrià F. EGFR signaling regulates cell proliferation, differentiation and morphogenesis during planarian regeneration and homeostasis. Dev Biol. 2011; 354(1):87–101.
Iglesias M, Almuedo-Castillo M, Aboobaker AA, Saló E. Early planarian brain regeneration is independent of blastema polarity mediated by the Wnt/ β-catenin pathway. Dev Biol. 2011; 358(1):68–78.
Petersen CP, Reddien PW. Polarized notum activation at wounds inhibits wnt function to promote planarian head regeneration. Science. 2011; 332(6031):852–5.
Gurley KA, Rink JC, Sánchez Alvarado A. Beta-catenin defines head versus tail identity during planarian regeneration and homeostasis. Science. 2008; 319(5861):323–7.
Petersen C, Reddien P. Smed-betacatenin-1 is required for anteroposterior blastema polarity in planarian regeneration. Science. 2008; 319(5861):327–30.
Oviedo NJ, Newmark PA, Sánchez Alvarado A. Allometric scaling and proportion regulation in the freshwater planarian Schmidtea mediterranea. Dev Dyn. 2003; 226(2):326–33.
Fraguas S, Barberán S, Iglesias M, Rodríguez-Esteban G, Cebrià F. egr-4, a target of EGFR signaling, is required for the formation of the brain primordia and head regeneration in planarians. Development. 2014; 141(9):1835–47.
Vogg MC, Owlarn S, Pérez Rico YA, Xie J, Suzuki Y, Gentile L, et al. Stem cell-dependent formation of a functional anterior regeneration pole in planarians requires Zic and Forkhead transcription factors. Dev Biol. 2014; 390(2):136–48.
Vásquez-Doorman C, Petersen CP. zic-1 expression in planarian neoblasts after injury controls anterior pole regeneration. PLoS Genet. 2014; 10(7):1004452.
Scimone ML, Lapan SW, Reddien PW. A forkhead transcription factor is wound-induced at the planarian midline and required for anterior pole regeneration. PLoS Genet. 2014; 10(1):1003999.
González-Sastre A, Molina MD, Saló E. Inhibitory Smads and bone morphogenetic protein (BMP) modulate anterior photoreceptor cell number during planarian eye regeneration. Int J Dev Biol. 2012; 56(1-2-3):155–63.
Dolfini D, Mantovani R. Targeting the Y/CCAAT box in cancer: YB-1 (YBX1) or NF-Y?Cell Death Differ. 2013; 20(5):676–85.
Maity S, Golumbek P, Karsenty G, de Crombrugghe B. Selective activation of transcription by a novel CCAAT binding factor. Science. 1988; 241(4865):582–5.
Hatamochi A, Golumbek P, Van Schaftingen E, de Crombrugghe B. A CCAAT DNA binding factor consisting of two different components that are both required for DNA binding. J Biol Chem. 1988; 263(12):5940–7.
Hooft van Huijsduijnen R, Bollekens J, Dom A, Benoist C, Mathis D. Properties of a CCAAT box-binding protein. Nucleic Acids Res. 1987; 15(18):7265–82.
Kim C, Sheffery M. Physical characterization of the purified CCAAT transcription factor, alpha-CP1. J Biol Chem. 1990; 265(22):13362–9.
Yoshioka Y, Ly L, Yamaguchi M. Transcription factor NF-Y is involved in differentiation of R7 photoreceptor cell in Drosophila. Biol Open. 2012; 1(1):19–29.
Ly L, Yoshida H, Yamaguchi M. Nuclear transcription factor y and its roles in cellular processes related to human disease. Am J Cancer Res. 2013; 3(4):339–46.
Wang Y, Stary J, Wilhelm J, Newmark P. A functional genomic screen in planarians identifies novel regulators of germ cell development. Genes Dev. 2010; 24(18):2081–92.
Zhu J, Zhang Y, Joe G, Pompetti R, Emerson S. NF-Ya activates multiple hematopoietic stem cell (HSC) regulatory genes and promotes HSC self-renewal. Proc Natl Acad Sci. 2005; 102(33):11728–33.
Benatti P, Dolfini D, Viganò A, Ravo M, Weisz A, Imbriano C. Specific inhibition of NF-Y subunits triggers different cell proliferation defects. Nucleic Acids Res. 2011; 39(13):5356–68.
van Wolfswinkel J, Wagner D, Reddien P. Single-cell analysis reveals functionally distinct classes within the planarian stem cell compartment. Cell Stem Cell. 2014; 15(3):326–39.
Unneberg P, Wennborg A, Larsson M. Transcript identification by analysis of short sequence tags-influence of tag length, restriction site and transcript database. Nucleic Acids Res. 2003; 31(8):2217–26.
Li B, Ruotti V, Stewart R, Thomson J, Dewey C. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2009; 26(4):493–500.
Koehler R, Issac H, Cloonan N, Grimmond SM. The uniqueome: a mappability resource for short-tag sequencing. Bioinformatics. 2010; 27(2):272–4.
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014; 15(2):121–32.
Morey M, Yee S, Herman T, Nern A, Blanco E, Zipursky SL. Coordinate control of synaptic-layer specificity and rhodopsins in photoreceptor neurons. Nature. 2008; 456(7223):795–9.
Pearson BJ, Sánchez Alvarado A. A planarian p53 homolog regulates proliferation and self-renewal in adult stem cell lineages. Development. 2009; 137(2):213–21.
Oviedo N, Beane W. Regeneration: the origin of cancer or a possible cure?Semin Cell Dev Biol. 2009; 20(5):557–64.
Pearson B, Sánchez Alvarado A. Regeneration, stem cells, and the evolution of tumor suppression. Cold Spring Harb Symp Quant Biol. 2008; 73(0):565–72.
Piquemal D, Commes T, Manchon L, Lejeune M, Ferraz C, Pugnère D, et al. Transcriptome analysis of monocytic leukemia cell differentiation. Genomics. 2002; 80(3):361–71.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 2013; 41(D1):991–5.
NCBI Gene Expression Omnibus. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51681.
The Perl Programming Language. https://www.perl.org.
Jiang H, Wong W. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics. 2008; 24(20):2395–6.
SeqMap - A Tool for Mapping Millions of Short Sequences to the Genome. http://www-personal.umich.edu/~jianghui/seqmap.
Benson D, Clark K, Karsch-Mizrachi I, Lipman D, Ostell J, Sayers E. Genbank. Nucleic Acids Res. 2013; 42(D1):32–7.
Slater G, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinf. 2005; 6(1):31.
GBrowse, 2. http://gmod.org/wiki/GBrowse.
Dimmer E, Huntley R, Alam-Faruque Y, Sawford T, O’Donovan C, Martin M, et al. The UniProt-GO Annotation database in 2011. Nucleic Acids Res. 2011; 40(D1):565–70.
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res. 2014; 42(D1):222–30.
Pfam Database of Protein Families. http://pfam.xfam.org.
HMMER: Biosequence Analysis Using Profile Hidden Markov Models. http://hmmer.org.
Reddien PW, Newmark PA, Sánchez Alvarado A. Gene nomenclature guidelines for the planarian Schmidtea mediterranea. Dev Dyn. 2008; 237(11):3099–101.
Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA. Genenames.org: the HGNC resources in 2013. Nucleic Acids Res. 2013; 41(D1):545–52.
Molina M, Saló E, Cebrià F. The BMP pathway is essential for re-specification and maintenance of the dorsoventral axis in regenerating and intact planarians. Dev Biol. 2007; 311(1):79–94.
Cebriá F, Guo T, Jopek J, Newmark P. Regeneration and maintenance of the planarian midline is regulated by a slit orthologue. Devel Biol. 2007; 307(2):394–406.
King R, Newmark P. In situ hybridization protocol for enhanced detection of gene expression in the planarian Schmidtea mediterranea. BMC Dev Biol. 2013; 13(1):8.
Cebrià F, Newmark P. Planarian homologs of netrin and netrin receptor are required for proper regeneration of the central nervous system and the maintenance of nervous system architecture. Development. 2005; 132(16):3691–703.
Cebrià F. Organization of the nervous system in the model planarian Schmidtea mediterranea: an immunocytochemical study. Neurosci Res. 2008; 61(4):375–84.
DGE libraries were produced by Skuldtech under supervision of David Piquemal. We thank Mireille Galloni (University of Montpellier 2 and CNRS) for sharing the DGE libraries of irradiated organisms before they were publicly available. Further thanks to Jaume Comas and Ricard Álvarez (Scientific and Technological Centers of the University of Barcelona) for their excellent technical assistance at the cytometry facilities. We are especially grateful to Luca Gentile and Sören Moritz (Max-Planck-Institut für molekulare Biomedizin) for the cell dissociation protocol and their inestimable advice. We also thank all the members of the Emili Saló and Francesc Cebrià labs for encouragement and discussion, and Enrique Blanco and Iain K. Patten for critical comments on the manuscript. We want to acknowledge the helpful comments made by the referees in order to improve the final manuscript version. This work was funded by grants BFU2008-01544 and BFU2011-22786 (MICINN), and 2009SGR-1018 and SGR2014-687 (AGAUR) to ES and BFU2009-09102 (MICINN) to JFA. GRE and JRL were supported by FPI (MICINN and MECC respectively), and AGS by FPU (MECD) fellowships. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of this manuscript.
The authors declare that they have no competing interests.
ES, GRE and JFA conceived the project. GRE prepared the cell fractions for the DGE sequencing, and did the screening of the selected genes. AGS and JIR carried out the experimental characterization of the Smed-meis-like and Smed-nf-Y genes respectively. GRE and JFA performed the computational analyses and set up the web material. GRE drafted the manuscript with contributions from all authors. All authors read and approved the final manuscript.
observed. The sensitivity of the cells in these populations to irradiation responds to their composition of neoblasts in different stages of the cell cycle and distinct levels of determination: X1, proliferating stem cells in S/G2/M, and X2, stem cell progeny and proliferating neoblasts in G0/G1. Neoblasts are the only proliferating cells in this organism.
Distribution of mapped and orphan tags by number of occurrences. A - Venn diagrams showing the tags overlap between the three cell populations, by occurrence (top), and by significative p-value, (p < 0.05, bottom). The number of mapping tags is detailed in italics. B - Frequency distribution of tags grouped by its number of occurrences, i.e., sequencing events, in all libraries. Tags detected in a low copy number are prone to be produced by sequencing errors—likely from more abundant tags. As can be appreciated, most of the tags with less than five occurrences do not map over any of the reference data sets, suggesting that those tags are less reliable , which is in agreement with the results of the randomization simulations (see the text and Additional file 3). Due to that, tags detected less than five times were discarded in further analysis.
Randomization simulations. Number of tags mapped over the randomized reference data sets, and vice versa, at different occurrences cutoffs. When compared with the theoretical number of matches expected by chance, this facilitates the assessment of the minimum number of counts for a tag to be considered reliable.
X1 and X2 in irradiated animals. Venn diagram showing the overlap between the results presented here and the DGE study conducted over irradiated planarians of the same clonal line by Galloni . The number of mapping tags out of the total is detailed in italics. It can easily be appreciated that most of the tags present in X1 and X2 are not detected by the irradiation approach.
Tags potentially mapping in the 3’-UTR regions. Y-axis represents the number of tags (tag counts) per nucleotide genomic position. The sequenced DGE tags were then classified in two groups: those mapping within the genomic region delimited by the transcript exons (green area), and those mapping outside (blue area). As position 0 depicts the last nucleotide for all the transcripts, we can only observe green marks upstream; blue marks can distribute across all the downstream region too. Background is defined by all those genomic CATG target sequences that do not match to any of the sequenced DGE tags (red areas). Dashed line depicts the average value for the downstream background tag counts.
Bar plots of the GO significant terms for different comparisons among X1, X2 and Xin annotation sets. Each panel presents a list of the significant functional annotations (p < 10-5, hypergeometric test), along with the corresponding GO code, that are over- or under-represented (computed as log-odds of the term abundance by sequence set) on each of the three ontology domains (Biological Processes, BP; Molecular Functions, MF; and Cellular Components, CC). Bar plots compare results obtained when considering the following four non-overlapping sets: X1-only (red bars), X2-only (green bars), the intersection between X1 and X2 not in Xin (orange), and Xin-only (blue bars). Bars color-filling is proportional to the p-value for the given GO code, thus darker colors corresponds to smaller p-values (all below the significant threshold anyway). A Venn diagram on top of each page represents the comparison made among the fraction sets.
Genes involved in stemness, regeneration or tissue homeostasis overexpressed in neoblasts and their progeny. DGE expression of clones reported in two experimental high-throughput screenings by Reddien  and Wenemoser  related to regeneration, stemness or tissue homeostasis identified as being overexpressed in neoblasts (p < 0.001).
Whole mount in situ hybridization of new neoblast genes. Expression by WISH of new neoblast genes in control (left panel) and irradiated planarians (right panel). 38 out of the 42 genes tested are presented here. The remaining four are characterized in Figures 6A and 7A. Time after irradiation in days is shown in the top right corner for each gene. As expected for neoblast genes, expression is reduced or disappears after irradiation.
RNA interference of new neoblast genes showing defects in regeneration. The stronger and most representative phenotype obtained after RNAi for those new neoblast genes producing aberrant regeneration after head and tail ablation. Days of regeneration and round of injection in superscript, and number of individuals affected with respect to the total are shown in the top right and bottom right corners of each panel. All pictures are dorsal except Smed-rbbp4-4, which illustrates the typical ventral curling of dying animals. The inhibition of most of the genes completely prevented the formation of the blastema. For those cases in which a small blastema was allowed to develop, a detail of the anterior part is shown to appreciate the defective head and eyes. For a regenerating control animal see Figures 6C and 7C.
Literature review of the new neoblast genes presented in this study. A description is provided for each one of the new neoblast genes proposed in this study (summarized in Table 4) based on the literature about their homologs in other species.
Double fluorescence in situ hybridization of Smed-h2b with Smed-meis-like . A - Double FISH of Smed-meis-like together with the neoblast marker Smed-h2b shows colocalization of both genes, demonstrating the expression of Smed-meis-like in neoblasts. Expression is also detected in differentiated cells. B - The pan-neural marker α-SYNAPSIN shows the different penetrance of phenotypes of Smed-meis-like(RNAi).
Double fluorescence in situ hybridization of Smed-h2b with Smed-nf-YA , Smed-nf-YB-2 , and Smed-nf-YC . A - Double FISH of Smed-nf-YA, Smed-nf-YB-2, and Smed-nf-YC shows colocalization of the NF-Y subunits with the neoblast marker Smed-h2b, corroborating the expression of this complex in neoblasts. Expression is also detected in differentiated cells. B - The pan-neural marker α-SYNAPSIN shows reduced cephalic ganglia of RNAi animals compared with gfp controls.
Splashplot projection of the X1/X2 versus Xin expression changes of upregulated contigs by cell population. A - Splashplot for overrepresented contigs in the three cell fractions X1, X2 and Xin according to our DGE data over the Smed454 transcriptome . B - Same representation using the data published by Labbé . Both plots show a similar composition, revealing a low number of transcripts overexpressed specifically in X2/progeny cells.
Pearson and Spearman correlations of the normalized expression levels among X1, X2 and Xin. Diagonal panels show violin plots with the distribution of the normalized expression levels for each of the three cell populations data sets. Panels on the upper diagonal summarize both Pearson (parametric) and Spearman (non-parametric) correlations, along with the p-values and the linear regression model estimates for the pairwise comparison between data sets. On the bottom diagonal panels, for each pair of cell fractions the scatterplots show differences in expression for each DGE tag. Blue dotted line is defined by the intercept and slope values for the linear regression model presented on the corresponding upper panel, confidence interval is drawn as a grey shadow along that regression line. Those tags having a normalized expression value of zero in one or both of the cell types, when considering each pair-wise comparisons, were removed before computing correlations and for the plots. One can notice that X1 and X2 are the more correlated pair of cell fractions, then X2 and Xin, and finally X1 and Xin. Those results match to what would be expected.
About this article
Cite this article
Rodríguez-Esteban, G., González-Sastre, A., Rojo-Laguna, J.I. et al. Digital gene expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea . BMC Genomics 16, 361 (2015). https://doi.org/10.1186/s12864-015-1533-1
- Stem cell
- Transcription factor