The planarian regeneration transcriptome reveals a shared but temporally shifted regulatory program between opposing head and tail scenarios

Background Planarians can regenerate entire animals from a small fragment of the body. The regenerating fragment is able to create new tissues and remodel existing tissues to form a complete animal. Thus different fragments with very different starting components eventually converge on the same solution. In this study, we performed an extensive RNA-seq time-course on regenerating head and tail fragments to observe the differences and similarities of the transcriptional landscape between head and tail fragments during regeneration. Results We have consolidated existing transcriptomic data for S. mediterranea to generate a high confidence set of transcripts for use in genome wide expression studies. We performed a RNA-seq time-course on regenerating head and tail fragments from 0 hours to 3 days. We found that the transcriptome profiles of head and tail regeneration were very different at the start of regeneration; however, an unexpected convergence of transcriptional profiles occurred at 48 hours when head and tail fragments are still morphologically distinct. By comparing differentially expressed transcripts at various time-points, we revealed that this divergence/convergence pattern is caused by a shared regulatory program that runs early in heads and later in tails. Additionally, we also performed RNA-seq on smed-prep(RNAi) tail fragments which ultimately fail to regenerate anterior structures. We find the gene regulation program in response to smed-prep(RNAi) to display the opposite regulatory trend compared to the previously mentioned share regulatory program during regeneration. Using annotation data and comparative approaches, we also identified a set of approximately 4,800 triclad specific transcripts that were enriched amongst the genes displaying differential expression during the regeneration time-course. Conclusion The regeneration transcriptome of head and tail regeneration provides us with a rich resource for investigating the global expression changes that occurs during regeneration. We show that very different regenerative scenarios utilize a shared core regenerative program. Furthermore, our consolidated transcriptome and annotations allowed us to identity triclad specific transcripts that are enriched within this core regulatory program. Our data support the hypothesis that both conserved aspects of animal developmental programs and recent evolutionarily innovations work in concert to control regeneration. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-14-797) contains supplementary material, which is available to authorized users.


Background
Understanding how we might replace damaged and diseased tissue through the use of stem cell based therapies is an important goal for biomedical science. Despite natural occurrences that occur across the Animal Kingdom and work in a growing number of systems, regeneration is still poorly understood. Model systems are starting to shed light on the molecular processes that orchestrate various regenerative phenomena [1,2]. Among the various systems, planarians have a distinct advantage of being able to regenerate entire animals from small starting fragments [3][4][5][6][7][8]. To what extent this is due to novel mechanisms unique to planarians is unknown. The simple anatomy and highly accessible adult stem cell system of planarian flatworms make it a high value model system from which we can hope to form valuable paradigms for how regeneration is controlled.
Regenerative potential in these animals is dependent on a population of cycling pluripotent adult stem cells present throughout the parenchyma, except in the area in front of the photoreceptors and the region of the pharynx [3][4][5][6][7]. On wounding or amputation these adult stem cells undergo two characteristic peaks of cell division that produce stem cell progeny and can subsequently differentiate to replace missing or damaged tissue [9]. The amenability of planarians, and in particular the planarian Schmidtea mediterranea to functional genomic approaches in combination with advances in sequencing technology has produced several gene expression and transcriptomic studies on head regeneration [10], wounding [9], and neoblast dynamics [11][12][13][14].
Planarians are able to reconstitute the full adult body plan from very different starting scenarios. For example tail stump pieces will regenerate a new head, head pieces a new tail and trunk fragments both a head and a tail. In addition to all these potentially different regenerative scenarios, the planarian must also rescale and remodel the morphology of pre-existing tissues and organs to fit the size of the animals and to ensure sufficient functional integration [15]. Thus from very different beginnings all fragments converge to the same end point. This means that any starting fragment contains the information required to reconstitute the whole adult body. While we already know a little about some of the key events in this process, for example the signalling pathways that are required to ensure the correct polarity along the different axes of regenerating pieces, we still lack an understanding of how these events fit together globally [7,8,[16][17][18][19].
In this study we have amalgamated existing transcriptomes [10,13,14,20,21] to provide an improved resource as a service to the research community. This exercise includes re-annotation of transcripts, removal of likely chimeric transcripts and characterisation of transcripts that code for proteins novel to the phylum Platyhelminthes and/or the intensely studied Triclad group of planarians.
Using this new meta-transcriptome assembly we looked to investigate the potential for genome wide expression analysis for understanding differences and similarities in the regulatory program that underpins different regenerative scenarios. We investigated head and tail regeneration of the planarian Schmidtea mediterranea from 0 to 72 hours after amputation. Using this wealth of data, we were able to describe patterns of gene expression levels during regeneration across the whole transcriptome and perform comparisons between regeneration time-points and scenarios. This allowed us to describe the transcriptional changes that reflect key regulatory transitions during the regenerative process.
We found that head regeneration (head fragment regenerating tail) and tail regeneration (tail fragment regenerating head) transcriptomes initially reflect the differences in cellular content at the beginning of regeneration. They then diverge further over the first 12 hours of regeneration. However, we observed an unexpected convergence of expression profiles by 48 hours of regeneration between these two contrasting scenarios. This divergent/convergent pattern was underpinned by a core battery of more than 5,000 genes that are regulated in the same manner during between 6-12 hours of head regeneration and 36-48 hours of tail regeneration.
Both to internally validate our data at the genome wide level and to further define those genes that can be associated specifically with anterior regeneration we performed RNA-seq in the background of the well characterised Smed-prep(RNAi) phenotype, which specifically results in the loss of anterior structures [22]. This allowed the identification of putative direct and indirect targets of Smed-prep which were enriched among those genes we found to be involved in the shared regulatory transition.
From blast annotations against selected species, we were able to define lists of genes that are potentially unique to S. mediterranea, unique within the tricladida [23], and unique to the phylum Platyhelminthes. We found that differentially expressed transcripts during regeneration were enriched for triclad specific transcripts. These transcripts are potentially involved in novel mechanisms underpinning potent regenerative capacity, and suggest that as has been recently suggested in urodeles, some important aspects of regeneration may be lineage specific [24,25]. Our data provide new insight into the regulatory logic of regeneration, and act as a reference point against which to advance our understanding of the regulatory control of regeneration.

Consolidation of the available transcriptome data
There are currently five independently assembled S mediterranea transcriptomes with sufficient read depth and coverage to have aspirations to providing whole transcriptome coverage [10,13,14,20,21]. It is unclear in the literature to what extent, if at all, later assemblies have used data from earlier sources. We decided against a reassembly from the raw sequencing data that went into the 5 independent assemblies due to the varying error profiles of the raw data from different library preparation and sequencing chemistry. In addition, not all data was readily available and/or described fully in the repositories. We gathered the available transcriptome data from the 5 relevant publications (we refer to transcriptome datasets by the group leader's last name) along with the available EST datasets and performed a consolidation of the transcripts. We chose to include only 5 of the 6 available transcriptomes because the dataset provided by Abril et al. contained a high number of contigs (~192,000) suggesting a highly fragmented assembly [26]. We did not want to introduce more variability into the consolidation.
The consolidation process seeks to retain a high confidence set of transcripts, resolve transcript fusion events, and retain transcripts with the longest open reading frames. We first clustered transcripts from all 6 datasets by sequence similarity using CAP3 assembler [27]. Each assembled contig can be represented as a cluster with contributing transcripts from one of the 6 data sources. We kept only clusters that had transcripts from at least 2 different sources to ensure a high confidence set of transcripts resulting in 23,802 clusters. We then removed potential fusion transcripts from each cluster by analysing the position of top blast hits along the cluster length to complete proteome sets. Removal of potential fusion transcripts split 441 clusters into 1,014 clusters. To retain the sequence with the longest ORF for each cluster, we took the transcript or CAP3 contig with the longest ORF in each cluster.
To make sure we are including known S. mediterranea transcripts that might not be in the 5 transcriptomes due to low expression, we blasted the consolidated transcripts to 179 known S. mediterranea mRNA sequences. 16 transcripts were not found in the consolidated transcriptome including genes with very low expression levels or very localised expression patterns (e.g. Smed-wnt-1, Smed-noggin-like 4, Smed-noggin-like 6) and many neuropeptide pro-hormones with restricted expression patterns [16,28,29]. These data are indicative that genes with expression restricted to small populations of cells may well have escaped current sequencing efforts.
A detailed description of the process is available as a Additional files 1, 2, 3, 4, 5, 6, 7, 8 containing all relevant scripts and details. The consolidated transcriptome contains 23,545 transcripts of which, 2,859 are unmodified transcripts taken from one of the transcriptome sources and the remaining 20,686 transcripts are assembled CAP3 contigs. A comparison of average transcript length and N50 length to the 5 transcriptomes shows a substantial length increase with respect to both transcript length and open reading frame (ORF) length ( Figure 1A and B). We  assessed transcriptome coverage by blasting each transcriptome to the Core Eukaryotic Gene Mapping Approach (CEGMA) database [30] which contains a core set of genes found in a wide range of eukaryotic organisms. The consolidated transcriptome had hits to 449 CEGMA genes out of a possible 458 and a significantly larger ortholog hit ratio of 0.93 compared to the 5 transcriptomes (Table 1).
Overall our analyses show that a simple consolidation of the extant published data provides an improved high confidence S. mediterranea transcriptome with respect to representation, total length and coding potential. This will be of significance for future genome wide expression analyses exploiting this important model system.

A gene expression time-course of anterior and posterior regeneration
Planarian regeneration is able to confidently restore whole individuals, with all organs scaled to the correct size from any starting piece [15]. Thus from very different beginnings, the same end result is obtained. In the first instance we wished to understand how this process is reflected in whole transcriptome gene expression changes. One would expect early expression profiles to be very different depending on the cell and tissue contents, for example brain and neural tissues in the head versus gut tissues in the tail. The expression profiles should eventually converge as the missing tissues are regenerated.
Our goal was to describe these trajectories from different starting scenarios to obtain an overview of which genes are differentially expressed during the first 72 hours of regeneration. We hypothesised that differentially expressed genes would represent the unique regulatory solutions each regenerative scenario uses to arrive at the reconstitution of a whole animal. To this end we performed transcriptome sequencing on regenerating head and tail fragments at 0, 6, 12, 24, 36, 48 and 72 hours after amputation.
In total, 514,384,160 reads were mapped to the transcriptome across replicate samples at each regeneration time-point in each of the two scenarios. On average, the correlation between replicate samples was 0.99. While having 2 replicates for each time-point/fragment is not ideal for modelling the variance encountered in RNA-seq, our sample preparation of including multiple individuals (20 fragments in each library) does offer some variance stabilization through biological averaging. Having more replicates would add more resolution to our individual transcript expression profiles, but for the purpose of observing global trends in expression, we resorted to statistical optimizations of filtering our dataset more stringently and setting a higher adjusted p-value threshold.
We filtered our raw count data for transcripts that had less than 20 reads mapping in all libraries, leaving us with 15,423 transcripts for differential expression analysis. We performed differential expression analysis with EdgeR [31] on pairs of consecutive time-points (0-6 hours, 6-12 hours, 12-24 hours…etc) and also on pairs of fragments at the same time-point (0 hour head vs 0 hour tail, 6 hour head vs 6 hour tail…etc). The significance threshold for differential expression was set at p-value of 0.01 (Figure 2A-D).
The largest number of differentially expressed transcripts during head and tail regeneration are 3,228 transcripts down-regulated from 6-12 hours in heads and 5,646 transcripts down-regulated from 36-48 hours in tails ( Figure 2A and B). We will refer to lists of differentially expressed transcripts by their abbreviated fragment type and time-period. For example, transcripts downregulated from 6 to 12 hours in head fragments will be abbreviated as H6-12-down. We also observed that between fragments there is considerable increase in differentially up-regulated transcripts in both head and tail fragments between 12 and 36 hours. ( Figure 2C). This increase hints at a divergence of expression profiles between head and tail regeneration starting at 12 hours and ending after 36 hours. This divergence may be representative of a differential program utilized by head and tail fragments. At higher fold-changes, there are also more transcripts up-regulated in tail fragments compared to head fragments between 12 and 36 hours suggesting that tail fragments undergoes a more drastic expression regulation during the divergence than head fragments. A possible reason for this may be that tail fragments need to regenerate a brain which contains a rich population of genes and isoforms.
Overall our data set describes the transcriptome of head and tail fragments during the first 72 hours of regeneration. This time-course data reflects the processes of regenerating new tissues in addition to remodelling existing tissue since whole fragments were used instead of just the regenerating blastema. This dataset presents a valuable resource for data mining the transcriptional behaviour of planarians genes during regeneration. Head and tail fragment enriched transcripts implicates genes involved in early anterior and posterior regeneration As a simple validation of our expression data, we looked for head and tail enriched transcripts by comparing head and tail fragments at 0 hours, immediately after amputation. We were able to find many known anterior markers (arrestin [32], opsin [33], tyrosinase [34], eyes absent [35], wnt2-1 [16], otxA [36], six1 [35], prep [22]) and posterior markers (wnt11-1, wnt11-2 [37], AbdBa [38], hox-D [36], axin [39]) enriched in head and tail piecess respectively (Tables 2 and 3). We also find many membrane voltage gated channels and neurotransmitter receptors involved in nervous system function, metalloproteinases, and several homeobox genes (NK-1, cutlike, lim, orthopedia, vsx-1) in head enriched transcripts. We will refer to head and tail enriched transcripts at the beginning of regeneration as F-head and F-tail (Table 4). We expected head and tail enriched transcripts to be representative of the existing anterior/posterior tissues. F-head and F-tail should be consistently up-regulated in their respective fragments until remodelling of the existing tissues occurs. We looked at the expression profile of these head and tail enriched transcripts during regeneration and found F-head transcripts to be consistently up-regulated in heads compared to tail fragments at the same time-points suggesting remodelling does not occur within the first 72 hours of head regeneration. In contrast, for tail enriched transcripts, there is a down-regulation in tail fragments at 48 hours ( Figure 3) suggesting remodelling is taking place at this time.
There is also a slight up-regulation of head enriched transcripts during early tail regeneration that might reflect the development of the early brain structure shown previously to occur in anterior facing wounds [39,47]. We found 277 transcripts out of a total 1,193 head enriched transcripts are up-regulated in tail fragments at 24 hours compared to 0 hours. Among these are several S. mediterranea transcripts involved in anterior development (smed-prep [22], smed-ndk [48], smed-six [34]) and transcripts known to be expressed anteriorly (smed-wnt2-1 [37], smed-sfrp [41]) ( Table 5). This group of 277 genes could be important in the early regeneration of anterior stuctures. Interestingly there are also several metalloproteinases (MMP) which have been implicated in neural tissue remodelling and cell migration in this list. However, early brain structures have previously been shown to develop in the blastema, not existing tissues and we also do not observe remodelling until 48 hours suggesting the function of MMPs at the stage is perhaps confined only to the blastema tissues or a small portion of the existing tissue.   [46] A Shared regulatory program between head and tail regeneration We performed a hierarchical clustering of expression profiles using correlation distance on all libraries to observe when tail and head regeneration diverges (Figure 4). The clustering generated two separate groups of expression profiles (red, yellow) from 12 hours in heads and 6 hours in tails indicative of an early divergence in expression profiles. Surprisingly, we observed a subsequent convergence of expression profiles at 48 hours indicated by the blue cluster, grouping heads and tail fragments at 48 and 72 hours. This convergence suggest both head and tail fragments reach a similar regenerative state at the transcriptome level as early as 48 hours. At this time point tails will have just formed cephalic ganglia tissue and started to form the beginnings of the photoreceptors [34,47]. Nonetheless, we found this convergence surprising, as head and tail fragments are still morphologically distinct at this early stage. Together this suggested to us that much of the similarity of expression levels might represent underlying gene regulatory and cellular behaviours rather than the formation of equivalent tissues.
To investigate the convergence further we looked for shared batteries of genes between differentially regulated genes at each time point ( Figure 5). The most significant overlapping regulatory programs with a hypergeometric p-value of effectively 0 were observed for 2 differentially expressed lists: H6-12-down and T36-48-down ( Figure 6A), H6-12-up and T36-48-up ( Figure 6C). We conclude that the H6-12 regulatory transition is primarily responsible for the early divergence of expression profiles between head and tail fragments and the subsequent convergence at 48 hours is caused by tails going through a very similar regulatory transition during T36-48 ( Figure 6B and D). We will refer to the shared regulatory program between H6-12 and T36-48 as O and we will refer the shared up/down-regulation program as O-up and O-down.
O-up and O-down both represent approximately one third of transcripts assessed for differential expression. We found 69 potential transcription factors out of a total 467 potential transcription factors in our data set were present in O-down and 29 in O-up. Known S. mediterranea transcript factors found in O-down include smed-prep, smed-dlx, smed-gata, smed-prox1, and smedsix. In O-up, we found smed-junli, smed-tcf15, smed-e2f-like.
There are more transcripts up/down-regulated in T36-48 compared to H6-12. 91% of the transcripts in H6-12-down overlap with 52% of transcripts in T36-48down and 81.7% of H6-12-up transcripts overlap with 49% of T36-48-up transcripts meaning there are 2,696 and 2,677 transcripts that are exclusively regulated (not shared with H6-12) in T36-48-down and T36-48-up. While these transcripts were not found to be differentially expressed at H6-12, they still conform to the convergence of expression profile at 48 hours (Figure 7). Instead of being sharply regulated during H6-12, these transcripts gradually up/down-regulate during head regeneration to eventually match tail expression level at 48 hours.
Together our analysis reveals a previously unknown shared regulatory program between two very different regenerative scenarios. This program includes a large proportion of the genome and runs at different times and scenarios. This gives us for the first time insight into how whole body regeneration is regulated and suggests H6-12-down -Transcripts down-regulated from 6 to 12 hours in regenerating head fragments H6-12-up -Transcripts up-regulated from 6-12 hours in regenerating head fragments.
T Up and down-regulated transcripts in tail fragments. Time-period is indicated after the abbreviation. that different regenerative scenarios may use a core regenerative program that is activated after scenario specific events have occurred. Future work investigating an even wider set of regenerative scenarios will test this model, but the use of shared program between the opposite scenarios investigates here is strongly suggestive this is the case. This program is likely to represent key shared events, such as elaboration of axial fates, replacement of major tissues such as the gut [49], excretory system and nervous system and re-establishment of the stem cell and stem cell progeny populations [50], and remodelling of existing tissues to their correct proportions.

Smed-prep(RNAi) disrupts the expression of transcripts found in the two major regulatory events during regeneration
Smed-prep is a TALE class homeodomain gene that has been found to be required for anterior fate and patterning during regeneration [22]. Upon RNAi knock-down of smed-prep, regenerating tail fragments fail to develop a discernible anterior compartment. In order to provide a genome wide validation of our time-course dataset and to investigate possible down-stream genes regulated directly or indirectly by smed-prep, we performed RNA-seq on tail fragments of smed-prep(RNAi) animals 24 hours after amputation along with GFP dsRNA injected controls at the same time-point. We will refer to this 24 hour tail fragment comparison between gfp and smed-prep(RNAi) animals as P. Transcripts down and up regulated in response to smed-prep(RNAi) will be referred to as P-down and P-up. We generated two lists of differentially expressed transcripts for P-down and P-up at a fold-change of 2 or more. There are 1,236 transcripts in P-up and 591 in P-down. The larger number of transcripts up-regulated in response to smed-prep(RNAi) versus down-regulation  suggest a direct or indirect transcriptional repressive role of smed-prep. As a validation of our smed-prep(RNAi) data, we looked at whether F-head transcripts are affected by smed-prep(RNAi). We found 51 F-head transcripts in Pdown and 47 in P-up. While there are some voltage gated channels and neurotransmitter transcripts in these lists, no known S. mediterranea transcripts were found and only two transcription factors were found in both P-down and F-head (an achaete-scute homolog and zinc finger protein). This result was to be expected as F-head transcripts are not up-regulated in tail fragments during the regeneration time-course. There was also no significant enrichment of the F-head transcripts that were up-regulated in tail regeneration at 24 hours in either P-down or P-up. This suggests that smed-prep is not involved in early brain development, in agreement with previous work that has investigated early brain regeneration [47].
We looked at the expression profiles of P during regeneration and found that 330 P-down transcripts are in T36-48-up and 878 P-up transcripts are in T36-48-down ( Figure 8) with significant enrichment (5.5e-41 and 3.4e-185). While P-up overlapped significantly with O-down, P-down overlapped with transcripts up-regulated in T36-48 exclusively (and not in H6-12-up) (Figure 8). The P-up data is showing that smed-prep is possibly playing a direct or indirect repressive role which facilitates O-down. Since the phenotype of smed-prep(RNAi) is loss of the anterior structures, we can reasonably assume that some of P-down is involved in anterior development. The observation that P-down seem to only  Tail 00h Tail 06h  Tail 00h Tail 06h  Tail 06h Tail 12h  Tail 06h Tail 12h  Tail 12h Tail 24h  Tail 12h Tail 24h  Tail 24h Tail 36h  Tail 24h Tail 36h  Tail 36h Tail 48h  Tail 36h Tail 48h  Tail 48h Tail 72h  Tail 48h Tail Figure 5 Overlapping transcripts between sets of differentially expressed transcripts. A heatmap displaying the number of overlapping transcripts between various differential expression lists. Differential expression is defined here as p-value < 0.01 and fold-change > 2. Both upper left and lower right triangles display the same information, but in different metrics. The upper left triangle (red to blue gradient) shows the percentage overlaps between two differential expression lists. Percentage overlap is calculated as the average of the two overlap percentages generated by dividing number of share transcripts to each of the two differential expression list totals. The lower right triangle (yellow to black) shows the absolute numbers of overlaps between two differential expression lists.
relate to T36-48-up exclusively suggest T36-48 is when smed-prep is up-regulating these anterior developmental transcripts; whereas it was not necessary to do so in head fragments since anterior structures already exists. The expression profile of P-up and P-down during regeneration allowed us to observe the differential role of smed-prep in both head and tail fragments. In head fragments, smed-prep plays a repressive role during the early major regulatory transition in heads. In tail fragments, in addition to also playing the same repressive role during the later major regulatory transition, it activates the anterior structure regeneration program.
We categorized the transcripts based on species hits to generate lists of transcripts that were potentially S. mediterranea specific, triclad specific, and platyhelminth specific (Figure 9). We performed this analysis on increasing e-value strictness and found 2,932 potentially S. mediterranea specific transcripts, 4,825 triclad specific transcripts, and 4,949 platyhelminthes specific transcripts at the highest strictness level (Table 6).
In addition to blast annotations, we also performed protein domain predictions on the transcriptome using models from the PFAM database [53] resulting in 13,217 transcripts with domain annotations. Using this domain information, we were able to look at the composition of domains within the strict platyhelminth and triclad specific lists of transcripts.
Within the platyhelminthes specific transcripts, 289 transcripts had pfam annotations. The low number of domain annotations probably reflects the heavy bias of known protein domains towards nematode, insect and vertebrate systems. We found that the most abundantly represented domain in platyhelminth specific list was the 7 transmembrane receptor domain (7tm) of the rhodopsin family (PF00001) found in 13 transcripts. This agrees with a previous study which catalogued the repertoire of G protein-coupled receptors (GPCR) in S. mediterranea and found a large expansion of platyhelminth specific GPCR of the rhodopsin subfamily [54].
Within the triclad specific transcripts, 262 transcripts had pfam annotations with the ubiquitin domain (PF00240) being most represented in 12 transcripts. The ubiquitin protease system (UPS) is the main cellular proteolytic mechanism that uses ubiquitin to tag and target proteins for degradation. Several studies have implicated UPS in drosophila development [55] and regeneration in various systems [56][57][58] making UPS a potential target for further research.
We also looked at the expression profile of triclad specific transcripts during regeneration and found there was significant enrichment in differentially expressed transcripts (p-value < 0.01 and fold-change > 2) during both head and tail regeneration (Figure 10, Tables 7 and 8).
Together our data identify a large set of potentially novel and/or rapidly evolving genes that are clearly differentially expressed during regeneration. Previous studies in planarians and other regenerative models have highlighted a role of conserved genes, known from studies of development, as key regulators of regeneration. More recent genome wide studies using transcriptomics have revealed that lineage specific genes may also be important and require study [24,25]. Our data also support that this may be the case during planarian regeneration. In particular we uncover an enrichment of novel genes during a regulatory transition that is shared between different regenerative scenarios. Our data suggest that the potent regenerative capacity in planarians may be partly due to novel mechanisms conserved within the highly regenerative triclad clade and pave the way for functional study of these genes.

Conclusion
In this study we have generated a consolidated transcriptome from 5 independently assembled transcriptomes and available ESTs. This consolidated dataset represents a high confidence set of transcripts providing a valuable resource for future expression studies. Our regeneration transcriptome consisting of regenerating head and tail fragments from 0 to 3 days reveals a shared regulatory program consisting of over 5,000 transcripts active at temporally shifted time-periods between regenerating head (6-12 hours) and in tail fragments (36-48 hours). Additional RNA-seq experiments on smed-prep(RNAi) animals versus control tail fragments allowed us to find transcripts that are regulated differentially in response to smed-prep. We observed that these smed-prep response transcripts are enriched during the shared regulatory program during regeneration suggesting an involvement in brain regeneration. We also performed BLAST alignment to 15 species across eukaryotes to identify novel or divergent genes and found lists of S. mediterranea, triclad, and platyhelminth specific transcripts. Triclad specific transcripts are found to be enriched in differentially expressed transcripts throughout regeneration suggesting novel mechanisms may contribute to the animal's potent regenerative capacity.

Animal culture
A clonal line of the asexual strain of Schmidtea mediterranea, AAANOTBIOL01, was used for all experiments. Animals were reared at 20°C in tap water filtered through activated charcoal and buffered with 0.5 ml/L 1 M NaHCO3. Planarians were fed veal liver and starved for at least one week prior to experiments or amputation. All worms used were 7-8 mm in length. The animals used in these experiments do not require approval from the ethical committee.

Smed-prep(RNAi) for RNAseq
RNAi for Smed-prep was perfomed by two rounds of injection as previously described [22].

Preparation of RNA form a regenerating timecourse
Regenerating head and tail fragments from 20 worms at each timepoint were collected and snap frozen at 6, 12, 24, 36, 48, and 72 hours of anterior and posterior regeneration in two replicate samples for total RNA preparation (Trizol). RNA was also prepared from Smed-prep (RNAi) regenerating tails at 24 hours. Total RNA was also prepared and pooled from a regenerative time course of a sexual strain of G. tigrina. 454 sequencing and assembly was performed from G. tigrina total RNA as previously described [20].

Transcriptome consolidation
Detailed information on the transcriptome consolidation process is included in the Additional files 1, 2, 3, 4, 5, 6, 7, 8. tricladida, and S. mediterranea Figure 9 Transcripts specific to platyhelminth, tricladida, and S. mediterranea. 15 proteomes/transcriptomes across the animal kingdom were used to blast against the consolidated transcriptome. Several different e-value thresholds were used to generate lists of platyhelminth, tricladida, and S. mediterranea specific transcripts. Triclad specific No hit to non-triclads at e-value < 1e-5 and hit to only triclads at e-value < 1e-5:

Platyhelminthes specific
No hit to non-platyelminth at e-value < 1e-5 and hit to only platyelminthes at e-value < 1e-5: 6,912 No hit to non-platyelminth at e-value < 1e-5 and hit to only platyelminthes at e-value < 1e-10: 5,594 No hit to non-platyelminth at e-value < 1e-5 and hit to only platyelminthes at e-value < 1e-15: 4,949 Homo sapiens, Mus musculus, Schistosoma mansoni, Clonorchis sinensis proteomes and TBLASTX was performed against Girardia tigrina transcriptome. A constant database size of total base pairs of all sequences was used to ensure comparable e-values. PFAM annotations was performed with HMMScan using the PFAM-A database. The gathering cut-off threshold was used for HMMScan. Transcription factors were identified using manually curated PFAM domain profiles from DBD: Transcription factor prediction database.

Read mapping and differential expression analysis
Reads were mapped with the ABI LifeScope software's single fragment mapping module. Uniquely mapped reads were counted for each transcript on both strands with HTSeq-count [59] using a mapping quality filter of 30. Outliers were filtered out by removing transcripts with tag counts that were more than 1% of the total library in at least 3 sequenced libraries. Transcripts with less than 20 reads in all libraries were also removed. Normalized counts and differential analysis was performed with EdgeR [31]. The generalized linear model function of edgeR was used across all the libraries. Read mapping and tag counting for smed-prep (RNAi) RNAseq samples was performed the same way as the regeneration time-course. Outliers were also removed based on more than 1% of total reads in at least 2 libraries. Transcripts with less than 50 reads were removed from all libraries. Since there were no replicates for these samples, we did not use edgeR to calculate differential expression as EdgeR requires replicates to effectively model the dispersion among libraries of the same conditions. We instead relied on a high filter of 50 reads for lowly expressed transcripts and a foldchange threshold of at least 2 for assessing differential expression.