Genome size measurement
The genome sizes of five naid species were estimated using the Feulgen image analysis densitometry method . Individuals were obtained from laboratory cultures of the following species: P. leidyi (Carolina Biological Supply Company), Allonais paraguayensis (Wards Natural Science), Dero digitata (originally collected from Edwards Lake, University of Maryland at College Park, USA), Dero furcata (Connecticut Valley Biological Supply), and Paranais litoralis (originally collected from Herrington Bay, Deale, MD, USA). Fifty or more nuclei were measured from each sample. The Integrated Optical Density of the sample was converted to a genome size value (in picograms) using Gallus gallus domesticus (1.25 pg) as a standard.
Worm culture, sampling, and RNA extraction
To generate material for this sequencing effort, we established twelve replicate lab cultures of a single clonal line of Pristina leidyi (PRIle(cbs)cloneA). Each culture was initially started with 100 worms and was maintained at room temperature in 20 cm glass bowls filled with ~1 liter of commercially purchased Poland Spring Water (PSW). To ensure purity of the samples, worm cultures were rinsed frequently to remove algae and debris and cultures were routinely inspected visually for the presence of small metazoans (e.g. rotifers). Worms were fed dried Spirulina powder twice weekly, and water was changed at least 1-2 times per week.
Possible contamination by the dried Spirulina food source was assayed via PCR, using cDNA samples derived from live Arthrospira platensis (Spirulina) as a reference. RNA was extracted using TRIReagent (Applied Biosystems), and cDNA was constructed using random oligos and Superscript III reverse transcriptase (Invitrogen). Primers were constructed against the large subunit of rubisco (rbcL) [GenBank:AY147205.1] and c-phycocyanin (cpc) [GenBank:AF164139.1] genes of A. platensis (Additional file 5).
Worms were collected at a range of stages of regeneration and fission (Figure 1). For the fission material, 1,000 worms that were actively growing and fissioning were collected and starved for 24 hours. To generate regenerating material, 3,485 worms were amputated anteriorly and posteriorly and allowed to regenerate for various lengths of time before collection. Most worms were actively fissioning and consisted of chains of linked zooids at time of amputation. A cut was made 2 body segments anterior to the most anterior fission zone to elicit posterior regeneration. A second cut was made after the 6th body segment of the most posterior zooid to elicit anterior regeneration. If a worm did not consist of at least two nearly formed zooids, a single cut after the 6th body segment was made to elicit anterior regeneration. Because the initiation of regeneration processes holds particular significance for future studies, 1,985 of the regenerating worms in the sample were allowed to regenerate between 0 and 24 hours, which is roughly coincident with the start of blastema formation. Batches of 250 worms were also collected at 1.25 days post-amputation (dpa), 1.75 dpa, 2 dpa, 2.5 dpa, 3 dpa, and 3.5 dpa, when differentiation of adult morphology is nearly complete.
Fissioning and regenerating worms were washed 5x in PSW prior to RNA extraction. RNA was extracted using TRIReagent (Applied Biosystems), and RNA from all samples was then pooled together.
cDNA library construction
We constructed a pooled cDNA library consisting of a normalized fraction to capture lowly expressed transcripts and a non-normalized fraction to capture large transcripts that might be lost during the PCR amplification steps of the normalization process.
First-strand cDNA (F.S. cDNA) was made using a MINT full-length cDNA synthesis kit (Evrogen) and manufacturer’s instructions. A modified oligo-dT primer with breaks in the homopolymer-T run was used to minimize the negative effects of an extensive homopolymer run on 454 sequence quality (Additional file 5). A portion of the F.S. cDNA was incubated for 2 hours at 15°C with NEB Buffer 2, DNA Polymerase I (New England Biolabs), and RNase H in order to make full-length double-strand cDNA. A fraction of F.S. cDNA was then normalized with Evrogen’s duplex-specific nuclease (DSN). F.S. cDNA-RNA duplexes in hybridization buffer were denatured at 98° for 3 minutes and then allowed to hybridize at 70°C for 5 hours. Preheated DSN at 1/4x concentration was then added and incubated for 20 minutes at 70°C. DSN stop solution was then added, and the sample was incubated for 5 minutes at 70°C. Normalized cDNA was then PCR amplified using an Encyclo PCR kit (Evrogen). PCR conditions were: 1 cycle × 95°C-1 min.; 17 cycles × 95°C-15 sec., 66°C-20 sec., 72°C-3 min.; 1 cycle × 66°C-15 sec., 72°C-3 min. Normalization efficiency was assayed via gel smear and qPCR of genes with known relative abundance (Additional file 5). qPCR analysis was performed using LinRegPCR [91, 92].
The non-normalized and normalized cDNA libraries were then pooled in a 1:2 ratio (Figure 1).
Five μg of pooled cDNA library was sent to the Roy J. Carver Biotechnology Center at the University of Illinois for sequencing. The cDNA library was sheared to 500-800 bp in length, 454 sequencing adaptors were ligated onto ends (Additional file 5), and the library was then converted to a single-stranded template library. Three titration runs (each of 1/16 lane) were performed to optimize sequencing conditions. A full plate was then sequenced on a Roche/454 GS FLX Sequencer using Titanium reagents.
Assembly of 454 sequence reads
Reads from the full plate and three titration runs were assembled using the Newbler Assembler v2.3 (Roche) using default parameters under the cDNA option. Prior to assembly, specified primers and adaptors were trimmed, namely the oligo-dT primer, the MINT PlugOligo adapter and PCR primer (Evrogen), and the 454 sequencing adaptors (Additional file 5).
Determination of fraction of captured transcripts
The coverage statistic developed by Susko and Roger (2004) estimates the proportion of genes from a cDNA library that is actually represented in the sequence data . Using this method, the unbiased estimate of coverage was calculated for our transcriptome with the equation
= 1 – n
/n, where n
is the number of singletons in the assembly and n is the total number of reads [39, 40]. The new gene discovery rate was estimated using the term 1/(1 –
Annotation and analysis of BLAST hits
The set of 95,644 isotigs was input into the EST2uni annotation pipeline using default parameters, but with PCR marker integration, microarray printing, reciprocal BLAST for orthologues, Gene Ontology, and RFLP integration options turned off . Within EST2uni, the CAP3 assembly parameters were adjusted to –f 2 –g 100 –p 100 –d 110 to produce an assembly of all singletons, thereby preserving the isotigs produced by Newbler. Isotigs were annotated if they produced BLASTX matches against UniProtKB Release 2010_04 (23-Mar-2010) with E-values less than e-10. Parsing of BLAST data for Table 2 was done with custom Perl scripts (available upon request).
Completeness of annotated isotigs in Table 3 was performed using only the largest isotig from each isogroup as a representative for its putative gene locus. A Perl script utilizing BioPerl modules (available upon request) was used for completeness analysis. An isotig was considered complete on either end if it matched within ten amino acids of the corresponding end of the UniProt sequence .
The set of max isotigs was also used to identify Gene Ontology (GO) designations using the program Blast2GO [44, 45]. Results from BLAST searches against the UniProt database were imported from EST2uni, and GO annotation in Blast2GO was performed with default parameters (E-value threshold of e-6).
Identification of candidate regeneration genes
TBLASTX was used to search the P. leidyi isotig dataset for homologs of genes implicated in animal regeneration in the literature. A reciprocal TBLASTX search was then performed against UniProt or the nr database via NCBI to verify the putative identity of candidate regeneration genes in P. leidyi.
Validation of transcript assembly
Transcript validation assays were performed for two isogroups, isogroup08478 (a putative piwi-like homolog) and isogroup03233 (a putative frizzled homolog). Isotigs of each isogroup were aligned together using ClustalX v2.1 with manual editing by Seaview v4.0 [93, 94]. PCR was then performed to verify contigs and isotigs of each isogroup (Additional file 5). PCR amplicons of the expected size were either sequenced directly or cloned into the pGEM-T Easy vector (Promega) prior to sequencing. Sequencing was performed using an Applied Biosystems 3730 × l DNA Analyzer.
Transcript assembly was also verified for four previously known P. leidyi genes. BLAST searches against the 454 dataset were performed using the published gene sequences of Pl-en [GenBank: AF336055.1], Pl-otx1 [GenBank: AF336056.1], Pl-otx2 [GenBank: AF336057.1], and Pl-nos [GenBank: GQ369728.1]. GenBank sequences were aligned to transcriptome sequences using Sequencher v.4.7 (Gene Codes Corporation).
Analysis of gene expression by whole mount in situ hybridization
A ~1250 bp fragment of Pl-fzA (isogroup23343) and a ~1300 bp fragment of Pl-β-cat (isogroup01340) were amplified by PCR (Additional file 5). Synthesis of sense and antisense riboprobes and in situ hybridization were performed as previously described .