Transcriptional response to sulfide in the Echiuran Worm Urechis unicinctus by digital gene expression analysis

Urechis unicinctus, an echiuran worm inhabiting the U-shaped burrows in the coastal mud flats, is an important commercial and ecological invertebrate in Northeast Asian countries, which has potential applications in the study of animal evolution, coastal sediment improvement and marine drug development. Furthermore, the worm can tolerate and utilize well-known toxicant-sulfide. However, knowledge is limited on the molecular mechanism of U. unicinctus responding to sulfide due to deficiency of its genetic information. In this study, we performed Illumina sequencing to obtain the first Urechis unicinctus transcriptome data. Sequenced reads were assembled and then annotated using blast searches against Nr, Nt, Swiss-Prot, KEGG and COG. The clean tags from four digital gene expression (DGE) libraries were mapped to the U. unicinctus transcriptome. DGE analysis and functional annotation were then performed to reveal its response to sulfide. The expressions of 12 candidate genes were validated using quantitative real-time PCR. The results of qRT-PCR were regressed against the DGE analysis, with a correlation coefficient and p-value reported for each of them. Here we first present a draft of U. unicinctus transcriptome using the Illumina HiSeqTM 2000 platform and 52,093 unique sequences were assembled with the average length of 738 bp and N50 of 1131 bp. About 51.6 % of the transcriptome were functionally annotated based on the databases of Nr, Nt, Swiss-Prot, KEGG and COG. Then based on the transcriptome, the digital gene expression analysis was conducted to examine the transcriptional response to sulfide during 6, 24 and 48 h exposure, and finally 1705, 1181 and 1494 tag-mapped genes were identified as differentially expressed genes in the 6-h, 24-h and 48-h libraries, then were further subjected to pathway analyses. In the DGE database of U. unicinctus, the alterations in certain known sulfide-related pathways indicate similar changes in response to sulfide. For more than 80 % of the identified pathway members, this is the first report on their association with sulfide stress, among which glycolysis pathway and PIDD involving pathways were unique and discussed in details, and were thought to play important roles in the sulfide tolerance of U. unicinctus. All the results are helpful to explain the mechanism of sulfide tolerance and detoxification.


Background
Exogenous sulfide, a common environmental toxin, harms organisms by reducing the affinity of hemoglobin for oxygen [1], inhibiting the cytochrome c and succinate oxidase complexes [2,3], depolarizing mitochondria [4], inducing apoptosis [5], and causing oxidative damage to RNA and DNA [6]. Moreover, sulfide is produced by mammals in vivo [7,8] and endogenous sulfide is involved in multiple physiological functions, including inflammation [9], up-regulation of the antioxidant system [10], and activation of K ATP channels [11]. Thus, exploring the physiological effects of sulfide and organismal sulfide tolerance mechanisms is important.
Responses to sulfide exposure are distinct in different organisms. The nematode Caenorhabditis elegans survives in the presence 50 ppm sulfide, which increases its thermotolerance and lifespan [12]. Mice exposed to 80 ppm sulfide enter a hibernation-like state to overcome the conditions inducing a low metabolic rate and decreased core body temperature without any apparent ill effects [13]. Organisms living in sulfide-rich habitats adopt a variety of defensive responses to avoid prolonged toxic exposure to sulfide; for example, endosymbiotic bacteria sulfide metabolism and mitochondrial sulfide oxidation are two primary strategies for sulfide tolerance in invertebrates [14]. The hydrothermal vent tube worm Riftia pachyptila and the clam Solemya velum use hemoglobin to deliver sulfide to their endosymbiotic bacteria [15][16][17]. Metatranscriptome pyrosequencing of the endosymbiotic bacteria of the clam S. velum revealed 28 genes involved in diverse pathways associated with sulfide metabolism, including the dissimilatory sulfite reductase pathway, the adenosine-5′-phosphosulfate pathway and the sulfur oxidation pathway [18]. For some species that lack endosymbionts, such as echiuran worm Urechis unicinctus, mitochondrial sulfide oxidation is important for their survival in sulfide-rich habitats, and several key enzymes, including sulfide:quinone oxidoreductase (SQR), sulfur dioxygenase (SDO), and sulfur transferase (ST), play important roles in sulfide detoxification [19][20][21][22]; however, due to the lack of sufficient high-throughput molecular data, the full complement of genes and signalling pathways involved in sulfide tolerance and utilization and the mechanisms that regulate these pathways are unknown.
The echiuran worm, U. unicinctus, inhabiting "U-shaped" burrows in intertidal and subtidal mudflats, is widely distributed in Russia, Japan, the Korean Peninsula and China, which belongs to phylum Echiura, a special group of invertebrate protostomes. Preliminary studies have been presented that the worm U. unicinctus is able to metabolize exogenous sulfide and utilize it as energy [23][24][25]. Moreover, characterizations of the key genes for mitochondrial sulfide oxidation [21,22,[26][27][28] have been revealed. Here, a high-throughput methods-digital gene expression (DGE) technique was conducted to analyze the transcriptional response to sulfide in U. unicinctus. Before the DGE analysis, the transcriptome data of U. unicinctus is necessary, so we performed a large-scale transcriptome sequencing of U. unicinctus by Illumina HiSeq TM 2000 platform (BGI, Shenzhen, China) to obtain the first version of the U. unicinctus transcriptome.
The aims of the present study are to (i) characterize the U. unicinctus transcriptome; (ii) identify the genes and signalling pathways responding to sulfide in U. unicinctus. The results will provide a comprehensive understanding for further exploring the molecular mechanism of sulfide metabolism in U. unicinctus.

Transcriptome analysis
A total of 59,007,614 raw PE reads in the fastq format were generated with a length of 2 × 90 bp, which have been deposited in the NCBI database (accession number: SRA122323). After removing the low quality reads, we obtained a total of 53,874,422 high-quality PE reads for optimizing the de novo assembly and analysis of the U. unicinctus transcriptome. The GC content, Q20 (Phred quality score >20) and unknown bases were 45.54, 97.91 and 0.00 %, respectively ( Table 1). Assembly of the 53,874,422 high-quality reads generated 115,682 contigs with an average length of 337 bp, ranging from 101 to 5279 bp (Fig. 1a). Approximately 75 % (38,984,834) of the high-quality reads contributed to the assembly of the contigs. Among these contigs, 46,653 (41 %) were longer than 200 bp, and 18,663 (16 %) were longer than 500 bp. As expected for a randomly fragmented transcriptome, a positive relationship was determined between the length of a given contig and the number of reads mapped into it (Fig. 1c). Furthermore, these contigs were assembled into 52,093 unigenes with an average length of 738 bp and an N50 of 1131 bp, ranging from 200 to 7184 bp (Fig. 1b). An overview of the sequencing and assembly process is presented in Table 2.
In this study, 26,889 gene sequences (51.6 % of the total consensus sequences) returned an above cut-off BLAST result, and 51.4 % of the assembled sequences could be matched to the known proteins. There were 8707 mapped  (Fig. 2a). More than 21 % of the total sequences had a similarity that was higher than 60 %, while the rest ranged from 14 % to 60 % (Fig. 2b). Moreover, 16.34 % of the distinct sequences had top matches with those from Branchiostoma floridae, which was followed by Saccoglossus kowalevskii (13.77 %) and Strongylocentrotus purpuratus (5.93 %) (Fig. 2c). COG (Cluster of orthologous Groups of proteins) analysis showed that 9447 unigenes were classified into 25 categories assigned by COG (Fig. 3). In this classification, the largest group was involved in the general function, containing 3749 unigenes (39.68 %), which was followed by translation, ribosomal structure and biogenesis (1360, 14.40 %); replication, recombination and repair (1332, 14.10 %); transcription (1206, 12.77 %), carbohydrate transport and metabolism (1116, 11.81 %); and posttranslational modification, protein turnover, and chaperones  (1110, 11.75 %). Only 2 unigenes (0.02 %) belong to the nuclear structure, which was the smallest group. GO (Gene ontology) analysis showed that a total of 13,458 unigenes (51.37 % of the total annotated sequences) were assigned at least one GO term. The transcripts were divided into 63 functional groups within the following three ontologies: molecular function, cellular component and biological process. For the biological process, the most representations were for cellular process (8576, 63.72 % of the total 13,458) and metabolic process (6796, 50.50 %); the cell (7985, 59.33 %), cell part (7979, 59.29 %) and organelle (5601, 41.62 %) for the cellular component; and the binding (7276, 50.04 %) and catalytic activity (6347, 47.16 %) for the molecular function, respectively. Only 60 and 9 unigenes were predicted to act in the functional group electron carrier and translation regulator activity, which were the smallest two parts (Fig. 4).
KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis showed that 18,884 unigenes (72.73 % of the total annotated sequences) were mapped to 254 KEGG pathways. Of these, metabolic pathways contained 2820 unigenes and were no doubt larger than other pathways, such as pathways in focal adhesion (717), cancer (654) and regulation of the actin cytoskeleton (645). Approximately 18 % of the unigenes were related to such pathways as immunity, stress, and growth.

DGE library sequencing and tag mapping
To investigate the transcriptome profile of U. unicinctus in response to sulfide at 6, 24 and 48 h, DGE analysis was performed. There were a total of 3.8-4.2 million raw tags. After removing the adaptor tags and the low-quality tags, more than 95 % of the raw tags were retained, including 3,909,160, 3,613,737, 4,057,279 and 4,027,932 clean tags for the control, 6-h, 24-h and 48-h libraries, respectively (Table 3). These results suggested that the sequencing data were of sufficient quality for subsequent analysis.
The distribution of the total clean tags and the distinct clean tags in each tag-abundance category was similar among the four DGE libraries (Fig. 5). Sequencing saturation analysis of the four libraries revealed that the number of detected genes plateaued shortly after 1 M tags were sequenced, and no new genes were identified when the total sequence number approached 2 M, indicating that the approximately 4 M tags in each library represented complete coverage of the transcriptome (Fig. 6).    The raw data are tags obtained from sequencing without processing. The clean tags are tags that remained after filtering out dirty (low-quality) tags from the raw data. The distinct tags are tags that do not overlap. The unambiguous tags are the clean tags that remained after the removal of tags mapped to reference sequences from multiple genes Fig. 5 Distribution of the total clean tags and distinct clean tags in Urechis unicinctus exposed to 50 μM sulfide for different periods  (Table 3).  (Fig. 7).

DEG (differentially expressed gene) identification, validation and pathway enrich analysis
To validate the digital gene expression (DGE) results, 12 candidate unigenes representing different biological processes were selected for qRT-PCR analysis. The results of qRT-PCR were regressed against the DGE analysis, with a correlation coefficient (R 2 ) and p-value reported for each of the 12 unigenes investigated (Fig. 8). Of the selected unigenes, eight returned a correlation coefficient with a p-value < 0.05 or <0.01, showing the consistence between DGE and qRT-PCR. The left 4 unigenes did not show a significant correlation. However, all the candidate KEGG pathway analysis of the DEGs revealed that 175, 166 and 176 metabolic pathways were affected in U. unicinctus exposed to 50 μM sulfide for 6, 24, and 48 h, respectively and the pathways including oxidative phosphorylation, ribosome, proteasome, complement and coagulation cascades, and metabolism were all significantly enriched in the 6-h, 24-h and 48-h library (Additional file 2).

Discussion
U. unicinctus, belonging to the phylum Echiura, can tolerate high concentrations of sulfide living in the "U-shaped" burrows in intertidal and subtidal mudflats. In this study, we first demonstrated a de novo assembly of the U. unicinctus transcriptome using Illumina Hiseq2000 platform. This is also the first transcriptome report in echiuran worms. In the obtained transcriptome of U. unicinctus, the assembly of the 53,874,422 clean reads produced 52,093 unique sequences with an average size of 738 bp which is comparable with other studies generating transcriptome data [29][30][31]. The number of protein-coding genes in the U. unicinctus genome is unknown; however, if we assume general conservation based on estimates for the representative of the phylum Annelida Capitella teleta (32,415 genes; JGI annotation pipeline),~83 % of protein-coding genes (26,889) were assembled in this study. All the transcriptome data obtained in U. unicinctus indicated that it is appropriate to conduct DGE analysis to identify transcriptional response to sulfide in U. unicinctus.
In the DGE analysis, a total of 1705, 1181 and 1494 tag-mapped DEGs were identified, and 175, 166 and 176 metabolic pathways were affected when the worms were exposed to 50 μM sulfide for 6, 24 or 48 h, respectively. These DEGs displayed many biological functions, suggesting molecular mechanisms involved in the sulfide response. In mammals, several pathways-such as MAPK, NF-κB and apoptosis-are affected by sulfide and might be important for the sulfide tolerance [5,9,[32][33][34][35][36][37]; we also identified some DEGs in these pathways such as ribosomal protein S6 kinase 2 alpha (RSK2) and phospholipase A2 (cPLA2) in the MAPK pathway, Toll-like receptor 4 (TLR4) and myeloid differentiation primary response protein 88 (Myd88) in the NF-κB pathway and caspase-3 and caspase-7 in the apoptosis pathway (Table 4), which are consistent with the previous results. Furthermore, in fission yeast Schizosaccharomyces pombe, it is revealed by microarray analysis that a large number of genes encoding mitochondrial proteins were down regulated after sulfide treatment, such as NADH dehydrogenase, succinate dehydrogenase, sulfide-quinone oxidoreductase (sqr), glycerol-3-phosphate dehydrogenase (gut), ubiquinol cytochrome c reductase complex subunit 5 (rip1) and cytochrome c oxidase subunit VIA (cox13), indicating that sulfide could affect the expressions of many mitochondrial proteins at transcriptional level [38]. In our DGE results, the mRNA expression levels of genes encoding mitochondrial complex I (NADH dehydrogenase) and complex II (succinate dehydrogenase) were downregulated (Table 4), which was similar with the results of [38]. Meanwhile the expression of genes such as sqr, sdo and st encoding mitochondrial sulfide oxidation enzymes increased with the sulfide exposure time (Table 4), which agreed with our previous studies [26,27]. These results demonstrate that our databases are reliable.
In this study, several unique pathways and more than 80 % of the identified pathway members are the first report, which are associated with sulfide stress in U. unicinctus exposed to sulfide. Among these pathways, ATP generated pathways (glycolytic and oxidative phosphorylation pathway) as well as the PIDD (p53-induced protein with a death domain) induced pathways involving in the DNA damage response attracted our interests.
Usually the anaerobic glycolytic pathway typically replaces oxidative phosphorylation for ATP production and the maintenance of the intracellular ATP balance when mouse or human cells are cultured in media containing 10 μM sulfide, and the expression of nearly all of the genes in the anaerobic glycolytic pathway are up-regulated [39,40]. In U. unicinctus, it is confirmed that activities of key enzymes in the glycolytic pathway, phosphorylase, phosphofructokinase, and pyruvate kinase become markedly high after fertilization in U. unicinctus eggs [41,42], and enzymes associated with oxidative phosphorylation for ATP production have also been elucidated in echiuran respiration [43][44][45][46][47]. In this study, we identified a large number of genes related to glycolytic and oxidative phosphorylation pathways, such as pyruvate kinase, phosphorylase, phosphofructokinase, ADP-dependent glucokinase and phosphoglycerate kinase (Table 4). Interestingly, the expressions of most unigenes involved in glycolysis such as ADP-dependent glucokinase and phosphoglycerate kinase were significantly down-regulated when the worms were For each qRT-PCR validation, five technical replications were performed. The correlation coefficient (R 2 ) of the regression between qRT-PCR and DGE for each of the 12 unigenes investigated was analyzed and shown. *p < 0.05 and **p < 0.01 exposed to sulfide (Table 4); moreover, the mRNA levels of both complex I and complex II were also down-regulated (Table 4). These results suggested unlike mammals, U. unicinctus may adopt a non-glycolytic pathway to produce ATP when sulfide exposure. Several sulfide-tolerant organisms-such as the ribbed mussel Geukensia demissa, echiuran worm U. unicinctus and the lugworm Arenicola marina-utilize sulfide to produce ATP [24,[48][49][50]. Hildebrandt [51] suggested that a super-complex composed of SQR and complexes III and IV initiates ATP production using sulfide as a material resource when sulfide enters the respiratory chain of rat mitochondria. In this study, the genes encoding SQR and complexes III and IV (Table 4) were all up-regulated in U. unicinctus exposed to sulfide. Therefore, we propose that the super-complex (SQR, complexes III and IV) catalytic activity by using sulfide as a material resource replaces the glycolysis to maintain the ATP balance when these worms are exposed to sulfide. Further studies are needed to confirm these suggestions.
It is known sulfide generates significant oxidative damage to genomic DNA [52]. Typically, PIDD, a key member of the p53 DNA damage response pathway, activates two distinct pathways in response to DNA damage. One is to activate NF-κB via the ubiquitination of NEMO to stimulate DNA repair and promote organismal survival [53][54][55]; alternatively, PIDD induces genotoxic stressmediated apoptosis and activates the mitochondrial apoptosis pathway to protect genome integrity [56]. In this study, all the members involving in this pathway were all in up-regulated expression when U. unicinctus were exposed to sulfide (Table 4). Furthermore, it is suggested that PIDD acts in response to sulfide-induced DNA damage based on probably the increase of the PIDD mRNA level. Further researches need to be conducted to verify the suggestions from the DGE analysis.

Conclusions
In this study, a transcriptome from the U. unicinctus adult tissues was sequenced. This database is currently the largest in echiuran worms, and 52,093 unigenes are assembled. Many unigenes have been identified and are involved in various fields. Then based on the transcriptome, transcriptional profile of U. unicinctus exposed to 50 μM was revealed via DGE analysis. Approximately 1705, 1181 and 1494 DEGs were identified at 6, 24 and 48 h after sulfide exposure, respectively. These DEGs are involved in many biological processes. In the DGE database of U. unicinctus, the alterations in certain known sulfide-related pathways indicate similar changes in response to sulfide, whereas U. unicinctus exhibited differences from other species with respect to its modulation of glycolysis in response to sulfide. In addition, this is the first report on PIDD in response to sulfide-induced DNA damage in U. unicinctus. These data facilitates the identification of sulfide-related genes and elucidates the molecular mechanisms by which organisms tolerate sulfide stress. This study provides a framework for additional functional studies to examine sulfide toxicology in this system.

Ethics statement
Each of the procedures that were used to handle and treat the Echiuran worms during this study was approved by the Ocean University of China Institutional Animal Care and Use Committee (OUC-IACUC) prior to the initiation of the study.

Animals and sulfide treatment
Urechis unicinctus (mean fresh mass of 33.4 ± 10.4 g) were purchased from an aquatic product market, which collected worms from a coastal intertidal flat in Yantai, China. The worms were temporarily maintained in aerated seawater (17 ± 1°C, pH 8.0, and salinity 32 PSU) for a week without feeding. Then 30 healthy worms were randomly assigned to three groups (10 worms per group), and for each group the worms were maintained in a sealed aquarium containing 30 L of seawater. The sulfide concentration was maintained at 50 μM by adding a sulfide stock solution (10 mM Na 2 S, pH 8.0) every 2 h as necessary based on the sulfide concentration, which was determined using the methylene blue method [57]. Six worms were sampled (i.e., two individuals from each aquarium) before sulfide addition (control) and at 6, 24 and 48 h after the initiation of sulfide exposure, frozen in liquid nitrogen, and stored at −80°C for subsequent RNA extraction.

RNA extraction and quality control
Total RNAs from each stored tissues were extracted with TRIzol® Reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions. The yield, purity and integrity of the RNA samples were analyzed using a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).

cDNA library construction and illumina sequencing
The mRNA was purified using a PolyATract® mRNA Isolation System (Ambion, Austin, TX, USA). Fragmentation buffer was added to interrupt mRNA to short fragments. With these short fragments as templates, random hexamer-primer was used to synthesize the firststrand cDNA. The second-strand cDNA was synthesized using buffer, dNTPs, RNase H and DNA polymerase I. Short fragments were purified with QiaQuick PCR extraction kit (Qiagen, Hilden, Germany) and resolved with EB buffer for end reparation and tailing A. Next, the short fragments were connected with sequencing adapters. After agarose gel electrophoresis, suitable fragments were selected for the PCR amplification as templates. Finally, the library could be sequenced using Illumina HiSeq™ 2000, which could generate paired-end (PE) reads with a length of 90 bp.

De novo assembly and transcriptome annotation
Raw reads from Illumina sequencing were first preprocessed for quality control, such as filtering of highquality reads based on the score value given in fastq files, removal of reads containing primer/adaptor sequences and trimming the read length using common programs. After comparison, the SOAP-denovo program was most effective for the assembly of 25-bp segments that are randomly clipped from the reads obtained in long parts with the help of the de Bruijn graph algorithm and was then used for de novo assembly of our PE reads and generation of non-redundant unique sequences. Unigenes were firstly aligned for homologues by blastX (E-value < 10 −5 ) to protein databases Nr, Swiss-Prot, KEGG and COG and then aligned by blastn to nucleotide databases Nt (E-value < 10 −5 ). GO terms were assigned utilizing Blast2GO [58] for blast output obtained from the Nr database. After getting GO annotation for every unigene, WEGO software [59] was used to do GO functional classification for all unigenes and to understand the distribution of gene functions of the species from the macro level. With the help of KEGG database, pathway annotation for unigenes was obtained.

DGE library construction, sequencing and annotation
Four DGE libraries were constructed from the total RNA sample of the worms exposed to 50 μM sulfide for 0, 6, 24 or 48 h. For all libraries, mRNA was purified from 6 μg of total RNA sample using oligo(dT) magnetic beads and was reverse-transcribed into double-stranded cDNA (SuperScript II, Invitrogen). Two restriction enzymes were used to generate sequencing tags. The beadbound cDNA was digested using NlaIII (CATG recognition site) and ligated to Illumina adaptor 1. The junction of Illumina adapter 1 and CATG generated the recognition site for MmeI. Thereafter, MmeI was used to cut the cDNA fragments 17 bp downstream of the CATG site. After removing the 3′ fragment via magnetic bead precipitation, Illumina adapter 2 was introduced at the site of MmeI cleavage, resulting in a library of tagged cDNA containing different adaptors on each end. Fifteen cycles of PCR amplification were performed to enrich the 21-bp tags in the samples. Massively parallel sequencing-by-synthesis was performed using an Illumina HiSeq™ 2000. Clean tags were obtained by filtering the adaptor sequences and low-quality tags using the Illumina data processing pipeline. The distinct clean tags were classified according to their copy number in the libraries, and the saturation of the four libraries was analyzed. The distinct clean tags were mapped to the transcriptome of U. unicinctus sequenced before. Only the tags displaying zero or one mismatch were annotated based on the reference genes.

Identification and validation of DEGs
To compare the gene expression in U. unicinctus at different periods of sulfide exposure, the number of raw clean tags in each library was first normalized to the number of transcripts per million clean tags (TPM). The expression analysis was then carried out using TPM of sulfide treatment groups compared with that of the control group. The thresholds for determining significant differences in gene expression were a FDR (false discovery rate) ≤ 0.001 and an absolute value of log 2 Ratio ≥1. All DEGs were subjected to KEGG pathway enrichment analysis compared with the transcriptome background using hypergeometric test. The calculated P-value goes through as a threshold. KEGG pathway fulfilling corrected P-value ≤ 0.05 from Bonferroni correction were defined as significantly enriched terms in DEGs.
The expression of tag-mapped genes in the four libraries was validated using quantitative real-time reversetranscription polymerase chain reaction (qRT-PCR). Primers were designed using Primer 5.0 software for 12 selected genes, including 8 up-regulated genes, 3 down-regulated genes and 1 gene not displaying differential expression (Additional file 3: Table S1). Total RNA sample (1 μg) from the stock samples used to construct the sequencing libraries was reverse-transcribed into cDNA using the Primescript™ RT Reagent Kit with gDNA Eraser (Takara, Otsu, Japan). All amplification reactions were performed using a LightCycler™ 480 (Roche, Basel, Switzerland) in a 20-μL volume containing 1 × SYBR Premix Ex Tap (Takara), 0.2 mM of each primer and 1.5 μL of cDNA. The amplification was conducted according to the following standard protocol: 95°C for 10 min, followed by 40 cycles of 95°C for 15 s, 60°C for 1 min, and a thermal denaturing step to generate the melting curves for the verification of amplification specificity. The U. unicinctus β-actin (GenBank accession number: GU592178.1) was used as an internal standard, and the relative gene expression levels were calculated using the 2 -ΔΔCt method, where -ΔCt is the difference in the Ct between β-actin and the target gene.