De novo transcriptome analysis of Perna viridis highlights tissue-specific patterns for environmental studies

Background The tropical green-lipped mussel Perna viridis is a common biomonitor throughout the Indo-Pacific region that is used for environmental monitoring and ecotoxicological investigations. However, there is limited molecular data available regarding this species. We sought to establish a global transcriptome database from the tissues of adductor muscle, gills and the hepatopancreas of P. viridis in an effort to advance our understanding of the molecular aspects involved during specific toxicity responses in this sentinel species. Results Illumina sequencing results yielded 544,272,542 high-quality filtered reads. After de novo assembly using Trinity, 233,257 contigs were generated with an average length of 1,264 bp and an N50 length of 2,868 bp; 192,879 assembled transcripts and 150,111 assembled unigenes were obtained after clustering. A total of 93,668 assembled transcripts (66,692 assembled genes) with putative functions for protein domains were predicted based on InterProScan analysis. Based on similarity searches, 44,713 assembled transcripts and 25,319 assembled unigenes were annotated with at least one BLAST hit. A total of 21,262 assembled transcripts (11,947 assembled genes) were annotated with at least one well-defined Gene Ontology (GO) and 5,131 assembled transcripts (3,181 assembled unigenes) were assigned to 329 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The quantity of assembled unigenes and transcripts obtained from male and female mussels were similar but varied among the three studied tissues, with the highest numbers recorded in the gills, followed by the hepatopancreas, and then the adductor muscle. Multivariate analyses revealed strong tissue-specific patterns among the three different tissues, but not between sexes in terms of expression profiles for annotated genes in various GO terms, and genes associated with stress responses and degradation of xenobiotics. The expression profiles of certain selected genes in each tissue type were further validated using real-time quantitative polymerase chain reaction assays and a similar tissue-specific trend was seen. Conclusions The extensive sequence data generated from this study will provide a valuable molecular resource for facilitating environmental studies with P. viridis, and highlight the importance of tissue-specific approaches in the future. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-804) contains supplementary material, which is available to authorized users.


Background
Over the past 30 years, mussels have served as useful biomonitors in assessing the quality of our marine ecosystems with respect to the impact of chemicals [1]. Mytilid mussels, in particular Mytilus species, have been extensively applied to environmental monitoring programs and toxicological studies in temperate regions [2]. More recently, a tropical/subtropical member of mytilids, the green-lipped mussel Perna viridis, has been adopted as a biomonitor throughout the Indo-Pacific region [3] for assessing a wide range of chemical pollutants [4][5][6][7][8]. Because of its wide distribution, and high tolerance to environmental stresses, this mussel is also regarded as highly invasive and a biofouling species [9].
In conventional biomonitoring, the body burden of common pollutants and a limited number of physiological and molecular biomarkers are used as the end-points for assessing the effects of chemical pollution on mussels [3]. To understand the toxic effects of chemical pollutants, and to develop an early warning signal of pollutant stress on a biomonitor, it is essential to have an in-depth knowledge of their toxic mechanisms at a molecular level. However, like many other non-model marine organisms, the limited genomic data available for P. viridis has hindered studies attempting to elucidate molecular mechanisms related to specific toxic stress responses.
A transcriptome of a given organism refers to a complete set of transcripts present in a cell type at a specific developmental stage or physiological condition [10]. The underlying molecular mechanisms and phenotypic plasticity of an organism against environmental influences can be interpreted from the dynamics of its transcriptome. Given the advent of next-generation sequencing (NGS) technology, high throughput RNA-sequencing analysis of transcripts with greater gene coverage, higher sensitivity, and better reproducibility of transcriptomes that are highly complex, can now be applied to any genome [10]. RNA-sequencing analysis has been applied to several marine bivalve species, including the Mediterranean mussel Mytilus galloprovincialis [11], the deep-sea hydrothermal vent mussel Bathymodiolus azoricus [12], the Manila clam Ruditapes philippinarum [13], the Yesso scallop Mizuhopecten yessoensis (formerly known as Patinopecten yessoensis) [14], the pearl oyster Pinctada fucata [15] and the Pacific oyster Crassostrea gigas [16]. Some of these studies have highlighted the species specificity of the transcriptome, and also demonstrated specific transcript profiles among different tissues. The extensive and novel genomic data generated from these non-model animals have revealed much about their underlying molecular mechanisms in response to different environmental stressors [13,17].
Compared with other mytilids, genomic and protein databases are largely unavailable for P. viridis. To date, the number of sequence entries available for this species in the National Center for Biotechnology Information (NCBI) is limited, with 451 nucleotide sequences, 4 expressed sequence tags (ESTs), 252 protein sequences, and 1 highthroughput DNA and RNA sequence read archive (SRA) ( Table 1, Data search performed on 15 May 2014). Molecular characterization of stress-associated genes, such as metallothioneins [18] and glutathione S-transferases [19], and the adhesion associated proteins such as the 3, 4-dihydroxyphenyl-L-alanine (DOPA)-containing proteins from the byssus threads [20] have been conducted for P. viridis. A recent study revealed the adhesive byssal complex and various adhesive proteins in the foot tissue of P. viridis, using RNA-sequencing and proteomics [21]. A number of microsatellite markers were also characterized for genetic investigations [22,23]. P. viridis has the shortest reported mitochondrial genome (16,014 bp) among marine mussels, and differs from that of other Mytilus species in terms of genome organization [24]. A cytogenetic study suggested that P. viridis could be a more primitive species of the Mytilidae, that diverged from other members of the Perna and Mytilus genera [25]. Such distinctions indicate that the existing genomic data from other Mytilidae member may not be suitable to serve as a basis for homologous references in P. viridis.
Given the importance of P. viridis to ecotoxicology and environmental applications, a comprehensive genomic database is urgently needed for this species to advance future marine environmental studies in the Indo-Pacific region. In the current study, we applied Illumina sequencing to characterize a de novo transcriptome assembled from a mixed sample of three target tissues (adductor muscle, gills and hepatopancreas) of male and female P. viridis collected from the field and from various designated exposure scenarios. Comparative analyses were also performed to determine any tissueand sex-specific transcriptome profiles.

Results and discussion
Sequencing and assembly An overall profile of the transcriptome of P. viridis was obtained through Illumina sequencing of six cDNA libraries (2 sexes × 3 tissues) and pooled mRNA samples from the adductor muscle, gills and hepatopancreas of male and female. The number of reads generated for each target tissue ranged from 84,607,612-102,182,018 with a Q30 values (denoting the accuracy of a base call to be 99.9%) ranging from 83.78-89.19%. The highest GC content was seen in the adductor muscle (42%), followed by the hepatopancreas (40%), and the gills (39%) (Additional file 1). A global de novo transcriptome of P. viridis was assembled from the pool of the reads for the six cDNA libraries using Trinity software. We obtained 544,272,542 highquality reads after filtering and subjecting to de novo assembly, generating 233,257 contigs. After clustering with CD-Hit, the resulting assembled transcriptome comprised 192,879 assembled transcripts and 150,111 assembled unigenes ( Table 2). The lengths of the assembled transcripts ranged from 201-43,014 bp, with an average length of 1,264 bp and an N50 length of 2,868 bp ( Table 2). The read data for the six samples was deposited in the NCBI SRA database (BioProject Accession Number SRP043984).

Sequence abundance
The expression levels of assembled unigenes/transcripts were measured using fragments per kilobases per million reads (FPKM). The numbers of assembled unigenes/transcripts that were expressed (cut-off value at FPKM ≥ 0.05) in male and female P. viridis were similar, however strong variations were observed for unigene/transcript abundance among tissues ( Figure 1). The largest transcriptome size was obtained from the gills, followed by the hepatopancreas, and then the adductor muscle. The numbers of assembled unigenes/transcripts in the gills and hepatopancreas were very similar, while those detected in the adductor muscle were comparatively low. A recent study outlined the potential errors in using whole organism or composite structures (organs) in RNA-sequencing analysis as a common practice for small invertebrates like insects or larvae. The same researchers demonstrated the significance of using a structure/organ of interest to correctly interpret specific expression of genes [26]. We have demonstrated the possibility of high variability in transcriptome sizes occurring in different tissues of P. viridis.

Functional annotation
Annotation results for the non-redundant assembled transcripts/unigenes are summarized in Table 3. We performed a BLAST search against the NCBI nr database and the three molluscan EST databases; 44,713 assembled transcripts and 25,319 assembled genes with at least one BLAST hit were returned, with 41,686 assembled transcripts and 23,392 assembled unigenes that had been annotated by the NBCI nr database. The BLAST search of the 27,651 assembled transcripts had their best match with sequences from the Pacific oyster, Crassostrea gigas (Additional file 2).
Annotation results from the BLAST search for de novo transcriptomes of the non-model species were shown to be highly dependent on the availability of annotated sequence information for a reference genome, and the sizes of their contig sequences [27,28]. The number of transcripts obtained for P. viridis from this study was comparatively higher than for other non-model bivalve species subjected to NGS technology. There were 38,942 reported sequences (14,638 contigs plus 24,304 singletons) [14] and 14,240 transcripts in the Yesso scallop Mizuhopecten yessoensis [29], 22,023 transcripts in the deep-sea hydrothermal vent mussel Bathymodiolus azoricus [12], 18,196 contigs in the striped venus Chamelea gallina [27], 9,747 transcripts in the Manila clam Ruditapes philippinarum [13] and 1,023 contigs in the Mediterranean mussel Mytilus galloprovincialis [11]. At the gene and transcript levels, the annotation results for P. viridis that we obtained were comparable to transcriptomes obtained from multiple organs and at various developmental stages of the Pacific oyster C. gigas (45,771 transcripts [17] and 28,027 protein-coding predicted genes [16]). The improvement in the annotation of assembled transcripts of P. viridis could be explained by the higher sequence coverage obtained because of better sequence length (average contig length: 1,264 bp; N50: 2,868 bp), and the substantial annotated genomic data available from the recently sequenced genome of C. gigas [16].
A total of 93,668 assembled transcripts and 66,692 assembled unigenes were predicted to have putative functions in protein domains based on InterProScan analysis. Gene Ontology (GO) functional terms were assigned according to NCBI nr annotations and the InterProScan results. From the global transcriptome, 21,262 assembled transcripts and 11,947 assembled unigenes were associated with at least one well-defined GO term. These were further classified into 31 functional groups (GO term of level 2) according to three main functional categories: biological process; cellular component; and molecular function (Additional file 3). The number of genes assigned to "metabolic process" (24.4%) was the highest under the biological process classification, while genes annotated with functions related to "cell" (64.8%) and "binding" (57.7%) were dominant for the cellular component and molecular function categories, respectively. The composition of various GO terms for genes from individual tissue types and sexes can be seen in Additional file 4. Variations in the GO term compositions across the three tissue types and between the sexes were not apparent. Kinoshita et al. [15] also reported that there was no difference in GO composition of the annotated genes from the pallium, mental edge and pearl sac of the pearl oyster Pinctada fucata. However, a strong tissue-specific expression pattern was found among the three different tissues of P. viridis that we examined, without any apparent sex-specific patterns when multivariate analyses were applied ( Figure 2A). A recent transcriptomic study on the Mediterranean mussel M. galloprovincialis also showed significant tissue-specific differences among four tissues with respect to transcript expression profiles in metabolic pathways [11]. We have successfully demonstrated that GO compositions did not vary among different tissues, but expression profiles exhibited distinctive tissue-specific patterns that were probably associated with the unique functions of the adductor muscle, gills, and the hepatopancreas.
For KEGG annotation, 5,131 assembled transcripts and 3,181 assembled unigenes were assigned to 329 KEGG pathways under the main groups: metabolism (1,126 genes); genetic information processing (928 genes); environmental information processing (597 genes); cellular processes (618 genes); organismal systems (746 genes); and human diseases pathway (874 genes) (Additional file 5). The functional annotations for GO terms and KEGG pathways offer a useful resource for future studies of specific pathways, cellular structures and protein functions of P. viridis. Subgroups of genes potentially involved in stress-associated responses and detoxification of xenobiotics are of particular interest to environmental researchers and ecotoxicologists.
Functional genes related to stress-associated response and xenobiotic degradation Tissue-specific expression patterns Based on GO term assignment and KEGG annotation, 187 assembled unigenes were annotated in "response to stress" under the level 3 classification for "biological process", while 96 assembled unigenes were associated with the pathways of "xenobiotics biodegradation and metabolism" under the KEGG metabolism pathway. It is  believed that genes from these two groups are highly relevant to the responses of P. viridis upon challenge with chemicals or other environmental factors. Multivariate analysis using principle coordinate analysis plot (PCO) revealed that expression patterns of the genes from these two categories exhibited distinct tissue-specific patterns ( Figure 2B and C). Additionally, the top 20 most expressed assembled unigenes of each tissue type from the two categories representing > 90% and > 60% of genes expressing from the "response to stress" and "xenobiotics biodegradation and metabolism" groups, respectively, are summarized (Tables 4 and 5). Compared with the top 20 genes under "response to stress" in each tissue, relatively higher expression levels were reported for genes in the hepatopancreas. In "xenobiotics biodegradation and metabolism", the top 20 genes in the gills and hepatopancreas exhibited higher expression levels than those in the adductor muscle ( Figure 3). The expression of some of the selected assembled unigenes involved in stress-associated responses and xenobiotic degradation were further validated using quantitative PCR (qPCR) assays (Additional file 6). Most of these genes also demonstrated tissue-specific expression patterns with generally higher expression levels noticed in the hepatopancreas and/or the gills compared with those in the adductor muscle.

Heat-shock protein (HSP) genes
Heat-shock proteins (HSPs) are a group of stressinducible chaperones that play crucial roles in modulating stress responses and offering cytoprotection [30]. HSPs in bivalves can be induced by various environmental stressors and chemical contaminants [31][32][33][34]. A number of HSPs were abundant among the top expressed genes in the "response to stress" category. These included HSP71, HSP90, HSP60, HSP24.1, HSP22, and other smaller HSPs.
Tissue-specific expression patterns of HSPs have been reported in bivalve species. At basal levels, the HSP70 gene in the oyster Crassostrea hongkongensis was more highly expressed in muscle than in other tissues such as gills, the digestive gland, mantle and heart. This gene was induced differentially and in a time-dependent manner in various tissues, with a faster and more potent response noticed in the digestive gland upon heat stress [32]. Expression of some of the identified HSP genes in individual tissues were also validated using qPCR analysis (Additional file 6). HSP60 was expressed to a greater extent in the gills and hepatopancreas than in the adductor muscle, and was generally up-regulated by chemical exposure (Table 4, Additional file 6). The small HSP encoded by HSP22 was present at high levels in the adductor muscle, and the protein encoded by HSP24.1 was highly expressed in the gills and hepatopancreas but generally down-regulated by chemical stressors (Table 4, Additional file 6). A previous study suggested that some of the small HSPs in the mussel would play an essential chaperone role in maintaining cytoskeletal structural elements under adverse conditions induced by a combination of stressors when other HSPs, such as HSP70 and HSP90, were suppressed [35]. Other genes encoding HSPs were also annotated in the present study (Additional file 7), including members from the HSP40, HSP60, HSP70, HSP90 classes; a total of 19 members of HSPs were annotated for P. viridis.

Cytochrome P450 (CYP) genes
With respect to detoxification, members involved in the metabolism of xenobiotics via the cytochrome P450 pathway were particularly abundant among the top expressed genes under the KEGG pathways of "xenobiotics biodegradation and metabolism", making up 40%, 60% and 50% of that category in the adductor muscle, gills and hepatopancreas of P. viridis respectively. The CYP genes encode one of the largest groups of enzymes that catalyze the oxidation of organic substances; they and play a central role in the Phase I detoxification system, and have a range of diverse functions in endogenous metabolism [36]. CYPs are classified into different clans according to homology between families; 11 clans were reported for animals [37], with at least four main clans (CYP2, CYP3, CYP4 clans and mitochondrial clan) being found in protostomes including mollusks [36,38]. In our study, 14 families (33 members) of CYP genes were annotated for P. viridis  Zhikong scallop Azumapecten farreri (formerly known as Chlamys farreri) via RNA sequencing analysis. With the completion of the C. gigas genome project, the number of CYP genes identified in oysters has further increased to 136 genes [16]. The majority of annotated CYP genes (11 members) in P. viridis were identified as members of clan 2 (Additional file 8).
The CYP genes of marine invertebrates, including mollusks, polychaetes and crustaceans, have been observed in a wide range of tissues and are generally enriched in organs associated with food processing [36,41]. CYPs show tissue specificity with respect to their functions; for instance, CYPs in hepatic tissues are generally involved in the detoxification of a wide range of xenobiotics [36]. A member from CYP family 3 in the xenobiotics degradation pathway was one of the top 20 expressed genes in the hepatopancreas of P. viridis. The mRNA expression levels of this CYP were highest in this tissue (Additional file 6). A member of the CYP12 family belonged to the mitochondrial clan and showed high expression levels in the gills, and was up-regulated by chemical stressors (Additional file 6). Although the function of CYP12 genes has not been extensively studied in mussels, previous reports suggest that these genes are capable of metabolizing xenobiotics such as pesticides in arthropods [42,43].

Glutathione S-transferase (GST) genes
Another predominant group contributing to the KEGG pathways of "xenobiotic biodegradation and metabolism" was the group of genes encoding GSTs. These genes comprised 25, 35 and 15% of the top expressed genes in the gills, adductor muscle, and hepatopancreas, respectively ( Table 5). The GST proteins are a group of multifunctional enzymes that catalyze the conjunction and detoxification of xenobiotics during phase II enzyme detoxification [44]. Three major families of GSTs (cytosolic, mitochondrial, and microsomal) and at least 15 different classes of GSTs (alpha, beta, delta, epsilon, kappa, lambda, mu, omega, phi, pi, sigma, tau, theta, zeta, and rho) have been identified in numerous organisms based on substrate specificity, and catalytic and immunological properties [44]. To date, only a limited number of GSTs have been characterized in mussels. GST pi and omega were identified in P. viridis [19], while alpha, pi and sigma classes have been identified in M. galloprovincialis [45,46], and pi class GST was found in M. edulis [47]. Characterization of GSTs has been conducted for other marine bivalves, such as oysters (C. gigas [48] and C. ariakensis [49]), and clams (R. philippinarum [50][51][52], Laternula elliptica [53,54], Mercenaria mercenaria [55], Solen grandis [56] and A. farreri [57]). From our analysis 19 GSTs were annotated for P. viridis, with most of them matching entries for C. gigas and M. galloprovincialis (Additional file 9). At least six classes of GSTs (alpha, pi, sigma, omega, theta and zeta), along with microsomal GSTs, and the C-and N-terminal domains were annotated for P. viridis (Additional file 9). The tissue-specific expression patterns of GST genes in bivalves have been previously shown [19,51]. A recent study on P. viridis showed that GST pi and omega classes showed higher transcriptional levels in the hepatopancreas than in the gonad, gills and mantle [19]. It was also reported that GST pi and GST sigma 1 in M. galloprovincialis were more highly expressed in the hepatopancreas than in the gills; GSTs alpha, sigma 2 and sigma 3 were found to be more abundant in hemocytes than in the hepatopancreas, gills, mantle, gonad and muscle [45,46]. We also noticed strong tissue-specific expression patterns for the GST genes of P. viridis (Table 5). These strong tissue-specific expression patterns were also demonstrated by gene expression analysis for some of the selected GST members from the three tissues we examined (Additional file 6). Studies involving microarrays have also shown that differential expression of various GST genes from the gills and digestive gland of R. philippinarum were detected in samples collected from polluted areas [13]. Our novel findings regarding various GST genes in P. viridis clearly show that tissue-and classspecificity should be considered in future application of GST genes as biomarkers in biomonitoring.

Conclusions
A de novo transcriptome of P viridis has been successfully generated; this transcriptome was based on the mRNA from three target tissues (adductor muscle, gills and hepatopancreas). Tissue-specific patterns were associated with transcriptome size, expression profiles of genes in GO terms, and subgroups of genes associated with stress responses and degradation of xenobiotics. Our findings provide an informative transcriptomic platform for further in-depth mechanistic environmental studies using the biomonitor P. viridis, and at the same time emphasizes the advantage of tissue-specific approaches in future investigations.

Sample collection and preparation
We obtained adult P viridis (4-5 cm shell length; Figure 4) for RNA sequencing from the field (field-type) and from mariculture farms. Animals from mariculture farms were exposed to various physical and chemical stressors (exposed-type) so as to obtain a wide spectrum of transcripts associated with environmental conditions. A similar approach was employed in the generation of other transcriptome databases [58]. According to the Animals (Control of Experiments) Ordinance, Chapter 340 (Department of Health, Hong Kong SAR), there is no requirement for permission to perform experiments involving mussels (invertebrate). The field-type samples were collected from different locations in Hong Kong waters. Locations encompassed the western (Butterfly Beach, 22°22'19.73"N and 113°57'38.31"E), southern (Po Toi, 22°9'49.61"N and 114°15'7.73"E) and eastern waters (Tung Lung Island, 22°15'24.16"N and 114°17'13.76"E; Sam Mun Tsai, 22°27'8.87"N and 114°12'28.90"E) as each area had different hydrographic condition. The field-type samples were dissected to obtain target tissues (gills, adductor muscle and hepatopancreas) within 24 h of collection. Sex was determined by examining the external characteristic of the gonad, that is a female gonad was in bright orange brick red colour while male gonad was in milky colour [9].
Two mussels, a male and a female, from each field site and from each exposure treatment (n = 44 mussels) were chosen for dissection to obtain the tissue samples of adductor muscle (ad), gills (g) and hepatopancreas (hp). These three types of tissue we examined were common targets that have been used in environmental studies [3].
Dissected tissue samples were immediately fixed in RNAlater® (Applied Biosystems, Warrington, UK), and then stored at −20°C until required.

RNA preparation and sequencing
Total RNA for each sample was extracted with the RNeasy® Mini Kit and digested with DNase I following the manufacturer's instructions (QIAGEN, Hilden, Germany). Integrity and size distribution of RNA samples were verified using an Agilent 2100 Bioanalyser (Agilent Technologies, Germany). Samples with an RNA Integrity Number ≥ 8.0 were used for cDNA library preparation. The concentration of RNA in each extracted sample was measured using a Qubit 1.0 Fluorometer (Invitrogen, Carlsbad, USA).
Six cDNA libraries were generated from the three target tissues (male-hp, female-hp, male-g, female-g, male-ad, and female-ad), with each library comprising pooled samples of field-type and exposed-type mussels. The cDNA libraries were assigned to six individual sequencing lanes, and one dedicated PhiX control lane for matrix and phasing calculations was applied during sequencing. The cDNA libraries were prepared using Illumina® TruSeq RNA Sample Preparation Kits v2 (Catalog # RS-122-2001), and the Illumina sequencing was performed at the Centre of Genomic Sciences (The University of Hong Kong). In brief, poly-A containing mRNA was collected using poly-T oligo-attached magnetic beads. The purified mRNA was broken down into short fragments and was applied as a template to synthesize the first-strand cDNA using a random hexamer-primer and reverse transcriptase (SuperScript® II Reverse Transcriptase; Invitrogen, Catalog # 18064014). For second-strand cDNA synthesis, the mRNA template was removed and a replacement strand was generated to form double-stranded (ds) cDNA. The ds cDNA underwent end repair, 3' adenylation and indexed adaptor ligation, and the adaptorligated libraries were enriched by 10 cycles of PCR. Qualified and quantified libraries were then sequenced using an Illumina Genome Analyzer IIx (GAIIx; Illumina, California, USA). Image analysis and base calling was performed with SCS2.8/RTA1.8. FASTQ file generation and removal of failed reads were performed using CASAVA version 1.8.2.

De novo assembly and annotation
The raw reads from the six samples were merged, and then each read was pre-processed by the trimming of adaptors and the removal of 10 bp from the 3' end. Lowquality reads were filtered out based on three criteria: reads with more than 5% unknown "N" bases; reads having more than 50% bases with a quality value less than 10; and reads < 35 bp. The resulting high-quality reads were then used for de novo assembly with Trinity version trinityrnaseq_r2012-10-25 [62] using default parameters.
Trinity used the greedy k-mer extension (k-mer 25) to assemble raw reads into unique sequences of transcripts, followed by clustering of the contigs to construct complete de Bruijn graphs, and finally processed the individual graphs to generate full-length transcripts. The assembled transcripts were further clustered using CD-HIT [63] to reduce redundancy. Briefly, CD-HIT employed a 'longest sequence first' algorithm by identifying a representative sequence and then clustered each subsequent sequence as a redundant or a new cluster sequence based on a similarity score set at ≥ 95% for our analysis. The longest transcript in each gene cluster defined by Trinity was selected as a "unigene" for downstream analysis.
The annotation of assembled transcripts/unigenes was performed based on sequence similarity searches using BLASTx against the NCBI nr database (08 February 2013), and BLASTn and tBLASTn searches against the EST databases from the molluscan genome projects of Aplysia californica (GenBank Accession Number AASC00000000, 255,605 EST entries), Crassostrea gigas (AFTI00000000, 206,647 EST entries) and Lottia gigantea (AMQO00000000, 252,091 EST entries), with an E-value threshold of 1 × 10 −5 and a high-scoring segment pairs cut-off length of 33. Functional classification was carried out based on GO using Blast2GO [64]. GO terms were assigned and annotated to the transcripts/ unigenes according to three main categories: cell component; molecular function; and biological process. The presence of a conserved domain in transcript/unigene was analysed using InterProScan [65] within Blast2GO. KEGG pathways were assigned to the transcripts using the online KEGG Automatic Annotation Sever (KAAS; http://www.genome.jp/kegg/kaas/) using the bi-directional best-hit method.

Transcriptome quantification for individual tissues and sexes
The high-quality reads from the six libraries (2 sexes × 3 tissues) were individually mapped to the global assembled de novo trancriptome to estimate the abundance of transcripts/unigenes using RNA-Seq by Expectation Maximization (RSEM) version 1.2.3 [66] with a pipeline script provided by Trinity. High-quality reads were mapped to the transcripts and the resulting alignment file was inputted into RSEM to calculate the expression value for each transcript. Expression level was presented in a normalized expressed value as FPKM, which corrects for read differences in libraries and gene length. A transcript was only considered expressed when the FPKM cut-off value was ≥ 0.05.

Real-time qPCR assays
Tissue-specific expression patterns for 15 selected genes (Additional file 10) associated with stress responses/ detoxification were revealed using qPCR analysis. Mussel samples were chosen from selected treatments (i.e., control, 730 μg/L CdCl 2 , 13 mg/L nZnO, 40 μg/L DDT, and 0.32 μg/L TPTCl) of the exposed-type that used in the library generation. Primer pairs for qPCR assays were designed using Primer3 version 0.4.0 (http://bioinfo.ut.ee/ primer3-0.4.0/). For amplification, 1 mM reverse and forward primers and 2 μL of cDNA were used per 20 μL reaction on a CFX96™ Real-Time System (Bio-Rad, Hercules, USA) with iQ™ SYBR® green supermix (Bio-Rad, Philadelphia, USA), following the manufacturer's recommended protocol and default settings . Thermal cycling conditions involved 3 min of pre-heating at 95°C, followed by 40 cycles of amplification (10 s at 95°C, 30 s at 60°C); a melt curve analysis was performed after the 40th cycle (10 s at 95°C, then 65-95°C in 0.5°C increments every 5 s). Relative expression levels of target genes were calculated using the 2 -ΔΔC T method [67], with the 18S rRNA gene used as a reference and pooled samples from all tissues as calibrator samples. The 18S gene was shown to be a suitable reference gene in mussels and its expression was stable among different chemical treatments [18,68]. Expression of the 18S gene was not significantly different among the various control and treatment groups (ANOVA, p > 0.05), and between sexes (t-test, p > 0.05).