qRT-PCR evaluation of the transcriptional response of zebra mussel to heavy metals

Background The transcriptional response of adult zebra mussels (Dreissena polymorpha) to heavy metals (mercury, copper, and cadmium) was analyzed by quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) to study the coordinated regulation of different metal-, oxidative stress- and xenobiotic defence-related genes in gills and digestive gland. Regulatory network analyses allowed the comparison of this response between different species and taxa. Results Chemometric analyses allowed identifying the effects of these metals clearly separating control and treated samples of both tissues. Interactions between the different genes, either in the same or between both tissues, were analysed to identify correlations and to propose stress-related genes’ regulatory networks. These networks were finally compared with existing data from human, mouse, zebrafish, Drosophila and the roundworm to evaluate their mechanistically-known response to metals (and to stressors in general) with the correlations observed in the still poorly-known, invasive zebra mussel. Conclusions Our analyses found a general conservation of regulation genes and of their interactions among the different considered species, and may serve as a guide to extrapolate regulatory data from model species to lesser-known environmentally (or medically) relevant species. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1567-4) contains supplementary material, which is available to authorized users.


Background
The survival of organisms to the ever-changing environmental conditions depends on their capacity to cope with the multiple stressors they are exposed to. The coordinated activation of different stress mechanisms is a fundamental element of the overall response to pollutants and to other potentially deleterious external inputs [1]. On the very roots of these coordinated responses lies an intricate network of regulatory elements at genetic level, adapting the cell metabolism first to survive to the sudden external changes and afterwards to acclimate to the new, and often unfavourable, conditions. DNA microarrays (and ultimately, high-throughput sequencing) are the standard instrumental technique to monitor changes in gene expression of essentially all genes [2]. There has been an increasing interest in the literature on chemometric data pre-treatment and data analysis methods dealing with microarray data [3,4]. However, other instrumental techniques can also monitor gene expression variations in multiple samples. One of these techniques is the quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) that allows detecting and quantifying target DNA molecules [5]. The main advantage of this method is that it allows quantitation of changes in mRNA levels (usually related to gene expression variations) in a very wide range of values (>10 7 -folds), resulting in assays with very high sensibility, selectivity, and reproducibility [5,6]. In addition, high-throughput systems allow analysing hundreds of transcripts for many samples simultaneously, which allow obtaining a large quantity of data in a single experiment. Different studies using qRT-PCR have appeared in the recent years in the literature, studying the response of different organisms at the gene expression level in so diverse research fields such as drug discovery, cancer research, environmental assays [7][8][9][10][11]. However, the analysis of qRT-PCR data by means of chemometric methods has not yet received the same attention as the analysis of DNA microarrays data, and only a small number of studies about this topic can be found in the literature [12][13][14].
In this work, variations in the gene expression of the zebra mussel (Dreissena polymorpha) associated with environmental stresses, such as the presence of pollutants, are investigated by means of chemometric analysis of qRT-PCR data. This freshwater mussel species has been selected due to its invasive character, which brought it to expand from its natural geographic distribution in the Caspian and Black seas to a real worldwide distribution in the last few decades [15]. In some places, this expansion has led to large infestations with significant economic and environmental consequences [15]. One of these colonisations has occurred in the Ebro river basin (North East Spain) where it has become a danger to native species [16]. Zebra mussel is the only freshwater bivalve that can be legally collected for environmental monitoring. This circumstance, together with the known ability of the zebra mussel to bioaccumulate contaminants, has increased the interest of this species as a sentinel notably for biomonitoring purposes and quality control of water ecosystems [17][18][19][20][21][22][23].
As a training dataset, we used previously reported qRT-PCR data from gills and digestive glands of adult zebra mussels exposed to different heavy metals concentrations (copper, cadmium and mercury) [22]. Several multivariate data analysis approaches have been tested with the final goal of monitoring and distinguishing between effects caused by heavy metals and exposure time, and with the goal of identifying the genes most affected by the investigated pollutants. Finally, biological interpretation has been obtained from a comparison with genetic and regulatory networks in different model species. Figure 1 shows the mean-centred data matrix composed of 40 samples and 18 genes. The visual representation of this data matrix did not show any feature easy to be interpreted. For instance, the heat map representation of the data (Figure 1a) did not allow gathering any relevant information about possible relationships between genes and samples directly. Therefore, different multivariate data analysis methods were tested to investigate relationships between genes and samples.

Graphical investigation of gene correlations
Relationships between genes from the same and different tissues were investigated. Correlation matrix plot between the 18 considered genes is shown in Figure 2a. From this representation, a preliminary interpretation can be obtained. First, it is worth to focus the attention on correlations between genes from the same tissue. Close to the diagonal of the correlation matrix (samples 1 to 9 for gills, and samples 10 to 18 for digestive gland), genes had positive correlation values whereas genes Figure 1 Representation of qRT-PCR experimental data. a) Heat map representation of the mean-centered data (40 samples and 18 genes according to Additional file 3 Figure S2). Plot of the experimental data b) before, and c) after mean-centering. placed farther from the diagonal (corresponding to the other tissue) showed either no correlation or inverse (negative) correlation. Next, correlation values between SOD, CAT and COI genes of digestive glands showed high positive correlations, as well as for gill tissues, but of lower intensity.
Correlation diagrams showed relationships already commented above, but a deeper analysis of the data was attempted to extract more information. A representation of the correlation matrix as a gene network is shown in Figure 2b. This plot shows relationships between genes in a clearer and quicker way. Interpretation of this network diagram demonstrated that there were no relevant relations between gene expressions of the two considered tissues (correlations between genes from different tissues were weak). In contrast, relationships between genes from the same tissue were strong. So, in digestive glands a cluster of genes with strong correlations included SOD, COI, CAT and, also, HSP90. Other genes such as GPx, GST or MT showed weaker correlations. For gill tissue genes, correlation between SOD and CAT was also high, although, in this case, correlation with GST was greater than that for COI. It is also worth to mention the behaviour of the P-gp1 gene. In digestive gland tissue, this gene showed a weak correlation with the SOD-COI-CAT cluster, wherein of gills, this correlation was inappreciable. Conversely, there was a strong correlation between the expressions of these genes in both tissues, which was the only case of such an intertissue correlation observed for this dataset. Finally, the behaviour of the gill GPx gene did not show any correlation with any other of the considered genes from either tissue.
Similar results were obtained when experimental data were analysed by means of unsupervised hierarchical clustering. In this case, no previous information was provided to the algorithm and genes were clustered iteratively in an agglomerative manner using Ward's [24] and Euclidian distance methods.
In dendrogram of Figure 3, genes GPx and HSP70 from gills had a totally different behaviour since they did not show any similarity with other genes. Apart from them, two main groups of genes were distinguished which could be assigned to either of the two investigated tissues. In the upper part of the dendrogram, genes were related to digestive gland tissue whereas those in its lower part were related to gills tissue. It is worth to highlight that SOD and GST genes from gills were located in the branch of the dendrogram associated with the digestive gland, which is probably due to the similarity between the gene expression variations caused by heavy metals in digestive glands (specially GST and, in a minor extent, SOD, COI and GPx), and that of SOD and GST in gills. It is also important to point out that genes of digestive glands that appeared in the branch of the dendrogram mostly associated to gills are HSP70 (which was the gene with the most different behaviour) and P-gp1 (which formed a cluster with the same gene from gills).
As in the previous case, Figure 3b shows the gene network diagram built from the reciprocal of the distances obtained between genes in the hierarchical clustering approach described above. This representation showed some advantages with respect to previous dendrogram approach. For instance, relationships between genes can be seen in a more intuitive and visual way allowing an easier association of the genes from different clusters.
In this figure, clusters built up by the genes were most strongly correlated in both gills and digestive glands. SOD, COI, GST and CAT formed a group because of their correlation in both tissues. Relations between genes in both tissues can be observed at the edges that connect nodes of gills and digestive glands. For instance, the relationship between SOD genes in gills and GPx genes in digestive glands were stronger than other relations observed for genes of the same tissues. Finally, HSP70 and GPx in gills and HSP70 in digestive glands were confirmed not to have any correlation with the other investigated genes.
When interpretation of these results using nonsupervised data analysis methods is complemented with the information available in the literature, it is observed that genes with stronger correlations in both gills and digestive glands (SOD, CAT, CAI, GST, and GPx) are known to be related to the oxidative metabolism which is significantly affected by heavy metals exposure.

Principal component analysis results
Principal component analysis (PCA) is used to extract relevant information about samples clustering and effects of metal exposure treatments. The information given by PCA analysis will be compared with that obtained through graphical means in the previous section. First principal component, PC1, already explained 45.1% of the observed variance of the experimental data. The rest of principal components explained a lower amount of experimental variance: PC2 -19.5%, PC3 -10.5% and PC4 -7.7%. From PC5, the amount of explained variance was lower than 5%. Figure 4 shows the scatter scores plots that related the first two principal components with the samples labelled according to exposure time, treatment, and the combination of exposure time and treatment. Additional file 1 Figure S1 in shows the scatter plots considering the first three components. In these plots, control samples were close to the origin due to mean-subtraction pre-treatment of control samples prior to data analysis.
From the analysis of these plots, it can be observed the influence of the exposure time on PC1 (Figures 4a). Samples with one-day exposure time showed high negative values while samples with one-week exposure time appeared closer to the origin of coordinates. In Figures 4b Figure 3 Clustering analysis. a) Dendrogram showing the relations between genes, and b) Gene network corresponding based on the matrix of distances used to build the dendrogram (width of the lines is inversely proportional to the distance).
separation of samples according to heavy metal treatment is presented. In PC2 vs. PC1 plots, Cu -Hg -Cd could be differentiated along PC2 (samples related with each applied metal could be seen). If PC3 was considered, a cluster of Cd treated samples was separated from the rest of samples.
Considering both treatment effects simultaneously (metal and exposure time), the exposure time separated each metal cluster into two sub-clusters within each type of metal group. For instance, in the PC2 vs. PC1 scatter plot ( Figure 4c) the three considered metals could be clearly distinguished when considering one-day samples whereas this differentiation was not so obvious when considering one-week samples.
In Figure 5, loading plots identified what genes were the most related with each principal component. Differentiation between gills and digestive gland genes was mainly displayed by the second principal component. Genes associated with gills at PC2 had significantly lower values than genes linked to digestive glands (Figure 5a-2 nd plot, and Figures 5b and c). Therefore, two types of gene clusters were differentiated based on the tissue from which they were obtained.
In Figure 5a, PC1 was mainly influenced by the HSP70 gene from both tissues. The main effect observed on PC2 was showing the separation of genes by tissue as discussed above. Genes that exhibited a higher contribution on PC2 were HSP90 and CAT in case of digestive glands, and GPx and HSP70 in the case of gills. SOD, CAT and COI (and in a lesser extent GST and GPx) showed a significant contribution only for genes of digestive glands in PC2 ( Figure 5a). However, all genes behaved similarly in both tissues. This could be checked in the scatter loadings plots where genes were close to each other for each tissue and, in addition, they were in a closer region when both tissues were considered.
Summarizing PCA results, PC1 could be related with the exposure time of samples. Among genes that mostly contributed to PC1, the HSP70 gene could be identified as the one showing higher positive score values (in both tissues). This indicates that this gene allowed differentiating samples according to accumulative toxic effects across time. On the same manner, the diagonal trend in PC2 vs. PC1 scores plot enabled the differentiation among samples as a function of heavy metal treatment.
From gene loadings, HSP90 and CAT of the digestive glands were correlated with copper treated samples while HSP70 of gills was mainly correlated to cadmium treated samples. The determination of the genes most correlated with the Hg treated samples was not straightforward due to their closeness to the origin.
From PC2 loadings plots differentiation of genes according to tissue type was possible. Samples behaved differently and only in the case of cadmium treated samples, a cluster was identified. Cd treated samples with one-day of exposure time showed negative values of PC2 which could be related to gills genes expression, while Cd treated samples with one-week of exposure time showed positive PC2 values which could be linked with digestive gland genes.

ANOVA simultaneous component analysis results
For ANOVA simultaneous component analysis (ASCA), the data matrix was rearranged as can be seen in Figure 6a. Note that in ASCA, ANOVA is applied to multivariate gene responses. Three experimental factors were considered in the ASCA analysis: tissues (gills or digestive gland), exposure time (one or seven days) and type of treatment (control, cadmium, mercury or copper).
Statistical significance of these factors was estimated by using a permutation test approach. In this work, the number of permutations was set to 100000. Only individual effects of exposure time X e (p time = 0.00505) and treatment X t (p treatment = 0.00003) were significant, and allow rejecting the null hypothesis H 0 . In all the other cases, the null hypothesis (H 1 ) was accepted (there was no significant effect of the considered factor or interaction). The triple interaction treatment-tissue-time and the double interactions tissue-time, tissue-treatment, and time-treatment provided p-values rather close to 1. Finally, individual effects of tissue (X T ) was not considered statistically significant (p tissue = 0.4540). Figure 6 shows the representation of scores and loadings matrices related to the individual effect of exposure time and metal treatment. For exposure time individual factor matrix, only one principal component was needed to explain most of the variance. Figure 6b shows the projected scores where one-day and one-week samples can be distinguished. Loadings plot (Figure 6c) displays the high influence of the HSP90, MT and P-gp1 genes in the two first principal components.
For metal treatment, three principal components were needed. Effects in projected scores ( Figure 6d) and loadings (Figure 6e) can be interpreted in a similar way than that for the PCA analysis. Treated samples could be clearly distinguished from the control samples on the first principal component whereas the second component allowed grouping the different metal treatments with some overlapping. In the case of the loadings plot, effects of gene HSP70 and MT were distinguished. These results also were concordant with those obtained in the PCA analysis. (Figure 6d).

Partial least squares discriminant analysis results
Partial least squares discriminant analysis models were used to identify the more discriminant variables among different type of samples considering exposure time and metal treatment as possible factors.
In the case of exposure time, two PLS-DA models were built up: one for the discrimination of one-day samples and the other for the discrimination of one-week samples. In both cases, discrimination between samples classes achieved by the PLS-DA model was good as can be seen in the obtained sensitivity, specificity, and accuracy parameters (see Table 1). When VIP scores of the each PLS-DA model were considered (Figure 7a), the most relevant genes, for discriminating between samples, were obtained. In the case of one-day exposure time, the HSP70 gene (both in gills and digestive gland tissues) was the more discriminating variable. On a minor extent, MT gene of gills also allowed discriminating one-day samples. In the case of one-week exposure time, CAT (digestive gland) gene was the most relevant together with MT, HSP70, and HSP90, at a lower extent.
In the case of samples treated with heavy metals, three PLS-DA models were built up corresponding for each type of treatments (Cu-treated, Cd-treated and Hgtreated samples). Figures of merit of PLS-DA models shown in Table 1 indicated discrimination was acceptable in the three cases with similar results. VIP scores for the three models represented in Figure 7b were rather similar. HSP70 gene (both for gills and digestive glands tissues) was the more discriminant variable in the three cases, probably due to the strong effect of the exposure time discussed above. Apart from this strong influence on HSP70 genes, Cu treated samples were also influenced by MT (gills and digestive glands) and HSP90 (digestive glands) genes. In Cd treated samples, MT gene was the discriminating variable (in both tissues) and, also, in a minor extent, CAT and HSP90 genes. Finally, in Hg treated samples MT (gills) and CAT (digestive glands) genes were the more relevant discriminant variables.

Phylogenetic analysis of co-regulated genes
Genetic and regulatory interactions between stress genes in D. polymorpha (Figures 2 and 3) were also explored in different model species using the respective putative orthologs ( Table 2). While some uncertainties are unavoidable when adscribing orthologs for genes from D. polymorpha in other species, some of the co-regulatory   interactions shown in Figures 2 and 3 were also seen in phylogenetic distant species (Figure 8).
For example, GST-SOD and GST-CAT co-expression seem to be quasi-universal, at least within Metazoans, as well as the interactions between HSP90 and HSP70 ( Figure 9). Conversely, co-expression between MT and HSP90 was only found in two species (mouse and C. elegans), whereas COI and P-gp1 genetic interaction with other stress genes was rarely (if ever) observed in the other species (Figure 9). Given the wide evolutionary gap between D. polymorpha and vertebrates or D. melanogaster and C. elegans, the existence of common co-expression patterns indicates that the correlation analyses were able to define deeply rooted regulatory networks among Metazoans. GST, SOD, and CAT are part of the cellular defence mechanism against oxidative stress, although they act at  very different levels [25,26]. Similarly, HSP90 and HSP70 share the heat-shock responsive element (also found in some metallothionein genes, [27,25]), so its co-expression may well be mediated by this particular regulatory network. The mechanisms for the co-expression of COI (an essential mitochondrial component required for cellular respiration) and the components of the oxidative stress cluster seems to be a unique of Dreissena, and may indicate a subjacent defence mechanism not characterized yet. The same apply to the correlation between Pgp-1 and SOD, only observed in gills (Figure 9). The possible meaning of these observations will be only understood as our knowledge of the defence mechanisms in molluscs increases. Our results suggest some mechanism(s) linking the presence of stress agents (in this case, heavy metals) to at least four levels of cellular defence: 1) Out flux of the exogenous agent (Pgp-1); 2) Chelation and neutralization of divalent metals (MT); 3) Heat-shock response (HSP70 and HSP90), probably related to the presence of denatured proteins; and 4) Oxidative stress defence (GST, SOD, CAT). The ASCA analysis suggests a temporal gradation of these mechanisms, being HSP90 and Pgp-1 expression more related to the early response (one-day, Figure 6b-6c), and MT and HSP70 associated to the chronic exposure (Figure 6b-6c). At this point, it should be considered that longer exposure times imply two independent and somewhat contradictory mechanisms. In the first place, tissue damage may accumulate over time, increasing the toxic effects. At the same time, acclimation processes occur, by which cells (and tissues) compensate the presence of the toxicant and reduce its effective toxicity. These two opposite effects may well be the reason for the negative, quasilinear correlation of PC1 and PC2 scores in Figure 5b. The fact that the correlation analyses were able to identify these different defence modules and following a coexpression pattern similar to, or at least compatible with, those already known for reference model species demonstrate the utility of these statistical methods to explore regulatory networks in species, like D. polymorpha, for which very little genetic information is available.

Agent-and tissue-specificity of the stress response
While a direct evaluation of the severity of the toxic effects is not possible with the current data, clustering of the different samples shows that mercury-treated samples were closer to controls than Cd-or Cu-treated ones, suggesting that these two heavy metals were more toxic to D. polymorpha than mercury. This conclusion was also drawn from the preliminary analysis of this data [22] as well as from a transcriptomic analysis of D. polymorpha populations along a pollution gradient in the Ebro River (Spain, [23]). The pattern of response seemed to differ for the two analyzed tissues as shown in Figures 2 and 3. However, expression of two genes (Pgp-1 and MT) appeared to be highly coordinated in both tissues (Figures 2 and 3, see also [22]). It is important to note that these two genes directly interact with the toxic agent (extruding it out of the cell in the first case, and chelating it in the second case), whereas the other mechanisms are compensating potentially deleterious alterations in the cell components (oxidation, denatured proteins). Therefore, it is not unrealistic to think that Pgp-1 and MT expression reflected the effective concentration of the metals in both tissues (bound to be relatively similar), whereas expression of the other genes would depend upon the extent of these internal damages, which very likely differ for both cell types.

Conclusions
In this work, the application of different chemometric methods allowed the extraction of relevant information from qRT-PCR data. Results from different methods appeared to be complementary focusing on various data features. Information provided by gene network diagrams can make rather easy the interpretation of the possible correlations between investigated genes. Chemometric results showed that genes were clustered according to the type of tissue, and separation of samples Figure 9 Comparison of the co-expression gene data between species. a) bivariate correlations (see Figure 2) between zebra mussel data and the different model species shown in Figure 8. b) clustering analysis (see Figure 3). Only gene pairs with 2 the highest correlations and/or lowest distance values are considered. Distance values are represented as reciprocals in b). Green cells with plus signals indicate correlations also observed for the corresponding model species; pink cells with minus signals indicate otherwise. N/A in a yellow background indicates nonapplicable interspecies correlations.
was achieved according to their time evolution (one-day versus one-week treatment) and heavy metal treatment. It is remarkable the conservation of at least some of the regulatory networks within Metazoans, and the ability of the presented method to define these genetic interactions using only a limited number of experiments and conditions in species, such as D. polymorpha, for which very little genetic information is available.

Data studied
A short introduction about qRT-PCR is presented below. qRT-PCR measures the fluorescence of the PCR reaction products during a cycle threshold (C p ). Above this threshold value, the fluorescence of the samples is considered to be above the background contribution [5]. This C p value is related to the initial concentration of RNA that allows its absolute quantification, according to Equation 1: However, relative quantification is usually performed by calculating the difference between Cp values of the considered gene and of the housekeeping (control) DNA sequence [28,29]: In this work, qRT-PCR measurements allowed building up a data matrix of 120 rows (samples) and 24 columns (variables). These 120 rows included 3 technical replicates of 20 different mussel samples (5 control samples, 5 treated samples with Cd, 5 treated samples with Cu and 5 treated samples with Hg) measured at two different treatment times (1 day and 7 days after metal exposure), respectively. For each sample, 24 measurements were obtained corresponding to the expression responses of 12 selected genes (S3, EF1, BAct, MT, HSP70, HSP90, GST, SOD, GPx, CAT, COI, and P-gp1, details in Additional file 2 Table S1) from gills and digestive glands of the same individuals. More details about the experimental procedure related to data acquisition can be found at Navarro et al. [22].

Data preparation and pre-treatment
Experimental qRT-PCR data presented some initial problems that hampered their direct exploration. First, since approximately 8% of data values were missing, the average of the three technical replicates was calculated to obtain a value for each combination sample-gene. This strategy gave a total number of measurements reduced to 40. Next, for relative quantization estimations, one reference gene should be selected among those that explain minimum variance for both tissue types (gills and digestive gland). In both cases, the best housekeeping gene was S3. Other genes that showed minor variations across samples were EF1 or BAct, but they were not selected as a reference gene and therefore discarded for further analysis. Final size of experimental data matrix was 40 rows (20 samples after one day and 20 samples after one week of exposure) and 18 columns (9 genes for gills and 9 genes for digestive gland). A schematic representation of the dataset built up is shown in Additional file 1: Figure S1.
For ASCA analysis, this data matrix was rearranged with the goal of obtaining more information from the study. Gene measurements from different tissues were considered as different samples generating a final matrix of 80 rows (40 samples for gills and 40 samples for digestive glands) and 9 columns (corresponding to the 9 different genes) Finally, data were mean centred prior to chemometric analysis. Figure 1 shows experimental data after S3 reference gene subtraction before ( Figure 1b) and after mean centring pre-treatment ( Figure 1c). Moreover, mean responses of control samples for each tissue were subtracted for PCA and PLS-DA analysis.

Data mining and phylogenetic comparative analyses
Putative orthologues for the nine stress genes analysed in D. polymorpha (MT, HSP70, HSP90, GST, SOD, GPx, CAT, COI, and P-gp1) in five reference model species: human, Homo sapiens; mouse, Mus musculus; zebrafish, Danio rerio; the fruit fly Drosophila melanogaster; and the nematode Caenorhabditis elegans, were identified by the BLAST algorithm at NCBI server, (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) using the complete sequences of the corresponding D. polymorpha genes (Listed in Table 2). Genetic and regulatory interactions between the different genes in each model species were explored using the GeneMANIA web page (http://genemania.org) [30].

Data analysis methods
Gene inspection, clustering and networking Initially, the gene correlation data matrix was investigated. Two visualization tools were used in order to extract the main relevant information from this correlation matrix in a natural and intuitive manner. On one side, the heat map graphical representation of the correlation matrix was considered. An ellipse-based color codification was used to indicate if correlation values are positive (blue) or negative (red). Moreover, the length of the minor axis of the ellipse indicates the strength of the correlation (the shorter the minor axis is, the stronger correlation exists) [31]. On the other side, the qgraph tool of the R environment [32] was used to visualize gene interactions in a network. This tool allows for data representation in a simple network where each variable (gene) is a node, and each edge shows the correlation between two genes. The thickness of the line on these edges is related to the size of this correlation.
Secondly, non-supervised hierarchical clustering analysis (HCA) was used to display correlations between genes. Cluster analysis is used to classify objects, characterized by the values of a set of variables, into clusters or groups [33]; in such a way that one object within a cluster is more closely related to one object of the same cluster than to another object assigned to a different cluster. In order to build up these clusters or groups, a measurement of the similarity or distance between the various objects is needed. Examples of possible distance measures are the Euclidean, City block or Mahalanobis distances. Additionally, there are several agglomerative linkage cluster methods such as Nearest Neighbor, Furthest Neighbor, Centroid, Median or Ward's Method [24]. In this work, the Euclidean Distance and the agglomerative Ward's method have been selected.
Two different display outputs can be obtained using this clustering process, the dendrogram, which is a treelike diagram illustrating HCA clusters and the matrix of gene distances values [33]. Distances matrix can be used as an initial basis for visual representation of the gene network, as the correlation matrix. Similarly, gene network also displays the interactions and correlations between different genes.

Principal component analysis
Principal component analysis (PCA) is based on the fulfilment of a bilinear model that decomposes experimental data in the product of two factor matrices related respectively to sample (rows) and variable (columns) contributions, using a minimum number of components to explain most of the data variance [34]: In this Equation PCA, X is the experimental data matrix of size m samples (rows) and n genes (variables, columns), T is the factor matrix related to sample contributions (usually known as scores) of size m number of samples and Ns number of principal components selected in the analysis, P T corresponds to the matrix related to the gene contributions (to the variables, usually known as loadings) of size Ns number of principal components and n number of genes. Finally, matrix E (of size m samples and n genes) contains the variance not explained by the bilinear model for the considered number of principal components, Ns. Every one of these Ns components is characterized by two vector profiles related respectively to the individual samples responses and with its gene expression. Hence, from these two profiles associated with each principal component, biological interpretation of each one of these components can be inferred [35]. In summary, PCA has been used to identify relationships between treated samples according to their gene expression and, also, to investigate possible relations between genes.

ANOVA simultaneous component analysis
There are different data analysis approaches that combine the power of ANOVA with that of PCA to identify and separate variance sources [36,24]. In this work, the selected approach has been the ANOVA simultaneous component analysis (ASCA) method [37].
In ASCA, SCA (similar to previously described PCA) is applied separately to each effect matrix and to all possible interaction matrices. First, the data matrix X is split into effect matrices containing the level averages for each factor and interaction matrices that describes the interaction between the considered factors [38]. In the case of the three factors considered in this work (tissue, exposure time and metal treatment), this is written as: In this equation, X is the experimental data matrix of i rows and j variables, X is the grand mean data matrix, X T is the effect of tissue factor (gills or digestive glands), X e is the effect of exposure time factor (one-day or oneweek), X t is the effect of metal treatment factor (control samples, copper, cadmium or mercury treated samples), X Te is the interaction of tissue and exposure time factors, X Tt is the interaction of tissue and metal treatment factors, X et is the interaction of exposure time, and metal treatment factors and X Tet is the global interaction of tissue, exposure time and metal treatment factors. In addition, 1 is a vector of ones of i rows and m is a vector of the overall means of the experimental matrix (j rows). For each submodel of factors or interactions, there are the associated component scores (T T , T e , T t , T Te , T Tt , T et and T Tet ) and component loadings (P T T , P T e , P T t , P T Te , P T Tt , P T et and P T Tet ). Finally, E corresponds to the residuals of all submodels of the global ASCA model: E = E T + E e + E t + E Te + E Tt + E et + E Tet .
Since different PCA models are fitted to each effect matrix that contains the averages of the measurements with the same factor settings, they do not represent the natural variation of the data [37]. This fact causes that the ASCA scores do not show the variation between replicates for each combination of factor levels. Hence, the estimation of the replicates variation in the PCA subspace of a factor is given by Equation 5: The projection for each factor Y k describes the variation among replicates in the principal component subspace of the considered factor k. Effect matrix (X k ), residual matrix (E) and loadings (P k ) are used to obtain the projection matrix.
The assessment of the statistically significance of the effects of all factors and of their interactions is checked under the null hypothesis H 0 of no experimental effect (no difference between the level averages of the effect matrices) against the alterative hypothesis of the presence of an experimental effect with a p confidence level. The estimation of this p-value is obtained by a permutation test, in which the original data matrix is permuted a number of times and the sum of the squares (SSQ) of the k effect matrix is recalculated (i.e. 100000 permutations) [39]: where the first summation (i) correspond to the total number of samples (N) and the second summation (m) to the considered principal component (maximum 2). The probability p-value is estimated from the number of permutations that give an SSQ value that is larger than the SSQ obtained for the experimental data. expression analysis. JJ and RT performed statistical data analysis. BP performed phylogenetic comparative analyses. JJ, RT and BP drafted the manuscript. All authors read and approved the final manuscript.