In the present study we profiled transcriptomic libraries of two confamiliar fish species from different temperature realms for signatures of thermal adaptation at the sequence level. The phylogenetic analyses (Figure1A, B) indicate a close relationship between P. brachycephalum and Z. viviparus, which is supported by transcriptomic sequence comparisons finding a pool of 4,155 comparable sequence segments with a mean identity of 96%. As only minor sequence changes are sufficient to adapt kinetic properties of orthologous proteins, this tight relationship is essential for identifying a potential thermal signature at the genomic and transcriptomic level in species from thermally distinct habitats.
Habitat temperature may be an important trigger for the observed patterns. As Z. viviparus faces similar cold temperatures in winter like P. brachycephalum, a signature for thermal shifts may appear. As optimal growth performance constitutes an important marker for maximal protein synthesis, which was determined for the upper third of the temperature window[22, 25], we postulate that the transcriptome is optimized to this temperature range and not to the cold edge, where metabolic activity and performance of the eurythermal species is comparably low.
It is important to note the assumptions we made to argue that the observed changes are adaptive. Garland and Adolph[32] generally argued against comparisons of only two species for making appropriate conclusions about adaptation, since the probability for finding a difference for any trait between two species is always 50% and no valid null hypothesis can be made. According to their critical statement[32], the limitation could be overcome by analysing several traits (about 5 independent) as this will reduce the type I error rate to the excepted rate (p < 0.05). As we used several thousand mRNA/protein sequences for the analyses from the beginning, we can assume that there are still enough independent traits left to reach a p < 0.05. This is essentially true for the analyses of the synonymous codon exchanges, which have no impact on function and can be stated as “neutral” at best. For the non-synonymous codon exchanges we can postulate the same, as the exchange of one specific amino acid in one protein will not cause a definite amino acid in another protein, even if both proteins belong to the same trait. If only a genetic drift has taken place after separation of the two species from the common ancestor, no change in codon or amino acid usage would be expected when analyzing thousands of traits. Since we found a clear signal, we have to conclude that this change is adaptive. The question remaining to be answered is whether the driving force is temperature (alone) or a combination with further factors (see below).
Codon and amino acid usage were found highly correlated with expression levels at least in fast growing bacteria and yeast[33]. Specific needs of an organism may shape a specific codon-usage-profile with a concomitant optimization in the translation apparatus (tRNA amount, stability of secondary structures of RNA, amino acid levels, etc.). Here, we compared qualitatively the profiles of two (related) species. The normalized cDNA libraries were synthesized with the same protocol by the same laboratory but sequenced with different procedures. Nevertheless, both libraries proved to be comparable on the level of transcript diversity (Figure2) as well as qualitatively on the level of attributable functions (c.f. Figure3). Comparable subsets of transcriptomic sequences comprising reliable sequence information for alignments on amino acid and RNA level were generated based on the BLAST results, covering 40 to 60% of functions seen in a related reference genome. In summary, shifts in codon usage between the species seem not biased by expression level per se but possibly by evolutionary forces requiring specific expression levels in a certain (thermal) environment. This aspect is not further addressed within this study.
Functional coverage compared to a reference genome
It is noteworthy that translated coding sequences of G. aculeatus are part of the eggNOG orthologies. Therefore, we expect that a comparison in a subset of these orthologies, i.e., metazoan (meNOGs), is useful to serve as a scale for inter-comparisons of the present cDNA libraries. Nevertheless, comparisons with G. aculeatus genomic coding sequences might show an overestimation of coverage due to the unresolved multiplicities by splice variants present in the assemblies of the transcriptomes. This fact might explain the observed higher coverage of functional terms in the KOG/COG categories N (cell motility) and W (extracellular structures) of the zoarcid fish compared to the reference genome.
Screening for occurrence of library-specific terms in principle allows for detection of signatures of the environmental conditions under which the transcriptomes were obtained, and not necessarily indicates presence or absence of genes in one of the species genomes. As a result transcripts related to COI were detected in elevated proportion in Z. viviparus, which is against the expectation that the normalization step in library preparation should reduce over-abundant RNA-variants in both libraries to a comparable degree. The same holds for the Antarctic species, for which the frequency of ubiquitin annotation appears in a higher rank. Discovering library specific functional terms, which would be expected to be included in similar ranks in both transcriptomes, reflects differences in the efficiency of normalization.
Other terms, specific for the cDNA library of Z. viviparus are related to complement protein (meNOG13752) and extracellular matrix constituent protein (meNOG18150). These terms summarize functions for immune response and integrity of the tissue, comprising cell communication, growth factors and wound healing. The term for tripartite motif (TRIM) proteins, which is predominantly found in the P. brachycephalum library, summarizes sequences of a protein family associated to pathogen-recognition, regulation of the concomitant transcriptional responses, constituting an important part of the immune system[34]. These differences may indicate adaptive forces acting on the immune system or simply different life histories of the sampled specimens. Responses of the innate immune system in fish were apparently more robust and diverse than in higher vertebrates[35]. Similarly, e.g., Atlantic cod comprises a unique immune system with substantial gene losses in adaptive components without being exceptionally susceptible to disease under natural conditions[36]. Consequently specific adaptations in the immune systems in the two eelpout species due to the different habitat conditions seem possible.
Another overrepresented term in P. brachycephalum is a carboxylase, which belongs to the KOG/COG category I - lipid transport and metabolism, as listed in the meNOG tables. The lipid fraction in liver of P. brachycephalum kept at 0°C is threefold higher than in Z. viviparus at habitat temperature[37], in line with the general finding that Antarctic species have high capacities for catabolism of fatty acids[38]. Furthermore, a recent study demonstrated relationships between preferred energy fuels and transcript levels of respective genes in P. brachycephalum[39]. Although the overrepresentation of the carboxylase fits into the current picture, further genes could have been expected to occur substantiating this evidence. This mismatch may diminish when the entire genome sequence data will become available. In summary, some of the library specific functions identified in our study may point to transcriptomic and possibly genomic contrasts due to the different environmental conditions of the discrete habitats of the two zoarcid fish under study.
Amino acid usage
With the advent of more genomic data from species from various habitats analyses of temperature effects on amino acid usage profiles and the correlated GC content became possible[13, 40]. Gu and Hilser[41] analyzed intra-protein interaction energies on the level of primary and secondary structural sequence segments in thermophilic and psychrophilic proteomes, supporting the flexibility hypothesis of cold adaptation. Furthermore, local and global adaptation patterns were differentiated, depending on different protein families.
By comparing the amino acid composition of a comprehensive set of orthologous sequence pairs globally we discovered a distinct pattern of replacements, which may result in structurally important changes in the functional structure of proteins supporting cold adaptation of primary protein structures (Figure5, Table2). In the following section all observed changes of amino acid usage are discussed in respect of the Antarctic species when compared to the Z. viviparus transcriptome.
The most obvious shifts found for P. brachycephalum are the net loss of glutamic acid and asparagine and the gain of serine (compare values of Table2 and Figure5). The frequency of all acidic residues is reduced and contrasts with slightly higher frequencies for the basic amino acids lysine and arginine as well as histidine. Due to the eminent loss of glutamic acid, the total amount of charged residues for stabilizing salt bridges is reduced. This may contribute to weaker interaction involved in substrate binding and protein interaction. The net loss of glutamic acid and a preferential exchange with 85 aspartic acid residues preserves the acidic function at reduced flexibility of the side chain. These exchanges may cause increments in charge density at the surface favouring increased solvent interaction as postulated earlier[11, 42]. A reduction of salt links results from the exchange of the charged glutamic acid by polar glutamine (30 Q over E), in line with earlier assumptions drawn from analyses of individual proteins[11, 42]. Arginine as a bulky amino acid was identified to be less abundant in cold-adapted species[11, 42]. However, replacements of 30 serine into arginine and a slight increase of arginine on a global level contrasts with both former assumptions.
Based on genomic comparisons Wang and Lercher[30] developed a simple, reduced predictor based on replacement frequencies of the charged amino acids glutamine (E), arginine (R) and lysine (K), which were found most often exchanged in a thermal cline. Higher values for the ERK-proxy are found to be characteristic for hyperthermophiles compared to thermophiles and mesophiles. However, the ERK signal in our study is purely based on the large E change (Figure5), as a minor change of R+K in the reverse direction does not confirm the ERK-hypothesis perfectly. We interpret this partial contrasting outcome as a consequence of extrapolation, i.e., application of a thermophile-mesophile hypothesis below the mesophile scope seems to be appropriate only for the E signal. Furthermore, we are aware of the finding of Lobry and Necsulea[40] that thermophile-mesophile signatures of cold adaptation are more prominent[43] than mesophile-psychrophile signatures, at least in analyses of codon-usage trends in coding sequences of complete genomes.
Within the group of polar amino acids the picture is heterogeneous due to the large gain of serine and the net loss of asparagine. Comparing the total counts of gains and losses of polar with unpolar residues a gain of 102 polar residues remains. This indicates that despite this pooling unpolar to polar shift -supporting increased solvent interaction- cannot be resolved as a strong signal within our study.
By comparing a limited set of prokaryotic proteome sequences amino acid frequencies canonically discriminating psychropilic, mesophilic and thermophilic species were determined after modelling of candidate genes on existing three-dimensional structures[43]. In this way the analysis of particular orthologuous sequences were extended to part of the proteomes, and amino acid frequencies were assessed separately in structural categories ‘buried’ and ‘surface’. In general, a trend towards polar residues (in particular serine) resemble the finding of a (solely significant) preferential usage of serine and a net gain of polar residues for a cold-adapted species. Our study confirms these findings as a cold induced positive serine shift is one of the dominating signals in our study. Therefore an increased protein surface-solvent interaction in cold-adapted proteins can be hypothesized, at least on the basis of the net serine signal. However, for a confirmation, further sequence analyses are required to gain structural discrimination of solvent exposure. A diverse pattern is detectable for threonine, accounting to a small net gain similar to serine: the surplus of 82 exchanges over alanine significantly increases the polarity, possibly contributing to less hydrophobic interactions within and improved solvent interactions at the surface of proteins. Similarly, the net gain of threonine over asparagine contributes to the overall reduction of the latter and may support better solvent interaction through increased polarity. A surplus of 30 serine exchanges for threonine was detected. At preserved polarity this exchange is likely to increase the accessibility of the hydroxyl group. In contrast, methionine is preferred over threonine (46 M over T), decreasing the polarity. In summary, the multitude and diversity of exchanges within the polar category of amino acids together with the strong signal found in an earlier study[43] point to their importance for thermal adaptation of proteins.
Within the group of unpolar residues no prominent net gains or losses were detected (Figure5). It should be noted that a moderate gain in the usage of proline for P. brachycephalum is detectable, contrasting conclusions of existing views[11, 42] as this amino acid causes a large negative impact on structural flexibility. Several amino acid exchanges with conserved functions became apparent in the hydrophobic domain: net gains of 31 alanine in exchange for leucine reflect the preference of shorter residues, i.e., reducing entropic and enthalpic net contributions of side chains to structural stability of protein cores[8, 10, 11]. The finding of 51 valine over alanine and 65 isoleucine over valine contrast with the global pattern of reduced hydrophobic interaction. However, it is likely that an enlargement in unpolar residue length may enhance the protein–lipid interaction, e.g., within membranes, and must not be in contradiction with the flexibility hypothesis.
As only minor changes in sequence seem to be necessary to adapt kinetic properties, adaptive changes may be difficult to be identified unless proteins of closely related species from different thermal habitats are compared[5, 8]. Furthermore, studies based on structural attribution are restricted to proteins with available 3D structures, which is possibly biased towards soluble proteins with enzymatic function as these proteins are overrepresented in common databases. The approach of our study considers all types of expressed proteins within the transcriptome including membrane proteins, structural proteins, etc. without further assumptions.
We claim that our study is less distorted by a probable bias from a specific selection of species, in that we restrict our analyses on model orthologous sequences from two comparable transcriptomes. Furthermore, we focussed on aligned segments, excluding insertions, i.e., we hypothesize that adaptation can be analysed on a purely local basis at single amino acid positions.
In summary, the observed amino acid substitutions represent a global net pattern of molecular shifts in the local, i.e., position-specific amino acid composition for two species of the same family, inhabiting thermally very distinct sites.
Codon usage
A holistic approach for studying temperature-dependent evolutionary profiles on genetic up to proteomic levels was initially subjected to archaea, bacteria and only some eukaryotes on a large thermal scale[14, 44], uncovering trends of a higher GC usage for species with a higher OGT. As the GC content reflects the degree of hydrogen bonding in nucleic acid chain molecules, higher GC contents lead to an increased thermal stability in (deoxy-) ribonuclein acid chains[45, 46]. Thermal adaptation was analyzed in structural ribosomal RNAs for prokaryotic 16SrRNA finding rising GC contents in species with higher OGTs[47]. In vertebrates similar observations were detected in the 18SrRNA[48] with a trend for higher GC content in endotherm animals compared to poikilotherms showing a correlation to the environmental habitat temperature of the species under study. Similarly, the coding parts of genomes of cold blooded vertebrates and mammals are proposed to be separated by a “major compositional transition” in the GC content, resulting in nearly 100% GC3 levels in mammals[15]. In contrast, genomes of poikilotherm vertebrates are characterized throughout by lower GC3 levels.
Based on the identified segment pairs on the coding sequence level, we focused on a subset of transcript sequences of equal size in both libraries to uncover temperature-related patterns in the codon usage. This filter is supposed to allow for a more specific statistical comparison in that it is not distorted by effects of insertions. We furthermore filtered for highly similar and aligned sequences (80% minimum translated sequence identity) to reduce false pairing. Admittedly, total codon frequencies are on comparable levels for the two species (Figure7). However, by aggregating all pairwise comparisons in a single Fisher’s test we were able to detect a significant signal (p = 5*10-4) of increased AT3 for the Antarctic eelpout (Figure7). A set of biochemically meaningful hypotheses may be proposed to explain the latter findings. Firstly, under energetically constrained conditions such as in polar environments, proof reading of DNA might be too cost-inefficient and an increment of A/T in the last codon position encoding for synonymous amino acids might be advantageous without having any impact on protein function. A subsequently lowered GC content is therefore discussed for the cold adapted diatom Fragilariopsis cylindrus (Thomas Mock, personal communication). Secondly, translation itself can be repressed kinetically through increased GC in codons, over-stabilizing enzymatic transition states within the chain of reaction steps of translation in the cold. Thirdly, to avoid over-stabilized secondary structure elements in messenger RNA in the cold lowering GC content could be an evolutionary resort. For example, the molecular function of cold shock proteins (CSPs) has been described to prevent mRNA from frozen, i.e., over-stabilized hairpin conformations in the cold[49]. An adaptation to the cold by means of reducing GC in the mRNA would imply a reduced pressure to express CSPs permanently at high levels, with implications for metabolic cost.